Python Interview > How do you process data in chunks with pandas? (AI) > Flashcards
Why process data in chunks with pandas?
Processing data in chunks is used when a dataset is too large to fit entirely into your computer’s available memory (RAM) at once.
Instead of loading the whole file, you read a manageable portion (a chunk) into memory, process it, potentially aggregate results, and then move on to the next chunk.
How do you initiate chunked reading using a pandas read_* function?
You pass the chunksize parameter to the read_* function (e.g., read_csv, read_json). This function then returns an iterator object rather than a single DataFrame.
Example: To read a CSV file 10,000 rows at a time:
import pandas as pd
chunk_iterator = pd.read_csv('large_data.csv', chunksize=10000)