What does CSV stand for?
Comma-Separated Values. It’s a plain text format where data is organized in rows, with values in each row separated by a delimiter (like a comma).
What is the built-in Python module for handling CSVs?
The csv module. (import csv)
csv: How do you open a CSV file for reading?
Use with open(‘filename.csv’, mode=’r’) as file:
csv: What object do you create to read the file row by row?
A csv.reader object: \ncsv_reader = csv.reader(file)
csv: When iterating over a csv.reader, what data type is each row?
A list of strings.
csv: How do you read a CSV file where the first row is a header?
Use csv.DictReader. Each row will be a dictionary where keys are the header names.
csv: When opening a file for writing with the csv module, what two arguments are crucial?
mode=’w’ (to write) and newline=’’ (to prevent blank lines between rows on Windows).
csv: What object do you create to write to a CSV file?
A csv.writer object: \ncsv_writer = csv.writer(file)
csv: How do you write a single row of data (as a list)?
csv_writer.writerow([‘data1’, ‘data2’, ‘data3’])
csv: How do you write multiple rows at once (from a list of lists)?
csv_writer.writerows(list_of_lists)
csv: How do you specify a different delimiter (like a semicolon)?
Pass it as an argument: \ncsv.reader(file, delimiter=’;’)
What is the most popular third-party library for CSV manipulation, especially for data analysis?
The pandas library.
pandas: How do you read a CSV file into a DataFrame?
import pandas as pd \ndf = pd.read_csv(‘filename.csv’)
pandas: What is a DataFrame?
A 2D labeled data structure, like a spreadsheet or SQL table, with columns and rows.
pandas: How do you access a single column (a “Series”) from a DataFrame?
df[‘column_name’]
pandas: How do you filter rows (e.g., all rows where ‘age’ > 30)?
df[df[‘age’] > 30]
pandas: How do you add a new column to the DataFrame?
df[‘new_column_name’] = [value1, value2, …]
pandas: How do you save a DataFrame back to a new CSV file?
df.to_csv(‘new_file.csv’)
pandas: When using to_csv(), how do you prevent it from saving the DataFrame’s row numbers (the index)?
df.to_csv(‘new_file.csv’, index=False)