How do we read in a csv using pandas?
pd.read_csv(‘data/file.csv’)
How do we access a column in pandas?
df[‘Column’]
How do get how many unique names there were for a given year?
df[‘Year’].value_counts()
How do we query for a given year?
df[(df[‘Year’] == 1800)]
If we want to get the number of unique names for a certain year, how do we do that?
df[(df[‘Year’] == 1800)].value_counts(‘Name’)
How many babies were recorded per year?
df.groupby(‘Year’)[‘Count’].sum()
What does the following statement give us?
(df
.assign(first_letter=df[‘Name’].str[0])
.query(‘first_letter == “L”’)
.groupby(‘Year’)
[‘Count’]
.sum()
.plot())
We are plotting a graph that shows the number of babies born with an “L” name per year
How would we make a function to create a name graph for any specific name?
def name_graph(name):
return (df
.query(f’Name == “{name}”)
.groupby(‘Year’)
[‘Count’]
.sum()
.plot(title=f’Number of Babies Born Named “{name}” Per Year’))
What does df.head(2) return?
the first two columns of the dataframe df
What does this code do?
whoa = np.random.choice([True, False], size = len(dogs))
(dogs[whoa]
.groupby(‘size’)
.max()
.get(‘longevity’)
)
the whoa portion is randomly selecting rows from the dogs dataframe. then we get a random subset of rows, group by the size column, takes the maximum value of each group and retrieves the longevity column. this gives us the max longevity per dog size based on a random sample.
What is numpy’s main object?
the array
What are the two traits of numpy arrays?
they are homogenous - all values are of the same type. and potentially multidimensional
What does np.arange(10) give us?
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
How do we import pandas into a jupyter notebook?
import pandas as pd
What does df.get(‘Population’) return?
a series
What is a series?
like an array but with an index
When can we perform arithmetic operations with two series?
anytime since we treat them like arrays, as long as they have the same length and index
How do we assign a new column?
df.assign(new_col = df.get(‘Population’) / df.get(‘Land Area’))
When assigning a new column, what do we not want to do?
put quotes around the name of the new column
If we want to use the dataframe with a newly assigned column, what must we do?
assign it to a variable like new_df = df.assign(…)
What kind of methods can we use on series?
.min(), .max(), .mean()
How would we get the median of a series?
df.get(‘Density’).median()
How do we get descriptive values of a column?
using .describe() on the specific column
What is the syntax to sort a dataframe?
.sort_values(by=’column_name’)