Bloomberg 2nd Technical Flashcards by Jacquelyn Garcia

how do you import pandas?

import pandas as pd

How well did you know this?

Not at all

Perfectly

how do you read in a csv?

df = pd.read_csv(“data.csv”)

How well did you know this?

Not at all

Perfectly

How do I get the first 4 rows of a df?

df.head(4)

How well did you know this?

Not at all

Perfectly

What is the default number of rows for head()?

How well did you know this?

Not at all

Perfectly

How do I get the last 5 rows for df?

df.tail()

How well did you know this?

Not at all

Perfectly

How do i get a random sample of the df?

df.sample(5)

How well did you know this?

Not at all

Perfectly

How do I get the number of rows and columns of a df?

df.shape

How well did you know this?

Not at all

Perfectly

how do i get the dtypes + non null counts?

df.info()

How well did you know this?

Not at all

Perfectly

how do i get column names of a df?

df.columns

How well did you know this?

Not at all

Perfectly

how do i get the data types per column?

df.dtypes

How well did you know this?

Not at all

Perfectly

how do i get the count missing per column?

df.isna().sum()

How well did you know this?

Not at all

Perfectly

how do we get % missing per column?

df.isna().mean() * 100

How well did you know this?

Not at all

Perfectly

how do we get total missing columns out of all the df?

df.isna().sum().sum()

How well did you know this?

Not at all

Perfectly

how do we drop rows with any missing values?

df.dropna()

How well did you know this?

Not at all

Perfectly

how do we do a simple fill in missing values?

df.fillna(0)

How well did you know this?

Not at all

Perfectly

how do we get descriptive statistics like count, mean, std, quantiles, etc. for every numerical column?

df.describe()

How well did you know this?

Not at all

Perfectly

how do we get descriptive statistics like count, mean, std, quantiles, etc. for every column?

df.describe(include=”all”)

How well did you know this?

Not at all

Perfectly

how do we get descriptive statistics like count, mean, std, quantiles, etc. for a specific column?

df[‘col_name’].describe()

How well did you know this?

Not at all

Perfectly

how do we get the mean of a column?

df[‘col’].mean()

How well did you know this?

Not at all

Perfectly

how do we get the median of a column?

df[‘col’].median()

How well did you know this?

Not at all

Perfectly

how do we get the standard deviation of a column?

df[‘col’].std()

How well did you know this?

Not at all

Perfectly

how do we get the minimum value of a column?

df[‘col’].min()

How well did you know this?

Not at all

Perfectly

how do we get the max value of a column?

df[‘col’].max()

How well did you know this?

Not at all

Perfectly

how do we get the index for the maximum value in each column?

df.idxmax()

How well did you know this?

Not at all

Perfectly

how do we get the index for the maximum value in each row?

df.idxmax(axis="columns")

how would we query to get the student with the highest gpa?

df.loc[df['gpa'].idxmax()]

how would we query to get the student with the lowest gpa?

df.loc[df['gpa'].idxmin()]

if we are trying to query into a df with multiple conditions what should we do?

wrap each condition with parentheses

how do we query when both all conditions should be met?

how do we query when both either/or conditions should be met?

how do we select multiple columns?

df[['col1', 'col2', 'col3']]

how do we sort by one column?

df.sort_values(by='gpa', ascending=False)

how do we sort by multiple columns?

df.sort_values(by=['year', 'gpa'], ascending=[True, False])

What is the default setting of sort_values()?

ascending=False

how would we group by a category and compute a statistic like mean?

df.groupby("col1")["gpa"].mean()

how would we group by a category and get counts per category?

df.groupby("col1")["id"].count()

how would we get multiple aggregations like min/max/mean per a group?

df.groupby("major")["gpa"].agg(['min', 'max', 'mean'])

how would we filter to group on major and find majors with average gpa greater than 3.5?

df.groupby("major")['gpa'].mean().loc[lambda x: x>3.5]

how do we compute outliers using the IQR method?

q1 = df['gpa'].quantile(0.25) q3 = df['gpa'].quantile(0.75) iqr = q3 - q1 lower = q1 - 1.5 * iqr upper = q3 + 1.5 * iqr

how do we actually use IQR to query for all outliers?

outliers = df[(df['gpa'] < lower) | (df['gpa'] > upper)]

what visualizations can we use for numeric variables?

histograms and box plots

what are histograms good for?

to show distribution shape

how would we create a histogram of gpa scores?

df['gpa'].plot(kind='hist')

what are box plots good for?

for outliers and to show quartiles and median

how do we create a boxplot for the gpa column?

df.boxplot(column='gpa')

how do we visualize categorical variables?

bar plot and horizontal bar plots

how would we visualize counts per major with a bar plot?

df['major'].value_counts().plot(kind='bar')

how would we visualize counts per major with a horizontal bar plot?

df['major'].value_counts().plot(kind='barh')

how do we do an inner merge?

pd.merge(df1, df2, on="student_id", how="inner")

how do we do a left merge?

pd.merge(df1, df2, on="id", how="left")

how would we define a join that's a union of both?

'outer'

what is loc based on?

label based

what is iloc based on?

index-based

how would we get the first 5 rows and the first 3 columns of a df?

df.iloc[0:5, 0:3]

how would we get the average beak length of a penguin?

df['beak_length_mm'].mean()

how would we get a count of how many students per major?

df['major'].value_counts()

how would we get the average salary per department?

df.groupby('department')['salary'].mean()

how would we find departments with missing salary values?

df[df['salary'].isna()]

how would we get the 6th row of a df?

df.iloc[5]

how would i read in an excel sheet?

pd.read_excel("titanic.xlsx")

how would i read in a json file?

pd.read_json("students.json")

Bloomberg 2nd Technical Flashcards

(61 cards)