Bloomberg 2nd Technical Flashcards

(61 cards)

1
Q

how do you import pandas?

A

import pandas as pd

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

how do you read in a csv?

A

df = pd.read_csv(“data.csv”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do I get the first 4 rows of a df?

A

df.head(4)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the default number of rows for head()?

A

5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do I get the last 5 rows for df?

A

df.tail()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do i get a random sample of the df?

A

df.sample(5)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do I get the number of rows and columns of a df?

A

df.shape

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

how do i get the dtypes + non null counts?

A

df.info()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

how do i get column names of a df?

A

df.columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

how do i get the data types per column?

A

df.dtypes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

how do i get the count missing per column?

A

df.isna().sum()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

how do we get % missing per column?

A

df.isna().mean() * 100

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

how do we get total missing columns out of all the df?

A

df.isna().sum().sum()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

how do we drop rows with any missing values?

A

df.dropna()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

how do we do a simple fill in missing values?

A

df.fillna(0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

how do we get descriptive statistics like count, mean, std, quantiles, etc. for every numerical column?

A

df.describe()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

how do we get descriptive statistics like count, mean, std, quantiles, etc. for every column?

A

df.describe(include=”all”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

how do we get descriptive statistics like count, mean, std, quantiles, etc. for a specific column?

A

df[‘col_name’].describe()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

how do we get the mean of a column?

A

df[‘col’].mean()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

how do we get the median of a column?

A

df[‘col’].median()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

how do we get the standard deviation of a column?

A

df[‘col’].std()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

how do we get the minimum value of a column?

A

df[‘col’].min()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

how do we get the max value of a column?

A

df[‘col’].max()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

how do we get the index for the maximum value in each column?

A

df.idxmax()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
how do we get the index for the maximum value in each row?
df.idxmax(axis="columns")
26
how would we query to get the student with the highest gpa?
df.loc[df['gpa'].idxmax()]
27
how would we query to get the student with the lowest gpa?
df.loc[df['gpa'].idxmin()]
28
if we are trying to query into a df with multiple conditions what should we do?
wrap each condition with parentheses
29
how do we query when both all conditions should be met?
&
30
how do we query when both either/or conditions should be met?
|
31
how do we select multiple columns?
df[['col1', 'col2', 'col3']]
32
how do we sort by one column?
df.sort_values(by='gpa', ascending=False)
33
how do we sort by multiple columns?
df.sort_values(by=['year', 'gpa'], ascending=[True, False])
34
What is the default setting of sort_values()?
ascending=False
35
how would we group by a category and compute a statistic like mean?
df.groupby("col1")["gpa"].mean()
36
how would we group by a category and get counts per category?
df.groupby("col1")["id"].count()
37
how would we get multiple aggregations like min/max/mean per a group?
df.groupby("major")["gpa"].agg(['min', 'max', 'mean'])
38
how would we filter to group on major and find majors with average gpa greater than 3.5?
df.groupby("major")['gpa'].mean().loc[lambda x: x>3.5]
39
how do we compute outliers using the IQR method?
q1 = df['gpa'].quantile(0.25) q3 = df['gpa'].quantile(0.75) iqr = q3 - q1 lower = q1 - 1.5 * iqr upper = q3 + 1.5 * iqr
40
how do we actually use IQR to query for all outliers?
outliers = df[(df['gpa'] < lower) | (df['gpa'] > upper)]
41
what visualizations can we use for numeric variables?
histograms and box plots
42
what are histograms good for?
to show distribution shape
43
how would we create a histogram of gpa scores?
df['gpa'].plot(kind='hist')
44
what are box plots good for?
for outliers and to show quartiles and median
45
how do we create a boxplot for the gpa column?
df.boxplot(column='gpa')
46
how do we visualize categorical variables?
bar plot and horizontal bar plots
47
how would we visualize counts per major with a bar plot?
df['major'].value_counts().plot(kind='bar')
48
how would we visualize counts per major with a horizontal bar plot?
df['major'].value_counts().plot(kind='barh')
49
how do we do an inner merge?
pd.merge(df1, df2, on="student_id", how="inner")
50
how do we do a left merge?
pd.merge(df1, df2, on="id", how="left")
51
how would we define a join that's a union of both?
'outer'
52
what is loc based on?
label based
53
what is iloc based on?
index-based
54
how would we get the first 5 rows and the first 3 columns of a df?
df.iloc[0:5, 0:3]
55
how would we get the average beak length of a penguin?
df['beak_length_mm'].mean()
56
how would we get a count of how many students per major?
df['major'].value_counts()
57
how would we get the average salary per department?
df.groupby('department')['salary'].mean()
58
how would we find departments with missing salary values?
df[df['salary'].isna()]
59
how would we get the 6th row of a df?
df.iloc[5]
60
how would i read in an excel sheet?
pd.read_excel("titanic.xlsx")
61
how would i read in a json file?
pd.read_json("students.json")