Notebook 7 COPY Flashcards by Dylan Ottey

What are the two main pandas data structures?

How well did you know this?

Not at all

Perfectly

import statement for pandas?

import pandas as pd

How well did you know this?

Not at all

Perfectly

How to read a csv in pandas?

How well did you know this?

Not at all

Perfectly

How do we look at the first 10 and last 10 entries in a dataframe?

What is an issue that arises here?

Cant view both at the same time use the name of the dataframe e.g. just “iris” to display both the first and last 5 entries in a data frame

doing this also displays the number of rows and columns in the dataframe

How well did you know this?

Not at all

Perfectly

How do you “print” both the head and tail of a table in the same comand line

How well did you know this?

Not at all

Perfectly

What does df.info report?

How well did you know this?

Not at all

Perfectly

What does df.describe() report?

How well did you know this?

Not at all

Perfectly

How would we pull the column “sepal_length” from the iris dataframe?

How well did you know this?

Not at all

Perfectly

What are the three ways of indexing and selecting data in a dataframe?

How well did you know this?

Not at all

Perfectly

for the iris dataframe, display two new data frames where df1 contains the columns “sepal_length” and “sepal_width” using column name indexing and the second contains displays the same using slicing?

How well did you know this?

Not at all

Perfectly

How do we print the shape of dataframe?

df.shape

How well did you know this?

Not at all

Perfectly

What are the two ways to extract the design matrix X and vector of targets y from the iris Dataframe?

How well did you know this?

Not at all

Perfectly

How can we find the mean of a df using np?

np.mean(df)

How well did you know this?

Not at all

Perfectly

How can we sort a dataframe based off as specific column in ascending order?

in desending order can set ascending = False or drop it from the code all together

How well did you know this?

Not at all

Perfectly

What do we need to check if our csv isnt not in columns?

Need to check how it is separated in this case it was separated by a semi-colon

How well did you know this?

Not at all

Perfectly

What is the import statement for seaborn?

import seaborn as sns

as the guys name was Samuel Norman Seaborn

How well did you know this?

Not at all

Perfectly

What does the following code display?

sns.pairplot(iris, hue=’species’)
plt.show()

How well did you know this?

Not at all

Perfectly

How do I adjust this function to only display the lower corner (as it is a mirror of the upper corner) and adjust the markets of the graphs)?

How well did you know this?

Not at all

Perfectly

How do you use seaborn to plot this following graph for the iris dataset?

What does it show?

removing the inner quartile replaces it with a boxplot

Also as seaborn likes the graphs to be based on categorical data if you swap the x and y axis the graph will be orientated horizontally.

How well did you know this?

Not at all

Perfectly

How do I create a heatmap for the data set “flights” with seaborn?

How well did you know this?

Not at all

Perfectly

What do we use scikit-learn for?

How well did you know this?

Not at all

Perfectly

What are the import states for some of the main ML algos we will use?

from sklearn import linear_model,tree

How well did you know this?

Not at all

Perfectly

Given our design matrix X and label vector y how can I use SVM to train and make a prediction on the sollowing unseen array?

X_unseen = np.array([[6.7, 2.8, 5.2, 2.1]])

How well did you know this?

Not at all

Perfectly

How do I adjust this code to use the decision tree classifier?

How well did you know this?

Not at all

Perfectly

What are the advantages of the decision tree classifier?

What are the disadvantages of the decision tree classifier?

What is the code for loading and printing a confusion matrix? What does the matrix show?

ADD TO THIS

How does a decision tree classifier work?

ADD TO THIS from user guide

What is the import code for train_test_split?

Why dont we test on the same parameters that our model learns from?

How do we create test and training data code for the iris data set using the train_test_split fucntion?

When testing for different hyperparameters why is their still a risk of overfitting?

How do we solve this issue?

How do we implement the cross-validation procedure here?

1) load diabetes data set 2) Split the target training/test data 3) Create the linear regression 4) train the model 5) Make predictions using the testing set 6) plot the target as a funciton of bmi of body mass 7) plot target as function of bp average blood preure 8) plt truth vs prediction

What is the least squares solution? What is the big-O of OLS?

What is the Ridge regression from OLS?

Has the same time complexity as OLS

How do we implement the Ridge model?

What is the classifier varient of Ridge on scikit learn?

How do we set the regularisation parameter for RidgeCV and RidgeClassifier CV?

What is the Lasso model?

How do we implement the Lasso model?

How do we set the regularisation parameters for Lasso?

What else does the estimator LassoLarsIC use to find its optimal alphas?

What is the equivalence between the alpha of lasso and the regularisation parameter of SVM?

How do I adjust this code for the Ridge regression?

How do I adjust this code for the Lasso regression?

Why do we use the standardscaler?

What does a confusion matrix show?

What is the code to use svm to classify some data and then create a confusion matrix?

What are the three way to scale the data?

What does the StandardScaler?

Once we have initialised scaled data, how do we implement it into code using a SVM classifier and print its confusion matrix and score?

Wha

What classifier isnt that sensitive to scaling?

How to implement the DecisionTreeClassifier model using scaling for the winequality date set? Perform the classification and repeat and display the confusion matrix and clf.score? How does it compare to when you use unscaled data (repeat the task with unscaled data)?

Not that much different

What are the two ways to show a confusion matrix based on this unscaled train_test data of the winequality dataset?

What are hyperparameters and how do they relate to Cross-validation?

What is the simpliest way to use cross-validation?

What happens when the cv argument is a integer in cross_val_score?

How do the cross_validate and cross_val_score differ?

What does cross_val_predict do? When is it appropriate or inappropriate to use?

What function do we use to find the names and current values for all parameters for a given estimator?

What does a search for a hyperparameters consist of generally?

What are the two generic approaches for parameter searches?

What is the Exhaustive Grid Search method in SciKit learn?

How do we tune the decision threshold of the classifier once he model has been trained?

How can we tune the decision theshold through different strategies controlled by the parameter scoring?

What do you need to note regarding the internal cross-validation?

What are all the import statements needed for cross validation?

Run the svm.SVC classifier on the unscaled data

How do we code this?

When tuning Hyperparameter seearch with a manual search with a loop what do we do once we have calculated the scores to decide which Hyperparameter is the best?

How do we perform a Hyperparameter grid search using GridSearch CV? How do you perform a Hyperparameter grid search over both C and gamma at the same time?

If things have gone well, you should find that with C and gamma set you can now predict approximately 66% of the cases correctly. This is a substantial improvement over the approximately 45% found at the beginning of this notebook before doing any scaling or hyperparameter tuning. Note also that the 66%

What do we do once we have the best set of hyperparameters?

What is the default number of folds the GridSearchCV() uses

cv = 5

Notebook 7 COPY Flashcards

(76 cards)