Notebook 8 Flashcards

Question

General recommendation: Getting reproducible result across multiple execution?

Answer 1

can change the 'strategy' to 'median' for for that to replace the results

Answer 2

Scaling features to lie between a given minimum and maximum value, often between zero and one, or so that the maximum absolute value of each feature is scaled to unit size. This can be achieved using MinMaxScaler or MaxAbsScaler, respectively. This can be useful if you know that your data is bounded in some range and is not Gaussian, and for preserving zero entries in sparse data.

Answer 3

your data contains many outliers, scaling using the mean and variance of the data is likely to not work very well. For such data you want to use RobustScaler, which scales data using the median and interquartile range (IQR), making it less sensitive to outliers. Typical examples of such data are housing prices and income data where outliers are common.

Answer 4

Add what which corresponds to which

Answer 5

check accuracy by head and verify that the precision and recall for the 1class(positive class) are the values printed here

Answer 6

What does this mean though?

Answer 7

We now have three 2D arrays: the `X` and `Y` arrays with the grid coordinates, and the array `Z` containing the value of the function at each grid point.

Answer 8

LOOK UP PROPER WAY TO FIND WAY TO FIND THE CRITICAL POINTS

Answer 9

use the kwarg: levels = int

Answer 10

other cmaps = 'magma', 'plasma', 'Reds', 'jet' adding alpha = 0 to 1 will change the opacity with the closer the to 1 the more brighter it will be

Answer 11

--- - A Python list with classifiers and hyperparameter settings is created. - Three datasets are created and put into a list. - The lists are looped over to generate the comparison grid. - Within the outer loop (dataset loop) you should recognise where the data is scaled and the train_test split is made. - There is then code to plot data. - Within the inner loop (classifier loop) you should recognise the fitting of the training data and the computation of score from the test data. - There is then code to plot the results. You would recognise the functions contourf and scatter. While more involved and sophisticated, this code more-or-less corresponds to the example in the Contour_plot notebook.

Answer 12

X += 2 * rng.uniform(size=X.shape) The score decreases by about 0.5 when increasing the noise

Answer 13

make_moon --> creates two interleaving half circles * Shuffle --> wheather to shuffle the samples * Noise --> Standard evision of Gaussian noise added to the data make_circles -->makes a larger circle containing the smaller circle * Shuffle --> wheather to shuffle the samples * Noise --> Standard evision of Gaussian noise added to the data * Factor --> the Scale factor between inner and out circles in the range [0,1) More noise generally makes the score worse for make_circles score in low for linear.svm as data is not linearly separable even with soft margins

Answer 14

Either randomly guessing the class of the binary classification problem or settle all the test data to one of the classes . (in our case we just can set y_test = 0 * y_test)

Answer 15

* Linear is the dot plot of the input samples which creates our usual hyperplane spit with straight lines * Radial Basis function (RBF) also know as the Gausian Kernel,easure the similarity between two data points in infinite dimension and then approaches classificaiton by majority vote The kernel function is: K(x₁,₂) = exp (-γ * IIx₁ - ₂II²

Answer 16

Setting degree =1 for the poly kernel should reproduce the linear case. Note that the circles data is approximately rotationally symmetric, so linear classifiers will have a hard time argeeing on an orientiation. As already discussed the scores are bad even for noise free circle data with linear classifiers.

Answer 17

You should see a graph of the function z=f(x,y) as a surface plot in three dimensions. Surface plots make local and global minima very obvious. You should also see faint lines on the surface showing the grid. Before discussing the Python code, it is worth changing the colouring of the surface. can add cmap='plasma' or 'viridis' or 'magma'

Answer 18

elev = rotates the surface in the z-axis e.g. elev=90 looks at the plot for above whereas -30 is looking at it from 30 degrees below change the azim (azimuth) angle rotations the view of the surface around the z-axis

Answer 19

500* np.ab(z) just makes the scatter plot larger

Notebook 8 Flashcards

(72 cards)