Feature Scaling Flashcards

Question 1

Q

How does Feature Scaling affect gradient descent

Answer

A

Question 2

Q

What is the basis for Feature Scaling?

Answer

A

the larger the possible range of values for a feature (like the size of a house) it is more likely that a model will learn to choose a relatively small parameter value w for it
if the possible values for a feature are small then the reasonable values for the weights for this feature will be large

Question 3

Q

How does the basic assumption of feature scaling affect the scatterplot and contour plot?

Answer

A

contour plot for the large range of value feature will need large changes to the parameter in order to effectively move along the plot
a change in the lower value range feature parameter will lead to a dramatic change in contour plot as small changes lead to a high change in cost
scatterplot is basically the reverse: large values ranges are scattered along a long axis and small value ranges are scattered along only a short path

Question 4

Q

Why does Feature Scaling help gradient descent?

Answer

A

because of the form of the contour plot with low-range and high-range features gradient descent might skip multiple times with only minor improvements
this needing a lot more steps to arrive at the pot. optimum
performing a translation of the features to have them happen in a similiar range allows gradient descent to perform more efficiently
best is if all feature ranges have comparable ranges of values to each other

Question 5

Q

How do you scale features?

Answer

A

Dividing by the maximum
Assuming
300<x1<2000 then x1 scales = x1/2000
0 < x2 < 5 then x2 scales = x2/5
in Addition to 1. Mean normalization
- rescaling the original features so that they are centered around 0, so that they have both negative and positive values usually between -1 and +1
-> calculating the mean of the values for one feature, like 600 for x1 and then x1 = (x1 - mean1)/(max x1 - min x1) -> gives the normalized x1
Z-score normalization
- uses the standard deviation of the feature
-> 1. calc mean and standard deviation of each feature
-> x1 = (x1 - mean x1)/standard deviation x1

Question 6

Q

What is the rule of thumb for feature scaling?

Answer

A

aim for about -1 <= xj <= 1 for each feature xj
or -3 <= xj <= +3, …. upper and lower limit should be similiar
even 0<= x1 <= 3 is okay

too large or too small range values should be rescaled
-100<=x1<=+100
-0.001<= x4<=0.0001
98.6 <= x6 <= 105

almost no harm to feature rescaling so should always be considered

Question 7

Q

(7 cards)