Feature Scaling Flashcards

(7 cards)

1
Q

How does Feature Scaling affect gradient descent

A
  • allows it to run much faster
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the basis for Feature Scaling?

A
  • the larger the possible range of values for a feature (like the size of a house) it is more likely that a model will learn to choose a relatively small parameter value w for it
  • if the possible values for a feature are small then the reasonable values for the weights for this feature will be large
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How does the basic assumption of feature scaling affect the scatterplot and contour plot?

A
  • contour plot for the large range of value feature will need large changes to the parameter in order to effectively move along the plot
  • a change in the lower value range feature parameter will lead to a dramatic change in contour plot as small changes lead to a high change in cost
  • scatterplot is basically the reverse: large values ranges are scattered along a long axis and small value ranges are scattered along only a short path
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why does Feature Scaling help gradient descent?

A
  • because of the form of the contour plot with low-range and high-range features gradient descent might skip multiple times with only minor improvements
  • this needing a lot more steps to arrive at the pot. optimum
  • performing a translation of the features to have them happen in a similiar range allows gradient descent to perform more efficiently
  • best is if all feature ranges have comparable ranges of values to each other
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you scale features?

A
  1. Dividing by the maximum
    Assuming
    300<x1<2000 then x1 scales = x1/2000
    0 < x2 < 5 then x2 scales = x2/5
  2. in Addition to 1. Mean normalization
    - rescaling the original features so that they are centered around 0, so that they have both negative and positive values usually between -1 and +1
    -> calculating the mean of the values for one feature, like 600 for x1 and then x1 = (x1 - mean1)/(max x1 - min x1) -> gives the normalized x1
  3. Z-score normalization
    - uses the standard deviation of the feature
    -> 1. calc mean and standard deviation of each feature
    -> x1 = (x1 - mean x1)/standard deviation x1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the rule of thumb for feature scaling?

A
  • aim for about -1 <= xj <= 1 for each feature xj
    or -3 <= xj <= +3, …. upper and lower limit should be similiar
    even 0<= x1 <= 3 is okay

too large or too small range values should be rescaled
-100<=x1<=+100
-0.001<= x4<=0.0001
98.6 <= x6 <= 105

almost no harm to feature rescaling so should always be considered

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly