How does Feature Scaling affect gradient descent
What is the basis for Feature Scaling?
How does the basic assumption of feature scaling affect the scatterplot and contour plot?
Why does Feature Scaling help gradient descent?
How do you scale features?
What is the rule of thumb for feature scaling?
too large or too small range values should be rescaled
-100<=x1<=+100
-0.001<= x4<=0.0001
98.6 <= x6 <= 105
almost no harm to feature rescaling so should always be considered