The scatter plot can be summarised by the following five numerical summaries…
correlation coefficient (r)
a numerical summary which measures how points are spread around the line.
The Correlation coefficient (r) is the mean of the product of the variables in standard units
True
Properties of the Correlation Coefficient
Symmetry - The correlation coefficient is not affected by interchanging the variables.
Scaling - The correlation coefficient is shift and scale invariant.
Outliers have no influence on ‘r’
False
Nonlinear association can be detected by the correlation coefficient
False
correlation coefficient in R
cor()
linear regression
lm(y~x)
how to put regression line on a plot
abline( lm(y~x), col=”…”)
Prediction error (residual)
vertical distance (or ‘gap’) of a point above or below the regression line (difference)
Residual plot
graphs the residuals vs x.
* If the linear fit is appropriate for the data, it should show no pattern (random points around 0).
* check appropriatness of linear model.
is extrapolating reliable?
no, it is a prediction error.
before predicting using a linear model you should…
check the scatter and residual plot
If the vertical strips on the scatter plot show equal spread in the y direction…
then the data is homoscedastic, otherwise the data is heteroscedastic.
homoscedastic
an assumption of equal or similar variances in different groups being compared
heteroscedastic
when the SDs of a predicted variable, monitored over different values of an independent variable or as related to prior time periods, are non-constant