What type of variables are seen in regression?
Regression: Continuous dependent and independent variables
What’s the difference between QUALITATIVELY and QUANTITAVELY?
Qualitative information describes qualities, characteristics, or experiences using words.
Quantitative information measures, counts, or quantifies using numbers and statistics.
What does a simple linear regression describe?
Simple linear regression describes the linear relationship between a predictor variable, plotted on the x-axis (distance from East Africa) , and a response variable, plotted on the y-axis (genetic diversity).
What variable regresses on the other?
We say “regress Y on X”
Ex: “regress genetic diversity on distance from Africa”
What is the observed value?
The dots on the scatter plot
What is the line that follows the general trend of the scatter plot called the fitted regression line?
Fitted regression line representing predicted values for any given value of X (proportion black). Best fit of the data → aid in making predictions in the data
What does the fitted regression line aid in?
aid in making predictions in the data
What is the predicted value?
Values along the fitted regression line
What is the Residual value e?
What algorithm does a regression model use and what does it assure?
A regression model uses an algorithm called “ordinary least squares (OLS)” that assures that the residual (deviation) values are as small as possible given the data. In other words, OLS maximizes the predicted values to be as closest as possible (in average) to the predicted values.
The regression line through a scatter of points is described by the following equation:
Y = a + bx
1) 𝒀 is referred as response variable (or also dependent variable).
2) 𝑿 is referred as explanatory variable
3) a = intercept: The predicted value of Y when X is zero (unit is the same as in Y)
4) b = slope: the rate of change in y as x changes
Why must you be careful when interpreting the intercept of a regression line?
A meaningful interpretation is only possible if X can truly be zero AND if the data include values close to zero (not the case here) → this is an issue.
What is the unit of the intercept of the regression line?
The unit attached to the intercept is the same as the response variable (i.e., years).
Why is the intercept is useful for prediction?
Because it represents the addition (offset) required to correctly position the regression line so that predictions match the observed data.
–> Different intercepts would lead to predicted values that are either too high or too low.
Define the slope
Because X is expressed in proportions (i.e., 0 to 1), then the slope is the increase of the response variable (age) when the predictor increases 100%, i.e., when X = 1.
What does Ŷ (y hat) stand for?
Predicted values on the regression line
What are residuals 𝜀?
Residual values 𝜀 are the difference (deviation) between the observed and predicted values
True or false: Each observation in the data has a predicted & residual value
TRUE
What is the purpose of the OLS or ordinary least squares?
Trying to minimize the sum of the squares of the residuals
What is the aim of a regression line?
A regression model aims at predicting the average Y based on X, i.e., predict the average male lion based on their proportion of black spots
What does the line of best fit for a regression line minimize?
The line of best fit minimizes the average distance between data and fitted line, i.e., the residuals.
How do we find the best line of a regression?
To find the best line, we must minimise the sum of the squares of the residuals
Why does the model to minimise the sum of the squares of the residuals to find the line of best fit use squares instead of square root?
Use squares such that it becomes a variance equation and can use the F-distribution
What is the H0 and HA of a t-test in statistical hypothesis testing of a regression model?
1) H0: the statistical population slope 𝛽 = 0 (i.e., Y can’t be predicted by X).
2) HA: the population slope 𝛽 ≠ 0 (i.e., Y can be predicted by X).