Univariate Data
data that consists of observations of only one characteristic
Bivariate Data
Data that includes observations of two characteristics
Explanatory variable
“x variable” variable may help predict or explain changes in the response variable
Response variable
“y variable” measures the outcome of a study
Scatterplot
shows the relationship between 2 QUANTATATIVE variables on the same group of individuals
What to use to describe a scatterplot
Form
Outliers/unusual features
Direction
Strength
SEE PAGE 3
Describing scatterplots
Positive Association
Large values of the explanatory tend to associate with large values of the response
Describing scatterplots
Negative association
Large values of the explanatory tend to associate with small values of the response
Describing scatterplots
No association
knowing the value of one variable does NOT help us predict the other
Describing scatterplots
Direction
positive
negative
none
Describing scatterplots
Form
Shape
Do the data points follow a linear or curved pattern
Describing scatterplots
Strength
weak
moderate
strong
Describing scatterplots
Unusual Features
outliers, gaps, clusters
Correlation (definition and symbol)
Measures the direction and strength of a linear relationship between 2 quantatative variables
correlation = r
Important properties of the correlation r
Always between -1 and 1
Indicates direction by its sign (if r is less than 0 = negative, if r is greater than 0 = positive)
You can only get r = -1 or r = 1 if there is a perfect relationship
Stronger relationships are closer to 1 and -1 and weaker ones are closer to 0
ONLY can be used for LINEAR relationships
Caution about Correlation
Correlation does NOT imply CAUSATION
Correlation does NOT measure FORM
Correlation only describes LINEAR relationships
NOT RESISTANT to outliers
Regression Line (definition and equation)
A line that measures how a response variable changes as the explanatory variable changes
Equation
ŷ = a + bx
ŷ = predicted response
a = y-intercept
b = slope
x = explanatory variable
Extrapolation
Use of the regression line for predictions outside of our interval of x-values used to make a scatterplot
These predictions are NOT ACCURATE
Residual
The difference between the actual value of y and the value of y predicted by the regression line
Equation
y-ŷ = residual
(Actual - Predicted, AP!)
Best line to use
the Least Squares Regression Line (LSRL)
Because it is the line that minumizes the sum of the squared residual values
A GOOD regression line
makes residuals as small as possible
Least Squares Regression Line
the line that minimizes the the sum of the squared residual values
Residual Plot
A scatterplot that displays residuals on the y-axis and the explanatory variable on the x-axis
If a regression model is appropriate
there should be NO curved pattern in the residual plot
SEE PAGE 13