Scatter Plots:
Graph used to determine if there is a relationship between two variables. The independent variable is on the horizontal axis and the dependent variable is on the vertical axis.
Line of Best Fit (trend line)
A straight line that passes as close as possible to all the points in a scatter plot.
The line is drawn with the following criteria in mind:
Outliers:
Data or points that lie significantly away from the majority of the other data. They can skew a regression analysis, especially when the collected data is small. More information should be sought about the outlier before including or excluding it from the analysis.
Correlation:
in data analysis, one variable may be affected by another variable and when a change in the independent variable affects the dependent variable, there is a correlation between them.
To describe a correlation, we use 3 attributes:
Linear Correlation:
Variables have a linear correlation if the changes in one variable tend to be
proportional to changes in the other variable.
Correlation Coefficient:
Gives a quantitative measure of the strength of a linear correlation or a measure of how closely the points of a scatter plot is to the line of best fit. It is the covariance of the two variables divided by the product of the standard deviations of each variable.
Negative Linear Correlation Values
Strong is -1 to -0.67
Moderate is -0.67 to -0.33
Weak is -0.33 to 0
Positive Linear Correlation Values
Weak is 0 to 0.33
Moderate is 0.33 to 0.67
Strong is 0.67 to 1
Steps for using the Correlation Coefficient Formula
Linear Regressions in Desmos
You can find the equation of the line of best fit by
subbing in the m and b that Desmos gives you. Just remember to write x after the m because ts y = mX +b.
A positive relationship between two variables will sound like this:
as the # of minutes increases, the cost also increases.
To find unknown y values that are within our data sets we can either
1) sub that x into the equation and solve or 2) we can hover over the x value on desmos and get the y.
Interpolation is
data within our data set.
Extrapolation is
data beyond the data set.
To find unknown x values that are within our data set we can either
1) sub in the y and solve or 2) hover over the trendline as close to the y value as possible.
A point can be rejected as an outlier if
1) There was an error collecting data, or outside factors affected the data,
ex. Kathy collected data on the heights of students and their arm spans.
When recording the data, she mistakenly recorded a high of 181cm for 101cm. (Thus, an error in data collection)
2) If outside factors affect the data.
ex. A company records its total revenue each month.
Last month, the workers at the company went on strike for two weeks.
The presence of an outlier can affect the
analysis of data
Other factors, such as
sample size and composition also need to be considered when analyzing data.
Linear regression is only appropriate
if the data appears to be linear.
Non-Linear Regression:
Definition — An analytical technique for finding a curve of best fit for data having a non-linear correlation.
Examples of other types of relationships are:
1) Quadratic: Equation of the form: y= ax^2 + bx +c
2) Cubic: Equation of the form: y= ax^3 + bx^2 + cx +d
3) Quartic: Equation of the form: y= ax^4 + bx^3 + cx^2 + dx + e
4) Exponential: Equation of the form: y = a(b)^x