What is the most important principle in statistics?
Correlation doesn’t imply causation!
What are the various possibilities when 2 data sets (X and Y) are correlated?
1) X causes Y
2) Y causes X
3) X and Y partly cause one another
4) X and Y are both caused by something else
5) Correlation is just chance, there’s no causal relationship
What does ‘r’ represent?
The strength of correlation
What are the types of correlation?
1) Perfect
2) Strong
3) Moderate
4) Weak
5) No correlation
What is the difference between correlation and association?
Correlation: linear relationship between 2 variables (type of association)
Association: any relationship between 2 variables
What is the r value also known as?
PMCC
What does a higher r value mean? What does r=0 mean?
A stronger correlation. r=0 means no correlation at all
What is interpolation? What is extrapolation?
Interpolation - inferences within our data (valid and reliable)
Extrapolation - inferences outside our data (unreliable)
What should you do when drawing a line of best fit?
1) same number of points above and below the line
2) try to have all points equidistant from the line
3) don’t include outliers
4) don’t extend the line beyond the points (not even to the origin as you can’t infer that the trend holds!)
What does this symbol mean: ≈?
Estimated value/ approximate
What is the formula for a regression line?
y=a+bx
What is a regression line?
A straight line showing the best fit for scattered data points on a graph, shows the linear relationship between an independent variable (X-axis) and a dependent variable (Y-axis)
What points will a regression line always pass through?
The mean point, coordinate formed by the average of all x and y values in the dataset (x̄, ȳ) where the straight line represents the mean
How do use your calculator to work out the r value of a data set?
1) press home and go to ‘regression’
2) type in your data down the column
3) select the subheading ‘graph’ then ‘regression’ then ‘linear’
4) select the subheading ‘stats’ and scroll down until you find the regression coeff. (r) value
How do you plot a regression line?
1) Use the values of ‘a’ and ‘b’ given to you by your calculator when you work out the r value
2) Insert them into the equation: y=a+bx
3) Pick any data point in your set and insert the ‘y’ value into the equation to get x
4) Plot the (x,y) point found
5) Repeat this for several values in your data set then connect them using a straight line (this is the regression line!)
What is standard deviation?
The number that tells us the average distance which the data points lie from the mean. The higher the number the larger the spread of the data.
What are the range of possible values for ‘r’?
-1≤r≤1
What does x̄ mean?
The mean
What is standard deviation squared equal to?
The variance
How do you find outliers?
Calculate x̄ + 2σx and x̄ - 2σx to get a range of data values (that includes 95% of your data), any values that fall outside these bounds are outliers
Define systematic sampling
This is where every nth person or item in the population is selected (after using a method to randomly select the first person)
What are the advantages and disadvantages of systematic sampling?
Advantages: can be used for quality control on a production line, should give an unbiased sample
Disadvantages: if intervals coincide with a pattern in the population then the sample could be biased
How do you calculate the mode if there are several values repeated the same number of times? What if none of the values are repeated?
Several values repeated - write all of the repeated values separated by a comma
None - no mode
How do you use your calculator to calculate standard deviation?
1) Home button
2) Press statistics
3) Type in your data
4) select the ‘stats’ tab and then scroll down to standard deviation