Statistics Flashcards

(60 cards)

1
Q

What is the most important principle in statistics?

A

Correlation doesn’t imply causation!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the various possibilities when 2 data sets (X and Y) are correlated?

A

1) X causes Y
2) Y causes X
3) X and Y partly cause one another
4) X and Y are both caused by something else
5) Correlation is just chance, there’s no causal relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does ‘r’ represent?

A

The strength of correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the types of correlation?

A

1) Perfect
2) Strong
3) Moderate
4) Weak
5) No correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the difference between correlation and association?

A

Correlation: linear relationship between 2 variables (type of association)
Association: any relationship between 2 variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the r value also known as?

A

PMCC

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does a higher r value mean? What does r=0 mean?

A

A stronger correlation. r=0 means no correlation at all

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is interpolation? What is extrapolation?

A

Interpolation - inferences within our data (valid and reliable)
Extrapolation - inferences outside our data (unreliable)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What should you do when drawing a line of best fit?

A

1) same number of points above and below the line
2) try to have all points equidistant from the line
3) don’t include outliers
4) don’t extend the line beyond the points (not even to the origin as you can’t infer that the trend holds!)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does this symbol mean: ≈?

A

Estimated value/ approximate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the formula for a regression line?

A

y=a+bx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a regression line?

A

A straight line showing the best fit for scattered data points on a graph, shows the linear relationship between an independent variable (X-axis) and a dependent variable (Y-axis)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What points will a regression line always pass through?

A

The mean point, coordinate formed by the average of all x and y values in the dataset (x̄, ȳ) where the straight line represents the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do use your calculator to work out the r value of a data set?

A

1) press home and go to ‘regression’
2) type in your data down the column
3) select the subheading ‘graph’ then ‘regression’ then ‘linear’
4) select the subheading ‘stats’ and scroll down until you find the regression coeff. (r) value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you plot a regression line?

A

1) Use the values of ‘a’ and ‘b’ given to you by your calculator when you work out the r value
2) Insert them into the equation: y=a+bx
3) Pick any data point in your set and insert the ‘y’ value into the equation to get x
4) Plot the (x,y) point found
5) Repeat this for several values in your data set then connect them using a straight line (this is the regression line!)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is standard deviation?

A

The number that tells us the average distance which the data points lie from the mean. The higher the number the larger the spread of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the range of possible values for ‘r’?

A

-1≤r≤1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does x̄ mean?

A

The mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is standard deviation squared equal to?

A

The variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How do you find outliers?

A

Calculate x̄ + 2σx and x̄ - 2σx to get a range of data values (that includes 95% of your data), any values that fall outside these bounds are outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Define systematic sampling

A

This is where every nth person or item in the population is selected (after using a method to randomly select the first person)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are the advantages and disadvantages of systematic sampling?

A

Advantages: can be used for quality control on a production line, should give an unbiased sample
Disadvantages: if intervals coincide with a pattern in the population then the sample could be biased

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How do you calculate the mode if there are several values repeated the same number of times? What if none of the values are repeated?

A

Several values repeated - write all of the repeated values separated by a comma
None - no mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How do you use your calculator to calculate standard deviation?

A

1) Home button
2) Press statistics
3) Type in your data
4) select the ‘stats’ tab and then scroll down to standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is the equation for calculating **sample** standard deviation?
sx = √ (Σ(x-x̄)^2) / n-1 The **whole term is square rooted** not just the dividend. This equation is the same as one of the standard deviation equation equations, except it has a divisor of n-1 instead of n
26
What are the 2 equations for standard deviation?
1) **√ (Σ(x-x̄)^2) /n** where ‘x’ represents an individual data point and ‘n’ is the number of data points. The **whole term** is square rooted. 3) **√ (Σx^2)/n (-x̄^2)** All terms are under the square root and Σx^2)/n is a separate fraction which x̄^2 is subtracted from. **Σx^2 this means square each individual data point separately then add them up**
27
What is the standard deviation symbol?
σ
28
What equations do you use for boxplots to determine outliers?
UQ + 3/2(IQR) LQ - 3/2(IQR)
29
What are some key points when analysing histograms?
1) area under the graph is (proportional/) equal to frequency (A=kf) 2) the y-axis shows **frequency density** 3) bar widths can be unequal 4) bars are touching (continuous data)
30
How do you calculate frequency density?
Frequency / class width
31
What different types of distributions are there? Describe their shapes
1) symmetrical (same shape curve either side of median) 2) positive skew (the graph has a long tail in the positive direction) 3) negative skew (the graph has a long tail in the negative direction)
32
How do you find the mean of data shown in a histogram?
(Σ (midpoint x frequency)) / total frequency
33
Can you be asked to draw cumulative frequency diagrams, boxplots or histograms?
No
34
What is stratified sampling?
If a population is divided into categories a stratified sample uses the same proportions of each category in the sample as there is in the population.
35
What are the advantages and disadvantages of stratified sampling?
Advantages: if the categories are mutually exclusive this should give a representative sample Disadvantage: extra process to decide who will be surveyed can be expensive (in terms of time or money)
36
How are cumulative frequency graphs plotted?
Endpoint (of x-axis value) against cumulative frequency
37
What is useful about cumulative frequency graphs?
They allow us to measure the spread of data
38
When analysing boxplots what values should you compare?
**Range or IQR** and the **median**
39
Give 5 key facts about the large data set
1) 5 different propulsion types: petrol, diesel, electric, gas/petrol, electric/petrol 2) 3 regions: North West, London and South West 3) 2 years of registration: 2002 & 2016 4) Types of emission produced: CO2, CO and nitrous oxides 5) 5 car manufacturers: BMW, Ford, Vauxhall, Volkswagen, Toyota
40
What does combinations mean? What does permutations mean?
Combinations: order doesn’t matter. Permutations: order does matter.
41
What is the value of 0! ?
1
42
What does nCr mean and what numbers can n be?
“n choose r”. n can be zero or above (positives but not negatives! This is the same for r) **n must also be greater than or equal to r**
43
What is the factorial function’s symbol and what does it mean?
n! It means multiply the number by every consecutive number before it until you reach the number 1
44
What is 4! ?
4x3x2x1=24
45
What is the formula for combinatorics?
n! / (r! (n-r)!)
46
How can you write “n choose r”?
(n,r) where the n is above the r like in a vector Or nCr where the n is the highest and the other letters are each to the left and slightly lower
47
How do you get “n choose r” up on your calculator?
Press tool box then probability then combinatorics then (n,r)
48
In (n,2) what is the lowest number n could be?
2 because n≥r
49
Simplify (n,2)
1) n! / (2!(n-2)!) 2) n(n-1)(n-2)… / 2(n-2)(n-3)… 3) cancel out the upper and lower fractions, simplifying to get 1/2n(n-1)
50
What is nCr used to calculate?
Combinations e.g. 5c2 means how many different ways are there of choosing 2 things from 5
51
What number is on the edges of Pascal’s triangle?
1
52
What number is the first row in Pascal’s triangle? What about the first column?
Row/column **zero**
53
What do the rows and columns in Pascal’s triangle represent?
Rows: how many items you’re choosing from Columns: how many options you’re choosing from the total (The answer tells you the number of possible combinations)
54
How do you work out the next row of Pascal’s triangle?
The numbers diagonally above the one you want to work out sum to make that number
55
PMCC stand for?
Product Moment Correlation Coefficient
56
State one type of emission where more than 80% of the data is known for cars in the entire UK Department for Transport Stock Vehicle Database
CO2
57
Rahman claims there is an error in the large data set because there is no data for the CO emissions of a Toyota from the SW. Use your knowledge of the data set to comment on his claim.
It’s incorrect as not all values of CO are known in the large data set
58
Anna claims the mean CO2 emissions for Fords first registered in London in 2016 is 0.485 g/km. Use your knowledge of the large data set to explain why this value must be incorrect.
CO2 emissions are in **10s and 100s** of grams per kilometre
59
Why is the large data set limited?
It doesn’t include all car types, there are only 2 years of registration, only 3 regions and **there isn’t data for CO emissions for all of the cars in the data set**
60
Denzel wants to buy a car with a propulsion type other than petrol or diesel. He takes a sample from the LDS of the CO2 emissions (g/km) of cars with one particular propulsion type. The sample is: 82, 13, 96, 49, 96, 92, 70, 81. Using your knowledge of the LDS state which propulsion type this sample is for, giving a reason for your answer
Electric/petrol as this is the only category with this many values