Statistics/Regression and Correlation/Error Analysis Flashcards

(153 cards)

1
Q

What do Descriptive Statistics do?

A

Summarise and describe the main features of a dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are Descriptive Statistics used for?

A

-Identifying trends and patterns
-Detecting outliers
-Facilitates clinical decision-making

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 3 measures of central tendency?

A

-Mean
-Median
-Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Give the formula to calculate the mean

A

(∑xi) / n
xi - ith data item in the list of n data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the mean?

A

Average of all the data points within the data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the median?

A

Middle value in an ordered dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you work out the median in an even number of data?

A

Average of the middle 2 numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the mode?

A

Most frequent value in the data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which is most influenced by outliers, median or mean?

A

Mean is most influenced by outliers, median is least influenced by outliers

Median is a better representation or summary of the data set than the mean is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the 3 measures of dispersion?

A

-Range
-Variance
-Standard Deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the range?

A

Difference between the highest and lowest value in the data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the variance?

A

Average of the squared deviations from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you calculate variance

A

(xi - mean)
(each data point - mean) and then square it, add all of the variances up and divide by the total number of points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why should we square the deviations when calculating variance?

A

If we don’t square each deviation then the average will always be zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What happens if we measure average of the squared deviations about a point other than the mean?
Why do we use the mean?

A

-Leads to greater variance
-Using the deviations about the mean will give us the lowest variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you calculate Standard Deviation (SD)?

A

SD = √variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Give the formula to calculate Standard Deviation (SD)

A

SD = √ [ ∑(xi - mean)^2 / n ]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What 2 types of variables can you have for Probability Density Functions (PDFs)?

A

-Discrete variable
-Continuous variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What does a Probability Density Function (PDF) represent?
What does a Probability Density Function (PDF) describe?

A

A PDF represents the probability distribution of a continuous random variable

PDFs describe the probability of a variable being within a certain interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Which part of a Probability Density Function (PDF) represents probability?

A

Area under the curve
Total area under the curve = 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Will a Probability Density Function (PDF) ever by negative?

A

No, a PDF will never by negative (below the x-axis) as this would imply a negative probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What does a PDF value at a single point (height) indicate?

A

Relative density of data around that point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Give some key principles of a Normal Distribution

A

-Type of PDF
-Bell shaped probability distribution
-Symmetric around the mean
-Mean = median = mode
-Determined by the mean and standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What can a Normal Distribution be useful for?

A

-Can define what is normal and help to identify deviations that might be pathological; usefulness in screening

-Statistical Inference

-Predictive Power (can help predict patient outcomes)

-Clinical Applications (screening and diagnosis, VA assessments)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
When would we use a t-distribution?
Used when sample size is small or when standard deviation is unknown
26
What does the shape of a t-distribution depend on?
Shape of the t-distribution depends on the degrees of freedom (df) -For small values of df the t-distribution is wider than the Normal Distribution -For df > 30 the two are almost the same
27
What are degrees of freedom (df)?
df is the number of independent values that can vary in a statistical calculation For a single sample, df = n – 1, where n is sample size
28
When comparing groups, what are some things to look out for?
Small Sample Sizes : -Reduced statistical power Confounding Variables : -Variables that influence both the independent and dependent variables Bias : -Selection bias, measurement bias or reporting bias can affect results
29
What are the types of t-tests? Give an example of each
Independent (two-Sample) t-Test : -Compares the means of two independent groups Example : Comparing IOP between patients with and without glaucoma Paired t-Test : Compares means within the same group at two different times or conditions Example : Comparing VA before and after LASIK surgery in the same Px's One-Sample t-Test : Compares the mean of a single group to a known value or theoretical expectation. Example : Comparing the mean axial length in a study population to a standard axial length for a given age group
30
What are assumptions of a t-test?
-Data is approximately normally distributed -Scale of measurement is continuous -Homogeneity of variance (equal variances in groups) for independent t-tests -Observations are independent
31
What is H0 and H1?
H0 - null hypothesis There is no difference in ... H1 - alternative hypothesis There is a difference in ...
32
Can you add Standard Deviations together?
No
33
How can you add Standard Deviations together?
Convert the Standard Deviations to variances by squaring them, add them and then square root at the end to convert back to Standard Deviations
34
What can straight line regression be used to predict?
Predicts/estimates one variable based on another
35
Give the format of the straight line regression equation
Y = a + bX Y - dependent variable (what you measure) X - independent variable (what you change) a - intercept b - slope (gradient)
36
What would be on the x-axis? What would be on the y-axis?
x-axis - independent variable (what you change) y-axis - dependent variable (what you measure)
37
What is Near Point of Convergence (NPC)?
The closest distance at which your eyes can turn inward (converge) to focus on a single object
38
What are residuals?
A residual is the difference between a value and the line (the difference between a measured value and the corresponding predicted value)
39
Does the best fit line try to minimise residuals?
The best fit line minimises residuals It is close to as many points as possible
40
What sign can residuals be?
-Can be +ve or -ve, and when they are summed they always cancel out to 0 -To avoid them cancelling out, square them
41
Give the formula to calculate the Sum of Squared Residuals (SSR)
SSR = ∑ [ 𝑦i−(a+b x 𝑥i) ]^2 𝑦i - actual data value a+b x 𝑥i - predicted data value
42
What is Correlation?
Correlation measures the strength and direction of a relationship between two variables (tells you how closely related variables are to each other) Correlation measures how close the data is to a straight line
43
What values can correlation range from?
1 and -1
44
What does a correlation of 1 mean?
-Means the data falls exactly on a line sloping upwards -Strong positive correlation -Both variables increase together
45
What does a correlation of 0 mean?
-Means the data does not fit on a line (or the line is horizontal) -Random association
46
What does a correlation of -1 mean?
-Means the data falls exactly on a line sloping downwards -Strong negative correlation -When one variable increases, the other decreases
47
If you swap x and y (axis), does this have any effect on correlation?
Swapping X and Y has no effect on the correlation as the correlation is symmetrical in x and y
48
If you swap x and y (axis), does this have any effect on a straight line?
Swapping X and Y produces a different line as a straight line is not symmetrical in x and y
49
What is Categorical data?
Data when the information naturally falls into one of two categories
50
How do you calculate Odds Ratio?
A/C / B/D = AD / BD
51
What should the Odds Ratio equal?
1, anything much bigger or smaller than 2 suggests a relationship
52
What happens to the Odds Ratio if you swap either the rows or the columns?
Odds Ratio becomes 1/x
53
Formula to calculate the Chi-Squared statistic
(observed - expected)^2 / expected
54
How do you calculate degrees of freedom for a Chi-Squared statistic
(number of rows - 1) x (number of columns - 1)
55
What is statistical inference?
Process of drawing conclusions or making predictions about a population based on data from a sample
56
What is a population?
The entire group or set of individuals, items, or data points that you are interested in studying
57
What is a sample?
A subset of the population that is actually observed or measured
58
Why do we gather data from a sample rather than from a population?
Using a population is : 1) Too expensive 2) Time consuming 3) Not everyone will want to participate in the study 4) Impractical
59
What would the population and sample be in this study? Determining the average refractive error in all adults in a city
Population: all adults in the city Sample: a randomly selected sample of fifty adults; each has refractive error measured
60
How is sample mean written as?
x with a line on top
61
How is population mean written as?
μ
62
How is sample Standard Deviation (SD) written as?
s
63
How is population Standard Deviation (SD) written as?
σ (sigma)
64
What is the point estimate?
Gives a specific value for the population parameter of interest
65
What is the interval estimate?
Gives a range of values for the population parameter of interest
66
What does the Standard Error of The Mean represent?
Represents how well the sample mean represents the mean of the population
67
What do repeatedly drawing samples from a population and calculating their means have? What is this called?
These means have a distribution; this is the distribution of the sample means
68
What is the mean of the distribution of the sample means?
μ
69
What is the SD of the distribution of the sample means?
σ (sigma) / √n
70
What is the SD of the distribution of the sample means known as?
Standard error of the mean (SEM)
71
What does the standard error of the mean estimate?
Estimates the variability of sample means from the true population mean
72
What does a large standard error of the mean (SEM) imply in terms of the population mean?
Large SEM implies the sample means are widely dispersed around the population mean
73
What does a small standard error of the mean (SEM) imply in terms of the population mean?
Small SEM implies the sample means are tightly clustered around the population mean
74
What effect does increasing the sample size have on the shape of the distribution?
Distribution will become sharper
75
What 2 things does the size of the standard error of the mean (SEM) depend on?
-Population standard deviation (σ) -Sample size (n)
76
If the population standard deviation (σ) is large, what does this mean for the standard error of the mean (SEM)?
Larger SEM The larger the population standard deviation (σ), the larger the SEM
77
If the sample size (n) is large, what does this mean for the standard error of the mean (SEM)?
Smaller SEM The larger the sample size (n), the smaller the SEM
78
What can we do to have a more reliable test?
Take a larger sample size (n)
79
What does a smaller standard error of the mean (SEM) indicate?
Indicates that the sample mean is a more accurate estimate of the population mean
80
What does a larger standard error of the mean (SEM) suggest?
Suggests greater uncertainty about the population mean, indicating more variability across sample means
81
What does the standard deviation measure?
Measures the spread of individual data points in a sample or population
82
What does the standard error of the mean (SEM) measure?
Measures how accurately a sample mean estimates the population mean
83
Why is the standard error of the mean (SEM) typically smaller than the SD?
SEM deals with sample means, not individual data points (it is easier to predict the mean of several data points, than it is to predict individual data points)
84
What is the confidence level?
The probability that the confidence interval will contain the true population parameter
85
What is the confidence interval?
A range of values within which the true population parameter is expected to fall
86
What is a critical value?
A value from a statistical distribution that corresponds to the desired confidence level
87
What does a confidence interval of (10, 20) mean?
The population parameter is believed to be between 10 and 20 with a certain level of confidence (e.g. 95%)
88
What statistical distribution is for large samples?
z-distribution
89
What statistical distribution is for small samples?
t-distribution
90
What does a 95% confidence interval mean? What does it not mean?
A 95% confidence interval means that 95% of the intervals you create from repeated sampling would contain the true population parameter It does not mean that there is a 95% probability that the true parameter lies within a specific interval
91
Formula to calculate Confidence Interval (CI)
CI = x ± tc x s / √n x - sample mean tc - critical value of t s / √n - SEM
92
When would we use a Independent two-sample t-test?
When there are 2 sets of groups of people that both have independent measurements
93
If ⍺ = 0.05, what would each tail equal?
0.05/2 = 0.025 Each tail = 0.025
94
What would you do if the t statistic lies in the middle region?
Accept the null hypothesis (H0)
95
What would you do if the t statistic lies in the critical region(s)?
Reject the null hypothesis (H0) and accept the alternate hypothesis (H1)
96
What is the significance level?
The probability of rejecting the null hypothesis when it is actually true (a Type I error)
97
What is the significance level often denoted as?
⍺ (alpha)
98
What does the significance level represent?
Represents the threshold for determining whether the observed data is sufficiently unusual to reject the null hypothesis
99
What are common significance levels?
-0.05 -0.01 -0.10
100
What does a significance level of 0.05 mean?
There’s a 5% risk of incorrectly rejecting the null hypothesis
101
What is the critical region?
The set of values for the test statistic that leads to rejection of the null hypothesis
102
What is the critical region determined by?
Significance level (⍺)
103
What is the p-value?
The probability of obtaining test results at least as extreme as the results actually observed, assuming that the null hypothesis is true
104
If the p-value ≤ ⍺ value, do you accept or reject the null hypothesis?
Reject the null hypothesis
105
If the p-value > ⍺ value, do you accept or reject the null hypothesis?
Accept the null hypothesis
106
True or false P-value ≠ Probability of Hypothesis Explain why or why not
True A p-value is the probability of the observed data (or more extreme data) under the null hypothesis, not the probability that the null hypothesis is true
107
True or false P-value = Strength of Evidence Explain why or why not
False A p-value gives evidence against the null hypothesis, but it doesn’t provide a measure of the strength of the evidence in favour of the alternative hypothesis
108
True or false P-value ≠ Certainty Explain why or why not
True A p-value, even if small, does not guarantee the result is true; it only tells you the likelihood of the data under the null hypothesis
109
What are the 2 types of error?
-Type I error (false positive) -Type II error (false negative)
110
When does a Type I error occur?
Occurs when the null hypothesis is rejected even though it is true
111
What controls the probability of a Type I error?
Significance level (⍺)
112
When does a Type II error occur?
Occurs when the null hypothesis is not rejected even though it is false
113
What is a Type II error denoted as?
β
114
How do you minimise both Type I and Type II errors?
Increase the sample size
115
What does increasing the sample size do to the standard error and t statistic?
-Reduces the standard error -Increases the t statistic
116
What does error analysis involve?
Involves understanding and quantifying the different types of potential errors when collecting, analysing or interpreting data.
117
What are the 2 ways in which results from a survey are analysed?
-Descriptive statistics -Associations between potential causes and effects
118
True or false A survey needs a defined population
True
119
What does a sample need to be?
Representative of the population (everybody in the population has an equal probability of being chosen to take part in the study)
120
What 2 things can biases occur in?
-Selection of study subjects -The measurements actually taken
121
What is sampling error?
Difference between the sample estimate and the true population value
122
Ways to reduce sampling error
-Increase the sample size (reduces variability) -Ensure a more random selection -Use stratified sampling, where different subgroups (such as age or geographic location) are specifically included to ensure the sample matches the diversity of the population
123
When does a systematic sampling error occur?
Occurs when there is a consistent, non-random bias introduced in the selection of participants, leading to results that are not representative of the general population
124
How does a systematic sampling error often arise?
Often arises from flaws in the sampling method that systematically exclude certain groups or over-represent others (e.g. recruiting patients from a high-end optical store excludes low-income participants; solution – recruit from multiple locations)
125
What are the 2 types of measurement error?
-Random error -Systematic error
126
Are random errors due to chance?
Yes
126
What do random errors lead to?
Leads to variability around the true value (slightly off)
127
What do random errors reduce?
Reduce the precision of the data
128
How do random errors affect variability?
Variability can be in both magnitude (small or big random error) and direction (overestimate or underestimate)
129
What are the types of systematic error?
-Zero or offset error -Scale error
130
What is a zero or offset error?
A certain value is added to all of the measurements NOTE : THESE ARE NON-RANDOM AND CAN BE COMPENSATED FOR IF KNOWN
131
What do systematic errors affect?
Affects the accuracy of the data
132
Define precision
Very little scatter in the data
133
Define accuracy
The average is close to the “true” value
133
How can measurement error be minimised?
-Repeating measurements- overestimates and underestimates partly cancel each other (end up with a value that is somewhat close to the true value) -Frequently recalibrating the equipment -Standardising the procedure for measuring
134
What is response bias?
-Self-reports can sometimes be very unreliable -The Px might be inclined to say what they think you want to hear
135
How can response bias be reduced?
-Having an anonymous survey -Having a third-party survey (neutral person who is not likely to bias the survey) -Having questions with more options than just a yes/no response (subjects have to think more and give more thoughtful answers)
136
When does nonresponse bias occur?
Occurs when certain groups of people are less likely to respond to a survey, causing an unrepresentative sample
137
How can nonresponse bias be reduced?
Resending survey to non-responders and trying to get the maximum number of people to respond
138
When do modelling errors occur?
Occurs when the statistical model is too simple, too complex or totally incorrect
139
How can we make Standard Errors small?
Having larger sample sizes
140
What are the 2 reasons as to why a survey has not picked up anything significant?
1) There is actually nothing significant 2) There is something significant but it was not picked up because the sample size was too small
141
Formula to calculate Confidence Interval (CI)
CI = p ± √ p(1-p) / n
142
What is power?
The ability of a statistical test to find a significant result when it is real
143
How is power denoted?
1-β
144
How can you increase the power of a hypothesis test?
Increase the sample size
145
What 3 things is power affected by?
-Sample size -Effect size -Significance level (⍺)
146
What is power analysis? When must it be done?
-Likelihood of detecting a significant effect, if one exists -Must be done during the design of a study
147
What is the Bonferroni Correction used for?
-Used for multiple comparisons -Used for correcting for the increased risk of false +ves
148
When can you make a correction/use the Bonferroni Correction?
When you know the number of comparisons
149
What does the correction tell you?
Probability of making a false positive x times in a row
150
How do you use the Bonferroni Correction to correct for the increased risk of false +ves?
Adjust the significance level by dividing it by the number of tests (α/m), where m is the number of tests
151
Use the Bonferroni Correction to correct for 10 individual tests
For 10 independent tests α = 0.05/10 = 0.005, the probability of at least one false positive is : 1−(1−0.005)^10 = 0.049 (5%)