Statistics/Regression and Correlation/Error Analysis Flashcards

Question

When would we use a t-distribution?

Answer 1

Used when sample size is small or when standard deviation is unknown

Answer 2

Shape of the t-distribution depends on the degrees of freedom (df) -For small values of df the t-distribution is wider than the Normal Distribution -For df > 30 the two are almost the same

Answer 3

df is the number of independent values that can vary in a statistical calculation For a single sample, df = n – 1, where n is sample size

Answer 4

Small Sample Sizes : -Reduced statistical power Confounding Variables : -Variables that influence both the independent and dependent variables Bias : -Selection bias, measurement bias or reporting bias can affect results

Answer 5

Independent (two-Sample) t-Test : -Compares the means of two independent groups Example : Comparing IOP between patients with and without glaucoma Paired t-Test : Compares means within the same group at two different times or conditions Example : Comparing VA before and after LASIK surgery in the same Px's One-Sample t-Test : Compares the mean of a single group to a known value or theoretical expectation. Example : Comparing the mean axial length in a study population to a standard axial length for a given age group

Answer 6

-Data is approximately normally distributed -Scale of measurement is continuous -Homogeneity of variance (equal variances in groups) for independent t-tests -Observations are independent

Answer 7

H0 - null hypothesis There is no difference in ... H1 - alternative hypothesis There is a difference in ...

Answer 8

Convert the Standard Deviations to variances by squaring them, add them and then square root at the end to convert back to Standard Deviations

Answer 9

Predicts/estimates one variable based on another

Answer 10

Y = a + bX Y - dependent variable (what you measure) X - independent variable (what you change) a - intercept b - slope (gradient)

Answer 11

x-axis - independent variable (what you change) y-axis - dependent variable (what you measure)

Answer 12

The closest distance at which your eyes can turn inward (converge) to focus on a single object

Answer 13

A residual is the difference between a value and the line (the difference between a measured value and the corresponding predicted value)

Answer 14

The best fit line minimises residuals It is close to as many points as possible

Answer 15

-Can be +ve or -ve, and when they are summed they always cancel out to 0 -To avoid them cancelling out, square them

Answer 16

SSR = ∑ [ 𝑦i−(a+b x 𝑥i) ]^2 𝑦i - actual data value a+b x 𝑥i - predicted data value

Answer 17

Correlation measures the strength and direction of a relationship between two variables (tells you how closely related variables are to each other) Correlation measures how close the data is to a straight line

Answer 18

-Means the data falls exactly on a line sloping upwards -Strong positive correlation -Both variables increase together

Answer 19

-Means the data does not fit on a line (or the line is horizontal) -Random association

Answer 20

-Means the data falls exactly on a line sloping downwards -Strong negative correlation -When one variable increases, the other decreases

Answer 21

Swapping X and Y has no effect on the correlation as the correlation is symmetrical in x and y

Answer 22

Swapping X and Y produces a different line as a straight line is not symmetrical in x and y

Answer 23

Data when the information naturally falls into one of two categories

Answer 24

A/C / B/D = AD / BD

Answer 25

1, anything much bigger or smaller than 2 suggests a relationship

Answer 26

Odds Ratio becomes 1/x

Answer 27

(observed - expected)^2 / expected

Answer 28

(number of rows - 1) x (number of columns - 1)

Answer 29

Process of drawing conclusions or making predictions about a population based on data from a sample

Answer 30

The entire group or set of individuals, items, or data points that you are interested in studying

Answer 31

A subset of the population that is actually observed or measured

Answer 32

Using a population is : 1) Too expensive 2) Time consuming 3) Not everyone will want to participate in the study 4) Impractical

Answer 33

Population: all adults in the city Sample: a randomly selected sample of fifty adults; each has refractive error measured

Answer 34

x with a line on top

Answer 35

σ (sigma)

Answer 36

Gives a specific value for the population parameter of interest

Answer 37

Gives a range of values for the population parameter of interest

Answer 38

Represents how well the sample mean represents the mean of the population

Answer 39

These means have a distribution; this is the distribution of the sample means

Answer 40

σ (sigma) / √n

Answer 41

Standard error of the mean (SEM)

Answer 42

Estimates the variability of sample means from the true population mean

Answer 43

Large SEM implies the sample means are widely dispersed around the population mean

Answer 44

Small SEM implies the sample means are tightly clustered around the population mean

Answer 45

Distribution will become sharper

Answer 46

-Population standard deviation (σ) -Sample size (n)

Answer 47

Larger SEM The larger the population standard deviation (σ), the larger the SEM

Answer 48

Smaller SEM The larger the sample size (n), the smaller the SEM

Answer 49

Take a larger sample size (n)

Answer 50

Indicates that the sample mean is a more accurate estimate of the population mean

Answer 51

Suggests greater uncertainty about the population mean, indicating more variability across sample means

Answer 52

Measures the spread of individual data points in a sample or population

Answer 53

Measures how accurately a sample mean estimates the population mean

Answer 54

SEM deals with sample means, not individual data points (it is easier to predict the mean of several data points, than it is to predict individual data points)

Answer 55

The probability that the confidence interval will contain the true population parameter

Answer 56

A range of values within which the true population parameter is expected to fall

Answer 57

A value from a statistical distribution that corresponds to the desired confidence level

Answer 58

The population parameter is believed to be between 10 and 20 with a certain level of confidence (e.g. 95%)

Answer 59

z-distribution

Answer 60

t-distribution

Answer 61

A 95% confidence interval means that 95% of the intervals you create from repeated sampling would contain the true population parameter It does not mean that there is a 95% probability that the true parameter lies within a specific interval

Answer 62

CI = x ± tc x s / √n x - sample mean tc - critical value of t s / √n - SEM

Answer 63

When there are 2 sets of groups of people that both have independent measurements

Answer 64

0.05/2 = 0.025 Each tail = 0.025

Answer 65

Accept the null hypothesis (H0)

Answer 66

Reject the null hypothesis (H0) and accept the alternate hypothesis (H1)

Answer 67

The probability of rejecting the null hypothesis when it is actually true (a Type I error)

Answer 68

⍺ (alpha)

Answer 69

Represents the threshold for determining whether the observed data is sufficiently unusual to reject the null hypothesis

Answer 70

-0.05 -0.01 -0.10

Answer 71

There’s a 5% risk of incorrectly rejecting the null hypothesis

Answer 72

The set of values for the test statistic that leads to rejection of the null hypothesis

Answer 73

Significance level (⍺)

Answer 74

The probability of obtaining test results at least as extreme as the results actually observed, assuming that the null hypothesis is true

Answer 75

Reject the null hypothesis

Answer 76

Accept the null hypothesis

Answer 77

True A p-value is the probability of the observed data (or more extreme data) under the null hypothesis, not the probability that the null hypothesis is true

Answer 78

False A p-value gives evidence against the null hypothesis, but it doesn’t provide a measure of the strength of the evidence in favour of the alternative hypothesis

Answer 79

True A p-value, even if small, does not guarantee the result is true; it only tells you the likelihood of the data under the null hypothesis

Answer 80

-Type I error (false positive) -Type II error (false negative)

Answer 81

Occurs when the null hypothesis is rejected even though it is true

Answer 82

Significance level (⍺)

Answer 83

Occurs when the null hypothesis is not rejected even though it is false

Answer 84

Increase the sample size

Answer 85

-Reduces the standard error -Increases the t statistic

Answer 86

Involves understanding and quantifying the different types of potential errors when collecting, analysing or interpreting data.

Answer 87

-Descriptive statistics -Associations between potential causes and effects

Answer 88

Representative of the population (everybody in the population has an equal probability of being chosen to take part in the study)

Answer 89

-Selection of study subjects -The measurements actually taken

Answer 90

Difference between the sample estimate and the true population value

Answer 91

-Increase the sample size (reduces variability) -Ensure a more random selection -Use stratified sampling, where different subgroups (such as age or geographic location) are specifically included to ensure the sample matches the diversity of the population

Answer 92

Occurs when there is a consistent, non-random bias introduced in the selection of participants, leading to results that are not representative of the general population

Answer 93

Often arises from flaws in the sampling method that systematically exclude certain groups or over-represent others (e.g. recruiting patients from a high-end optical store excludes low-income participants; solution – recruit from multiple locations)

Answer 94

-Random error -Systematic error

Answer 95

Leads to variability around the true value (slightly off)

Answer 96

Reduce the precision of the data

Answer 97

Variability can be in both magnitude (small or big random error) and direction (overestimate or underestimate)

Answer 98

-Zero or offset error -Scale error

Answer 99

A certain value is added to all of the measurements NOTE : THESE ARE NON-RANDOM AND CAN BE COMPENSATED FOR IF KNOWN

Answer 100

Affects the accuracy of the data

Answer 101

Very little scatter in the data

Answer 102

The average is close to the “true” value

Answer 103

-Repeating measurements- overestimates and underestimates partly cancel each other (end up with a value that is somewhat close to the true value) -Frequently recalibrating the equipment -Standardising the procedure for measuring

Answer 104

-Self-reports can sometimes be very unreliable -The Px might be inclined to say what they think you want to hear

Answer 105

-Having an anonymous survey -Having a third-party survey (neutral person who is not likely to bias the survey) -Having questions with more options than just a yes/no response (subjects have to think more and give more thoughtful answers)

Answer 106

Occurs when certain groups of people are less likely to respond to a survey, causing an unrepresentative sample

Answer 107

Resending survey to non-responders and trying to get the maximum number of people to respond

Answer 108

Occurs when the statistical model is too simple, too complex or totally incorrect

Answer 109

Having larger sample sizes

Answer 110

1) There is actually nothing significant 2) There is something significant but it was not picked up because the sample size was too small

Answer 111

CI = p ± √ p(1-p) / n

Answer 112

The ability of a statistical test to find a significant result when it is real

Answer 113

Increase the sample size

Answer 114

-Sample size -Effect size -Significance level (⍺)

Answer 115

-Likelihood of detecting a significant effect, if one exists -Must be done during the design of a study

Answer 116

-Used for multiple comparisons -Used for correcting for the increased risk of false +ves

Answer 117

When you know the number of comparisons

Answer 118

Probability of making a false positive x times in a row

Answer 119

Adjust the significance level by dividing it by the number of tests (α/m), where m is the number of tests

Answer 120

For 10 independent tests α = 0.05/10 = 0.005, the probability of at least one false positive is : 1−(1−0.005)^10 = 0.049 (5%)

Statistics/Regression and Correlation/Error Analysis Flashcards

(153 cards)