1b Statistics Flashcards

(46 cards)

1
Q

Analysis of variance (ANOVA).

A

A form of linear model with a continuous outcome variable and categorical input variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Bayesian methods.

A

Methods which allow parameters to have distributions. Initially the parameter ? is assigned a prior distribution P(?), and after data, X, have been collected a posterior distribution P(?|X) is obtained using Bayes’ Theorem, which links the two via the likelihood p(X| ?).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Binomial distribution.

A

The distribution shows frequency of events that have two possible outcomes.

When sample size is large, approximates to normal distribution.

Two parameters - n (sample size) and π (true probability)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Cluster randomised trial.

A
  • Involves randomisation of clusters (groups) of participants as opposed to individuals
  • Used to reduce contamination between control and intervention participants
  • Some interventions may only be delivered at population rather than individual level (e.g. health promotion campaigns)
  • Generally need a 30% larger sample size than RCT
  • ICC measures the homogeniety within cluster (variation within a cluster) the higher the ICC, the higher the ICC, the less variation within a cluster and the more power is “lost”.
  • Random mixed effects model assumes variation between clusters is random
  • Fixed effects model assumes that differences between clusters is not random, i.e. differences between school attainment is to with systematic differences between schools and not due to chance.
  • Complex to plan - need to try and account for variation

Example
- randomising schools in a trial of an intervention to reduce childhood obesity that was administered at school level and would be
hard to deliver to individual pupils with contamination of controls.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Conditional logistic regression.

A

Used to analyse binary data from matched case-control studies, or from cross-over trials.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Confidence Interval.

A

A 95% confidence interval displays the degree of uncertainty about the population parameter provided by the sample estimate

  • technically, if one conducted the same study, with the same sample size, 100 times, we would expect 95% of these intervals to cover the true population parameter.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is design effect?

A

The amount that a variance of an estimator has to be inflated to allow for clustering, or other aspects of the sampling scheme.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Effect modification

A

Given a model between an input variable and an outcome, effect modification occurs if the observed relationship is changed markedly when a third variable is included in the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Fixed effect vs random effect

A

In meta analysis:

Fixed effect: assumes treatment effect is the same (fixed) in all studies in the meta-analysis

Random effects model: allows the treatment effect to vary across studies.

FE and RE differ in the way studies are weighted and in the interpretation of the summary effect.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Forest plot

A

A plot used in meta analysis. Usually it comprises a series of estimates and confidence intervals from the component studies, and a summary estimate and confidence interval. Supposedly named because it appears like a set of trees.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Funnel plot

A

A plot used in meta analysis to try and detect publication bias. It comprises a plot of the precision of estimates of treatment effects from component studies versus the estimate itself.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Methods to account for clustering in analysis x3

A
  • Calculating summary statistics for each cluster and then analysing these using standard techniques
  • Calculating robust standard errors that account for clustering
  • More sophisticated techniques, such as generalised estimating equations and multi-level models
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Hazard Rate:

A

The probability per time unit that a case that has survived to the beginning of the respective interval will fail in that interval. Specifically, it is computed as the number of failures per time units in the respectiveinterval, divided by the average number of surviving cases at the mid-point of the interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Hazard Ratio (Relative Hazard):

A

Hazard ratio compares two groups differing in treatments or prognostic variables etc. If the hazard ratio is 2.0, then the rate of failure in one group is twice the rate in the other group. Can be interpreted as a relative risk.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Kaplan-Meier plot.

A

A graphical plot of the probability of survival on the y axis by survival time on the x-axis. Censored observations can be incorporated in the plot.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Likelihood:

A

The probability of a set of observations given a model. If the model has a single parameter θ, it is denoted P(X|θ) where X denotes the data.

e.g. Probability of B occurring given that A occurred P(B|A)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Logistic regression.

A

Used to analyse data where the outcome variable is binary. It uses the logistic transform of the expected probability of ‘success’.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Meta-analysis.

A

A method of combining results from different studies to produce a overall summary statistic, usually in clinical trials.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Multiple linear regression.

A

Often just known as multiple regression. Used to analyse data when the outcome is continuous and the model is linear.

20
Q

Null hypothesis

A

�For comparing two samples the assumption under the null hypothesis is that they both came from the same population
�type 1 error: occurs when null hypothesis is true and wrongly rejected, i.e. conclude significant difference exists when in reality doesn’t
�type 2 error: occurs when null hypothesis is wrongly accepted when false, i.e. missing a true difference

21
Q

Type 1 error

A

Type 1 error: Occurs when null hypothesis is true and wrongly rejected, i.e. conclude significant difference exists when in reality doesn’t

22
Q

Type 2 error

A

Type 2 error: occurs when null hypothesis is wrongly accepted when false, i.e. missing a true difference

23
Q

NNT:

A

Number Needed to Treat. An estimate of number of patients that would need to be treated under a new treatment for one more of them to achieve the desired outcome than under the standard treatment

�NNT = 1/ARR (absolute rate reduction)
�method of expressing trial outcomes
�possibly more easily applicable to clinical practice than OR/RR.

However, be aware that it is dependent on the baseline incidence rate, and so cannot be interpreted without knowledge of the baseline incidence.
Example NNT 1.2 ~ for Helicobacter eradication, NNT 40 for preventing death with aspirin after MI

24
Q

NNH:

A

Number Needed to Harm.

Number of patients that a physician would have to treat with a new treatment to harm one extra patient who would otherwise have not been harmed.

Harm may be an adverse reaction, or treatment failure, death etc.

25
Odds.
If p1 is the probability of a success, the odds is the ratio of the probability of a success to a failure p1/(1-p1).
26
Odds ratio.
Used as a summary measure for binary outcomes. If p1 is the probability of a success in one group and p2 the probability of success in another, then the odds ratio is {p1/(1-p1)}/{p2/(1-p2)}.
27
Proportional hazards model (also known as the Cox model).
Used to analyse survival data. The main assumption is if an explanatory variable is binary, then the hazard ratio for this variable is constant over time.
28
Poisson distribution.
The distribution of a count variable when the probability of an event is constant. The Poisson distribution is used to describe discrete quantitative data such as counts in which the population size n is large, the probability of an individual event is small Typical examples: - number of deaths in a town from a disease per day, - number of admissions to a particular hospital Poisson distribution describes the distribution of binary data from an infinite sample. Thus it gives the probability of getting r events in a population.
29
Poisson regression.
Used to analyze data when the outcome variable is a count.
30
Define power
The Power of a test is the probability of declaring a result significant when the null hypothesis is false. It is denoted by 1-beta
31
Positive predictive value (PPV) of a test
PPV is the probability that a subject who tests positive will be a true positive i.e. has the disease and is correctly classified i.e. how good the test is at finding people with disease in screening situation, prevalence is small & PPV low. PPV & NPV both depend on prevalence of disease, sensitivity & specificity PPV=A/(A+B)
32
Negative predictive value (NPV) of a test
NPV is the probability that a subject who is test negative will be a true negative i.e. someone. doesn't have disease and is correctly classified... how good a test is at identifying people without disease NPV=D/(C+D)
33
Publication bias & reason
A phenomenon when some studies which have been conducted fail to be published. It usually occurs because studies that have positive findings are more likely to be written up and submitted for publication, and editors are more likely to accept them
34
p value (probability value (p) or significance value)
Assuming that the null hypothesis is true, the p-value is the probability due to chance alone of obtaining a result at least as extreme as the observed result. �calculate using significance test �P < 0.05: considered statistically significant �P > 0.05: result not statistically significant. Two groups are not significantly different and chance can not be excluded as potential explanation of association
35
Random effects model.
A model with more than one random (or error) term. The assumption is that if the study was done again, the terms would estimate different population parameters, in contrast to a fixed effects model. Thus in a longitudinal study, the effect of a patient on the intervention effect is assumed random.
36
Relative risk.
Used as a summary measure for binary outcomes for prospective studies. If p1 is the probability of success in one group and p2 the probability of success in another, the relative risk is p1/p2. If p1 and p2 are the incidences of an event, then the relative risk is also the incidence rate ratio
37
Type I, type II errors and Power
- A type I error occurs when the null hypothesis is rejected when it is true. A type I error rate is the expected probability of making a type I error, and this should be decided before collecting data. It is essentially the expected **false positive rate (significance level)** of the test and is often denoted by α (usually set at 0.05) - A type two error occurs when a study fails to reject a null hypothesis when it is false, i.e. the alternative hypothesis is true. A type 2 error rate is essentially a **false negative** rate, and is often denoted by β.
38
What is standard error?
Standard error measures how precisely a population measure (eg mean/proportion/rate) is estimated by a sample measure (ie the amount of variability in the sample measure)
39
How to calculate Likelihood ratio for screening (and interpret)
Likelihood ratio: sensitivity/(1-specificity) Number of times more likely to have got +ve result when have the disease
40
Define sensitivity analysis
A sensitivity analysis varies each input to see which are the most important drivers of the final result
41
How does effect size affect power and sample size? How does a smaller effect size than anticipated affect power?
- smaller effects require a **larger sample size** to achieve adequate power, holding all other study components constant. - need more participants in order to detect a small effect. - If the trial effect size is smaller than expected, this would be expected to **decrease the power**
42
What is Kruskal Wallis test?
Non-parametric rank-based test for more than 2 independent variables For continuous outcome and unordered categorical exposure Parametric equivalent: one-way ANOVA
43
What is Mann Whitney U test?
Non-parametric rank-based test for 2 independent variables AKA Wilcoxon Rank Sum test Parametric Equivalent: unpaired t test
44
What is Wilcoxon signed rank test?
Non parametric rank test for paired small samples Parametric equivalent: paired t test
45
What are 3 disadvantages of non-parametric tests?
- Low power (type 2 errors are more likely). - Calculating confidence intervals is more difficult. - Can only generally be used for simple bivariate analysis (i.e. unable to adjust for con- founding or test for interaction).
46
What are 5 assumptions of t-tests?
1. Independence: observations in samples are independent of other sample. 2. Approximately normal distributed. 3. Homogeneity of Variances: Both samples have approximately the same variance. 4. Random Sampling: Both samples obtained using random sampling method. 5. Data are measured at the interval (or ratio) level