Independent & Dependent Variables
• Independent variables
– The characteristic being observed or measured that is hypothesized to influence an event of manifestation. E.g., Risk factors
• Dependent variables
– The value of which is dependent on the effect of other variable(s). A manifestation or outcome whose variation we seek to explain or account for by the influence of independent variables. E.g., Disease outcome
Continuous vs. Discrete Data
• Continuous: Quantitative with potentially infinite number of values along continuum. Can be measured to as many decimal places as measuring instrument allows. E.g., Weight, height
• Discrete:
– Count – quantitative data that can be arranged into discrete, naturally occurring or arbitrarily selected groups or sets of values, e.g., pulse rate
– Categorical
-> Nominal–qualitative, named category; the order of the categories is irrelevant to statistical analyses e.g., gender, reproductive status
-> Ordinal–ordered categories, qualitative e.g., disease staging in cancer, education level
Descriptive vs. Inferential statistics
• Descriptive statistics
– Communicate results without attempting to generalize
– Important first step in epidemiologic studies
• Inferential statistics
– Used to infer the likelihood that the observed results can be generalized to other samples of individuals
Measures of central tendency
• Mean – The average, determined by adding all values and dividing by total number of subjects • Mode – The most common value in the data • Median – Value in dataset where 1⁄2 subjects are smaller and 1⁄2 are larger. List data in ascending order Find the median location as (n+1) / 2
Measures of Dispersion (Variation)
• Need to be able to measure the extent to which individual values differ from mean:
• Range: The difference between the highest and lowest values
• Variance: Average squared deviation of each value from the mean
Σ(Individual value – mean value)^2 / (n - 1)
Because variance is reported in squared units, take square root of the variance and report standard deviation
• Standard deviation (SD): Average measure of how individual values differ from the mean
– The smaller the SD, the less each score varies from the mean
– The larger the spread of scores, the larger the SD. SD = √ Σ(Individual value – mean value)^2
/ (n - 1)
• When reporting estimates of central tendency, report measure of dispersion, e.g., mean ± SD
Inference & Assessing the Role of Chance
How can we quantify the degree to which chance variability may account for the results observed in any individual study
– By performing appropriate test of statistical significance and determining the p-value
How to determine likelihood that sampling variability (chance) explains the observed results?
Hypothesis Testing
• Performing a test of statistical significance to determine likelihood that sampling variability (chance) explains the observed results
• Make explicit statement of hypothesis to be tested:
– Null hypothesis (H0): Always the hypothesis of no difference. The assertion that there is no association between exposure and disease, e.g., RR = 1, OR = 1
– Alternative hypothesis (H1 or HA): The assertion that there is some association between exposure and disease, e.g., RR ≠ 1, OR ≠ 1
The Appropriate Test of Statistical Significance
• Will vary by study design, data type and situation
• Generates a test statistic that is a function of:
– The difference between observed values in the study and expected values if null hypothesis were true, and
– The variability in the sample
• Will lead to a probability statement (p-value)
p-value
t Test
• Parametric test for differences between means of independent samples
– Continuous data
- H0: mean1 = mean2
- HA: mean1 ≠ mean2
Chi-square test
• Test whether observed differences in proportions between study groups are statistically significant
– I.e., Whether there is an association between exposure and outcome
– Categorical data
H0: proportions are equal; no association
HA: proportions are different; there is an association