Mode (Moda)
The value that occurs most frequently in a given data set.
Interquartile Range (IQR)
Rozstęp ćwiartkowy
Q3-Q1
Standard Deviation (SD)
Odchylenie standardowe: sqrt(variance)
Variance
Population: (mean - xi)^2 / N where xi is each element of set
Sample: Use n - 1 instead
How to describe a histogram
4 Main Aspects:
Study design and types of study
Encompasses everything in preparation for data-driven research process.
Types:
Dependent (example when) vs. Independent Data
i.i.d.
i = independent
id = identically distributed
Simple Random Samples (SRS)
Each sampling unit of a population has an equal chance of being included in the sample.
Longitudinal Data
Repeated measures of same variable, collected from same unit over time → likely correlated.
Repeated Measures Data: Wide and Long
Wide format: one row per subject, each measure in separate column.
Long format: one row per measurement.
Quantitative Variables types
Categorical (or Qualitative) Variables
Conducting a Population Census
Gather data from the whole population.
Probability Sampling
Probability sampling refers to the selection of a sample from a population, when this selection is based on the principle of randomization, that is, random selection or chance.
Probability of selection for each unit is known.
Types: SRS, Complex (anything beside SRS - cluster, stratification, etc…)
Stratification
Population divided into different strata, and part of sample is allocated to each stratum; → ensures sample representation from each stratum, and reduces variance of survey estimates.
Clustering
Clusters of population units (e.g., counties) are randomly sampled first (with known probability) within strata, to save costs of data collection (collect data from cases close to each other geographically)
Non-Probability Sampling
Pseudo-Randomization
Combine non-probability sample with a probability sample, Estimate probability of being included in non-probability sample as a function of auxiliary information available in both samples,
Non-Probability Sampling Calibration
Compute weights for responding units in non-probability sample that allow weighted sampled to mirror a known population.
Example: If we got more responses from females than males (but population is 50/50), then down-weight females and up-weight males.