week 2 - data wrangling Flashcards

Question 1

Q

Statistics & Samples

Answer

A

Types of Data
- Data = measurements of 1/+ variables made on a sample of individuals.

Question 2

Q

Categorical Variables

Answer

A

Describe membership in a category/group.
Describe qualitative characteristics of individuals that do not correspond to a degree of diff. on a numerical scale.
Categorical variables = attribute/qualitative variables.
E.g. survival (alive or dead),
Nominal if the diff. categories = no inherent order.
Nominal = name.
values of an ordinal categorical variable = ordered.
Magnitude of the diff betw. consecutive values = not known.
Ordinal = having an order

Question 3

Q

Numerical Variables

Answer

A

When measurements of individuals = quantitative & have magnitude.
Numbers.
E.g core body temp. (e.g., degrees Celsius [°C])
Either continuous/discrete.
Continuous numerical data
- Take on any real-number value within some range.
- Betw any 2 values of a continuous variable, an infinite number of other values = possible.
- Continuous data = rounded to a predetermined number of digits, set for convenience
Discrete numerical data
- Data come = in indivisible units.
- Often analyzed as though they = continuous, if large # of possible values.
Numbers might also be used to name categories
Numerical data = be reduced to categorical data by grouping → result contains less info

Question 4

Q

Explanatory & Response Variables

Answer

A

To relate 1 variable to another by examining associations betw. variables & differs betw. groups.
Measuring an association = measuring a difference
Goal = to assess how well 1 of the variables (explanatory variable) predicts/affects the other variable (response variable).
Treatment variable = manipulated by the researcher → explanatory variable
Measured effect of the treatment → response variable.
Neither variable = manipulated by the researcher → association
described by the “effect” of 1 of the variables on the other → not direct evidence for causation.
IV = explanatory
DV = response

Question 5

Q

Frequency Distributions

Answer

A

Diff individuals in a sample = diff.
Measurements → observed by frequency dist.
Freq. of a specific measurement in a sample = #of observations having a particular value of the measurement.
Freq. dist shows how often each value of the variable occurs in the sample
Informs us about the dist of the variable in the pop. it came from.
Gives intuitive understanding of the variable.

Question 6

Q

Probability Distribution

Answer

A

Distribution of a variable in the whole pop. = prob dist.
Real prob in nature = almost never known.
Researchers → theoretical prob dists to approx the real prob dist.

Question 7

Q

Normal Distribution

Answer

A

Normal dist = “bell curve.”
Most important prob. dist. in stats.

Question 8

Q

Describing Data - Sample Mean

Answer

A

Avg. of the measurements in the sample
Sum of all observations divided by # observations
x̄ (symbol)

Question 9

Q

Describing Data - Variance & Standard deviation (SD)

Answer

A

Used to measure of the spread of a dist
How far from the avg the observations =
SD = large → most observations = far from mean
SD = small → most observations = close to mean
Calc. from variance
SD = square root of variance
SD = better b/c = same units as variable
Deviation from mean = diff. betw. measurement & mean
-ve deviations cancel +ve deviations
Need to avg squared deviations
Deviations above & below the mean contribute +ly to var
Never -ve & same units as OG observations
SD = connected to freq. dist. b/c bell-shaped freq, then ⅔ of observations = lie in 1SD of the mean & 95% = lie 2SD

Question 10

Q

Describing Data - Coefficient of Variations (CV)

Answer

A

Calcs. the SD as a % of mean
High CV = more variability
Low CV = individuals = similar & more relative to mean

Question 11

Q

Median

Answer

A

Middle observation in data set
Dividing data set into 2 by sorting from smallest to largest
Even # of observation find average of middle values

Question 12

Q

IQR

Answer

A

Dividing data into quarters
Q1 = middle value lying below the median
Q2 = median
Q3 = middle value lying above the median
IQR = Q3 - Q1

Question 13

Q

Box Plots

Answer

A

Shows median & IQR
Lower & upper edges = Q1 & Q3 → span of the box

Question 14

Q

Measuring Spread & Location Comparison - Mean vs median

Answer

A

Median = middle measurement of a dist.
Mean = center of all points including to outliers → balance
Mean = sensitive to extreme outliers
Median = unaffected
Mean = displaced from location of normal measurement when freq. dist. = strongly skewed → extreme values

Question 15

Q

Measuring Spread & Location Comparison - SD vs. IQR

Answer

A

Calc from square of deviations → more sensitive to extreme observations
IQR = better indicator of spread b/c strongly skewed data due to extreme values
SD reflects variation among all data points

Question 16

Q

Estimating w/ Uncertainty

Answer

Study These Flashcards

A

Estimation = process of inferring a pop parameter from sample data.
All estimates = sampling dist, → prob dist of all the possible values of the estimate that might be obtained under rando sampling w/ given sample size.
Standard error (SE) of an estimate = SD of its sampling dist.
SE measures precision.
Smaller SE = more precise estimate
SE & CI assume that sampling = rando
SE of estimate declines w/ increase in sample size
CI = range of values calced from sample data → likely to contain within its span the value of the target parameter.
95% CIs calced from independent random samples = include the value of the parameter 19/20 times
2SE rule = rough approx to the 95% CI for a mean.
Error bars to graphs reps SEs/CIs

week 2 - data wrangling Flashcards

(16 cards)