Why can you think of any statistic as a random variable?
It’s value will vary from one sample to another in the same population; there are many alternative sets from which it could have been calculated
If you got average age of 1 for 100 CT residents, is that a weird value?
Not weird since there are millions of combos of 100 that give you that average, BUT relatively speaking might be rare value
What are population, sample, and sampling all?
Types of distributions
What is the probability distribution of statistics called?
Sampling distributions
Sampling distribution
a probability distribution that determines probabilities of the possible values of a sample statistic (such as a sample mean)
How would you define the sampling distribution of the sample mean?
the collection of sample means for all possible random samples of a particular size (n) that can be obtained from a population
Sampling distribution in JT terms
for whatever statistic I calculate (mean, IQR, etc.) let’s not just think about our sample, but every possible sample/combination of the same size. Each of those combos would have a corresponding statistic; that list of all possibilities resluting statistics is the sampling distribution
Sampling distribution is all combos, with…(2 things)
Why do we care about a distribution of samples? (3)
We will use theories about how populations produce samples to make backward inferences from our sample to the population
We can’t typically study a population, but we can study a sample
We can’t know how well a given sample reflects the population, but we can use probability theory to study how samples would tend to come out if we did know the characteristics of the population
(The sampling distribution helps us estimate the likelihood of our sample statistics)
CLT informs us what we can expect to be true about what three things?
What says CLT say about central tendency?
If the mean of the sampling distribution (mean of the billion sample means) is mew, the mean of the population is mew
What does σ2/n tell us?
The spread of the sampling distribution
Why is the numerator of σ2/n important?
Population variance - dictates how spread out the values are, important so you can know how far off the sample statistics are from parameter
What does σ2/n say?
Larger the sample size, smaller the error
CLT: if you pull every value but 1…
doesn’t matter how noisy the value is, your calculations will be great
CLT: if all the info you pull is identical (everyone is 40)…
doesn’t matter how many values you pull, your calculations will be great
CLT: the more spread out the list is/the noisier it is…
the noisier the statistic you calculate is
What does CLT say about functional form?
as sample size gets bigger, sampling distribution (picture) starts to approach normality (at about 30)
What does the CLT say about variability?
σ2/n
When does sampling distribution become normal?
When sampling size is approximately 30 (list should be normal, table A applies)
Central Limit Theorem
If repeated random samples of size n are
drawn from any population (of whatever
form) having a mean μ and a variance σ2, then as n becomes large, the sampling distribution approaches normality, with mean μ and variance σ2/n
What elements determine the variance of the sampling distributions list?
σ2/n (variance of the population over how much of the population we grab)
What does the CLT tell us about the graph?
CLT: as sample size gets bigger, graph for sampling distribution list will more and more approximate the normal curve, EVEN IF the population distribution is highly skewed or U-shaped
What does the CLT tell us in simple terms?
tells us the mean of all the general statistics in that list is the exact parameter/population value