Quiz II Flashcards

(79 cards)

1
Q

Why can you think of any statistic as a random variable?

A

It’s value will vary from one sample to another in the same population; there are many alternative sets from which it could have been calculated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

If you got average age of 1 for 100 CT residents, is that a weird value?

A

Not weird since there are millions of combos of 100 that give you that average, BUT relatively speaking might be rare value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are population, sample, and sampling all?

A

Types of distributions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the probability distribution of statistics called?

A

Sampling distributions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Sampling distribution

A

a probability distribution that determines probabilities of the possible values of a sample statistic (such as a sample mean)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How would you define the sampling distribution of the sample mean?

A

the collection of sample means for all possible random samples of a particular size (n) that can be obtained from a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Sampling distribution in JT terms

A

for whatever statistic I calculate (mean, IQR, etc.) let’s not just think about our sample, but every possible sample/combination of the same size. Each of those combos would have a corresponding statistic; that list of all possibilities resluting statistics is the sampling distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Sampling distribution is all combos, with…(2 things)

A
  1. Without consideration of order (doesn’t matter who interviewed when)
  2. No replacement (once one person selected, can only be in the same sample one time)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why do we care about a distribution of samples? (3)

A

We will use theories about how populations produce samples to make backward inferences from our sample to the population

We can’t typically study a population, but we can study a sample

We can’t know how well a given sample reflects the population, but we can use probability theory to study how samples would tend to come out if we did know the characteristics of the population
(The sampling distribution helps us estimate the likelihood of our sample statistics)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

CLT informs us what we can expect to be true about what three things?

A
  1. central tendency
  2. variability
  3. functional form
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What says CLT say about central tendency?

A

If the mean of the sampling distribution (mean of the billion sample means) is mew, the mean of the population is mew

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does σ2/n tell us?

A

The spread of the sampling distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why is the numerator of σ2/n important?

A

Population variance - dictates how spread out the values are, important so you can know how far off the sample statistics are from parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does σ2/n say?

A

Larger the sample size, smaller the error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

CLT: if you pull every value but 1…

A

doesn’t matter how noisy the value is, your calculations will be great

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

CLT: if all the info you pull is identical (everyone is 40)…

A

doesn’t matter how many values you pull, your calculations will be great

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

CLT: the more spread out the list is/the noisier it is…

A

the noisier the statistic you calculate is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does CLT say about functional form?

A

as sample size gets bigger, sampling distribution (picture) starts to approach normality (at about 30)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does the CLT say about variability?

A

σ2/n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

When does sampling distribution become normal?

A

When sampling size is approximately 30 (list should be normal, table A applies)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Central Limit Theorem

A

If repeated random samples of size n are
drawn from any population (of whatever
form) having a mean μ and a variance σ2, then as n becomes large, the sampling distribution approaches normality, with mean μ and variance σ2/n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What elements determine the variance of the sampling distributions list?

A

σ2/n (variance of the population over how much of the population we grab)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What does the CLT tell us about the graph?

A

CLT: as sample size gets bigger, graph for sampling distribution list will more and more approximate the normal curve, EVEN IF the population distribution is highly skewed or U-shaped

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What does the CLT tell us in simple terms?

A

tells us the mean of all the general statistics in that list is the exact parameter/population value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Does the CLT still work even if the population distribution is highly skewed or U-shaped?
Yes! Though will need more data
25
When might you need around 100, not just 30, observations for the distribution to approach normality with the CLT?
Highly skewed data
26
With the CLT, what does having 30+ observations practically allow you to do?
*can expect sampling distribution to be approximately normal in shape *table A therefore applies *can use z-score and standardization
27
When will sampling distribution (with CLT) be normal regardless of the sample size?
If the pdf of X in the population is normal
28
What is the mean of sampling distribution called?
The expected value of sample means
29
What is the expected value of sample means exactly equal to?
The population mean
30
What is the standard error defined as?
The standard deviation of the sampling distribution of the sample mean (how far a value in the sampling distribution list is from the mean of the list/the poulation mean)
31
On average, how different the sample mean (of the sampling distribution) is from the populatoin mean, is called...
standard error
32
What does the CLT tell us about how the standard error is related to the population standard deviation?
-the larger the sample size, the smaller the standard error (less spread there is among the sample means)
33
What does the σ in σ/√n tell us?
The smaller the value of σ (the less variability there is in the population), the smaller the standard error
34
What is the standard error measuring?
The standard error is the average sampling error
35
What is the standard error?
the standard error specifies how much error to expect, on average, between the sample mean and the population mean; now thinking about sampling distribution as a whole, mean of every possible sampling error
36
Law of Large Numbers
The larger the sample size (n), the more probable it is that the sample mean will be close to the population mean (sample will look more like population)
37
How does the LLN relate to standard error?
The standard error IS the LLN; σ/√n (as sample size goes up, sampling error goes down)
38
LLN vs. CLT. As sample size n increases...
CLT: distribution of sample means will become more normal in shape LLN: more probable it is that sample mean will be close to population mean
39
Difference between CLT and LLN?
LLN is about relationship btwn sample distribution and population distribution CLT is abt relationship btw distribution of sample means and population distribution
40
What does it mean if a sample has 30 or more observations?
This means the sampling distribution of the statistic is approximately normal, no matter what the form of the population distribution
41
population and sample distribution vs. sampling distribution of a statistic
First two: similar, just different responses, and the bigger the sample gets, more similar they are Third: doing calculations; as samples get bigger here, standard deviation is less, difference between ŷ and mew will be less
42
T/F: the CLT says that when sample size increases from 30 to 30,000, the sampling distribution is more precise
False - already observe this normal distribution at 30 observations, no difference in term of the form, table A applies regardless
43
What does the z-score tell you in terms of the sampling distribution of the sampling mean?
position of a specific sample within the distribution of sample means; tells you exactly where a specific sample is located in relation to all other possible samples that could have been obtained
44
When taking z-score of a sample mean, what must you use/think about differently?
In the denominator, the error will now be the standard error, instead of the standard deviation as before
45
What accounts for the difference between our sample mean and the true population parameter?
sampling error (fact that we took a sample)
46
What is the sampling error
difference between a sample statistic and the population parameter (due to fact that we took a sample) - just for one sample, not all the possibilities
47
If ŷ = 81 while mew = 81.85, what is the sampling error?
0.5
48
Two possible types of sampling error
Random (you hope for) or systematic (affects some people more than others)
49
Example of systematic sampling error?
If doing CT age survey, and took sample from a daycare
50
Why is systematic sampling error worse than random sampling error?
The methods we're learning will not adjust for it
51
Common sources of sampling error (2)
under coverage nonresponse
52
meaning/example of under coverage
some groups in the population left out of the selection process (mail survey, homeless people left out)
53
meaning/example of nonresponse
subject chosen fro the sample can't be contacted or don't cooperate (rich people when asking about income)
54
4 types of samples
1. simple random sample 2. probability sample 3. stratified random sample 4. multistage sampling design
55
Simple random sample
Every set of n individuals has an equal probability of being selected
56
Probability sample
Gives each member of the population a known chance (greater than zero) of being selected
57
Point of probability sample
idea is usually to make sure you get enough members of under-represented groups to support an analysis
58
Difference between portability sample and simple random sample?
Simple random sample is one specific, fundamental type of probability sampling
59
Stratified random sample
First divide the population into relevant groups (strata), then choose a separate SRS in each stratum and combine these SRSs to form full sample
60
PROPORTIONATE stratified random sample
Proportionate: the size of the sample selected from each subgroup is proportional to the size of that subgroup in the entire population
61
DISPROPORTIONATE stratified random sample
the size of the sample selected from each strata is disproportional to the size of that strata in the population
62
Example of when might use stratified random sample
Looking for Vietnamese population in the U.S. - random selection won't work, so might define a Vietnamese, Cambodian strata, then within those groups randomly select observation
63
example of disproportionate stratified random sample?
collect 1000 people, find Vietnamese poulation in US, randomly select 500 Vietnamese, and 500 non-Vietnamese
64
Implications of disproportionate stratified random sample
Will have to think about error differently
65
Multistage sampling design
Select successively smaller groups within the population in stages and do SRS at each stage
66
Example of multistage sampling design
First identify large geographical units (states - randomly select a few states, instead of letting all be represented. Within each state, randomlly select counties. Within each country, randomly select town. Within each town, randomly select households.
67
Possible problem with multistage sampling design
Representativeness: if you randomly select N. Dakota, S. Dakota, and Wyoming, and rural counties within each, it might not be super representative
68
Why might someone do a multistage sampling design?
Concentrating the sample geographically like this reduce costs
69
Given the equation σ2/n for the variance of the sampling distribution, as the number of samples you pull go up, what can you say?
The sampling error will get reduced
70
What is the probability density function?
Drawing the curve, and the y-axis is probability/proportion
71
CLT trimmed down
As n gets larger, the sampling distribution approaches normality, even if the pdf of X in the population is not normal
72
If the picture/pdf of the population is normal, then you could have a sample size of 2, and what would be true?
the sampling distribution will be normal
73
Relationship between sampling error and standard error?
Standard error looks at every possible sample; it's is mean of every possible sampling error
74
As you go from 30 to 300 samples, will sampling distribution get pointier? What will change?
Level of clustering will NOT change - instead, picture will look the same, the units on the x-axis (average difference between a sample mean and a population mean) might change
75
What is kind of three for all three types of distributions?
Difference between mew and ý will be smaller as your sample gets bigger
76
What does LLN say about relationship between sample and population? (2)
1. Bigger the sample, the closer the functional form of the population and sample (or the pictures will look) 2. If calculate a statistic, the sample gets bigger, it'll be closer to the population value
77
How does LLN play into sampling distributions?
When samples get closer to the population, the average level of sampling error goes down, the standard deviation of sampling distribution will go down
78