Lecture 6: Supplemental Material Flashcards by Max O

What will most studies do before data collection begins?

attempt to determine size of the sample needed to achieve certain degree of accuracy in estimation

How well did you know this?

Not at all

Perfectly

Two reasons why estimating minumum sample size commonly done with population proportions?

With population proportions, you do not need to make separate guesses about the population mean and standard deviation
With population proportions, it is easy to identify a conservative mean, and the bias does not vary much

How well did you know this?

Not at all

Perfectly

Why is estimating minimum sample size less commonly done with population means?

With population means, you need to make separate guesses about the population mean and standard deviation
We generally have a hard time making a good guess about a population standard deviation without measuring it

How well did you know this?

Not at all

Perfectly

Three things that must set to calculate the desired sample size

precision: decide on maximum error B around the proportion (i.e. how wide is the desired confidence interval)
Confidence: set probability with which the specified precision/this interval is achieved (ie.e what is the desired confidence coefficient)
Proportion: select a value of the population proportion pi (either using best available data or conservative estimate)

How well did you know this?

Not at all

Perfectly

why must we select a
value of the population proportion – π (most conservative is 0.5)

This is because the spread of the sampling
distribution depends on the value of π

How well did you know this?

Not at all

Perfectly

Why is the most conservative guess of pi 0.5?

this value produces the largest standard error (most spread/variance); you’ll know you got the right sample size

How well did you know this?

Not at all

Perfectly

What is you guessed that pi is 0.5, but it turns out that it is was actually 0.8?

Either way, it would’ve said you needed a smaller sample size, less observations than you originally thought, your result will be even more precise

How well did you know this?

Not at all

Perfectly

What equation do you use to answer the question “for a given level of precision, how many observations do I need to ensure that election night +/- value that I want?”

n = π( 1− π)
( z/B)^2

How well did you know this?

Not at all

Perfectly

in the equation n = π( 1− π)
( z/B)^2, what does B denote?

the maximum error around the proporMon

How well did you know this?

Not at all

Perfectly

in the equation n = π( 1− π)
( z/B)^2, what does n denote?

denotes the sample size ensuring that, with fixed probability,
the error of esMmaMon of π by the sample proporMon is no
greater than B

How well did you know this?

Not at all

Perfectly

in the equation n = π( 1− π)
( z/B)^2, what does zw denote?

the corresponding z-score for a confidence interval
with a confidence coeﬃcient equal to the fixed probability

How well did you know this?

Not at all

Perfectly

In the equation n = π( 1− π)
( z/B)^2, can you throw in a value for pi?

NO - that is the true population proportion (you don’t know that, and you can’t estimate it by throwing in sample value)

How well did you know this?

Not at all

Perfectly

Two solutions to not knowing pi in n the equation n = π( 1− π)
( z/B)^2

If another researcher collected data before, maybe could utilize that value as best guess.
Most of the time, researchers use a specific value (0.5) to figure out how many people they need to sample

How well did you know this?

Not at all

Perfectly

What equation do you use to calculate the necessary sample size for estimating means?

n = σ^2(z/B)^2

How well did you know this?

Not at all

Perfectly

What equation do you use to calculate the necessary sample size for estimating proportions?

n = π( 1− π)( z/B)^2,

How well did you know this?

Not at all

Perfectly

What must you specify when calculating sample size for estimating means? n = σ^2(z/B)^2

2 things

Desired confidence coefficient
population standard deviation

How well did you know this?

Not at all

Perfectly

What must you specify when calculating sample size for estimating means? n = σ^2(z/B)^2 , how might you specify the standard deviation?

*use standard deviation from past research, or related population data if possible

How well did you know this?

Not at all

Perfectly

when dealing with calculating sample size for estimating means: the __ the spread of the population
distribution, as measured by the __, the __ the
sample size needed to achieve a certain accuracy

greater
standard deviation
larger

How well did you know this?

Not at all

Perfectly

Two types of estimators?

Biased
Unbiased

How well did you know this?

Not at all

Perfectly

Unbiased estimator

the average sample statistic (from an indefinitely large number of samples) equals the population parameter in the long run/ (average of sampling distributions’ statistics will equal parameter)

How well did you know this?

Not at all

Perfectly

T/F: When you have an unbiased estimator, for some samples a statistic may underestimate the
parameter of interest and for others it may overesMmate
the parameter

True - but in the long run the estimates will “average” themselves out

How well did you know this?

Not at all

Perfectly

Biased estimator/statistic

In the long run, the statistic consistently over or underestimates the parameter it is estimating (the average of all possible statistics is not equal to the parameter)

How well did you know this?

Not at all

Perfectly

Why is range a biased estimator?

Study These Flashcards

when find all possible sample ranges and take the average, doesn’t equal population range, since almost of all sample ranges might be like 4 or 7, only combos including outlier of 10 million have the true range captured

Why is standard deviation a biased estimator?

Study These Flashcards

end to underestimate, since most samples don’t contain the max and min value; the true population parameter will always be a bit larger, also using a different equation

Does CLT still apply when you have biased estimators? and what's the caveat?

Yes - you'll still have data clustering, mode/median/mean will be equal, just won't be at the population parameter

Examples (2) of biased estimators

Range Standard deviation

Why do we care if estimator is biased or unbiased?

if biased, have implications for how to calculate interval; the logic of interval estimate requires that the value be in the middle, unbiased

What makes a statistic POSITIVELY biased?

If it tends to overestimate the parameter

What makes a statistic NEGATIVELY biased?

If it tends to underestimate the parameter

T/F: An unbiased statistic is always an accurate statistic

FALSE - If a statistic is sometimes much too high and sometimes much too low, it can still be unbiased. It would be very imprecise, however.

What is one possible saving situation for biased statistics?

A statistic can be slightly biased but still be efficient if it systemically results in a very small over or underestimate of a parameter

T/F: If the population distribution is positively skewed (skewed to the right), then the expected value of the sampling distribution of the sample mean might not be equal to the population mean

FALSE - it still will be!

T/F: If the population distribution is positively skewed (skewed to the right), then the expected value of the sampling distribution of the sample median might not be equal to the population mean

TRUE - value is consistently less than the true population mean, so it's a biased estimator of the population parameter when the population distribution is positively skewed

What is efficiency/precision? (2 definitions)

Statistic more erfficient/precise if its standard error smaller relative to others: Idea that statistic obtained from any single sample from a population is close to the value of the parameter being estimated (not much error) how stable a statistic is from sample to sample

The less subject to sampling fluctuation a statistic is, is it more or less efficient?

more efficient!

Why is the efficiency of statistics often called the relative efficiency?

It's measured relative to the efficiency of other statistics

What might cause us to say that statistic A is more efficient than statistic B?

If statistic A has a smaller standard error than statistic B

T/F: If an estimator is biased, it is automatically inconsistent and inefficient

NO - something could be highly efficient and biased

How do decide if one equation is more efficient/precise estimator?

Whichever one produces the smaller standard error

Most important thing about efficiency/precision?

They are RELATIVE concepts - for two equations...which ones have smaller standard error?

Area where efficiency/precision might be applied (and you can see it's relativity shine)

For a given parameter, might have multiple equations for how to get the interval (and some of those equations might have less error, give you tighter interval)

T/F: The mean is always more efficient than the median

FALSE - relative efficiency of the two statistics may depend on distribution involved. For instance, the mean is more efficient than the median for normal distributions but not for some extremely skewed distributions

What key factor does the relative efficiency of two statics depend on?

the distribution involved (normal or skewed)

The more __ the statistic, the more __ the statistic is as an estimator of the parameter

efficient precise

Three things to keep in mind when selecting a point estimate

Unbiased vs. biased Efficiency/precision consistency

Consistency of an estimator

An estimator is consistent if the estimator tends to get closer to the parameter it is estimating as the sample size increases/error goes down

Difference between consistent and unbiased estimators?

consistent estimators get better as the size of the sample increases, whereas unbiased estimators get better as you take the average of the statistic from a large number of samples

What might cause you to label an estimator as inconsistent?

If, as you increased sample size, the error went up

Why does consistency matter?

In general want biggest possible sample, but sometime it hurts

What important concept does consistency (as sample size gets larger, a statistic gets closer and better at estimating the parameter) connect to?

LLN

What does B entail

“Beta” term, aka the maximum error around the proportion

If a confidence interval is NOT about where the parameter falls, what IS it about?

It's about having a good/reasonable guess of what the parameter is. Then its the classique interpretation of "we are x% confident our parameter falls within the (x,y) c.i." There is a trade off between precision and confidence level.

What are the tradeoffs between precision and confidence levels (for intervals?) - careful, different type of precision

as precision increases, confidence level decreases

How do you select the value of the population proportion pi when estimating B (2 options) - ## Footnote what do I mean here for estimatign b??

1. using best available data 2. conservative estimate

Example of interpretation for estimating sample size of proportion, where n = 600.25?

n = 600.25, so we should select a minimum of 601 houses for our sample

Lecture 6: Supplemental Material Flashcards

(56 cards)