Ch.6 Field Flashcards by Eli Cusano

Population and Samples

How well did you know this?

Not at all

Perfectly

Population and Samples

What are some points on samples you have to remember?

The form of our model is similar for all samples and for the population (The model we get for one sample is similar to all the other models for all the other samples, and similar to the model for the population)
Parameter estimates vary across samples -> Samples don’t have parameter values that match the true values exactly
Spread of scores around pouplation model is consistent -> lines representing limits are parallel to the model itself. In other words, at all values of the IV the spread of scores around the model is assumed to be similar (See image 1)

How well did you know this?

Not at all

Perfectly

Population and Samples

What is error?

The difference between the predicted value of the DV at a certain level of the IV vs the observed value of the DV at the same value of the IV
error = observed - model
- When referring to error in sample models, we use e
- When referring to error in population models, we use ε
!!! e and ε are the same concept (error = observed - model), they’re just used in different circumstances !!!

(See image 2, notive the difference in hats as well, because with a sample model we want to estimate the population parameters (we can’t get the population values from our data) we add hats. In the population model because our data give us the actual numbers in the population, we don’t estimate parameters, so there are no hats. Also note the difference in e & ε as mentioned above)

How well did you know this?

Not at all

Perfectly

Population and Samples

What are some general notes on error?

most errors in prediction will be close to 0 (this is what the book said, seems like bullshit to me, nonetheless I think just remember it as a note if it’s asked in a multiple choice question)
As magnitude of errors increase, frequency decreases.
~ !!! The opposite isn’t necessarily the case. Remember it just in this direction only !!!
The distribution of the errors is also normally distributed with a mean of 0 and a variance of s^2

How well did you know this?

Not at all

Perfectly

Errors vs Residuals

How well did you know this?

Not at all

Perfectly

Errors vs Residuals

What is a Residual

Residual = observed value - value predicted by the model = error, BUT SPECIFICALLY FOR A SAMPLE MODEL (simply, residual is error for a sample model)
- Since we use a sample model to estimate the population model, it’s likely that the residuals from the sample model are a good approximation of the population errors
- If we make a plot distribution of all the residuals, it’s normal with a mean of 0

How well did you know this?

Not at all

Perfectly

Errors vs Residuals

What do we use Residuals for?

We use Residuals to inder different stuff about the errors in the population model

How well did you know this?

Not at all

Perfectly

Errors vs Residuals

What is the equation for Total error?

See image 3

How well did you know this?

Not at all

Perfectly

Errors vs Residuals

What is the ordinary least squares (OLS) regression?

It’s a method that uses the method of least squares to estimate the parameters (b-values) for which the total error is at it’s minimum (method to minimize total error)

How well did you know this?

Not at all

Perfectly

Errors vs Residuals

How do you estimate the variance of the model errors?

See image 4

How well did you know this?

Not at all

Perfectly

Confidence Intervals and Significance Testing

How well did you know this?

Not at all

Perfectly

CI & Significance Testing

General notes on Sampling Distribution

It’s the distribution of parameter estimates across samples
The width reflects variability in sampling error
~ Also called th standard deviation -> in a sampling distribution the sd is called standard error

How well did you know this?

Not at all

Perfectly

CI & Significance Testing

What is the general equation for any test-statistic?

effect/error
This also equals to = size of parameter/sampling variation in the parameter
- Sampling variation in the parameter is equal to the difference between the means for each sample

How well did you know this?

Not at all

Perfectly

CI & Significance Testing

What is the Central Limit Theorem?

!!! If the model errors are normally distributed, the sampling distribution of b^ is also normal.
Therefore, we can estimate the se of b^ and construct the CI and the test-statistic

How well did you know this?

Not at all

Perfectly

CI & Significance Testing

What is true about the relationship between sample size and sampling distribution?

As sample size increases, the sampling distribution approximates a normal distribution with a mean equal to the population mean and a variance equal to σ^2/n
(specifically, when model errors are normally distributed, the sampling distribution for b^ is normal)

How well did you know this?

Not at all

Perfectly

CI & Significance Testing

Based on the above flashcard, what are the steps for conducting NHST and constructing a CI?

Study These Flashcards

When the sampling distribution for b^ is normal we can use s^2 to estimate the SE of b
The sampling distribution of SE is called a X^2 distribution with n-p degrees of freedom
Knowing the estimate of SE(b) allows us to construct a CI and a hypothesis

CI & Significance Testing

What is the Gauss-Markov Theorem?

Study These Flashcards

When certain conditions are met the OLS is the best way to estimate parameters. The condition that need to bet met are:
- Model errors are on average 0
- Homoscedasticity
- Independence of osbervations
Last two conditions are called spherical errors. See image 5

Bias

Study These Flashcards

Bias

What us an unbiased estimator?

Study These Flashcards

An estimator that yields an expected value that is the same as the one it is trying to estimate (in other words: on average the estimate in the sample will match the estimate in the population)

Bias

What is a consistent estimator?

Study These Flashcards

An estimator that produces estimates which tend to the population value as the sample size increases

Bias

What is an efficient estimator?

Study These Flashcards

An estimator that produces estimates that are in a way “the best” of the available methods of estimation
(best = lowest variance, and the estimates are distributed ,pre tightly around the population)

Bias

What is the optimal estimate for any data set?

Study These Flashcards

The mean
(If dataset has a mean, mean is pushed up and to the right)

Bias

What are outliers and why are they problematic?

Study These Flashcards

Data points that differ significantly from the rest of the data
- They bias parameter estimates
- They increase SSR a lot
If SSR is affected by outliers, the following happens:
1. SSR is biased
2. SD is biased
3. SE is biased
4. CI and test-statistic are biased

Bias

What should we do with outliers?

Study These Flashcards

Keep them, unless you know they’re not representative of the population

Assumptions

# Assumptions What are Assumptions?

A condition that ensures that what we're attempting to do works as it should

# Assumptions What is the most important assumption?

Linearity and additivity: The process we're trying to describe can be described by a linear model Even if all other assumptions are met (next flashcard) the model is invalid because our description of the process of the model is wrong

# Assumptions What are the other general assumptions?

- Expected value of errors is 0 - Spherical errors ~ Homoscedasticity ~ Independence of errors - Assumption of normality - (No outliers) (not really considered by many an assumption, bu still could be thought of as one) (See image 6 summarizes some of the things we might want from a model, and the assumptions required for them).

# Assumptions Notes on Homoscedasticity

- Homoscedasticity applies to population errors, not your sample data. BUT, if sample residuals exhibit the characteristics of homogeneity, so will the population errors probably - If violated, SE, CI, and significance test associated with a parameter will be inaccurate ~ If we apply though the method least squares, we can get an unbiased estimate of a parameter. Still though, the CI and SE is inaccurate (See image 7 as well for another note)

# Assumptions Notes on Independence of Errors

If violated, same consequences as if Homoscedasticity was violated

# Assumptions Notes on Normality

(In general, the least damage if violated) For the CI, SE and test statistic coming from a parameter to be accurate, the parameter estimate must come from a normal sampling distribution - If sample residuals are normal -> Population error is normal -> Sampling distribution is normal - In large samples, because sampling distribution of the parameter will be normal, this assumption can be ignored (it'll be true either way)

What is bootstrapping?

A robust method tests use in case normaltiy is violated. Lack of normality prevents us from inferring the shape of the sampling distribution unless we have big samples. Bootstrapping gets around this problem by estimating the properties of the sampling distribution empirically from the sample data.

Ch.6 Field Flashcards

(32 cards)