Population and Samples
Population and Samples
What are some points on samples you have to remember?
Population and Samples
What is error?
The difference between the predicted value of the DV at a certain level of the IV vs the observed value of the DV at the same value of the IV
error = observed - model
- When referring to error in sample models, we use e
- When referring to error in population models, we use ε
!!! e and ε are the same concept (error = observed - model), they’re just used in different circumstances !!!
(See image 2, notive the difference in hats as well, because with a sample model we want to estimate the population parameters (we can’t get the population values from our data) we add hats. In the population model because our data give us the actual numbers in the population, we don’t estimate parameters, so there are no hats. Also note the difference in e & ε as mentioned above)
Population and Samples
What are some general notes on error?
Errors vs Residuals
Errors vs Residuals
What is a Residual
Residual = observed value - value predicted by the model = error, BUT SPECIFICALLY FOR A SAMPLE MODEL (simply, residual is error for a sample model)
- Since we use a sample model to estimate the population model, it’s likely that the residuals from the sample model are a good approximation of the population errors
- If we make a plot distribution of all the residuals, it’s normal with a mean of 0
Errors vs Residuals
What do we use Residuals for?
We use Residuals to inder different stuff about the errors in the population model
Errors vs Residuals
What is the equation for Total error?
See image 3
Errors vs Residuals
What is the ordinary least squares (OLS) regression?
It’s a method that uses the method of least squares to estimate the parameters (b-values) for which the total error is at it’s minimum (method to minimize total error)
Errors vs Residuals
How do you estimate the variance of the model errors?
See image 4
Confidence Intervals and Significance Testing
CI & Significance Testing
General notes on Sampling Distribution
CI & Significance Testing
What is the general equation for any test-statistic?
effect/error
This also equals to = size of parameter/sampling variation in the parameter
- Sampling variation in the parameter is equal to the difference between the means for each sample
CI & Significance Testing
What is the Central Limit Theorem?
!!! If the model errors are normally distributed, the sampling distribution of b^ is also normal.
Therefore, we can estimate the se of b^ and construct the CI and the test-statistic
CI & Significance Testing
What is true about the relationship between sample size and sampling distribution?
As sample size increases, the sampling distribution approximates a normal distribution with a mean equal to the population mean and a variance equal to σ^2/n
(specifically, when model errors are normally distributed, the sampling distribution for b^ is normal)
CI & Significance Testing
Based on the above flashcard, what are the steps for conducting NHST and constructing a CI?
CI & Significance Testing
What is the Gauss-Markov Theorem?
When certain conditions are met the OLS is the best way to estimate parameters. The condition that need to bet met are:
- Model errors are on average 0
- Homoscedasticity
- Independence of osbervations
Last two conditions are called spherical errors. See image 5
Bias
Bias
What us an unbiased estimator?
An estimator that yields an expected value that is the same as the one it is trying to estimate (in other words: on average the estimate in the sample will match the estimate in the population)
Bias
What is a consistent estimator?
An estimator that produces estimates which tend to the population value as the sample size increases
Bias
What is an efficient estimator?
An estimator that produces estimates that are in a way “the best” of the available methods of estimation
(best = lowest variance, and the estimates are distributed ,pre tightly around the population)
Bias
What is the optimal estimate for any data set?
The mean
(If dataset has a mean, mean is pushed up and to the right)
Bias
What are outliers and why are they problematic?
Data points that differ significantly from the rest of the data
- They bias parameter estimates
- They increase SSR a lot
If SSR is affected by outliers, the following happens:
1. SSR is biased
2. SD is biased
3. SE is biased
4. CI and test-statistic are biased
Bias
What should we do with outliers?
Keep them, unless you know they’re not representative of the population