What is a point estimate? How is it computed? Provide an example.
Point estimates are single sample values used to estimate population parameters.
Computation:
mean = sum of single sample values/size of sample
The value generated is called the point estimate of the mean.
What is student’s t-distribution and when is it used? How does it compare to the normal distribution?
Student’s t-distribution is a bell-shaped probability distribution that is symmetrical about its mean. It is used when constructing confidence intervals based on small smalls (where n < 30) from populations with unknown variance and a normal distribution.
Compared to normal distribution, t-distribution is flatter with fatter tails.
What are the properties of student’s t-distribution?
What happens to t-distribution when the degrees of freedom increases? What happens when degrees of freedom increases without bounds?
When degrees of freedom increases, the centre becomes more spiked and its tails become thinner.
When degrees of freedom increases without bounds, t-distribution converges to the standard normal distribution (z-distribution).
What is degrees of freedom?
Degrees of freedom is the number of observations, which is calculated as n - 1.
What are fat tails an indication of?
Fat tails mean that there are more outliers (observations away from the centre of the distribution).
How are confidence intervals for a random variable that follows a t-distribution related to degrees of freedom?
Confidence intervals for a random variable that follows a t-distribution must be wider when the degrees of freedom are less (fatter tails) for a given significance level, and narrower when the degrees of freedom are more (thinner tails) for a given significance level.
What is a confidence interval?
Confidence interval estimates result in a range of values within which the actual value of a parameter will lie, given the probability of 1 - alpha which is referred to as the degree of confidence.
What is alpha?
Alpha is the level of significance for confidence interval.
How are confidence intervals constructed?
CIs are constructed by adding or subtracting an appropriate value from the point estimate.
Point estimate plus minus (reliability factor x standard error)
How is the confidence interval for the population mean calculated, given that the population has a normal distribution with a known variance?
With known variance and normal distribution, CI is calculated as:
Point estimate for population mean plus minus reliability factor times standard deviation over the square root of sample size
What is the reliability factor for 90% CI?
What is the reliability factor for 95% CI?
What is the reliability factor for 99% CI?
Reliability factor for 90% CI = 1.645 (significance level is 10%, 5% in each tail)
Reliability factor for 95% CI = 1.960 (significance level is 5%, 2.5% in each tail)
Reliability factor for 99% CI = 2.575 (significance level is 1%, 0.5% in each tail)
How is the confidence interval for the population mean calculated, given that the population has a normal distribution with an unknown variance?
With unknown variance and normal distribution, CI is calculated as:
Point estimate for population mean plus minus t-reliability factor, corresponding to degrees of freedom 1 - n, times the standard deviation over the square root of the sample size.
How is the confidence interval created for a non-normal distribution?
If the sample size is less than 30 (n < 30), confidence intervals cannot be constructed.
If the sample size is greater than 20 (n > 30)
What are the two limitations to using a larger sample size?
What is data mining? What are the warning signs of data mining?
Data mining occurs when analysts repeatedly use the same database to search for patterns or trading rules until one that works is discovered.
Warning signs:
What is data-mining bias?
Data mining bias refers to results where the statistical significance of the pattern is overestimated because the results were found through data mining.
What is the best way to avoid data mining?
The best way to avoid data mining is to test a potentially profitable trading rule on a data set different from the one used to develop the rule.
What is sample selection bias?
Sample selection bias occurs when some data is systematically excluded from the analysis, because of the lack of availability. This results in a non-random observed sample and any conclusions drawn from this sample cannot be applied to the population.
What is survivorship bias? What is an example of survivorship bias? What is the solution?
Survivorship bias is a result of excluding data that no longer exist from the sample so that the result is an overestimation.
Example: mutual funds
Solution: use a sample that all started at the same time and do not exclude data that have been removed
What is look-ahead bias?
Look-ahead bias occurs when a study tests a relationship using sample data that was not available on the test date.
What is time-period bias?
Time-period bias can result if the time period over which the data is gathered is either too short or too long.