Module 8 - Regression Analysis Flashcards

(76 cards)

1
Q

What is the definition of regression analysis?

A

Describes a statistical relationship between variables

The dependent variable responds to the independent variable(s) with measurable uncertainty.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the equation for the most basic regression model?

A

y = a + bx + ϵ

Here, a is the y-intercept, b is the slope, and ϵ is the error term.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does homoscedasticity refer to in regression analysis?

A

Constant variance of normal errors independent of X

This property is essential for valid regression results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the two crucial results of regression analysis for CER development?

A
  • Establishing statistical significance of CERs
  • Quantifying uncertainty in a risk model

These results help in making reliable cost estimates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does Ordinary Least Squares (OLS) regression aim to minimize?

A

Sum of Squared Errors (SSE)

OLS regression is a foundational method for determining the best-fit line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does indicate in regression analysis?

A

Percent of overall variation in cost explained by the regression equation

The complement, 1 − R², indicates unexplained variation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the Standard Error of the Estimate (SEE) used for?

A

Measures accuracy of predictions made by the regression line

It estimates the standard deviation of the normally-distributed error term.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does the F-test determine in regression analysis?

A

Whether the regression model is statistically significant

This test assesses the overall fit of the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the purpose of the t-test in regression analysis?

A

Determines whether individual cost driver variables are statistically significant

It helps in validating the relevance of each predictor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a proxy variable in cost estimating?

A

An independent variable that stands in for one which drives cost

Example: Number of firemen as a proxy for fire size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the role of scatter plots in regression analysis?

A

Visualize patterns and detect correlation between variables

They help in determining the appropriate regression model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does the correlation coefficient (ρ) indicate?

A

Strength and direction of the relationship between variables

Ranges from -1 (perfect negative) to +1 (perfect positive).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the difference between correlation and causation?

A

Correlation indicates a relationship; causation indicates one variable drives another

Causation cannot be statistically verified.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the first step in regression analysis?

A

Create a scatter plot of the data

This helps to visualize correlation and determine the model type.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the two primary practical applications of regression in cost estimating?

A
  • CER development
  • Learning curve analysis

These applications leverage historical data for future cost predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the purpose of confidence intervals (CIs) in regression?

A

Quantify uncertainty regarding the estimate of mean cost

They provide error bounds for the mean of the estimate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does multicollinearity refer to in OLS multivariate regression?

A

Risk introduced by multiple independent variables

It can distort the regression results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the general guideline when using statistical software for regression?

A

Always plot the data and fit a trend line first

This helps guide and check the regression analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the best estimate for any value of the cost driver variable?

A

Result of plugging that x-value into the best-fit regression equation

This provides the predicted cost based on the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is a non-OLS model?

A

Models like Weighted Least Squares (WLS) and Mean Absolute Percentage Error (MAPE)

These models minimize errors other than SSE.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the learning curve analysis in cost estimating?

A

Application of regression to understand cost reductions over time with increased production

It helps in forecasting future costs based on experience.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the first step of regression analysis?

A

Making a scatter plot

A scatter plot provides insight into the correlation’s strength and helps determine the best model type.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

List the basics of linear regression analysis.

A
  • Finding the regression equation
  • Determining the goodness of fit
  • Calculating confidence intervals

These basics are essential for understanding the model’s performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

The key concept of regression is to use the method of least squares for determining what?

A

The best estimates for equation parameters

This method minimizes the sum of the squared differences between the data points and the estimated line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What does **SSE** stand for in regression analysis?
Sum of Squared Errors ## Footnote SSE provides a metric for the total amount of uncertainty surrounding the regression line.
26
True or false: In OLS, residuals are squared before addition to avoid cancellation.
TRUE ## Footnote Residuals can have both positive and negative values, so squaring them ensures they contribute positively to the sum.
27
What does the **y-intercept** represent in a regression equation?
The point where the line passes through the mean of the X and Y data ## Footnote It is calculated using the formula: ( hat{a} = ar{Y} - hat{b} ar{X} ).
28
What does the **error term** ( epsilon ) describe in regression analysis?
Statistical uncertainty surrounding the equation ## Footnote It represents the residuals and indicates that the relationship is statistical rather than purely functional.
29
List the **two key OLS assumptions** regarding the error term.
* It is independent with X * It has a probability distribution ## Footnote These assumptions are critical for the validity of the regression model.
30
What is the purpose of calculating **confidence intervals** in regression analysis?
To estimate the range within which the true parameter values lie ## Footnote Confidence intervals provide insight into the reliability of the predictions made by the model.
31
What is an **irreducible component** in the context of data analysis?
An error term that will always persist despite improvements in data collection and normalization ## Footnote It indicates that some level of error is inherent in the data.
32
What are the two key **OLS assumptions** regarding the error term?
* Independence with respect to X (homoscedasticity) * Normally distributed with a mean of zero and constant variance ## Footnote These assumptions are crucial for the validity of Ordinary Least Squares regression.
33
What does **homoscedasticity** refer to in OLS regression?
The assumption that the error term is independent and shows no correlation with X ## Footnote This means that the spread of the residuals should be constant across all levels of the independent variable.
34
What is the purpose of **residual analysis** in OLS regression?
To check the OLS assumptions by examining a residual plot ## Footnote It helps determine if the regression model is appropriate for the data.
35
What does the regression equation consist of in OLS regression?
* Fitted slope (b hat) * y-intercept (a hat) ## Footnote These coefficients are derived from minimizing the sum of squared errors (SSE).
36
True or false: The error term in regression makes it more than just **curve fitting**.
TRUE ## Footnote The error term accounts for the variability in the data that cannot be explained by the model.
37
What should a residual plot ideally portray in terms of data distribution?
Symmetry about the x-axis with constant variability ## Footnote This indicates that a normal distribution with a mean of 0 is a reasonable assumption.
38
What are the three basic **measures of variation** calculated in ANOVA?
* Sum of Squares Total (SST) * Sum of Squared Errors (SSE) * Sum of Squares Regression (SSR) ## Footnote These measures help assess the goodness of fit of the regression model.
39
The relationship between SST, SSE, and SSR can be expressed as __________.
SST = SSE + SSR ## Footnote This equation shows how total variation is partitioned into explained and unexplained variation.
40
What does **SST** represent in ANOVA?
Total variation in Y ## Footnote It is calculated as the sum of the squared differences between each data point and the mean of the y-values.
41
What does **SSE** represent in ANOVA?
Unexplained variation around the regression line ## Footnote It is calculated as the sum of the squared differences between each observed y-value and its corresponding predicted y-value.
42
What does **SSR** represent in ANOVA?
Explained variation from the average to the regression line ## Footnote It is calculated as the sum of the squared differences between predicted y-values and the mean of the y-values.
43
What is the significance of **degrees of freedom** in the context of SST, SSE, and SSR?
* SST: n-1 d.f. * SSR: k-1 d.f. * SSE: n-k d.f. ## Footnote Degrees of freedom are crucial for determining the statistical significance of the regression model.
44
What is the **penalty** imposed when introducing new parameters in regression analysis?
Larger average variation due to loss of degrees of freedom ## Footnote This reflects the trade-off between model complexity and explanatory power.
45
What is the formula for **Mean Squared Error (MSE)**?
MSE = SSE / (n - k) ## Footnote Where SSE is the sum of squared errors, n is the number of data points, and k is the number of parameters.
46
What does **degrees of freedom (d.f.)** represent in regression analysis?
The value of the denominator for each mean measure of variation ## Footnote It imposes a penalty for each new parameter introduced in the model.
47
In the context of regression, what does **Standard Error (s)** quantify?
Uncertainty in the estimated regression coefficients and the regression equation ## Footnote It indicates how much variability exists in the estimated values.
48
What distribution do the standard errors of estimated coefficients follow?
t-distribution ## Footnote This occurs when estimating the mean of a normally distributed population with a small sample size and unknown population standard deviation.
49
What is the **Standard Error of Estimate (SEE)**?
The square root of the Mean Squared Error (MSE) ## Footnote It provides a quantitative measure of the variability of an estimate or forecast made using the regression equation.
50
How is the **Coefficient of Variation (CV)** calculated?
CV = SEE / mean of y-values ## Footnote It expresses the SEE as a percentage of the mean and indicates the percent error associated with an estimate.
51
What significance level is commonly used in regression analysis?
0.05 (5%) ## Footnote This means the assumption of a significant cost driver when there isn't one only happens 5% of the time.
52
What are the null and alternative hypotheses in a **t-test** for regression coefficients?
H0: b = 0; H1: b ≠ 0 ## Footnote The null hypothesis states that there is no relationship, while the alternative suggests there is a significant relationship.
53
What is the formula for calculating the **t statistic**?
t = Estimated Coefficient / Standard Error ## Footnote This statistic is used to determine the statistical significance of the regression coefficients.
54
What does a **p-value** indicate in regression analysis?
The probability that the correlation between x and y at its observed strength would arise if H0 is true ## Footnote A small p-value (less than significance level) indicates statistical significance.
55
What is the **probability** that the correlation between {displaystyle x} and {displaystyle y} arises if they are not related?
p-value ## Footnote A significance level of {displaystyle alpha = 0.05} is used to determine statistical significance.
56
True or false: A **p-value** smaller than 0.05 indicates statistical significance.
TRUE ## Footnote If the p-value is greater than 0.05, the coefficient is not statistically significant.
57
The **t** and **F** tests are important in what type of analysis?
Regression analysis ## Footnote They help determine whether to accept the relationship in question.
58
What do **t statistics** indicate in regression analysis?
Good predictor of independent variable ## Footnote They assess the significance of each independent variable.
59
What do **F statistics** indicate in regression analysis?
Good model for regression as a whole ## Footnote They evaluate the overall significance of the regression model.
60
In a regression with one independent variable, the **t** and **F** tests yield what?
Same result ## Footnote This indicates the relationship is assessed consistently.
61
What is the **R²** value used for in regression analysis?
Indicator for goodness of fit ## Footnote It shows how much variability in the data is accounted for by the regression.
62
Higher values of **R²** indicate what about the regression?
Better fit of the regression ## Footnote Values closer to 1.0 are preferred.
63
What does a **p-value** of 0.11 indicate regarding statistical significance?
Not statistically significant ## Footnote It is greater than the significance level of 0.05.
64
What should an analyst do if there is a lack of statistical significance?
Check the functional form ## Footnote This may involve analyzing residual plots for better models.
65
The equation for **R²** is: {displaystyle R² = ?}
R² = explained variation / total variation ## Footnote It can also be expressed as R² = 1 - SSE/SST.
66
What does **SSR** stand for in regression analysis?
Sum of squares due to regression ## Footnote It measures the explained variation in the dependent variable.
67
What does **SST** stand for in regression analysis?
Total sum of squares ## Footnote It measures the total variation in the dependent variable.
68
What does **SSE** stand for in regression analysis?
Sum of squares due to error ## Footnote It measures the unexplained variation in the dependent variable.
69
The **R²** value for the toy problem is calculated as follows: R² = ?
R² = SSR / SST ## Footnote For the toy problem, R² = 30/48 = 0.62.
70
What does a **high R²** value (greater than 0.9) indicate?
Strong relationship ## Footnote It suggests a good fit for the regression model.
71
In regression analysis, the **total variability** is represented by which arrow in Figure 8.20?
Larger blue arrow ## Footnote It represents the total variability in the Y data.
72
What does the **smaller orange arrow** in Figure 8.20 represent?
Variability after regression line ## Footnote It shows the variability in Y data after accounting for the regression.
73
What does **SSR** stand for in regression analysis?
Sum of Squares due to Regression ## Footnote SSR is a measure of the variation explained by the regression model.
74
What does **SST** represent in the context of regression?
Total Sum of Squares ## Footnote SST measures the total variation in the dependent variable.
75
The value of the **correlation coefficient** is the square root of which statistic?
R² ## Footnote R² indicates the proportion of variance in the dependent variable that can be explained by the independent variable.
76
True or false: The **correlation coefficient** ( r ) can be negative.
FALSE ## Footnote The correlation coefficient ( r ) is always positive since it is derived from a square root.