What is the definition of regression analysis?
Describes a statistical relationship between variables
The dependent variable responds to the independent variable(s) with measurable uncertainty.
What is the equation for the most basic regression model?
y = a + bx + ϵ
Here, a is the y-intercept, b is the slope, and ϵ is the error term.
What does homoscedasticity refer to in regression analysis?
Constant variance of normal errors independent of X
This property is essential for valid regression results.
What are the two crucial results of regression analysis for CER development?
These results help in making reliable cost estimates.
What does Ordinary Least Squares (OLS) regression aim to minimize?
Sum of Squared Errors (SSE)
OLS regression is a foundational method for determining the best-fit line.
What does R² indicate in regression analysis?
Percent of overall variation in cost explained by the regression equation
The complement, 1 − R², indicates unexplained variation.
What is the Standard Error of the Estimate (SEE) used for?
Measures accuracy of predictions made by the regression line
It estimates the standard deviation of the normally-distributed error term.
What does the F-test determine in regression analysis?
Whether the regression model is statistically significant
This test assesses the overall fit of the model.
What is the purpose of the t-test in regression analysis?
Determines whether individual cost driver variables are statistically significant
It helps in validating the relevance of each predictor.
What is a proxy variable in cost estimating?
An independent variable that stands in for one which drives cost
Example: Number of firemen as a proxy for fire size.
What is the role of scatter plots in regression analysis?
Visualize patterns and detect correlation between variables
They help in determining the appropriate regression model.
What does the correlation coefficient (ρ) indicate?
Strength and direction of the relationship between variables
Ranges from -1 (perfect negative) to +1 (perfect positive).
What is the difference between correlation and causation?
Correlation indicates a relationship; causation indicates one variable drives another
Causation cannot be statistically verified.
What is the first step in regression analysis?
Create a scatter plot of the data
This helps to visualize correlation and determine the model type.
What are the two primary practical applications of regression in cost estimating?
These applications leverage historical data for future cost predictions.
What is the purpose of confidence intervals (CIs) in regression?
Quantify uncertainty regarding the estimate of mean cost
They provide error bounds for the mean of the estimate.
What does multicollinearity refer to in OLS multivariate regression?
Risk introduced by multiple independent variables
It can distort the regression results.
What is the general guideline when using statistical software for regression?
Always plot the data and fit a trend line first
This helps guide and check the regression analysis.
What is the best estimate for any value of the cost driver variable?
Result of plugging that x-value into the best-fit regression equation
This provides the predicted cost based on the model.
What is a non-OLS model?
Models like Weighted Least Squares (WLS) and Mean Absolute Percentage Error (MAPE)
These models minimize errors other than SSE.
What is the learning curve analysis in cost estimating?
Application of regression to understand cost reductions over time with increased production
It helps in forecasting future costs based on experience.
What is the first step of regression analysis?
Making a scatter plot
A scatter plot provides insight into the correlation’s strength and helps determine the best model type.
List the basics of linear regression analysis.
These basics are essential for understanding the model’s performance.
The key concept of regression is to use the method of least squares for determining what?
The best estimates for equation parameters
This method minimizes the sum of the squared differences between the data points and the estimated line.