Unit 1 - Module 3 - Parametric Estimating Flashcards

(53 cards)

1
Q

When using a particular estimating method, the analyst should

A. Use that method throughout the life of the program for consistency

B. Incorporate cost actuals only in the latter stage of the program because they are hard to obtain and often inaccurate

C. Re-evaluate the estimating method used at every milestone as the program matures

D. Always use parametric estimating because it applies at every stage (milestone) in the program

A

C. Re-evaluate the estimating method used at every milestone as the program matures

It is important to re-evaluate the cost estimating methodology at every milestone because
different methods may be better suited than others at certain points in the life cycle.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The cost analyst can identify cost drivers by:

A. Talking to subject matter experts

B. Reading requirements specification documents

C. Obtaining and understanding system architecture designs

D. Scatter plotting the data

E. All of the above

F. A and D Only

A

E. All of the above

All of the methods mentioned are ways to identify potential cost drivers. It is important to keep in
mind, however, that parameters that drive design (which will be discovered when exploring choices B
and C) are not always the parameters that drive cost.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

CERs in parametric estimating are:

A. Cost Effectiveness Ratios

B. Cost Earned Relationships

C. Component Engineering Requests

D. Cost Estimating Relationships

E. Complete Engineering Releases

A

D. Cost Estimating Relationships

Parametric cost estimating uses Cost Estimaing Relationships (CERs), which are based on historical
data to predict the cost of a new project or system. Cost drivers, such as weight and size are used to estimate cost and production schedules.

Cost drivers are the parameters or
independent variables in the CERs which can be shown to drive cost: cost, the dependent variable
in the equation, changes as the input parameters change.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The term rate is best defined as which of the following:

A. Best Fit Equation

B. Cost on Cost

C. Cost on Parameter

D. Parameter on Parameter

A

C. Cost on Parameter

A rate uses a parameter to predict cost via a simple multiplicative relationship.

One of the most common rates is the labor rate, expressed in dollars per hour. Total labor cost
is then estimated labor hours times the project labor rate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The term factor is best defined as which of the following:

A. Best Fit Equation

B. Cost on Cost

C. Cost on Parameter

D. Parameter on Parameter

A

B. Cost on Cost

A factor uses the cost of another element to predict cost via a simple multiplicative relationship.

Often “below-the-line” elements such as program management and systems engineering are
estimating as a factor of the prime mission equipment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The term ratio is best defined as which of the following:

A. Best Fit Equation

B. Cost on Cost

C. Cost on Parameter

D. Parameter on Parameter

A

D. Parameter on Parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

A regression-based CER is best defined as which of the following:

A. Best Fit Equation

B. Cost on Cost

C. Cost on Parameter

D. Parameter on Parameter

A

A. Best Fit Equation

A regression is the best fit equation of the data. The most common way of defining this “best fit”
is ordinary least squares (OLS) regression, wherein the sum squared error (SSE) is minimized.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

True or False. Because parametric relationships are statistically verified for significance, the cost analyst can apply this relationship for all values of the cost driver.

A

False.

A relationship does not necessarily apply beyond a “reasonable” range. It is possible to apply CERs outside the range of the data, and the Prediction Interval (PI) captures appropriate.
uncertainty, but one would not estimate, for example, the cost of an object of zero weight.

Further discussion of this concern is addressed in Module 8 Regression Analysis and Module 9 Cost
and Schedule Risk Analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Given the hypothesis “Weight is a significant cost driver at a significance level of 0.05,” which of the following statistics would you use to test for this?

A. R-squared = 0.867

B. cV = 15%

C. P-value = 0.022

D. All of the above

E. A and B only

F. A and C only

G. B and C only

A

C. P-value = 0.022

The test for significance of a parameter is the p-value corresponding to the t statistic.
The R-squared value is the ratio of the explained variation to the total variation in the data set.
The CV (coefficient of variation) is the ratio of the standard error to the mean, and is a measure of
variability. In this case, the cost driver would be statistically significant, since the p-value
of 0.022 is less than the alpha value of 0.05.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Parametric estimating is a valid approach when creating cost estimates because:

A. It includes a detailed build-up of all applicable costs

B. It models the current system on the model of a similar system or sub-system

C. It uses tested relationships to estimate costs using predefined parameters

D. It can be used early before detailed requirements are known

A

C. It uses tested relationships to estimate costs using predefined parameters

Parametric estimating uses relationships between costs and cost drivers (predefined parameters) to develop an estimate. Though the can be difficult to find, once developed,

CERs can be adjusted for requirements changes. While it is true that parametrics can be used early on in the life
cycle, this is not the basis for the validity of the technique.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

True of False. If, when in the data collection stage in the parametric estimating process, the contractor provides a total estimate at complete for the program, the cost analyst can skip the steps of identifying cost drivers and developing CERs and go straight to building the parametric model.

A

False.

A parametric estimate is based on identified cost drivers and developed CERs. A contractor-provided estimate at complete does not serve as the basis for a parametric
estimating methodology.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

If a CER for Site Development was developed giving the relationship, y (in $K) = 26.635x + 105.16 (where x is the number of workstations) for a data set cost driver that had a range minimum of 7 workstations to 47 workstations, and the independent variable has tested positively for significance, the predicted cost for a site that had 36 workstations would be:

A. $1,064.02
B. $1,064,020
C. $958.86
D. $958,860
E. CER may not be applicable. Further data collection would be advisable.

A

B. $1,064,020

y=26.635(36)+105.16= 1064.02 $K

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Using the same example in question 12, what would be the predicted cost for a site that had 10 workstations?

Site Development: y (in $K) = 26.635x + 105.16 (where x is the number of workstations)

A. $371.51
B. $266.35
C. $371,510
D. $266,350
E. CER may not be applicable. Further data collection would be advisable.

A

C. $371,510

y=26.635(10)+105.16= 371.51 $K

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Using the same example in question 12, what would be the predicted cost for a site that had 60 workstations?

Site Development: y (in $K) = 26.635x + 105.16 (where x is the number of workstations)

A. $1,598,100
B. $1,703.26
C. $1,742,350
D. $7,907.7
E. CER may not be applicable. Further data collection would be advisable.

A

E. CER may not be applicable. Further data collection would be advisable.

CER was developed using between 7 and 47 workstations. Since 60 is above our maximum,
the CER may not be applicable. There is probably not a problem with applying the CER for this value,
as long as the appropriate prediction interval (PI) is used to characterize the increased uncertainty.
We’d be much more nervous about applying the CER for a site with, say, a thousand workstations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

If you were developing a multivariate CER to predict the payroll of a Major League Baseball (MLB) team, which of the following would be good candidate cost drivers?

A. Population of the team’s home city
B. Number of players on the roster
C. Number of free agents signed at the beginning of the current season
D. Whether or not the team is owned by George Steinbrenner or Ted Turn
E. A and B
F. A and C
G. B and C

A

F. A and C

Bigger cities tend to have higher payrolls (due to larger fan bases).

If a team signs a large numebr of free agents before the season, they may have a higher payroll
than a team that has a lot of players from their farm system. Free agents are often won in
bidding wars. While it makes sense that the more players a team has, the higher its payroll
will be, the reason this is not a good cost driver is that all teams have the same number of players,
so you’d be trying to fit a sloping line through a vertical cloud of points! While teams owned by

George Steinbrenner (the New York Yankees) and Ted Turner (formerly the Atlanta Braves) may
indeed have higher payrolls, as might be shown with an appropriate dummy variable, this is too
restrictive to be of much value as a predictive variable. It might be better to try to develop an
objective (yet non-circular) method for characterizing ownership groups as extravagant or
parsimonious.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

True or False. CERs are always linear.

A

False. Though CERs are certainly often linear, the relationship could be of a non-linear
functional form.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

True or False. A good way to identify cost drivers is to use comb charts or Pareto charts to identify WBS elements with the highest cost.

A

False. The highest cost WBS items, or the “big ticket items” can be termed “Cost
Passengers.” These high cost WBS elements are not necessarily the elements with the greatest
potential for cost savings. Instead, it is important to look for the elements that drive the costs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Which of the following statements is true regarding calibration of parametric CERs?
I. The calibration point must be a part of the original data set.
II. A calibrated CER is mathematically equivalent to an adjusted analogy.
III. Calibration of a CER changes the Y-intercept.

A. I
B. II
c. III
D. I and II only
E. I and III only
F. II and III only
G. All of the above.

A

F. II and III only

Statements II and III are correct. Statement I is false: the calibration point must not be part of the
original data set. While there are valid reasons for calibrating a CER, it is important to calibrate carefully
and with good reason, as incorrect and unjustified calibration can lead to suspicions that an analyst is
“cooking” the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

True or False. A parametric model can be updated with program-specific actuals.

A

This statement is true, though the use of actuals depends on the situation. When new data is available,
the analyst has the option to update the CER, recalibrate the CER so that it passes through the new
data point, change the methodology altogether, or leave the model unchanged. Whatever the analyst
decides, it is important that the rationale for this decision is defensible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Which of the following statements is accurate.
I. Forcing a CER through the origin is not possible.
II. Forcing a CER through the origin is necessary, because if something has 0 mass (for example), it should also cost nothing.
III. Forcing a CER through the origin is not advised.

A. I
B. II
С. III

A

С. III

The y-intercept of a CER should not artificially be forced through zero. General practice is to
accept the y-intercept, even if it is not statistically significant. Though it is possible to force the
y-intercept through zero in excel, this practice is not advised. Just like you would not force your
regression through any other data point, you should not artificially force your regression through
zero. The y-intercept should not, however, be interpreted as a fixed cost.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

PARAMETRIC TECHNIQUES

A

Ratio
Estimates effort, defined as parameter on parameter

Factor
Uses the cost of one element to predict the cost of another with
a simple, multiplicative relationship

Rate
Predicts cost via a simple multiplicative relationship

Arithmetic Mean
Sum of all items divided by the number of items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

ESTIMATING
THROUGHOUT
THE PROGRAM
LIFE CYCLE

A

Analysts rely heavily on analogy and parametrics during concept and design stages.

As more design detail becomes available, analysts begin to use build-up, although parametric estimating continues to play an important role.

Actual costs are incorporated into cost estimates as Low-Rate Initial Production (LRIP) or full-rate production begins. Actual cost experience on prototype units and early engineering development hardware can enlighten the estimate before production actuals become
available. Manufacturers directly incorporate their own costs.

Government analysts use data
available via the Contractor Cost Data Reporting (CCDR) system.

23
Q

PARAMETRIC ESTIMATING PROCESS

A

1.Data Collection: Analysts search for data sources. Cost, schedule, and technical data are the raw
materials for parametric relationships.

2.Cost Drivers: Analysts identify cost drivers. A hypothesis based on analogy, organization history,
or expert judgment may provide insight to cost drivers with the highest confidence interval. Scatter
plots are a visualization tool for uncovering underlying relationships in the data. Cost drivers are
explored via correlation of each independent element with cost. Analysts use cost drivers to capture
the underlying engineering/physical causality in a complex system.

3.Cost Estimating Relationships (CERs): Analysts develop equations to capture CERs using linear
models with one parameter or with more complex mathematical functions.

4.Parametric Model: Analysts build an inclusive parametric model containing all the parametric
inputs and CERs for all the cost elements in an entirely parametric cost model approach. Analysts
use common software programs or more sophisticated tools. Methodologies will often change
throughout the life of the program; an exclusively parametric model may seem stochastic.

24
Q

y= ax + b. Linear

y = ax^b Power

y= a + b*ln(x) Logarithmic

y=ae^(bx). exponential

y = a + b1x1 + b2x2 + … polynomial

25
Calibration of a CER is defined as resetting the y-intercept(or equivalently the constant term) of the CER to force it topass through a desired point (pair of coordinates). It may beused for: Correcting an analogy Increase applicability of CER to less robust data sets In commercial models, insert input values corresponding to historical programs and to adjust various model parameters (e.g., complexity) to match historical actual costs Improve response to a certain performance parameter for the purposes of conducting a trade-off for Cost as an Independent Variable (CAIV) or Target Costing
Calibration does not mean force-fitting a source data point in a regression-based CER. Calibration should always be performed carefully and be clearly documented. With a calibrated CER, analysts must be even more cautious than usual about the range over which it is applied. It is generally acceptable to be within one standard deviation of x from the point of departure.
26
Consider the following hypothesis: Site activation cost can be calculated as a function of the number of workstations per site. A single independent variable will provide a well-fitting regression model for estimating cost. *We have data for eleven sites including the number of workstations installed at each site and the site activation costs. If we graph this data on a scatterplot*, we can fit a trend line to the data *to view the regression equation and the r-squared to establish if this predicted CER is a good fit. We can see from the R-squared of 0.91 that it is – in fact, workstation counts can account for 91% of the variance in the data.
y = 26.491x + 82.756 R2=.9098 From the trendline* we found the following equation. Site activation cost can be calculated as $82,756 plus the # of workstations times the coefficient $26,491. Running the* regression output gives us the following information. We see the* R-squared which matches the information from the graph, and also the coefficient and intercept which form the equation. We can assess significance* by looking at the p-value for the coefficient as well as the F-statistic. In both cases, alpha is less than 0.05*, so our model is significant. We can also calculate the CV by dividing the standard error by the mean cost, which gives us 20.4%.
27
Hypothesis: The operating costs of a cruise ship can be better predicted by crew size and ship port time than either variable individually. The longer a ship is in port and the larger its crew, the more expensive its operating costs will be
Consider the hypothesis: The operating costs of a cruise ship can be better predicted by crew size and ship port time than either variable individually. The longer a ship is in port and the larger its crew, the more expensive its operating costs will be. In the site activation example, a single independent variable provided a well-fitting regression model for estimating cost. In other cases, analysts need to investigate multiple independent variables or cost drivers at the same time. The analyst must use multiple regression analysis to create the CER when multiple independent variables are used together to predict cost. This example is for the estimate of operating costs for a new cruise ship. It is hypothesized that the operating costs will be positively correlated to both port hours (i.e., the hours of the year that the ship is not steaming, or not running its engines, while in port) and the ship’s crew size. The assumptions are that the longer a ship stays in port, and the more crew it has on board, the more it will have to pay in operating costs. The table provides data for seven ships. The scatter plot* of operating cost and port hours is positively correlated with purchased services, explaining 39% of the total variation. Crew size* is also positively correlated and explains 62% of the variation in data. It is apparent when examining the scatter plots that whenever port hours is a poor explanatory variable for a data point, crew size generally is a good one*, and vice versa. With the data point for the luxury liner, where operating costs equal $387.56K, port hours is a poor explanatory variable (the point is well below the regression line, but crew size is an excellent explanatory variable (the point is almost directly on the regression line. This indicates that although the luxury liner spends a small amount of time in port, it is the smaller crew size that is driving the lower cost of purchased services. Now we will plot* the two cost drivers against each other to visualize the lack of significant correlation between them. Multicollinearity occurs when there is a significant correlation between two cost drivers. When there is multicollinearity, it is unlikely that both cost drivers will be good explanatory variables at the same time. This graph indicates no multicollinearity between port hours and crew size. All these considerations indicate an outstanding multivariate CER which is confirmed by the regression statistics shown here. Note that the negative intercept is irrelevant here, as the CER only applies within a certain range. Inserting reasonable values for port hours and crew size will ensure a positive cost. Never interpret the intercept value/constant term of a CER as a fixed cost when x = 0. Note that 87% of variation is now explained compared to a maximum of 62% with either cost driver individually. Both independent variables are statistically significant although, again, the intercept is not The regression has an excellent CV of only 9.1%.
28
A combination of expert opinion, organizational history, and data analysis determine whether to use multivariate regression models. There must be compelling reasons why two or more variables are judged to explain cost significantly better than a single variable.
A combination of expert opinion, organizational history, and data analysis determine whether to use multivariate regression models. There must be compelling reasons why two or more variables are judged to explain cost significantly better than a single variable.
29
Module 3 - Parametric Estimating
This module provides an overview of parametrics. It covers the process of building parametric models and describes how those models are used. The module discusses cost estimating relationships (CERs) and introduces the topics of data use, CER development, and development of complex models. The proper development and application of CERs depend on understanding the associated mathematical and statistical techniques. This module provides general guidance for use in developing and employing valid CERs, including differences between simple and complex CERs, techniques for developing and implementing CERs, including linear regression ordinary least squares (OLS) "best-fit" models.
30
This module describes predicting cost through parametric estimating based on one or more independent variables in the Cost Estimating Relationship (CER) equation. Building a parametric model includes collecting data, identifying cost drivers, and developing CERs. This module contains two parametric estimation examples to reinforce these concepts. The key ideas discussed in this module are:
cost drivers and cost passengers, parametric analysis inputs and outputs, and parametric cost models. The Advanced Topics section describes practical applications of parametric analysis including CER development, Cost Response Curves (CRCs), sensitivity analysis, Schedule Estimating Relationships (SERs), and Weight Estimating Relationships (WERs).
31
Lines form the simplest functional form of a CER and serve as the starting point in parametric analysis. This linear form can be extended to multiple independent variables to become a plane in three-space or a hyper-plane in n + 1 {\displaystyle n+1} dimensions.
This module will also examine non-linear functional forms, such as power, exponential, logarithmic, and even polynomials functions. Non-linear functional forms provide more versatility in the parametric modeling toolkit but are also more challenging. In general, analysts use curve fitting, the process of determining the functional equation graph which best fits a scatter plot of data, to turn normalized data into CERs
32
Basics of Parametric Estimating Module 2 Costing Techniques demonstrates different costing techniques used during different stages of the acquisition cycle or product development process, as shown in figure 3.2. Cost estimators typically use three primary estimating techniques
Analogy: This approach is a comparative analysis of similar systems. Analogy uses elements of similar projects as a basis for a future/new system’s cost. It can be used early in programs before detailed requirements are known. Analogous estimates provide no objective test of validity. Parametric Estimating: This approach uses a mathematical relationship between a parameter and a cost. The historical data needed to support parametric estimating can be difficult to obtain, but the relationships derived from this technique can be easily adjusted for requirements changes and can provide statistical results to assess quality. Engineering Build-Up: This estimating approach builds up cost by beginning at lower levels and rolling results up to produce higher-level estimates. A high-level of data detail is required and may involve deep knowledge of the project and extensive collaboration with stakeholders. Missing or incorrect data may increase the risk of omitting costs. Engineering build-up produces details that are easily traceable, though this technique can be expensive.
33
Basics of Parametric Estimating A parametric model is composed of multiple CER combinations and other algorithms.
Cost drivers that support parametric analysis are identified through examination of historical costs as well as technical and programmatic data. The Parametric Estimating Handbook (PEH) provides the following definition[2]: Parametric estimating is a technique that develops cost estimates based upon the examination and validation of the relationships which exist between a project's technical, programmatic, and cost characteristics as well as the resources consumed during its development, manufacture, maintenance, and/or modification.
34
Basics of Parametric Estimating Parametric cost estimating uses CERs to predict the cost of a new project or system. Cost drivers (e.g., weight and size), preferably from an organization's own history, are used to estimate cost and production schedules.
Cost drivers are the parameters, or independent variables, in the CERs that influence the amount of money spent on an item (i.e., cost), the dependent variable. These independent variables then become required inputs. Inputs are values that must be accurately determined for the CERs to produce the cost. Parametric relationships are used to estimate cost, schedule, weight, and other parameters as outputs.
35
Basics of Parametric Estimating The parametric approach provides several advantages, including:
versatility, objective quantitative inputs, easily traceable equations that clearly link input and output variables, and easily adjustable input parameters.
36
Basics of Parametric Estimating Several statistical measures associated with CERs are derived by regression analysis including t and F statistics, the R2 value, and the Coefficient of Variation (CV). The t and F statistics are objective measures of the validity of the model. The R value indicates its explanatory power. The CV derives error bands in cost risk analysis. Module 8 Regression Analysis and Module 9 Cost Risk Analysis explore these measures and their application in more detail.
Analysts rely heavily on analogy and parametrics during concept and design stages. As more design detail becomes available, analysts begin to use build-up, although parametric estimating continues to play an important role. Actual costs are incorporated into cost estimates as Low-Rate Initial Production (LRIP) or full-rate production begins. Actual cost experience on prototype units and early engineering development hardware can enlighten the estimate before production actuals become available. Manufacturers directly incorporate their own costs. Government analysts use data available via the Contractor Cost Data Reporting (CCDR) system.
37
Parametric Estimating Process The basic parametric estimating approach consists of four activities. These activities are derived from the overall cost estimating process discussed in Module 1 Cost Estimating Basics:
1. Data Collection: Analysts search for data sources. Cost, schedule, and technical data are the raw materials for parametric relationships. 2. Cost Drivers: Analysts identify cost drivers. A hypothesis based on analogy, organization history, or expert judgment may provide insight to cost drivers with the highest confidence interval. Scatter plots are a visualization tool for uncovering underlying relationships in the data. Cost drivers are explored via correlation of each independent element with cost. Analysts use cost drivers to capture the underlying engineering/physical causality in a complex system. 3. Cost Estimating Relationships (CERs): Analysts develop equations to capture CERs using linear models with one parameter or with more complex mathematical functions. 4. Parametric Model: Analysts build an inclusive parametric model containing all the parametric inputs and CERs for all the cost elements in an entirely parametric cost model approach. Analysts use common software programs or more sophisticated tools. Methodologies will often change throughout the life of the program; an exclusively parametric model may seem stochastic.
38
Parametric Estimating Process 1. Data Collection: Data collection is an iterative process. Analysts continually add data as new sources are identified. Parametric estimates include historical cost, programmatic, schedule, technical, and operational data from systems similar to the system being estimated (e.g., skills required for a software development project; missiles for a missile program; logistics data for supply chain optimization; cruisers, destroyers, and frigates for a new surface ship of similar class). The cost analyst gathers any program data that exists to date. Data can be found in the program’s cost reports, the schedule, and requirements documents (i.e., the Cost Analysis Requirements Description (CARD) described in Module 1 Cost Estimating Basics). Analysts look for potential relationships within the data attributes to facilitate cost driver identification. Some examples of data types are
Lines Of Code (LOC) for software development, site deployment plans for IT installations, procurement of sub-systems or other materials from subcontractors or vendors, Test and Evaluation (T&E) schedules, physical system characteristics (e.g., weight or power), Technical Performance Measures (TPMs) (e.g., speed), resource staffing, and other operational parameters that reflect a concept of operations (CONOPs). Weight, power, and LOC are common cost drivers and are a matter of practice for most data collection efforts. For traceability, analysts document the sources, reasoning, assumptions, and raw data behind the values used to develop their CERs. Module 4 Data Collection and Normalization provides additional informatio
39
Parametric Estimating Process 2. Cost Drivers: Data analysis begins after sufficient data is collected to begin analyzing relationships. Data analysis identifies cost drivers to test hypotheses concerning the potential relationships between various parameters and cost. If relationships are confirmed, these parameters can become the independent variables in CERs.
Subject Matter Experts (SMEs) who have a deep understanding of the particular system or type of system under consideration can provide insight into cost drivers. Cost analysts should make every effort to understand the technical and operational parameters of the system and search for those parameters that logically drive cost. Scatter plotting the data can visually demonstrate the relationships and make them easier to identify than numerical analysis alone. Scatter plots are discussed in greater detail in Module 6 Basic Data Analysis Principles. Some cost drivers may affect more than one cost element or may have ripple effects on the design or cost. However, parameters that drive design are not always the parameters that drive cost. As in data collection, identifying cost drivers is an iterative process. Analysts cannot identify cost drivers without data, but they may predict potential cost drivers and influence the data collection plan.
40
Parametric Estimating Process 2. Cost Drivers: Cost drivers are not to be confused with cost passengers. Cost passengers are major system components that represent the majority of the system's cost. They are the cost elements with the highest dollar values in the Work Breakdown Structure (WBS) and can be identified using Pareto charts. Analysts often misuse the terms when looking for cost reduction opportunities, as in the Reduction of Total Ownership Cost (RTOC) initiative, but cost passengers do not always represent the biggest potential for cost savings. Cost drivers represent those design decisions and requirements at a system-level that truly influence the amount of money spent on an item. The parametric approach captures these design decisions or requirements with numerical parameters. While cost passengers indicate which items cost the most, cost drivers predict how those costs can be influenced. Collecting data on all possible parameters and plotting them against cost is both prohibitively expensive and inefficient. Cost driver identification requires a degree of understanding and expertise. System-level cost drivers may be hard to quantify, but identification is the value of the parametric approach. Some examples of top-level cost drivers include:
hardware or equipment, resource requirements, level of automation, technical requirements such as speed and accuracy, and life cycle support concerns such as reliability or maintenance philosophy. To illustrate cost drivers versus cost passengers, imagine a pea pod. A pea pod wraps around the peas and its design is determined by the number, shape, and size of the peas. Likewise, the hull of a ship or the airframe of an airplane is designed to carry the crew and the mission equipment of the vessel. Although a hull or airframe represents a significant portion of the system’s cost, it cannot easily be changed to influence that cost without dramatically affecting the requirements of the system. In these examples, the pea pod, hull, or airframe represents a cost passenger and the number, shape, and size of peas, crew members, and mission equipment represents the cost driver.
41
Parametric Estimating Process 3. Cost Estimating Relationship Development After collecting the available historical and industry data for the system and identifying cost drivers, the relationships between the cost drivers and cost may be analyzed. The resulting CERs are equations which can range from the simple arithmetic mean of a few data points to multivariate regressions complete with a host of related statistics. Figure 3.3 shows cost scatter plotted against a sample cost driver (variable 1). The relationship is calculated using the linear equation: y=0.3075x + 66.337 {y=0.3075x+66.337} where y represents cost and x represents variable 1. Tips for scatter plots are found in Module 6 Basic Data Analysis Principles. Some CERs take the form of a simple multiplier of one of three types:
A rate predicts cost via a simple multiplicative relationship. Rates are always expressed in dollars per parameter. The most common rate is a labor rate, expressed in dollars per hour, for a certain labor category or type of work. Cost per function point or LOC are other common rates used in software estimating. A factor uses the cost of one element to predict the cost of another via a simple multiplicative relationship. Factors are cost on cost, without units, often expressed as a percentage. An example of a factor is System Engineering/Program Management (SE/PM) calculated as 20% of the program’s prime mission equipment (hardware and software). Analysts typically employ factors when incorporating industry data into the program estimate. Many organizations have a set of standard factors that can be used as guidelines or benchmarks. As a reference, the factors used by the FAA can be found at https://www.faa.gov/regulations_policies/policy_guidance/benefit_cost/. A ratio estimates effort and is defined as parameter on parameter. For example, if a program plans to perform Commercial Off-The-Shelf (COTS) software integration within the software baseline, a good ratio to use is the industry average of 1,200 lines of code per integrated package. These three terms are sometimes improperly interchanged. When encountered, ensure they are used in a manner consistent with these definitions. Rates, factors, and ratios are often the result of very simple calculations such as the arithmetic mean (average)
42
Parametric Estimating Process 3. Cost Estimating Relationship Development However, similar to CERs, rates, factors, and ratios can also be derived via a rigorous regression process. This regression process is the preferred process for developing CERs and can be summarized with five top-level steps:
Select the Variables: Determine the independent (source) variable (e.g., LOC, number of deployments, COTS procurements, etc.), data collection methodology, and data type corresponding to the desired analysis. For example, for a cost per deployment CER or a cost per LOC CER, determine which costs are applicable to these elements. This collected cost data will be the dependent variable used to develop the CER. Test Relationships: After selecting the variables, test for associations that may exist between each variable and cost. Remove representative data from the working data set for validation later in the analysis. Scatter plot the working data set. Scatter plots display the strength, form, and direction of the relationship and may reveal trends, anomalies, or outliers. Determine the type of relationship for each variable. For more information on these techniques, refer to Module 6 Basic Data Analysis Principles. Perform Regression: After initial inspection and testing of the data, perform regression analysis to develop the equation to use in the model. This module uses the Ordinary Least Squares (OLS) approach detailed in Select CERs: Select the CER that best represents the data. Validate CERs: Finally, validate the CER on the representative data that was not used to develop the equation. It may be useful to perform additional graphical exploration and regression analysis. Use the Prediction Interval (PI) methodology to accurately capture uncertainty. Analysts often apply CERs outside the range of collected data since many programs attempt to build systems that are faster, bigger (or smaller), more dense, more accurate, etc., than their predecessors. The PI methodology[3] provides one method to capture uncertainty in OLS estimates, which increase as estimates deviate from the core data set. Refer to Module 6 Basic Data Analysis Principles and Module 8 Regression Analysis for more detail on PIs.
43
Parametric Estimating Process 3. Cost Estimating Relationship Development Calibration of a CER is defined as resetting the y intercept (or equivalently the constant term) of the CER to force it to pass through a desired point (pair of coordinates). One reason for doing this may be to correct an analogy. If a CER applies to systems typified by the basis of the analogy but not to the target system, the analyst may calibrate the CER to intersect a point determined by the analogy. Calibration is also performed to increase the applicability of CER based estimates on broad robust data sets, adjusting them to a smaller set of data specific to a particular experience (e.g., company, commodity, agency, etc.).
Calibration has a particular meaning in the context of commercial cost models. In that case, calibration means to insert input values corresponding to historical programs and to adjust various model parameters (e.g., complexity) to match historical actual costs. Calibration in this context makes these models more applicable to the system. Finally, a CER might have good statistics but may not be responsive to a certain performance parameter for the purposes of conducting a trade-off for Cost as an Independent Variable (CAIV) or Target Costing. To do this, calibrate the latter CER, which has worse statistics, to intersect a point generated by the better CER as shown in figure 3.4. The solid blue line represents the original CER, and the dotted yellow line represents the calibrated CER. In this graphic, the original CER passing through Calibration does not mean force-fitting a source data point in a regression-based CER. Calibration should always be performed carefully and be clearly documented. With a calibrated CER, analysts must be even more cautious than usual about the range over which it is applied. It is generally acceptable to be within one standard deviation of from the point of departure.
44
Parametric Estimating Process 4. Parametric Model Development After developing CERs for all the cost elements, these relationships are gathered into an integrated, automated structure that comprises the cost model. Cost elements in the cost model, though estimated separately, are most likely dependent. For consideration of correlation between elements of the model, without which the overall uncertainty will be understated, see Module 9 Cost and Schedule Risk Analysis. When program-specific data is available, the analyst can use actual data rather than estimates to update the CERs in the model. For instance, the analyst may have estimated 1,200 LOC per package for COTS software integration based on industry data. In the program development cycle, when a full COTS integration effort for a suite of packages has been completed, the analyst collects actuals that demonstrate the effort was 1,500 LOC per package. The analyst can update the model with this new ratio. Analysts must be judicious when evaluating new data. The new average of 1,500 lines could be within expected statistical variation and therefore not require an updated model. In general, the use of actuals can vary depending on the situation. Choices include:
re-running the CER with the new augmented data set, re-calibrating the CER to pass through the new data point, changing the methodology altogether, or leaving the model unchanged.
45
Parametric Estimating Process 4. Parametric Model Development Complex parametric models are typically comprised of multiple CERs and incorporate other cost estimating techniques. Complex parametric models are often used for Life Cycle Cost (LCC) or Total Ownership Cost (TOC) estimates
These estimates include all costs from project conception, to maintenance, and ultimately disposal. A complex parametric model may also include math and/or logic to account for time phasing, inflation, allocations, Monte Carlo simulation, schedule estimating, learning curve analysis, earned value management, and/or other adjustments. The costs are also allocated or mapped to the program’s WBS. The model builder can create or build in logic to enable Design-to-Cost or CAIV trades. The WBS forms the basis for the complex parametric model structure. Like all other models, the complex parametric model has inputs and outputs. The outputs of a complex parametric model may provide a range of cost estimates that can include risk and uncertainty, project team size, spend profile, activity schedules, facility requirements, etc. The outputs of the model should be specific to the goal of the analysis[2].
46
Parametric Estimating Examples Site Activation Example Hypothesis: Site activation cost can be calculated as a function of the number of workstations per site. A single independent variable will provide a well-fitting regression model for estimating cost.
This example of the parametric estimating process develops a site activation CER in which the cost includes both the site survey and site installation costs for an Automated Information System (AIS). The CER will be based on the number of workstations per site. Assume the following cost driver relationship: the bigger an installation is, the more workstations it has. Therefore, site activation costs should be greater. The table in figure 3.5 provides the data set for eleven historical installations, with the number of workstations as the independent variable X and the normalized cost of site activation for each one of those installations as our dependent variable y. Assume that the cost data is normalized to ensure that the costs represent similar content in all programs and that they are all in the same constant year dollars. These issues will be discussed more fully in Unit II Cost Analysis Techniques. Figure 3.6 shows the scatter plot of the data which appears to confirm that the number of workstations is a cost driver for site activation cost. A trend line, the equation, and the R2 value are displayed on the chart. These will be used for formal regression analysis. Figure 3.7 shows the resulting regression statistics for the example. All of these statistics will be discussed in much greater detail in Module 8 Regression Analysis. The equation parameters, the slope, and the constant term are the coefficients in the summary output and are shown in red. This gives us our CER: site activation = $82,756+($26,491∗number of workstations) Note that while the equation shows costs to the nearest dollar, this equation is only accurate to the thousands of dollars based on the granularity of the data collected. The R2 value represents the ratio of explained variation to total variation in the data set. It provides an estimate of how explanatory the independent variable, cost driver, is. In this case, the number of workstations explains almost 91% of the variation in site activation costs. Therefore, it is an excellent cost driver. No matter how good the R2 value is the analyst must investigate whether the regression and its coefficients are statistically significant. The model as a whole is judged to be valid if the F statistic is significant, which is to say its corresponding p-value (labeled here as “Significance F) is less than the significance level using the customary α=0.05. The individual coefficients are statistically significant based on their t statistics (bigger is better) and corresponding p-values. These are all shown in green. The mathematics behind these statistics is detailed in Module 10 Probability and Statistics. In this case, from their minuscule p-values, the model as a whole and the explanatory variable are both highly significant. The constant term (intercept), however, is not statistically significant. This is often the case, but analysts usually accept the predicted constant term (y). By contrast, if one of the variables is not statistically significant, the model should be rejected. Note that for a single-variable regression, the t statistic of the variable and the F statistic of the model will always be identical. To update and to re-run the CER, collect any actuals created or discovered since the original creation of the CER. These are shown in the last four rows of the updated table in figure 3.8. Now the model includes data for additional sites. The four new data points are shown in orange and the original eleven in blue on the scatter plot in figure 3.9. The change in the equation of the best fit line is also shown in green. Notice that neither the coefficient of x, which captures the cost driving effect of the number of workstations on the site activation cost, nor the R2 value have changed much. The constant term, however, has changed quite a bit. The constant term is not statistically significant, though, and therefore does not have much effect on the final predicted cost. The updated regression results are shown in figure 3.10. Most of the regression values are similar to the previous ones. If the new data had differed significantly from the previous data set, some investigation would be required to find out whether this was due to different data normalization procedures or a different data source, a change in technology or other driving factor, a simple mistake, or some other reas
47
Parametric Estimating Examples Multivariate CER Example Hypothesis: Cold iron hours and crew size will predict cost together better than either variable will do independently. In the site activation example, a single independent variable provided a well-fitting regression model for estimating cost. In other cases, analysts need to investigate multiple independent variables or cost drivers at the same time. A combination of expert opinion, organizational history, and data analysis determine whether or not to use multivariate regression models. There must be compelling reasons why two or more variables are judged to explain cost significantly better than a single variable. The analyst must use multiple regression analysis to create the CER when multiple independent variables are used together to predict cost.
This example is for the purchased services cost for the Operating and Support (O&S) estimate for a new Navy surface ship class. Purchased services are defined as the cost of all non-maintenance services purchased by the ship including printing, automatic data processing (ADP), laundry services, rental of boats and other port services, rent, utilities, and communications. It is hypothesized that the cost of purchased services will be positively correlated to both cold iron hours (i.e., the hours of the year that the ship is not steaming, or not running its engines) and the ship’s complement (i.e., the number of its crew). The assumptions are that the longer a ship stays in port, and the more sailors it has on board, the more it will have to pay in purchased services. Utilities are assumed to be the biggest cost driver. The data are shown in figure 3.11 for seven different historical ship classes. Figure 3.12 illustrates the scatter plot of the data which confirms that cold iron hours (blue) and complement (orange) are each positively correlated with purchased services. However, each leaves a significant part of the variation in purchased services costs unexplained (61% and 38%, respectively, or 1−R.. It is apparent when examining the scatter plots that whenever cold iron hours is a poor explanatory variable for a data point, complement generally is a good one, and vice versa. The data point for Frigate A, where purchased services equal $387.56K, cold iron hours is a poor explanatory variable (the point is well below the regression line in figure 3.12(b)), but complement is an excellent explanatory variable (the point is almost directly on the regression line in figure 3.12(a)). This indicates that although the frigate spends a small amount of time in port, it is the smaller complement that is driving the lower cost of purchased services. Conversely, for Destroyer C, purchased services equal $582.75K. The small complement is a poor explanatory variable, and the higher cold iron hours drives the higher cost. Figure 3.12(c) plots the two cost drivers against each other to visualize the lack of significant correlation between them. Multicollinearity occurs when there is a significant correlation between two cost drivers. When there is multicollinearity, it is unlikely that both cost drivers will be good explanatory variables at the same time. Multicollinearity, and how to deal with it, is discussed in Module 8 Regression Analysis. All these considerations indicate an outstanding multivariate CER which is confirmed by the regression statistics in figure 3.13. The negative intercept is irrelevant here, as the CER only applies within a certain range. Inserting reasonable values for cold iron hours and complement will ensure a positive cost. Never interpret the intercept value/constant term of a CER as a fixed cost when x=0. Note that 87% of variation is now explained compared to a maximum of 62% with either cost driver individually. The regression has an excellent CV of only 9.1%, and both independent variables are statistically significant although, again, the intercept is not.
48
In summary, different costing techniques are used during different stages of the acquisition cycle. Until actual cost data is available, the use of parametric costing techniques is the preferred approach for the development of cost estimates. Parametric estimating is often considered the hallmark of cost estimators. Parametric estimating can be viewed as a process consisting of four steps:
collecting data ( Module 4 Data Collection and Normalization), identifying cost drivers ( Module 6 Basic Data Analysis Principles), developing CERs ( Module 8 Regression Analysis), and building a parametric model. Parametric estimating uses CERs based on historical data from similar programs to predict a project’s cost. The independent variables that influence the amount of money spent on a project in these equations are known as cost drivers. COTS cost models are generally quicker to build and execute than those developed from scratch, but they suffer the disadvantages of lack of visibility into the underlying methodology and encouragement of an over-reliance on the models.
49
Related and Advanced Topics Cost Response Curves Cost Response Curves (CRCs) encapsulate an entire cost model in a single equation. They relate total or phase costs to some specific attribute or decision variable. CRCs are portable, easy to use, and they yield the same results within a very small margin of error, similar to a cost model. CRCs provide engineers with quick insight to parameter changes and cost model impact.
CRCs aren’t any more accurate than the cost model, but they provide the faster response needed by decision-makers and designers who don’t have formal cost training or easy access to the cost model. Figure 3.14 illustrates an example of a CRC. As with CERs, the x -axis is a parameter, and the y -axis is cost (in this case, the costs for the O&S phase). To develop the CRC, the cost model is run with several different values of the desired parameter (usually a performance parameter – in this case, maximum ship speed) over the desired range (e.g., in one-knot increments from 21 knots to 27 knots). The output values are then plotted on a graph as shown, and a curve is fitted to the points. That curve is the CRC.
50
Related and Advanced Topics Risk versus Uncertainty Analysts should understand the difference between uncertainty and risk. Uncertainty describes the variance a data point might experience in relation to the mean, median, or regression line. Risk involves measuring the uncertainty of a future event as it pertains to specific cost and schedule constraints. Risk seeks to identify the future event, the likelihood that it will happen, and the consequence if the risk is realized and then assesses how to account for the risk in the program cost event.
Consider as an example the growth over time of modal SLOC count as shown in figure 3.15. The graphic in figure 3.15 shows software development cost as a function of SLOC, with the SLOC count from the CARD shown as a most likely value. Uncertainty is portrayed as the distributions which radiate outward from the regression line. To properly account for the risk of software cost growth, demonstrated by the upward curve of the regression line, the analyst should update the risk bounds of the estimate instead of updating the inputs. However, assuming SLOC is more likely to grow than shrink, the analyst should put a right-skew triangular distribution on SLOC to shift the mean upward. This properly accounts for the risk of software cost growth.
51
Related and Advanced Topics Sensitivity Analysis Sensitivity analysis addresses uncertainty in the overall variability of the cost output. Some inputs will have more influence on the cost output than others. Sensitivity analysis can be used to determine which inputs have the most impact on the cost. Only one input variable is changed at a time and all other inputs are held at their base case values.
Monte Carlo simulations change two or more inputs simultaneously calculating a correlation coefficient for each. The correlation coefficient represents the statistical relationship between inputs as a number between -1 and +1. This range represents a perfectly negative correlation (-1) to a perfectly positive correlation (+1). A zero correlation indicates no correlation between inputs and the cost output. A chart is often used to visualize relative importance by means of rank correlation.
52
Related and Advanced Topics Parametric Analysis of Technical Parameters The parametric estimating approach can also be applied to technical parameters.
A Weight Estimating Relationship (WER) estimates the weight of an item by establishing a relationship using variables which drive the weight of an item. For example, the crew size of a ship may drive the amount of outfitting and furnishings (i.e. beds, chairs, etc.) required on a ship. A technique to measure the amount of outfitting and furnishings is by weight. In this case, the estimating relationship would be weight (as the dependent variable) as a function of the crew size of the ship. Neither the dependent nor independent variables are cost, but the steps to take to estimate the weight of the ship are the same. A Schedule Estimating Relationship (SER) predicts the amount of time it will take to complete a task[6]. See the Related and Advanced topics section of Module 2 Costing Techniques for more information about SERs.
53