Unit 1 - Module 3 - Parametric Estimating Flashcards

Question

Calibration of a CER is defined as resetting the y-intercept(or equivalently the constant term) of the CER to force it topass through a desired point (pair of coordinates). It may beused for: Correcting an analogy Increase applicability of CER to less robust data sets In commercial models, insert input values corresponding to historical programs and to adjust various model parameters (e.g., complexity) to match historical actual costs Improve response to a certain performance parameter for the purposes of conducting a trade-off for Cost as an Independent Variable (CAIV) or Target Costing

Answer 1

Calibration does not mean force-fitting a source data point in a regression-based CER. Calibration should always be performed carefully and be clearly documented. With a calibrated CER, analysts must be even more cautious than usual about the range over which it is applied. It is generally acceptable to be within one standard deviation of x from the point of departure.

Answer 2

y = 26.491x + 82.756 R2=.9098 From the trendline* we found the following equation. Site activation cost can be calculated as $82,756 plus the # of workstations times the coefficient $26,491. Running the* regression output gives us the following information. We see the* R-squared which matches the information from the graph, and also the coefficient and intercept which form the equation. We can assess significance* by looking at the p-value for the coefficient as well as the F-statistic. In both cases, alpha is less than 0.05*, so our model is significant. We can also calculate the CV by dividing the standard error by the mean cost, which gives us 20.4%.

Answer 3

Consider the hypothesis: The operating costs of a cruise ship can be better predicted by crew size and ship port time than either variable individually. The longer a ship is in port and the larger its crew, the more expensive its operating costs will be. In the site activation example, a single independent variable provided a well-fitting regression model for estimating cost. In other cases, analysts need to investigate multiple independent variables or cost drivers at the same time. The analyst must use multiple regression analysis to create the CER when multiple independent variables are used together to predict cost. This example is for the estimate of operating costs for a new cruise ship. It is hypothesized that the operating costs will be positively correlated to both port hours (i.e., the hours of the year that the ship is not steaming, or not running its engines, while in port) and the ship’s crew size. The assumptions are that the longer a ship stays in port, and the more crew it has on board, the more it will have to pay in operating costs. The table provides data for seven ships. The scatter plot* of operating cost and port hours is positively correlated with purchased services, explaining 39% of the total variation. Crew size* is also positively correlated and explains 62% of the variation in data. It is apparent when examining the scatter plots that whenever port hours is a poor explanatory variable for a data point, crew size generally is a good one*, and vice versa. With the data point for the luxury liner, where operating costs equal $387.56K, port hours is a poor explanatory variable (the point is well below the regression line, but crew size is an excellent explanatory variable (the point is almost directly on the regression line. This indicates that although the luxury liner spends a small amount of time in port, it is the smaller crew size that is driving the lower cost of purchased services. Now we will plot* the two cost drivers against each other to visualize the lack of significant correlation between them. Multicollinearity occurs when there is a significant correlation between two cost drivers. When there is multicollinearity, it is unlikely that both cost drivers will be good explanatory variables at the same time. This graph indicates no multicollinearity between port hours and crew size. All these considerations indicate an outstanding multivariate CER which is confirmed by the regression statistics shown here. Note that the negative intercept is irrelevant here, as the CER only applies within a certain range. Inserting reasonable values for port hours and crew size will ensure a positive cost. Never interpret the intercept value/constant term of a CER as a fixed cost when x = 0. Note that 87% of variation is now explained compared to a maximum of 62% with either cost driver individually. Both independent variables are statistically significant although, again, the intercept is not The regression has an excellent CV of only 9.1%.

Answer 4

A combination of expert opinion, organizational history, and data analysis determine whether to use multivariate regression models. There must be compelling reasons why two or more variables are judged to explain cost significantly better than a single variable.

Answer 5

This module provides an overview of parametrics. It covers the process of building parametric models and describes how those models are used. The module discusses cost estimating relationships (CERs) and introduces the topics of data use, CER development, and development of complex models. The proper development and application of CERs depend on understanding the associated mathematical and statistical techniques. This module provides general guidance for use in developing and employing valid CERs, including differences between simple and complex CERs, techniques for developing and implementing CERs, including linear regression ordinary least squares (OLS) "best-fit" models.

Answer 6

cost drivers and cost passengers, parametric analysis inputs and outputs, and parametric cost models. The Advanced Topics section describes practical applications of parametric analysis including CER development, Cost Response Curves (CRCs), sensitivity analysis, Schedule Estimating Relationships (SERs), and Weight Estimating Relationships (WERs).

Answer 7

This module will also examine non-linear functional forms, such as power, exponential, logarithmic, and even polynomials functions. Non-linear functional forms provide more versatility in the parametric modeling toolkit but are also more challenging. In general, analysts use curve fitting, the process of determining the functional equation graph which best fits a scatter plot of data, to turn normalized data into CERs

Answer 8

Analogy: This approach is a comparative analysis of similar systems. Analogy uses elements of similar projects as a basis for a future/new system’s cost. It can be used early in programs before detailed requirements are known. Analogous estimates provide no objective test of validity. Parametric Estimating: This approach uses a mathematical relationship between a parameter and a cost. The historical data needed to support parametric estimating can be difficult to obtain, but the relationships derived from this technique can be easily adjusted for requirements changes and can provide statistical results to assess quality. Engineering Build-Up: This estimating approach builds up cost by beginning at lower levels and rolling results up to produce higher-level estimates. A high-level of data detail is required and may involve deep knowledge of the project and extensive collaboration with stakeholders. Missing or incorrect data may increase the risk of omitting costs. Engineering build-up produces details that are easily traceable, though this technique can be expensive.

Answer 9

Cost drivers that support parametric analysis are identified through examination of historical costs as well as technical and programmatic data. The Parametric Estimating Handbook (PEH) provides the following definition[2]: Parametric estimating is a technique that develops cost estimates based upon the examination and validation of the relationships which exist between a project's technical, programmatic, and cost characteristics as well as the resources consumed during its development, manufacture, maintenance, and/or modification.

Answer 10

Cost drivers are the parameters, or independent variables, in the CERs that influence the amount of money spent on an item (i.e., cost), the dependent variable. These independent variables then become required inputs. Inputs are values that must be accurately determined for the CERs to produce the cost. Parametric relationships are used to estimate cost, schedule, weight, and other parameters as outputs.

Answer 11

versatility, objective quantitative inputs, easily traceable equations that clearly link input and output variables, and easily adjustable input parameters.

Answer 12

Analysts rely heavily on analogy and parametrics during concept and design stages. As more design detail becomes available, analysts begin to use build-up, although parametric estimating continues to play an important role. Actual costs are incorporated into cost estimates as Low-Rate Initial Production (LRIP) or full-rate production begins. Actual cost experience on prototype units and early engineering development hardware can enlighten the estimate before production actuals become available. Manufacturers directly incorporate their own costs. Government analysts use data available via the Contractor Cost Data Reporting (CCDR) system.

Answer 13

1. Data Collection: Analysts search for data sources. Cost, schedule, and technical data are the raw materials for parametric relationships. 2. Cost Drivers: Analysts identify cost drivers. A hypothesis based on analogy, organization history, or expert judgment may provide insight to cost drivers with the highest confidence interval. Scatter plots are a visualization tool for uncovering underlying relationships in the data. Cost drivers are explored via correlation of each independent element with cost. Analysts use cost drivers to capture the underlying engineering/physical causality in a complex system. 3. Cost Estimating Relationships (CERs): Analysts develop equations to capture CERs using linear models with one parameter or with more complex mathematical functions. 4. Parametric Model: Analysts build an inclusive parametric model containing all the parametric inputs and CERs for all the cost elements in an entirely parametric cost model approach. Analysts use common software programs or more sophisticated tools. Methodologies will often change throughout the life of the program; an exclusively parametric model may seem stochastic.

Answer 14

Lines Of Code (LOC) for software development, site deployment plans for IT installations, procurement of sub-systems or other materials from subcontractors or vendors, Test and Evaluation (T&E) schedules, physical system characteristics (e.g., weight or power), Technical Performance Measures (TPMs) (e.g., speed), resource staffing, and other operational parameters that reflect a concept of operations (CONOPs). Weight, power, and LOC are common cost drivers and are a matter of practice for most data collection efforts. For traceability, analysts document the sources, reasoning, assumptions, and raw data behind the values used to develop their CERs. Module 4 Data Collection and Normalization provides additional informatio

Answer 15

Subject Matter Experts (SMEs) who have a deep understanding of the particular system or type of system under consideration can provide insight into cost drivers. Cost analysts should make every effort to understand the technical and operational parameters of the system and search for those parameters that logically drive cost. Scatter plotting the data can visually demonstrate the relationships and make them easier to identify than numerical analysis alone. Scatter plots are discussed in greater detail in Module 6 Basic Data Analysis Principles. Some cost drivers may affect more than one cost element or may have ripple effects on the design or cost. However, parameters that drive design are not always the parameters that drive cost. As in data collection, identifying cost drivers is an iterative process. Analysts cannot identify cost drivers without data, but they may predict potential cost drivers and influence the data collection plan.

Answer 16

hardware or equipment, resource requirements, level of automation, technical requirements such as speed and accuracy, and life cycle support concerns such as reliability or maintenance philosophy. To illustrate cost drivers versus cost passengers, imagine a pea pod. A pea pod wraps around the peas and its design is determined by the number, shape, and size of the peas. Likewise, the hull of a ship or the airframe of an airplane is designed to carry the crew and the mission equipment of the vessel. Although a hull or airframe represents a significant portion of the system’s cost, it cannot easily be changed to influence that cost without dramatically affecting the requirements of the system. In these examples, the pea pod, hull, or airframe represents a cost passenger and the number, shape, and size of peas, crew members, and mission equipment represents the cost driver.

Answer 17

A rate predicts cost via a simple multiplicative relationship. Rates are always expressed in dollars per parameter. The most common rate is a labor rate, expressed in dollars per hour, for a certain labor category or type of work. Cost per function point or LOC are other common rates used in software estimating. A factor uses the cost of one element to predict the cost of another via a simple multiplicative relationship. Factors are cost on cost, without units, often expressed as a percentage. An example of a factor is System Engineering/Program Management (SE/PM) calculated as 20% of the program’s prime mission equipment (hardware and software). Analysts typically employ factors when incorporating industry data into the program estimate. Many organizations have a set of standard factors that can be used as guidelines or benchmarks. As a reference, the factors used by the FAA can be found at https://www.faa.gov/regulations_policies/policy_guidance/benefit_cost/. A ratio estimates effort and is defined as parameter on parameter. For example, if a program plans to perform Commercial Off-The-Shelf (COTS) software integration within the software baseline, a good ratio to use is the industry average of 1,200 lines of code per integrated package. These three terms are sometimes improperly interchanged. When encountered, ensure they are used in a manner consistent with these definitions. Rates, factors, and ratios are often the result of very simple calculations such as the arithmetic mean (average)

Answer 18

Select the Variables: Determine the independent (source) variable (e.g., LOC, number of deployments, COTS procurements, etc.), data collection methodology, and data type corresponding to the desired analysis. For example, for a cost per deployment CER or a cost per LOC CER, determine which costs are applicable to these elements. This collected cost data will be the dependent variable used to develop the CER. Test Relationships: After selecting the variables, test for associations that may exist between each variable and cost. Remove representative data from the working data set for validation later in the analysis. Scatter plot the working data set. Scatter plots display the strength, form, and direction of the relationship and may reveal trends, anomalies, or outliers. Determine the type of relationship for each variable. For more information on these techniques, refer to Module 6 Basic Data Analysis Principles. Perform Regression: After initial inspection and testing of the data, perform regression analysis to develop the equation to use in the model. This module uses the Ordinary Least Squares (OLS) approach detailed in Select CERs: Select the CER that best represents the data. Validate CERs: Finally, validate the CER on the representative data that was not used to develop the equation. It may be useful to perform additional graphical exploration and regression analysis. Use the Prediction Interval (PI) methodology to accurately capture uncertainty. Analysts often apply CERs outside the range of collected data since many programs attempt to build systems that are faster, bigger (or smaller), more dense, more accurate, etc., than their predecessors. The PI methodology[3] provides one method to capture uncertainty in OLS estimates, which increase as estimates deviate from the core data set. Refer to Module 6 Basic Data Analysis Principles and Module 8 Regression Analysis for more detail on PIs.

Answer 19

Calibration has a particular meaning in the context of commercial cost models. In that case, calibration means to insert input values corresponding to historical programs and to adjust various model parameters (e.g., complexity) to match historical actual costs. Calibration in this context makes these models more applicable to the system. Finally, a CER might have good statistics but may not be responsive to a certain performance parameter for the purposes of conducting a trade-off for Cost as an Independent Variable (CAIV) or Target Costing. To do this, calibrate the latter CER, which has worse statistics, to intersect a point generated by the better CER as shown in figure 3.4. The solid blue line represents the original CER, and the dotted yellow line represents the calibrated CER. In this graphic, the original CER passing through Calibration does not mean force-fitting a source data point in a regression-based CER. Calibration should always be performed carefully and be clearly documented. With a calibrated CER, analysts must be even more cautious than usual about the range over which it is applied. It is generally acceptable to be within one standard deviation of from the point of departure.

Answer 20

re-running the CER with the new augmented data set, re-calibrating the CER to pass through the new data point, changing the methodology altogether, or leaving the model unchanged.

Answer 21

These estimates include all costs from project conception, to maintenance, and ultimately disposal. A complex parametric model may also include math and/or logic to account for time phasing, inflation, allocations, Monte Carlo simulation, schedule estimating, learning curve analysis, earned value management, and/or other adjustments. The costs are also allocated or mapped to the program’s WBS. The model builder can create or build in logic to enable Design-to-Cost or CAIV trades. The WBS forms the basis for the complex parametric model structure. Like all other models, the complex parametric model has inputs and outputs. The outputs of a complex parametric model may provide a range of cost estimates that can include risk and uncertainty, project team size, spend profile, activity schedules, facility requirements, etc. The outputs of the model should be specific to the goal of the analysis[2].

Answer 22

This example of the parametric estimating process develops a site activation CER in which the cost includes both the site survey and site installation costs for an Automated Information System (AIS). The CER will be based on the number of workstations per site. Assume the following cost driver relationship: the bigger an installation is, the more workstations it has. Therefore, site activation costs should be greater. The table in figure 3.5 provides the data set for eleven historical installations, with the number of workstations as the independent variable X and the normalized cost of site activation for each one of those installations as our dependent variable y. Assume that the cost data is normalized to ensure that the costs represent similar content in all programs and that they are all in the same constant year dollars. These issues will be discussed more fully in Unit II Cost Analysis Techniques. Figure 3.6 shows the scatter plot of the data which appears to confirm that the number of workstations is a cost driver for site activation cost. A trend line, the equation, and the R2 value are displayed on the chart. These will be used for formal regression analysis. Figure 3.7 shows the resulting regression statistics for the example. All of these statistics will be discussed in much greater detail in Module 8 Regression Analysis. The equation parameters, the slope, and the constant term are the coefficients in the summary output and are shown in red. This gives us our CER: site activation = $82,756+($26,491∗number of workstations) Note that while the equation shows costs to the nearest dollar, this equation is only accurate to the thousands of dollars based on the granularity of the data collected. The R2 value represents the ratio of explained variation to total variation in the data set. It provides an estimate of how explanatory the independent variable, cost driver, is. In this case, the number of workstations explains almost 91% of the variation in site activation costs. Therefore, it is an excellent cost driver. No matter how good the R2 value is the analyst must investigate whether the regression and its coefficients are statistically significant. The model as a whole is judged to be valid if the F statistic is significant, which is to say its corresponding p-value (labeled here as “Significance F) is less than the significance level using the customary α=0.05. The individual coefficients are statistically significant based on their t statistics (bigger is better) and corresponding p-values. These are all shown in green. The mathematics behind these statistics is detailed in Module 10 Probability and Statistics. In this case, from their minuscule p-values, the model as a whole and the explanatory variable are both highly significant. The constant term (intercept), however, is not statistically significant. This is often the case, but analysts usually accept the predicted constant term (y). By contrast, if one of the variables is not statistically significant, the model should be rejected. Note that for a single-variable regression, the t statistic of the variable and the F statistic of the model will always be identical. To update and to re-run the CER, collect any actuals created or discovered since the original creation of the CER. These are shown in the last four rows of the updated table in figure 3.8. Now the model includes data for additional sites. The four new data points are shown in orange and the original eleven in blue on the scatter plot in figure 3.9. The change in the equation of the best fit line is also shown in green. Notice that neither the coefficient of x, which captures the cost driving effect of the number of workstations on the site activation cost, nor the R2 value have changed much. The constant term, however, has changed quite a bit. The constant term is not statistically significant, though, and therefore does not have much effect on the final predicted cost. The updated regression results are shown in figure 3.10. Most of the regression values are similar to the previous ones. If the new data had differed significantly from the previous data set, some investigation would be required to find out whether this was due to different data normalization procedures or a different data source, a change in technology or other driving factor, a simple mistake, or some other reas

Answer 23

This example is for the purchased services cost for the Operating and Support (O&S) estimate for a new Navy surface ship class. Purchased services are defined as the cost of all non-maintenance services purchased by the ship including printing, automatic data processing (ADP), laundry services, rental of boats and other port services, rent, utilities, and communications. It is hypothesized that the cost of purchased services will be positively correlated to both cold iron hours (i.e., the hours of the year that the ship is not steaming, or not running its engines) and the ship’s complement (i.e., the number of its crew). The assumptions are that the longer a ship stays in port, and the more sailors it has on board, the more it will have to pay in purchased services. Utilities are assumed to be the biggest cost driver. The data are shown in figure 3.11 for seven different historical ship classes. Figure 3.12 illustrates the scatter plot of the data which confirms that cold iron hours (blue) and complement (orange) are each positively correlated with purchased services. However, each leaves a significant part of the variation in purchased services costs unexplained (61% and 38%, respectively, or 1−R.. It is apparent when examining the scatter plots that whenever cold iron hours is a poor explanatory variable for a data point, complement generally is a good one, and vice versa. The data point for Frigate A, where purchased services equal $387.56K, cold iron hours is a poor explanatory variable (the point is well below the regression line in figure 3.12(b)), but complement is an excellent explanatory variable (the point is almost directly on the regression line in figure 3.12(a)). This indicates that although the frigate spends a small amount of time in port, it is the smaller complement that is driving the lower cost of purchased services. Conversely, for Destroyer C, purchased services equal $582.75K. The small complement is a poor explanatory variable, and the higher cold iron hours drives the higher cost. Figure 3.12(c) plots the two cost drivers against each other to visualize the lack of significant correlation between them. Multicollinearity occurs when there is a significant correlation between two cost drivers. When there is multicollinearity, it is unlikely that both cost drivers will be good explanatory variables at the same time. Multicollinearity, and how to deal with it, is discussed in Module 8 Regression Analysis. All these considerations indicate an outstanding multivariate CER which is confirmed by the regression statistics in figure 3.13. The negative intercept is irrelevant here, as the CER only applies within a certain range. Inserting reasonable values for cold iron hours and complement will ensure a positive cost. Never interpret the intercept value/constant term of a CER as a fixed cost when x=0. Note that 87% of variation is now explained compared to a maximum of 62% with either cost driver individually. The regression has an excellent CV of only 9.1%, and both independent variables are statistically significant although, again, the intercept is not.

Answer 24

collecting data ( Module 4 Data Collection and Normalization), identifying cost drivers ( Module 6 Basic Data Analysis Principles), developing CERs ( Module 8 Regression Analysis), and building a parametric model. Parametric estimating uses CERs based on historical data from similar programs to predict a project’s cost. The independent variables that influence the amount of money spent on a project in these equations are known as cost drivers. COTS cost models are generally quicker to build and execute than those developed from scratch, but they suffer the disadvantages of lack of visibility into the underlying methodology and encouragement of an over-reliance on the models.

Answer 25

CRCs aren’t any more accurate than the cost model, but they provide the faster response needed by decision-makers and designers who don’t have formal cost training or easy access to the cost model. Figure 3.14 illustrates an example of a CRC. As with CERs, the x -axis is a parameter, and the y -axis is cost (in this case, the costs for the O&S phase). To develop the CRC, the cost model is run with several different values of the desired parameter (usually a performance parameter – in this case, maximum ship speed) over the desired range (e.g., in one-knot increments from 21 knots to 27 knots). The output values are then plotted on a graph as shown, and a curve is fitted to the points. That curve is the CRC.

Answer 26

Consider as an example the growth over time of modal SLOC count as shown in figure 3.15. The graphic in figure 3.15 shows software development cost as a function of SLOC, with the SLOC count from the CARD shown as a most likely value. Uncertainty is portrayed as the distributions which radiate outward from the regression line. To properly account for the risk of software cost growth, demonstrated by the upward curve of the regression line, the analyst should update the risk bounds of the estimate instead of updating the inputs. However, assuming SLOC is more likely to grow than shrink, the analyst should put a right-skew triangular distribution on SLOC to shift the mean upward. This properly accounts for the risk of software cost growth.

Answer 27

Monte Carlo simulations change two or more inputs simultaneously calculating a correlation coefficient for each. The correlation coefficient represents the statistical relationship between inputs as a number between -1 and +1. This range represents a perfectly negative correlation (-1) to a perfectly positive correlation (+1). A zero correlation indicates no correlation between inputs and the cost output. A chart is often used to visualize relative importance by means of rank correlation.

Answer 28

A Weight Estimating Relationship (WER) estimates the weight of an item by establishing a relationship using variables which drive the weight of an item. For example, the crew size of a ship may drive the amount of outfitting and furnishings (i.e. beds, chairs, etc.) required on a ship. A technique to measure the amount of outfitting and furnishings is by weight. In this case, the estimating relationship would be weight (as the dependent variable) as a function of the crew size of the ship. Neither the dependent nor independent variables are cost, but the steps to take to estimate the weight of the ship are the same. A Schedule Estimating Relationship (SER) predicts the amount of time it will take to complete a task[6]. See the Related and Advanced topics section of Module 2 Costing Techniques for more information about SERs.

Unit 1 - Module 3 - Parametric Estimating Flashcards

(53 cards)