| GUIDANCE | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| The Pricing Handbook
19. Software Pricing 19.10 Software Project Tracking And Measurement 19.10.1 Cost and Schedule Tracking 19.10.2 Performance Measurement Indicators
19.10 Software Project Tracking And Measurement Once an estimate has been completed and a project started, it is imperative that a reporting and measurement system be established to compare estimated and actual cost, schedule, and performance. Such a system will not only be invaluable for project management, but also can provide the data necessary for model calibration. SEI suggests that an organization have data collection and feedback processes that foster capturing and correctly interpreting data from work performed and entering them into an historical database in order to develop reliable cost estimates. Unfortunately, past endeavors in this area have not generally been successful; either the data was not collected or the data collected was inadequate or misleading. To effectively track and manage software programs, suitable metrics must be selected, and adequate data collection procedures must be established. This section discusses which metrics may be suitable for software management and the requirements for an effective data collection procedure. The purpose of this section is to provide a short overview of the subject to highlight the importance of this area. There are many handbooks and manuals that cover cost, performance and quality tracking metrics in depth. One excellent reference is DoD’s Joint Logistics Commanders handbook Practical Software Measurement: A Guide to Objective Program Insight (available for download at www.psmsc.com).
19.10.1 Cost and Schedule Tracking A basic system for tracking costs is necessary for effective software management. One such system which is used widely in Government agencies (DoD, IRS, NASA) is an earned value management system (EVMS). Figure 19-14 illustrates the basic terms used in an EVMS. Figure 19-14. Basic Earned Value Terminology
In the beginning of a program, a manager determines the total program dollars available (BAC) and a time phased expenditure profile (BCWS), which considers both the amount to be spent during each period (e.g., monthly) and the amount of work to be accomplished during that period. The time phased plan is expressed in the common denominator of dollars, but represents work scheduled to be accomplished by those dollars. This can be determined with the aid of one of the cost models discussed in Appendix 19D. Most of the software cost estimating models generate monthly schedule and resource usage profiles. The BAC is not usually equal to the TAB; the manager usually sets aside some money for management reserve that is used to cover risk and uncertainty. As the program progresses, the manager also tracks ACWP, which is the actual amount spent on the program, and BCWP, which is the value of the work actually performed or "earned value". To determine how well the program is progressing compared to what was planned, the manager computes a cost and schedule variance at the end of each reporting period using the following formulas: % Cost Variance = 100 * (BCWP - ACWP) / BCWP % Schedule Variance = 100 * (BCWP - BCWS) / BCWS Negative variances are an indication of problems with the program, especially if they are large. In the DoD environment, it is frequently mandated that the contractor explain variances greater than 10% to the managing DoD agency, including what action is planned to alleviate the cost and schedule problems. There are two challenges to using earned value for software. One is that, to be effective, data must be reported down to at least the CSCI level! If a CSCI is an entity that is managed separately, then it must be tracked separately. Unfortunately, many government projects in the past did not track costs to the CSCI level; therefore, the actual cost of these CSCIs is still unknown. Furthermore, cost and schedule overruns have been the hallmark of government software programs. Perhaps this situation would have been alleviated if an earned value management system had been used at the CSCI level to demonstrate problems early in a program where they are easier and much less costly to correct. The second challenge is in measuring work performed: How can a manager assess earned value for software programs? Most of the remainder of this section discusses metrics that can be useful with performance measurement systems to help assess earned value.
19.10.2 Performance Measurement Indicators Performance measurement indicators are metrics which help measure performance, or work performed, in contrast to quality metrics which are discussed later. For all metrics, there is currently a plethora of books and articles; however, there is not an abundance of standardization or agreement among writings. Some of these articles are listed in the chapter list of references. The Software Engineering Institute also has published a Software Measurement Guidebook [CMU/SEI-97-SR-019] that provides tracking and monitoring methods to evaluate status and earned value for projects. It is up to the manager responsible to independently determine a set of appropriate metrics for his or her program. In selecting metrics, a manager should first consider what constitutes a "good" metric. Several writers have proposed general criteria for "good" metrics. For example, Tom DeMarco, in his book Controlling Software Projects, states that metrics should be measurable, or quantifiable; independent from influence by project personnel; accountable, in that data can be collected; and precise, in that the degree of exactness can be specified. Watts S. Humphrey, in Managing the Software Process, states that metrics should be objective (versus subjective), explicit (versus derived), absolute (versus relative), and dynamic (versus static). Capers Jones categorizes metrics as hard, soft, and normalized. Hard metrics can be quantified with little subjectivity; soft metrics require human judgment; and normalized metrics are hard metrics which should only be used for comparison and not for absolute measures. Hard metrics, of course, are most desirable, but programs may require some soft or normalized metrics. In their article "Using Earned Value for Performance Measurement on Software Development Projects," David S. Christensen and Daniel V. Ferens have proposed a list of metrics criteria, partly based on surveys of program managers at the Air Force’s Aeronautical Systems Center. Their research was based on the thesis of Bradley J. Ayres and William M. Rock entitled Development of a Standard Set of Software Indicators for Aeronautical Systems Center. The criteria for metrics include explicit, absolute, and objective, as discussed above. Two other criteria are that the metric be timely, or available early in a program; and relevant, or appropriate for the specific program being measured. The surveys by Ayres and Rock identified relevance as the key criterion for a metric. The next challenge for a manager, therefore, is to determine which metrics are most relevant for his or her program. Again, the 1992 study by Ayres and Rock identified a set of metrics which were most relevant for Aeronautical Systems Center programs. Christensen and Ferens refined the list of metrics and determined their suitability for use in earned value programs. The seven metrics and their roles in earned value management are explained in the text that follows. Table 19-12 summarizes how well they meet the criteria as a qualitative metric. Table 19-12. Summary of Metrics and Qualities
Requirements and Design Progress This metric illustrates the number of CSCI requirements delineated during requirements analysis and preliminary design. The requirements are described in two documents: the Software Requirements Specification (SRS) and the Software Design Document (SDD). The planned requirements can be used for laying out a software development plan. As the CSCIs are written, they can be used as metrics to show progress in the development effort. For instance, if there are 100 CSCIs of approximately the same size planned to accomplish a certain requirement, the budget can be divided among those CSCIs. Progress would be measured by earning the value of that CSCI as it is written. This metric can be complicated because the total requirements may change during a program, and counting "completed" requirements can be difficult. If these limitations can be overcome, this metric is very valuable in assessing progress and demonstrating any problems early in a program. Code and Test Progress This metric illustrates the number of computer software units (CSUs) that have successfully been designed, coded and tested. This information is available from software development files or similar documentation. As in the requirements and design progress metric, planned and actual CSUs can be used to plan and measure progress. This metric is usually easier to assess than the requirements and design progress metric, but is not available until later in the program. Person-months of Effort This metric compares planned and actual person-months expended for a project and is useful in computing actual costs. It can also be useful in assessing overall program health, since using too many people for a project can result in cost overruns, while using too few can result in schedule slippage. Program Size This metric tracks the total size of the program (in either SLOC or function points), and the percentages of code that are new, reused, or modified. Since there is a direct relationship between size and effort, this metric may be helpful in assessing cost variances. For instance, if the SLOC are higher at a given point than were initially projected, this could help explain a cost overrun. However, this metric has drawbacks as a progress measurement tool since it lends itself to misuse because programmers can write additional code to feign progress; thus the false impression that progress is occurring is given. Another problem with using program size is that, early in a program, size must be estimated with one of the techniques discussed in section 19.4, which may not be very accurate. Therefore, using it to measure progress or even plan work is a questionable practice. For this purpose, it is best to use program size as a technical parameter to investigate cost variances based on other metrics. On the other hand, for pricing purposes, tracking program size is important to insure that current and projected size are consistent with the proposed level of effort. Computer Resources Utilization This metric is a measure of computer memory, timing, and input/output (I/O) channel capacity consumed by the software. It is closely related to the software size metric in that increases in size will result in increases in capacities used. Like size, this metric can be useful throughout the program to analyze cost and schedule variances. Additionally, it can help a manager perform hardware-software trade-off studies early in a program. Requirements Stability This metric is related to requirements and design progress. It tracks total requirements and the number of changes (additions, deletions, and modifications) to requirements. Numerous or frequent changes will result in additional effort consumed with no indication of progress; therefore, this metric can explain unfavorable cost and schedule variances. Design Stability This metric is similar to requirements stability and is related to code and test progress. It tracks total CSUs and the number of changes. Frequent changes result in additional effort consumed; therefore, this metric, like requirements stability, can help explain unfavorable variances. Like code and test progress, a limitation of this metric is that it is not useful until later in a program. These seven metrics are not the only metrics available for performance measurement. According to Lloyd K. Mosemann in his paper Guidelines for Successful Acquisition and Management of Software Intensive Systems, document completion can be used in determining whether milestones have been met. According to Ferens and Mark T. Hunter in their article "Use of Cost plus Award Fee Contracts for Software Development Efforts", some performance metrics can also be useful in assessing award fees for cost-plus-award fee contracts. Among them are the ones discussed above, along with compliance with development plans and number of change reports successfully resolved. As stated earlier, a manager must determine the best metrics for his or her program; however, the set of seven discussed above is a useful starting point.
19.10.3 Software Quality Metrics In addition to measuring performance, a manager frequently wants to measure the quality of a program. There are many metrics that can help a manager in these endeavors. Quality metrics can also be used with performance indicators to assess causes of cost and schedule variances. Several general categories of quality metrics and examples in each category are now discussed.
Defect Metrics Several cost models, including PRICE-S, SEER-SEM, SLIM, and CHECKPOINT, estimate defects in addition to software cost and schedule. SLIM and CHECKPOINT are especially sophisticated in this area; they provide defect estimates for the entire life cycle of a software program. If the predictions of these models can be trusted, a manager can use defect metrics to compare estimated and actual defects. Also, some organizations such as the Air Force require that defect information be collected. Although often associated with coded products, defect metrics can also be used during requirements and design to measure errors in documentation or other products generated during these phases. One method of collecting defect information, as described by Mosemann, is to track not only the total number of defects, but also the number of defects open and closed, in order to assess how well an organization is correcting problems. According to Jones, it is also useful to track the "bad fix rate", the portion of fixes that either do not correct the problem or result in a problem somewhere else in the program or system. Other useful defect measures, according to Jones and Putnam, include defect severity levels and causes, defect distribution among modules (CSCs and CSUs), and defect removal efficiency, which may be assessed from the number of total and closed defects. As mentioned in section 19.7.5, defect metrics can be useful during software cost analysis to help the analyst determine whether the contractor’s estimated effort for identifying and fixing defects is consistent with the contractor’s/industry’s historical averages. Complexity Metrics These metrics measure the relative complexity of a program and can be useful in discovering problem areas. An example of complexity metrics, developed by Thomas J. McCabe and G. Gordon Schulmeyer is discussed in their paper "System Testing Aided By Structural Analysis." Such metrics measure the number of independent paths in a program or module, with "essential complexity" adjusted for use of structured coding techniques such as case or do-while statements. Lower McCabe’s values indicate simpler design and, hence, less effort. Other complexity metrics, according to Jones, include Haistead’s metrics, logical complexity, and design entropy, which is the tendency of the design to deteriorate when changes are made. One advantage of complexity metrics is that they are sometimes available relatively early in a program. Module Metrics These metrics measure properties of individual modules, such as CSCs or CSUs, and are useful during the design phases of a program. According to Roger S. Pressman in his book Software Engineering: A Practitioner's Approach, two useful module metrics are degree of cohesion and degree of coupling. Cohesion measures the internal strength of a module; and higher ratings (such as functional cohesion) are better. Coupling measures the degree of dependence on other modules, and lower ratings (such as no coupling or simple data coupling) are better. Another useful measure is span of control, or the number of modules directly under the control of one module. Values ranging from five to nine are usually desirable. Testing Metrics In addition to defect measures collected during testing, some other testing measures can be useful. Some testing metrics are also useful in performance measurement. Breadth of testing, or percentage of requirements demonstrated, and depth of testing can help a manager determine the confidence level he or she can place in a product. Test sufficiency, or an estimate of the number of errors remaining in a program, can be useful, but is often difficult to estimate. Seeded error discovery is a technique where an engineer independently inserts errors into a program that are supposedly representative of the types of errors expected. A measure is taken of the number of unseeded and seeded errors discovered; this measure can be used to determine total errors and testing efficiency. It can be problematic, however, in that seeded errors do not always match the types of errors that actually do occur. Also, the developer must insure the final product is not released with seeded errors still present! Product and Operational Metrics Although they are not available until the end of program development, product metrics are useful to a customer to assess the quality of the developed product. One example is adequacy of documentation, especially for final product documents such as the Software Product Specification or the User’s Manual. Another is the number of known problems remaining; software is often delivered with known problems (although this is highly undesirable). A related metric is product reliability, either measured in error count (e.g., number of defects per thousand SLOC) or mean-time-to-defect (MTTD), a software equivalent of the hardware mean-time-between-failure measure. When the software is operational, defects and MTTD should be continually tracked for quality assessment. Also, time to correct errors and availability can be useful measures. Finally, according to David P. Youll in his book Making Software Development Visible, the number of user problem reports can be a beneficial indicator of software quality. Youll recommends tracking the number and frequency of problem reports as well as the cause, severity, and time to correct problems.
Selection of metrics for a program is of little use if there is no way to collect the data for them. Therefore, an effective data collection system must be established. According to Humphrey, an effective data collection system, like software development, must begin with a plan and have specific objectives. The objectives of data collection can include understanding of a process, evaluation of a program, control of a program, or prediction of future trends and events. The data collection plan must have management support and the data collection procedures developed must consider the impact on the entire organization. Since data collection is an added expense in dollars and personnel time, the support of management is essential. Meanwhile, support by people actually collecting the data is also necessary, or they will perform a minimal effort and the data will probably be of questionable validity. According to Putnam, effective data collection requires that data be collected at regular intervals, be used to measure "actuals", and compare estimates and actuals to assess deviations (in other words, employ a performance measurement system). Also, a commitment to investigate deviations and act upon results is essential. The data collection itself can be augmented by some of the myriad of automated tools currently available. Taking action, however, is still a personnel issue for which there is no automation. Data collection requirements can be placed in the contract. Before this is accomplished, the agency must determine what metrics are to be used. For the Air Force, there is currently a policy that requires that certain "core" metrics be collected, and recommends several others be collected. Sometimes these can be negotiated with the developing contractor. Again, it is tantamount that software data be collected to at least the CSCI level! One additional benefit of data collection is that information for model calibration can be obtained. Currently, the Air Force’s Space and Missiles Center is managing a large database of over 2,500 programs with historical size, effort, and other data that can be used in model calibration and validation. Since all model developers recommend calibration of their cost models to an organization’s peculiar environment, efforts to collect calibration data should be actively pursued.
19.10.5 Software Tracking and Measurement Summary To effectively manage a software effort, a manager must collect relevant data to at least the CSCI level. Fortunately, there is a profusion of metrics available for software tracking and measurement. Unfortunately, there are few standardized lists of metrics available that cover a broad range of applications. The manager must select from the many available metrics that are most suitable for his or her program. Additionally, a manager must establish a viable data collection system that is supported by management and by the personnel responsible for collecting the data. Data for model calibration should be considered in software data collection efforts. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||