Reliability Growth

Carnegie Mellon University
18-849b Dependable Embedded Systems
Spring 1998

Ying Shi

Abstract:

Reliability Growth is a widely accepted useful concept used as the basis for planning equipment reliability tests, assessing reliability improvement for changing equipment configurations. This particular benefit this concept brings up is the cost and time saving, with its future further attention in simpler model and higher coverage test, comparing with the traditional reliability demonstration test method. For embedded system, this concept benefits with development process in helping with the validation determination, particularly with critical systems such as medical, nuclear systems.

Introduction

The origin of reliability growth concept is generally attributed to J.T.Duane, an engineer at the Aerospace Electronics Department of the General Electronic Company. Duane developed the most commonly used model which is called the Duane model, named after its developer, and was published a paper in 1964 on the subject. In his paper, Duane brought up with a learning curve approach to reliability monitoring. Duane observed that different complex electromechanical and mechanical systems showed similar rates of improvement during system development. He further observed that the equipment development process could be characterized by continuing improvements in design and refinements in both operating and maintenance. Based on his observations, Duane concluded that he equipment development process was a learning process and could be described by a classical learning model equation. The learning curve theory is based on the observation that the cost of an item is a function of where in the manufacturing sequence the item was produced. The learning curve theory is usually attributed to J.P.Wright, who introduced a mathematical model describing a learning curve in a 1936 article in The Journal of Aeronautical Science titled "Factors Affecting the Cost of Airplane." Wright showed that the cumulative average direct labor input for an aircraft manufactured on a production line decreased in a predictable pattern. The decrease was obviously related to the increased proficiency (i.e., learning) of the manufacturing people on the line as they continued to perform the various repetitive tasks. The model that was published described the learning as an exponential function. One feature of the exponential function is that the relationship is linear when plotted on log-log graph paper. Duane observed that the process associated with improving equipment reliability during the early stages of development " is one of learning through failures. Knowledge of the applicable learning curve would provide a means of measuring and predicting reliability during this period of change." He also suggested this learning could be mathematically described by an exponential function, which when plotted on log-log paper appeared as a straight line. Correction of unexpected failure modes was how the learning would come about. The learning is accomplished through a " test, analyze, and fix" (TAAF) process, and the term TAAF became associated with, and almost synonymous with reliability growth.

Key Concepts

As somehow the concepts of and closely associated with reliability growth has been a jargon in the reliability research world for years, there are some researchers feel obliged to give some clarification of those definitions and their use, from their point of view.

Reliability Growth Testing [1]

Most often when the subject of reliability growth is discussed, it is reliability growth testing that is the focus of the discussion. Certainly, this focus on testing is neither surprising nor unreasonable. In general, testing, to prove the merit of a design and the validity of the models and analytical tools used to develop the design, is a necessary and standard part of development. In regard to reliability growth testing, much work has gone into developing the various statistical models developed for the purpose of planning and tracking reliability growth achieved through testing. Given the high cost of testing, the extensive effort to develop good models and the attention paid to the reliability growth test process are natural.

Coming out of the discussion as of whether reliability growth can be achieved without testing, generally speaking, the process is one of iteration. Iterations of the design are needed because the various performance requirements often conflict, optimizing the design to meet one requirement can result in the design failing to meet another requirement. Balancing the requirements is a demanding task. Iteration is also needed because not all analyses can be done simultaneously. consequently, the design may be changed as the result of a particular analysis, only to be changed again as the results of a subsequent analysis are available. As these iterations take place, the design is refined. Therefore, each revised design is an improvement over its predecessor. Some of the analyses conducted during the design process directly address the reliability of the design. So, the reliability of the design improves as successive design changes are made based on analytical evaluation.

Using the line of reasoning just presented, a broader definition of reliability growth could be developed : the process by which the reliability of an initial design is improved. Improvement can result as the design is iterated either on the basis of analytical evaluation and assessment or on test results. Ideally, when the product enters testing, all deficiencies have been eliminated through the design changes made as a result of analyses. So come up with this theory, the conclusion is that reliability growth does not have to be achieved with testing.

However, seldom is this ideal completely realized, and some design changes will be required as the result of design deficiencies discovered during development testing. And this is why a specific type of development testing often dedicated to the reliability growth process is brought up, the Reliability Growth Test.

Issues around reliability growth testing are coverage of test generation tool and testing results analysis specific to each type of testing.

Reliability Growth Model

As the trend during system development is the growing of system reliability, reliability growth models, each of them is tend to represent the growing trend, are acting as a guide help with measure and achieve this reliability growth resulting from improved software reliability and recovery algorithms. Many mathematical models exist nowadays, their basic working principle is to apply the testing results, data points, to the model, and based on the degree of matching or deviation from the model, judgment is made as of whether or not the system's development is coming towards reliability growth, and how much amount of refinement work need to be done right to this point of time.

The key that matters here with reliability growth model is how well each model represent the real system development process so that the refinement work done corresponding could really effectively and efficiently improve system reliability growth.

Duane showed a reliability engineering test process followed a predictable pattern, The underlying basis of this process is learning, in this case, redesign where there are unexpected failures. As with any physical process, randomness is to be expected. So our understanding from this is that the key issue for reliability growth model is the degree of its representing the system reliability growing trend. Now the fact is that more and more mathematical models come out, and tend to make those models more and more complex, which raises the question, is this diversity necessary. With the discussion going on, people show the preference of choosing simpler models.

Reliability Growth and Reliability Prediction[2]

As the concept of reliability growth has the in-born character of simplicity and easy understandability, it gets real popularly used. But this popularity also brings up the problem of improper applying of it in not a few cases and conditions, which make some of the researchers feel urgent to give it a clarification as of the use of this concept.

Conditions where reliability learning curve concept seems to fit:
models a single reliability development test activity; requires failure analysis and corrective actions as part of test activity, and applies to equipment that operates continuously.

Reasonable application of reliability learning curve:
Determine approximate reliability test time requirements; monitor rate of reliability improvement in test

Unreasonable use of this curve:
Predicting equipment reliability, either current or future; used to combine different types of reliability tests

The Objectives of Reliability Growth

The process of reliability growth has one primary objective - to improve the reliability of the design through analysis and test. The degree of possible improvement possible depends on the available resources, the underlying technology of the parts, components, and subsystems, and the knowledge of the design team.

Resources are always limited, so the reliability growth process must be as efficient as possible. A collateral objective of reliability testing, indeed, of all development testing, is to validate the models and tools used in creating the design.

The underlying technology is an obvious limiting factor in the degree of improvement possible in a design. Rather than relying on a continuous series of technological breakthroughs, design engineers must focus on the fundamentals and thoroughly understand the technologies at hand.

And the understanding of the design team is another constraint on the degree of improvement possible in the reliability, or any other performance characteristic, of a product. The models and tools used in creating a design reflect the current level of understanding of the technical community. To some extent, the models and tools are always inexact. By using test results to validate the models and tools and to revise or update them when we find they are not valid, our knowledge increases and the potential improvements possible for design increase.

Available tools, techniques, and metrics (I need do more work here, ying)

Go through the last decade conference proceedings of Annual R&M Symposium, vase amount of reliability growth models and testing methods/tools can be easily found.

Relationship to other topics

Software Reliability. The concept of reliability growth nowadays is more about software, as software reliability tend to be stable after it gets into the operational life stage, and so the pre-operational stage is the main emphasis for software to improve the reliability, or to achieve the reliability growth.
Software Testing. Reliability growth testing is one type of software testing as reliability growth is more of a concept that is applied to software reliability improvement during the system developing process.
Profit/Business Model. As mentioned in the previous sessions, Reliability growth model is providing information as to make a decision of where the system has achieved in its development stage, and make a prediction as of when this development process should end up due to the tradeoff between the system reliability requirement and business profit.

Conclusions

Reliability growth is a process that begins with the first efforts to conceptualize a design. As the design evolves, undergoing a series of analyses and tradeoffs, the overall performance of the design should improve. Reliability, one very important aspect of performance, also will improve. At some point, analytical methods, without additional information, are insufficient to continued improvement. Testing of a model or prototype product, in its entirety or some part thereof, then begins, and it is the information gained through testing that allow further improvements to be made.

Although improvement of a design's reliability is the primary objective of the reliability growth process, it is also an important means for improving our models and tools used in creating a design. Reliability growth testing, one aspect of the reliability growth process, is also being used to assess the level of product reliability being achieved. This used of RGT creates a dilemma for the developer. From the perspective of validating and understanding the design, failures are welcome events. They present an opportunity for learning and improvement. From the perspective of assessment, failures are not welcome. To help deal with this dichotomy of purpose, the ground rules of all testing must be well defined long before testing begins.

Annotated Reference List

"Reliability Growth"
Martin A. Meth, "Reliability-Growth Myths and Methodologies: A Critical View", 1992 Proceedings Annual Reliability and Maintainability Symposium
J.T.Duane, "Learning Curve Approach to Reliability Monitoring", IEEE Transactions on Aerospace, vol 2 No.2, April 1964

Loose Ends

Go To Project Page