End Of Life Wearout & Replacement

Carnegie Mellon University
18-849b Dependable Embedded Systems
Spring 1998

Authors: Michael Collins

Abstract:

As systems approach the end of their usable lifetime, individual components may fail without the integrity of the system being compromised. In some cases, it is possible to continue to use a system even as its component subsystems fall apart. For various reasons, operators may elect to operate systems after their expected lifespan, and designers should be aware of the factors that lead to these decisions.


Related Topics:

Contents:


Introduction

Mortality calculations are an important part of a product's life cycle. Ideally, an engineer should know how long a system will last for, and a company can then know how long they will have to provide maintenance services for that system. From the customer perspective, the lifespan calculation serves as a useful life estimate: the system should be replaced before or at the end of its lifespan and customers, for a multitude of reasons, generally do. However, there are mitigating circumstances which may keep a system operating after its estimated lifespan. These circumstances can range from customer loyalty to a product line to spectacularly botched system upgrades; and despite the best intentions and loudest warnings of system designers, systems will be used past the end of their lifespan. It is consequently important not only to understand that this does happen, but why.

The Bathtub Curve

The bathtub curve illustrates the expected failure rates for systems. As can be seen, systems start out with a high failure rate (the infant mortality period), then settle down to a life of fairly stable operation. However, as the system reaches the end of its lifespan, its failure rate increases once again as various physical failures accumulate: lubricants go dry, metal rusts, rubber becomes brittle, all the various processes of wear and tear eventually cause a system to fail regardless of how well it is designed. End of lifespan wearout is concerned with how systems behave once they reach this far end of the curve.

End of lifespan behavior is a somewhat thorny issue when dealing with safety critical systems. Systems should fail in a safe fashion, but the unpredictable nature of failure modes means that the safest option is to completely shut off and remove the system. Unfortunately, there are a variety of systems, such as Air Traffic Control systems, which require continuous operation. Frustratingly, these systems are also inordinately difficult to replace.

In other cases, shutting off a segment of a system may have political implications. Cars tend to be sold and resold, going through one or more income brackets with each sale. As cars become more sophisticated, certain expensive (and sophisticated) systems are becoming commonplace. Cruise control is a good example. A car without cruise control can still operate, but a faulty cruise control system can easily cost lives. While the obvious solution is to completely eliminate the cruise control system once it reaches the end of its safe lifespan, this raises unpleasant legal issues. As noted below, systems tend to be passed down in hierarchies - in the case of cars, the hierarchy is economic. Shutting down cruise control could be considered the equivalent of limiting safety to those who can afford it.

Safety regulations, and technological advances usually make the complete replacement of a system preferable to continually repairing a system. When dealing with electronic systems, the impetus to replace is further strengthened by Moore's Law: there is little reason to replace a ten year old system when present systems are at least one hundred times as powerful. In the case of consumer electronics and office equipment, systems rarely reach the end of their lifespan because the external pressures to replace are just too great.

However, While eliminating a system may be a technically sound decision, there are valid reasons for keeping a system up to or past its lifespan. Economically, repairing a system is a cheaper short term cost than outright replacement, and when an organization is living close to profit margins, repair costs become more attractive. Although the aggregate cost may be more expensive than outright replacement, there are organizations which cannot produce enough capital to replace their systems outright. The former Soviet Union is running into this problem with certain infrastructural systems.

Beyond the economic factors, the single most powerful motivator for system maintenance is operator conservatism. Retraining is expensive, and in the case of critical systems, operators can't afford the learning curves and mistakes associated with learning everything from scratch[D'yakov96]. Consequently, video editors will continue to use Amiga computers as their primary editing platform long after the manufacturer has filed for bankruptcy.


Key Concepts


Available tools, techniques, and metrics

There really aren't that many tools available for end of life maintenance. As noted above, by the time you reach the end of the lifespan, the preferred solution is to junk the system. I have included several hyperlinks to repair societies and organizations which maintain antique systems as examples.

The Year 2000 problem has introduced several remediation tools which have gotten more sophistication as we approach the deadline. Arguably, the majority of repair decisions in end of life are managerial and economic, not necessarily technical.


Relationship to other topics


Conclusions

Systems will be used past their expected lifespan and engineers should recognize this when designing their systems. In particular, systems consist not only of the components, but the replacement and support chain for maintaining that system. If a system is seen as valuable, it will outlast the official support structure and build new support systems. The existence of societies and companies dedicated to maintaining antique systems indicates that the operators find value in the system beyond the calculated lifespan.

The wearout and replacement requirements surrounding electronic and mechanical systems differ. Although certain common factors (such as emissions quality standards) can motivate replacement for both systems, electronic systems usually reach obsolescence far before they reach end of life. Consequently, electronic systems usually are replaced before being repaired. Mechanical systems are usually repaired before replacement. Embedded systems are best considered as two distinct entities: the electronic/control component and the mechanical portion.


Annotated Reference List

There are few papers covering end-of-life wearout and replacement, the best ones I have found focus on the impending collapse of the former Soviet Union's infrastructure.

The following links provide a feeling for the repair and maintenance organizations and subcultures in existence.

Loose Ends

(Un)fortunately, this isn't a heavily researched topic, and most of the information I have acquired I have gotten through interviews. Looking at resale patterns would be intriguing, especially when relating to military hardware.


Go To Project Page