-
Safe and Reliable
- Systems must be safe to protect people & property
- "Mission-critical" systems -- if electronics fail, someone
could die or lose lots of money
- Software & hardware must anticipate electronic & non-
electronic failure modes to at least fail "safe"
- Traditional fault-tolerant techniques work, but are expensive
- Replicated hardware (e.g., triplex modular redundancy)
- Distributed consensus
- High availability ("up-time") may come at the cost of
poor reliability (more things to break over the long term)
Design challenges:
- Realistic reliability predictions with commercial components
- Low-cost reliability -- without brute force redundancy
(probably requires a system-level approach)