Notes on:
Reliability, Availability, Maintainability and Safety AssessmentVillemeur |
|
![]() |
Reliability, Availability, Maintainability and Safety Assessment, Alain Villemeur, John Wiley & Sons, Chichester, 1992. (two volume set; 746 pages+). ISBN 0 471 93048 2 (vol. 1) and 0 471 93049 0 (vol. 2).
This is a broad-ranging, multidisciplinary discussion of system-level dependability topics. It is translated from French originally published in 1991. The author works for the French electric utilities, and so the book is motivated by a nuclear power point of view. The scope and applicability is broader than just that, and includes some examples from the aerospace (the Concorde aircraft), and petrochemical industries. This is perhaps the most comprehensive view of methods, tools, and techniques discussed in an interdisciplinary framework that is available. However, because of its breadth it may not be definitive in all respects. In other words, it provides an excellent and comprehensive overview, but may not have all the latest details for any particular specialty area.
Volume 1 is a survey of mathematics, and 9 analysis methods. Volume 2 discusses multiple disciplines, automated tools, and case studies.
Topic coverage: (*** = emphasized; ** = discussed with some detail; * = mention)
| *** | Dependability | ** | Electronic Hardware | Requirements | |||||
| ** | Safety | ** | Software | * | Design | ||||
| Security | * | Electro-Mechanical Hardware | * | Manufacturing | |||||
| Scalability | Control Algorithms | * | Deployment | ||||||
| Latency | ** | Humans | Logistics | ||||||
| Affordability | * | Society/Institutions | Retirement |
Other topics: dependability math, data, common-cause failures, dependability assessment
Publisher Comments:
"Presents methods and techniques for assessing the reliability, availability, maintainability or safety of industrial systems. Describes the history of dependability concepts and methods and also defines the main concepts and principles of predictive analysis used. The second section is a detailed description of principles and methods. The third deals with the specific methods used in the fields of human factors, mechanics, software and safety assessment. The last section lists the main computer programs developed to assess dependability and common cause failures."
Volume 1 Publisher Comments:
"Opening with a brief survey of the history of dependability in industry, this volume, the first of two, presents the methods and techniques used to assess and measure the reliability, availability, maintainability and safety of industrial systems. Incorporating much of the recent research in this area, the author evaluates the main concepts and principles of predictive analysis. Numerous illustrative examples allow the reader to build a fundamental understanding of the techniques employed for measuring dependability, including Preliminary Hazard Analysis, Failure Modes and Effect Analysis and the Truth Table Method. More specific methods and practical examples are detailed in Volume 2 of this reference text. The comprehensive and accessible approach of these volumes will appeal to practising engineers in a range of areas, including mechanical, electrical, electronic, safety and reliability engineering. Researchers and advanced students will also find this book invaluable."
Volume 2 Publisher Comments:
"A detailed guide to the developing area of dependability, this comprehensive work in two volumes presents the methods and techniques used to assess and measure the reliability, availability, maintainability and safety of industrial systems. Building on the fundamental principles and methods of predictive analysis explained in Volume 1, Volume 2 concentrates on the specific methods used to solve reliability problems. Readers are shown how to overcome a wide range of common cause failures, including human factors, mechanics and software. Practical examples are detailed and the main computer programs which have been developed to assess dependability are evaluated. Case studies, extensive appendices and bibliographies conclude this indispensable reference text. This dependability assessment will appeal to practising engineers, researchers and senior students in mechanical, electrical, electronic, safety and reliability engineering."
Reviews:
Foreword by Paul Caseau xv
Foreword by Arnould d'Harcourt xvii
Acknowledgements xix
Preface xxi
Main Notations xxv
PART 1 INTRODUCTION TO DEPENDABILITY METHODS I
1 Systems dependability history 3
1.1 From the beginning of the industrial age to
the 1930s 3
1.2 The 1940s 5
1.3 The 1950s 6
1.4 The 1960s 8
1.5 The 1970s and 1980s 10
References 13
2 Main concepts 15
2.1 Introduction 15
2.2 Reliability: definitions 15
2.3 Quality and reliability 17
2.4 Dependability 17
2.5 Systems and components 19
2.5.1 Definition 19
2.5.2 Nature of systems 21
2.5.3 Main characteristics of a system 21
2.6 Failures: definitions and classifications 22
2.6.1 Definition of failure 22
2.6.2 Classification as to suddenness 23
2.6.3 Classification as to degree 23
2.6.4 Classification of failures combining
suddenness and degree 23
2.6.5 Classification according to the date of
their occurrence in the system lifetime 23
2.6.6 Classification as to effects 25
2.6.7 Classification as to causes 26
2.7 Fault 27
2.8 Failure modes 28
2.9 Defect, failure, fault 30
References 31
3 The principle of predictive analysis 32
3.1 Predictive analysis of system dependability 32
3.1.1 System analysis 32
3.1.2 Dependability prediction 33
3.1.3 Methods of analysis 34
3.1.4 Inductive and deductive approaches 34
3.2 Principal stages 35
3.2.1 Functional and technical analysis 35
3.2.2 Qualitative analysis 35
3.2.3 Quantitative analysis 37
3.2.4 Synthesis and conclusions 38
3.3 Principal characteristics 38
3.3.1 Interactive nature 39
3.3.2 Iterative nature 39
References 40
4 Dependability mathematics 41
4.1 Event algebra 41
4.1.1 Main definitions 41
4.1.2 Boolean algebra 42
4.2 Event probabilities 44
4.2.1 Poincare's theorem 45
4.2.2 Conditional probabilities theorem 45
4.2.3 Total probability theorem 46
4.2.4 Bayes' theorem 46
4.2.5 Epistemology of probabilities 46
4.3 Random variables 47
4.3.1 Definition 47
4.3.2 Main distributions used 48
4.4 Stochastic processes 53
4.5 Fundamental relations for dependability 54
4.5.1 Definitions and main characteristics 54
4.5.2 Definitions of MTTF, MTTR, MUT, MDT and MTBF 55
4.5.3 Failure and repair densities, MTTF and MTTR 56
4.5.4 Failure and repair rates 57
4.5.5 Failure and repair intensities 58
4.5.6 Main relationships 59
4.6 Failure rate and MTTF for the main probability
distributions 61
4.7 Reliability and availability of an entity 63
4.7.1 Non-repairable entity 63
4.7.2 Repairable entity 63
4.8 Model for constant failure and repair rates 65
References 69
5 Dependability data 71
5.1 General considerations 71
5.1.1 Components 71
5.1.2 Collection of event data 72
5.2 Data processing 74
5.2.1 Dependability parameters 74
5.2.2 Parameter distributions 76
5.2.3 Estimator and confidence interval calculation 77
5.2.4 Bayesian parameter estimation 80
5.2.5 Assessment based on expert judgement 82
5.2.6 Failure rate modelling 84
5.3 Example of a data collection system 85
5.3.1 Information and data collection 86
5.3.2 Data processing 86
5.3.3 Data extraction 87
5.4 Data sources 87
5.4.1 Databases 87
5.4.2 Data source documents 90
References 95
PART 2 MAIN METHODS 99
6 Preliminary hazard analysis (PHA) 101
6.1 Introduction 101
6.2 Principles of the PHA 101
6.3 The preliminary hazard analysis in the
aeronautical industry 102
6.4 The preliminary hazard analysis in the chemical
industry 105
References 105
7 Failure modes and effects analysis (FMEA) 106
7.1 Introduction 106
7.2 Performance of an FMEA 107
7.2.1 Definition of the system, its functions
and components 107
7.2.2 Identification of the component failure
modes and their causes 108
7.2.3 Study of the effects of the component
failure modes 111
7.2.4 Conclusions and recommendations 112
7.3 Presentation of the analysis and its results 113
7.4 Illustrative exercise 115
7.4.1 Presentation of the exercise 115
7.4.2 Performance of the FMEA 117
7.5 Failure modes, effects and criticality
analysis (FMECA) 119
7.6 FMEA and defect analysis 120
7.6.1 'HAZOP'-type analysis 120
7.6.2 Safety study on fluid flow diagrams 121
7.7 FMEA application 122
7.8 Summary 123
References 123
8 Success diagram method (SDM) 125
8.1 Introduction 125
8.2 Principles for constructing a success diagram 125
8.3 Reliability assessment of a non-repairable system 127
8.3.1 Particular success diagrams 127
8.3.2 Complex success diagrams: general case 135
8.4 Dependability assessment of a repairable system 141
8.4.1 Availability calculation 142
8.4.2 The lambda-mu method 144
8.4.3 Remarks 144
8.5 Parts count method 145
8.6 Success diagram and cause tree method 146
8.7 Summary 146
References 147
9 Cause tree method (CTM) 149
9.1 Introduction 149
9.2 Description of the method 149
9.3 Basic concepts 153
9.3.1 Undesirable event 153
9.3.2 Logic gate representation 153
9.3.3 Event representation 155
9.3.4 Defects, failures and faults 155
9.3.5 Failure classes 157
9.3.6 Basic events 157
9.4 Cause tree construction: principles 159
9.4.1 Identification of immediate, necessary
and sufficient causes (first principle) 159
9.4.2 Classification of intermediate events
(second principle) 161
9.4.3 Analysis of component defects
(third principle) 162
9.4.4 Seeking the INS causes of intermediate
events until basic events are
obtained (fourth principle) 162
9.4.5 Iterative approach (fifth principle) 162
9.4.6 Other rules 164
9.5 Illustrative exercise: cause tree construction 164
9.5.1 Cause tree analysis 164
9.5.2 Comments 168
9.6 Minimal cut sets and prime implicants 169
9.6.1 Definition of minimal cut sets 169
9.6.2 Determination of minimal cut sets 170
9.6.3 Prime implicants 173
9.7 Illustrative exercise: determination of minimal
cut sets 174
9.7.1 Reduction of the cause tree 174
9.7.2 Comments 176
9.8 Quantitative analysis 177
9.8.1 Unrepairable system 178
9.8.2 Repairable system 183
9.8.3 Use of specific logic gates 188
9.8.4 Interdependencies of basic events 188
9.9 Analysis of phased-mission systems 189
9.9.1 Availability computation 190
9.9.2 Reliability computation 191
9.10 Method application 194
9.11 Summary 194
References 195
10 Truth table method (TTM) 197
10.1 Introduction 197
10.2 Principles 197
10.3 Illustrative exercise 199
10.4 Summary 200
11 Gathered fault combination method (GFCM) 202
11.1 Introduction 202
11.2 Presentation of the method 202
11.3 Main characteristics of the method 207
11.3.1 Grouping of faults with identical effects 207
11.3.2 Criteria for selecting failure combinations 208
11.3.3 Allowance for interactions between
elementary systems 209
11.3.4 Inductive and deductive nature of the method 209
11.3.5 Analysis simplification 210
11.4 Main characteristics of internal, external and
global gathered faults 210
11.4.1 Characteristics of internal gathered faults 210
11.4.2 Characteristics of external gathered faults 211
11.4.3 Characteristics of global gathered faults 211
11.5 Illustrative exercise 212
11.5.1 Determination of internal gathered faults 212
11.5.2 Determination of external and global
gathered faults 214
11.5.3 Conclusions 215
11.6 Theoretical foundations of the method 216
11.6.1 Advantages of the two criteria 216
11.6.2 Fault combination effects-effects base 218
11.6.3 External and global gathered faults 220
11.6.4 GFCM and truth table reduction 221
11.6.5 Global gathered faults and minimal cut sets 222
11.6.6 Direct and indirect effects 223
11.7 Algorithm for the implementation of the GFCM 224
11.7.1 FMEA of elementary systems 224
11.7.2 Identification of internal gathered faults 225
11.7.3 Identification of external gathered faults 227
11.7.4 Identification of global gathered faults 229
11.7.5 Result analysis 230
11.8 Quantitative analysis 230
11.9 Method implementation 231
11.9.1 Analysis of a system 231
11.9.2 Analysis of a group of interacting
elementary systems 232
11.10 Summary 233
References 233
12 Consequence tree method (CQTM) 234
12.1 Introduction 234
12.2 Description of the consequence tree method 235
12.3 Construction of consequence trees 237
12.3.1 Deductive approach 237
12.3.2 Inductive approach 243
12.4 Theoretical foundations of the consequence
tree methodology 246
12.4.1 Initiating event and sequences 246
12.4.2 Reduction and development of the
consequence tree 247
12.4.3 Boolean reduction 249
12.5 Illustrative exercise 251
12.5.1 Deductive approach 252
12.5.2 Inductive approach 255
12.6 Quantitative analysis 256
12.6.1 Independent events 257
12.6.2 Dependent events 258
12.7 Summary 262
References 262
13 Cause-consequence diagram method (CCDM) 264
13.1 Introduction 264
13.2 Description of the method 264
13.3 Construction of the cause-consequence diagram 267
13.4 Illustrative exercise 269
13.5 Connections with other methods 272
13.6 Use of the method 273
13.7 Summary 273
References 273
14 State-space method (SSM) 275
14.1 Introduction 275
14.2 Principles of the method 275
14.3 Availability of repairable systems 278
14.3.1 System state equations 278
14.3.2 Solution 279
14.3.3 Mean state duration and asymptotic frequency
of encountering given states 283
14.3.4. Minimal cut sets and states 286
14.3.5. State sequences 289
14.4 Reliability of repairable systems 291
14.4.1 System state equations and solution 291
14.4.2 Minimal operating states method 295
14.4.3 State sequence method 297
14.4.4 Minimal cut sets and states 301
14.5 Repairable system maintainability 304
14.6 Reliability, availability and maintainability of
simple repairable systems 306
14.6.1 Time-dependent availability and reliability 306
14.6.2 Measures for a series system 306
14.6.3 Measures for a parallel system 314
14.6.4 Measures for a parallel m/n system 319
14.7 Reliability, availability and maintainability
of large repairable systems 320
14.8 Non-Markovian processes 322
14.9 Semi-Markovian processes 323
14.9.1 Calculating availability 324
14.9.2 Calculating reliability 325
14.9.3 Calculating the number of passages
through the states 326
14.9.4 Examples 328
14.10 Consequence trees and semi-Markovian processes 330
14.11 Summary 334
References 334
Appendix 1 Main definitions 336
Index 1-1
Foreword by Paul Caseau xv
Foreword by Arnould d'Harcourt xvii
Acknowledgements ixx
Preface xxi
Main Notations xxv
PART 3 SPECIFIC METHODS 365
15 Dependent and common-cause failures 367
15.1 Introduction 367
15.2 lnterdependencies between failures 368
15.3 Examples of critical dependencies among failures
in the aeronautical, oil and nuclear
industries 370
15.4 Common-cause and cascade failures 373
15.4.1 History of these concepts and their
introduction 373
15.4.2 Definitions 374
15.4.3 Dependent failures and primary, secondary
and command failures 375
15.5 Classification of common-cause failures
according to their nature 376
15.5.1 Environmental hazards 378
15.5.2 Design errors 379
15.5.3 Manufacturing errors 382
15.5.4 Assembly errors 383
15.5.5 Operating errors 383
15.6 Using event reports to study common-cause
failures 384
15.7 Dependent failure assessment methods 387
15.7.1 Main assessment methods 388
15.7.2 Other methods 391
15.8 Dependent and common-cause failure probabilities 394
15.8.1 Explicit methods 394
15.8.2 Parametric methods 395
15.9 Importance of these failures and means to
prevent them 400
References 402
16 Human factors 405
16.1 Introduction 405
16.2 Historical overview 405
16.3 Behaviour of the human operator 407
16.3.1 Functioning of the human operator 407
16.3.2 Some outstanding characteristics 411
16.3.3 Classifying tasks 413
16.3.4 Human errors 413
16.4 Major concepts 418
16.5 Phases of a human reliability assessment 421
16.5.1 Identification of potential human errors 422
16.5.2 Selecting significant errors 423
16.5.3 Detailed analysis of significant errors 424
16.5.4 Integration with system modelling 425
16.5.5 Quantification 426
16.5.6 Presentation of the approach used and
its results 426
16.5.7 Some remarks 426
16.6 Quantification models 427
16.6.1 TESEO 427
16.6.2 THERP 428
16.6.3 HCR 432
16.6.4 Simulation models 432
16.6.5 Conclusions 433
16.7 Data 433
16.7.1 Data collection 434
16.7.2 Data banks 437
16.7.3 Range of values 439
References 441
17 Mechanics 446
17.1 Introduction 446
17.2 General considerations 447
17.3 Stress and strength 448
17.4 Fatigue 449
17.4.1 Statistical approach 449
17.4.2 Probabilistic fracture mechanics 453
17.5 Dependability of mechanical systems 457
References 458
18 Software domain 461
18.1 Introduction 461
18.2 Concepts 462
18.3 Main features of the software system 466
18.3.1 Life-cycle stages 466
18.3.2 Software testing 467
18.3.3 Quality assurance 469
18.3.4 Reliability 470
18.4 First indications on software reliability 471
18.4.1 Complexity measurements 471
18.4.2 Test-related measurements 473
18.5 Software reliability prediction 474
18.5.1 Introduction to models 474
18.5.2 'Perfect debugging' model 475
18.5.3 'Imperfect debugging' model 480
18.5.4 'Random debugging' model 482
18.5.5 'Bugs with different occurrence rates' model 483
18.5.6 Parametric models 484
18.5.7 Model validation 484
18.5.8 Conclusions 487
18.6 Reliability of a software system 487
18.7 Availability and maintainability of a software
system 491
18.8 Fault-tolerant software systems 494
18.8.1 Architecture 494
18.8.2 Reliability 495
18.8.3 Examples 496
18.9 Computer system dependability 497
References 498
19 Assessing safety 504
19.1 Introduction 504
19.2 The risk concept 505
19.3 The actual risks 506
19.3.1 Risks from human and natural sources 506
19.3.2 Risk acceptance 508
19.3.3 Risk perception 511
19.4 Probabilistic risk assessment (PRA) 514
19.4.1 Nuclear industry 514
19.4.2 Petrochemistry 523
19.5 Quantified safety goals 529
19.5.1 Principles of the approach 529
19.5.2 Aircraft 531
19.5.3 Chemical plants 534
19.5.4 Off-shore oil rigs 534
19.5.5 Nuclear power plants 536
References 541
PART 4 COMPUTERIZED METHODS 547
20 Computer codes for dependability assessment 549
20.1 Introduction 549
20.2 Cause tree analysis codes 550
20.2.1 Qualitative analysis computer codes 551
20.2.2 Quantitative analysis computer codes 553
20.2.3 Computer codes for qualitative and
quantitative analysis 554
20.2.4 Computer codes for direct analysis 559
20.3 Codes for the consequence tree method 560
20.4 Codes for the state-space method 562
20.5 Codes for common-cause failure analysis 564
20.6 Codes for uncertainty evaluation 565
20.7 Monte Carlo simulation codes 566
20.7.1 Principle 567
20.7.2 Distribution generation 568
20.7.3 Simulation characteristics 569
20.7.4 Application 569
20.8 Miscellaneous programs 569
References 572
21 Automatic assessment of dependability 679
21.1 Introduction 579
21.2 Modelling by logical operators and inductive
analysis: the GO program 580
21.2.1 Introduction 580
21.2.2 Modelling by logic operators 580
21.2.3 The analysis principles 582
21.2.4 The GO program 583
21.3 Modelling by decision tables and cause tree
construction: the CAT program 583
21.3.1 Introduction 583
21.3.2 Modelling with decision tables 584
21.3.3 Cause tree construction 587
21.3.4 The CAT program 588
21.4 Electronic gate-based modelling and failure
simulation: ESCAF and S.ESCAF systems 589
21.4.1 Introduction 589
21.4.2 Modelling and analysis principles 590
21.4.3 ESCAF 591
21.4.4 S.ESCAF 592
21.5 Using expert systems for modelling: the EXPRESS
program 593
21.5.1 Introduction 593
21.5.2 How does an expert system work? 594
21.5.3 Example of a system reliability analysis 596
21.5.4 Prospects 600
21.6 Conclusion 600
References 600
PART 5 CONCLUSIONS 603
22 Dependability assessment approach 605
22.1 Introduction 605
22.2 Advantages and drawbacks of the various
analysis methods 605
22.3 Comparison of the various analysis methods 609
22.3.1 Inherent features 609
22.3.2 System-dependent features 611
22.4 Criteria for choosing the methods 612
22.5 Dependability assessment: approach definition 613
22.6 Limitations of dependability assessment 615
22.6.1 Limitations of the qualitative assessment 615
22.6.2 Limitations of the quantitative assessment 616
22.7 Validation of the dependability assessment 617
22.8 Organization and management of dependability
assessment 619
References 620
23 Dependability assessment application 621
23.1 Introduction 621
23.2 System design 622
23.2.1 Dependability assessment in design 622
23.2.2 Deterministic design and probabilistic
design 623
23.3 System operation 625
References 626
PART 6 CASE STUDIES 627
24 Analysis of a set of elementary systems by
different methods 629
24.1 Goals 629
24.2 Presentation of the system 629
24.2.1 General description 630
24.2.2 Detailed description of the elementary
systems 631
24.3 Failure modes and effects analysis 633
24.4 Cause tree method 645
24.4.1 Beginning of the construction 645
24.4.2 Cause tree 648
24.5 Gathered fault combination method 653
24.5.1 Identification of internal gathered faults 653
24.5.2 Identification of external and global
gathered faults 653
24.5.3 Undesirable event 662
24.5.4 Discussion 663
24.6 Consequence tree method 664
24.6.1 Initiating events 664
24.6.2 Consequence tree 665
24.6.3 Links with the gathered fault combination
method 666
24.7 Cause-consequence diagram method 667
24.8 Quantitative analysis 670
24.8.1 Occurrence probability of the undesirable
event 670
24.8.2 Conclusions; lessons 672
24.8.3 Quantitative analysis method: discussion 673
References 674
25 Human reliability assessment 676
25.1 Presentation of the example 676
25.2 Identification of potential human errors 676
25.3 Selecting significant errors 678
25.4 Detailed analysis of significant errors 678
25.5 Integration with system modelling 679
25.6 Quantification 680
25.7 Results 682
APPENDICES 685
Appendix 1 Main definitions 687
Appendix 2 Availability of a component and a system
on standby and periodically tested 715
A2.1 Introduction 715
A2.2 Availability of a component on standby and
periodically tested 715
A2.3 Mean availability 721
A2.4 Optimizing intervals between tests 723
A2.5 Availability of a system on standby and
periodically tested 724
References 724
Appendix 3 Importance factors 725
A3.1 Introduction 725
A3.2 Importance factors 725
References 727
Appendix 4 Assessment of uncertainties 728
A4.1 Introduction 728
A4.2 Processing of uncertainties 729
A4.3 Propagation of uncertainties 730
A4.3.1 Monte Carlo-type simulation 730
A4.3.2 Moments method 731
References 732
Appendix 5 Operating rules for cases where a safety
system is observed to be unavailable 733
A5.1 Introduction 733
A5.2 Method 734
A5.3 Examples of application 735
A5.3.1 Aeronautical applications 735
A5.3.2 Nuclear applications 736
References 737
Appendix 6 Semi-Markovian process: MUT, M17R and M17F 739
A6.1 MUT, MTTR and MTTF calculation 739
A6.2 Asymptotic frequency method 741
A6.3 Asymptotic duration method 742
A6.4 Examples 743
Reference 744
Appendix 7 International standards on dependability 745
Go to: other books | resource page
Philip Koopman: koopman@cmu.edu