Notes on:
Reliability, Availability, Maintainability and Safety AssessmentVillemeur |
|
Reliability, Availability, Maintainability and Safety Assessment, Alain Villemeur, John Wiley & Sons, Chichester, 1992. (two volume set; 746 pages+). ISBN 0 471 93048 2 (vol. 1) and 0 471 93049 0 (vol. 2).
This is a broad-ranging, multidisciplinary discussion of system-level dependability topics. It is translated from French originally published in 1991. The author works for the French electric utilities, and so the book is motivated by a nuclear power point of view. The scope and applicability is broader than just that, and includes some examples from the aerospace (the Concorde aircraft), and petrochemical industries. This is perhaps the most comprehensive view of methods, tools, and techniques discussed in an interdisciplinary framework that is available. However, because of its breadth it may not be definitive in all respects. In other words, it provides an excellent and comprehensive overview, but may not have all the latest details for any particular specialty area.
Volume 1 is a survey of mathematics, and 9 analysis methods. Volume 2 discusses multiple disciplines, automated tools, and case studies.
Topic coverage: (*** = emphasized; ** = discussed with some detail; * = mention)
*** | Dependability | ** | Electronic Hardware | Requirements | |||||
** | Safety | ** | Software | * | Design | ||||
Security | * | Electro-Mechanical Hardware | * | Manufacturing | |||||
Scalability | Control Algorithms | * | Deployment | ||||||
Latency | ** | Humans | Logistics | ||||||
Affordability | * | Society/Institutions | Retirement |
Other topics: dependability math, data, common-cause failures, dependability assessment
Publisher Comments:
"Presents methods and techniques for assessing the reliability, availability, maintainability or safety of industrial systems. Describes the history of dependability concepts and methods and also defines the main concepts and principles of predictive analysis used. The second section is a detailed description of principles and methods. The third deals with the specific methods used in the fields of human factors, mechanics, software and safety assessment. The last section lists the main computer programs developed to assess dependability and common cause failures."
Volume 1 Publisher Comments:
"Opening with a brief survey of the history of dependability in industry, this volume, the first of two, presents the methods and techniques used to assess and measure the reliability, availability, maintainability and safety of industrial systems. Incorporating much of the recent research in this area, the author evaluates the main concepts and principles of predictive analysis. Numerous illustrative examples allow the reader to build a fundamental understanding of the techniques employed for measuring dependability, including Preliminary Hazard Analysis, Failure Modes and Effect Analysis and the Truth Table Method. More specific methods and practical examples are detailed in Volume 2 of this reference text. The comprehensive and accessible approach of these volumes will appeal to practising engineers in a range of areas, including mechanical, electrical, electronic, safety and reliability engineering. Researchers and advanced students will also find this book invaluable."
Volume 2 Publisher Comments:
"A detailed guide to the developing area of dependability, this comprehensive work in two volumes presents the methods and techniques used to assess and measure the reliability, availability, maintainability and safety of industrial systems. Building on the fundamental principles and methods of predictive analysis explained in Volume 1, Volume 2 concentrates on the specific methods used to solve reliability problems. Readers are shown how to overcome a wide range of common cause failures, including human factors, mechanics and software. Practical examples are detailed and the main computer programs which have been developed to assess dependability are evaluated. Case studies, extensive appendices and bibliographies conclude this indispensable reference text. This dependability assessment will appeal to practising engineers, researchers and senior students in mechanical, electrical, electronic, safety and reliability engineering."
Reviews:
Foreword by Paul Caseau xv Foreword by Arnould d'Harcourt xvii Acknowledgements xix Preface xxi Main Notations xxv PART 1 INTRODUCTION TO DEPENDABILITY METHODS I 1 Systems dependability history 3 1.1 From the beginning of the industrial age to the 1930s 3 1.2 The 1940s 5 1.3 The 1950s 6 1.4 The 1960s 8 1.5 The 1970s and 1980s 10 References 13 2 Main concepts 15 2.1 Introduction 15 2.2 Reliability: definitions 15 2.3 Quality and reliability 17 2.4 Dependability 17 2.5 Systems and components 19 2.5.1 Definition 19 2.5.2 Nature of systems 21 2.5.3 Main characteristics of a system 21 2.6 Failures: definitions and classifications 22 2.6.1 Definition of failure 22 2.6.2 Classification as to suddenness 23 2.6.3 Classification as to degree 23 2.6.4 Classification of failures combining suddenness and degree 23 2.6.5 Classification according to the date of their occurrence in the system lifetime 23 2.6.6 Classification as to effects 25 2.6.7 Classification as to causes 26 2.7 Fault 27 2.8 Failure modes 28 2.9 Defect, failure, fault 30 References 31 3 The principle of predictive analysis 32 3.1 Predictive analysis of system dependability 32 3.1.1 System analysis 32 3.1.2 Dependability prediction 33 3.1.3 Methods of analysis 34 3.1.4 Inductive and deductive approaches 34 3.2 Principal stages 35 3.2.1 Functional and technical analysis 35 3.2.2 Qualitative analysis 35 3.2.3 Quantitative analysis 37 3.2.4 Synthesis and conclusions 38 3.3 Principal characteristics 38 3.3.1 Interactive nature 39 3.3.2 Iterative nature 39 References 40 4 Dependability mathematics 41 4.1 Event algebra 41 4.1.1 Main definitions 41 4.1.2 Boolean algebra 42 4.2 Event probabilities 44 4.2.1 Poincare's theorem 45 4.2.2 Conditional probabilities theorem 45 4.2.3 Total probability theorem 46 4.2.4 Bayes' theorem 46 4.2.5 Epistemology of probabilities 46 4.3 Random variables 47 4.3.1 Definition 47 4.3.2 Main distributions used 48 4.4 Stochastic processes 53 4.5 Fundamental relations for dependability 54 4.5.1 Definitions and main characteristics 54 4.5.2 Definitions of MTTF, MTTR, MUT, MDT and MTBF 55 4.5.3 Failure and repair densities, MTTF and MTTR 56 4.5.4 Failure and repair rates 57 4.5.5 Failure and repair intensities 58 4.5.6 Main relationships 59 4.6 Failure rate and MTTF for the main probability distributions 61 4.7 Reliability and availability of an entity 63 4.7.1 Non-repairable entity 63 4.7.2 Repairable entity 63 4.8 Model for constant failure and repair rates 65 References 69 5 Dependability data 71 5.1 General considerations 71 5.1.1 Components 71 5.1.2 Collection of event data 72 5.2 Data processing 74 5.2.1 Dependability parameters 74 5.2.2 Parameter distributions 76 5.2.3 Estimator and confidence interval calculation 77 5.2.4 Bayesian parameter estimation 80 5.2.5 Assessment based on expert judgement 82 5.2.6 Failure rate modelling 84 5.3 Example of a data collection system 85 5.3.1 Information and data collection 86 5.3.2 Data processing 86 5.3.3 Data extraction 87 5.4 Data sources 87 5.4.1 Databases 87 5.4.2 Data source documents 90 References 95 PART 2 MAIN METHODS 99 6 Preliminary hazard analysis (PHA) 101 6.1 Introduction 101 6.2 Principles of the PHA 101 6.3 The preliminary hazard analysis in the aeronautical industry 102 6.4 The preliminary hazard analysis in the chemical industry 105 References 105 7 Failure modes and effects analysis (FMEA) 106 7.1 Introduction 106 7.2 Performance of an FMEA 107 7.2.1 Definition of the system, its functions and components 107 7.2.2 Identification of the component failure modes and their causes 108 7.2.3 Study of the effects of the component failure modes 111 7.2.4 Conclusions and recommendations 112 7.3 Presentation of the analysis and its results 113 7.4 Illustrative exercise 115 7.4.1 Presentation of the exercise 115 7.4.2 Performance of the FMEA 117 7.5 Failure modes, effects and criticality analysis (FMECA) 119 7.6 FMEA and defect analysis 120 7.6.1 'HAZOP'-type analysis 120 7.6.2 Safety study on fluid flow diagrams 121 7.7 FMEA application 122 7.8 Summary 123 References 123 8 Success diagram method (SDM) 125 8.1 Introduction 125 8.2 Principles for constructing a success diagram 125 8.3 Reliability assessment of a non-repairable system 127 8.3.1 Particular success diagrams 127 8.3.2 Complex success diagrams: general case 135 8.4 Dependability assessment of a repairable system 141 8.4.1 Availability calculation 142 8.4.2 The lambda-mu method 144 8.4.3 Remarks 144 8.5 Parts count method 145 8.6 Success diagram and cause tree method 146 8.7 Summary 146 References 147 9 Cause tree method (CTM) 149 9.1 Introduction 149 9.2 Description of the method 149 9.3 Basic concepts 153 9.3.1 Undesirable event 153 9.3.2 Logic gate representation 153 9.3.3 Event representation 155 9.3.4 Defects, failures and faults 155 9.3.5 Failure classes 157 9.3.6 Basic events 157 9.4 Cause tree construction: principles 159 9.4.1 Identification of immediate, necessary and sufficient causes (first principle) 159 9.4.2 Classification of intermediate events (second principle) 161 9.4.3 Analysis of component defects (third principle) 162 9.4.4 Seeking the INS causes of intermediate events until basic events are obtained (fourth principle) 162 9.4.5 Iterative approach (fifth principle) 162 9.4.6 Other rules 164 9.5 Illustrative exercise: cause tree construction 164 9.5.1 Cause tree analysis 164 9.5.2 Comments 168 9.6 Minimal cut sets and prime implicants 169 9.6.1 Definition of minimal cut sets 169 9.6.2 Determination of minimal cut sets 170 9.6.3 Prime implicants 173 9.7 Illustrative exercise: determination of minimal cut sets 174 9.7.1 Reduction of the cause tree 174 9.7.2 Comments 176 9.8 Quantitative analysis 177 9.8.1 Unrepairable system 178 9.8.2 Repairable system 183 9.8.3 Use of specific logic gates 188 9.8.4 Interdependencies of basic events 188 9.9 Analysis of phased-mission systems 189 9.9.1 Availability computation 190 9.9.2 Reliability computation 191 9.10 Method application 194 9.11 Summary 194 References 195 10 Truth table method (TTM) 197 10.1 Introduction 197 10.2 Principles 197 10.3 Illustrative exercise 199 10.4 Summary 200 11 Gathered fault combination method (GFCM) 202 11.1 Introduction 202 11.2 Presentation of the method 202 11.3 Main characteristics of the method 207 11.3.1 Grouping of faults with identical effects 207 11.3.2 Criteria for selecting failure combinations 208 11.3.3 Allowance for interactions between elementary systems 209 11.3.4 Inductive and deductive nature of the method 209 11.3.5 Analysis simplification 210 11.4 Main characteristics of internal, external and global gathered faults 210 11.4.1 Characteristics of internal gathered faults 210 11.4.2 Characteristics of external gathered faults 211 11.4.3 Characteristics of global gathered faults 211 11.5 Illustrative exercise 212 11.5.1 Determination of internal gathered faults 212 11.5.2 Determination of external and global gathered faults 214 11.5.3 Conclusions 215 11.6 Theoretical foundations of the method 216 11.6.1 Advantages of the two criteria 216 11.6.2 Fault combination effects-effects base 218 11.6.3 External and global gathered faults 220 11.6.4 GFCM and truth table reduction 221 11.6.5 Global gathered faults and minimal cut sets 222 11.6.6 Direct and indirect effects 223 11.7 Algorithm for the implementation of the GFCM 224 11.7.1 FMEA of elementary systems 224 11.7.2 Identification of internal gathered faults 225 11.7.3 Identification of external gathered faults 227 11.7.4 Identification of global gathered faults 229 11.7.5 Result analysis 230 11.8 Quantitative analysis 230 11.9 Method implementation 231 11.9.1 Analysis of a system 231 11.9.2 Analysis of a group of interacting elementary systems 232 11.10 Summary 233 References 233 12 Consequence tree method (CQTM) 234 12.1 Introduction 234 12.2 Description of the consequence tree method 235 12.3 Construction of consequence trees 237 12.3.1 Deductive approach 237 12.3.2 Inductive approach 243 12.4 Theoretical foundations of the consequence tree methodology 246 12.4.1 Initiating event and sequences 246 12.4.2 Reduction and development of the consequence tree 247 12.4.3 Boolean reduction 249 12.5 Illustrative exercise 251 12.5.1 Deductive approach 252 12.5.2 Inductive approach 255 12.6 Quantitative analysis 256 12.6.1 Independent events 257 12.6.2 Dependent events 258 12.7 Summary 262 References 262 13 Cause-consequence diagram method (CCDM) 264 13.1 Introduction 264 13.2 Description of the method 264 13.3 Construction of the cause-consequence diagram 267 13.4 Illustrative exercise 269 13.5 Connections with other methods 272 13.6 Use of the method 273 13.7 Summary 273 References 273 14 State-space method (SSM) 275 14.1 Introduction 275 14.2 Principles of the method 275 14.3 Availability of repairable systems 278 14.3.1 System state equations 278 14.3.2 Solution 279 14.3.3 Mean state duration and asymptotic frequency of encountering given states 283 14.3.4. Minimal cut sets and states 286 14.3.5. State sequences 289 14.4 Reliability of repairable systems 291 14.4.1 System state equations and solution 291 14.4.2 Minimal operating states method 295 14.4.3 State sequence method 297 14.4.4 Minimal cut sets and states 301 14.5 Repairable system maintainability 304 14.6 Reliability, availability and maintainability of simple repairable systems 306 14.6.1 Time-dependent availability and reliability 306 14.6.2 Measures for a series system 306 14.6.3 Measures for a parallel system 314 14.6.4 Measures for a parallel m/n system 319 14.7 Reliability, availability and maintainability of large repairable systems 320 14.8 Non-Markovian processes 322 14.9 Semi-Markovian processes 323 14.9.1 Calculating availability 324 14.9.2 Calculating reliability 325 14.9.3 Calculating the number of passages through the states 326 14.9.4 Examples 328 14.10 Consequence trees and semi-Markovian processes 330 14.11 Summary 334 References 334 Appendix 1 Main definitions 336 Index 1-1
Foreword by Paul Caseau xv Foreword by Arnould d'Harcourt xvii Acknowledgements ixx Preface xxi Main Notations xxv PART 3 SPECIFIC METHODS 365 15 Dependent and common-cause failures 367 15.1 Introduction 367 15.2 lnterdependencies between failures 368 15.3 Examples of critical dependencies among failures in the aeronautical, oil and nuclear industries 370 15.4 Common-cause and cascade failures 373 15.4.1 History of these concepts and their introduction 373 15.4.2 Definitions 374 15.4.3 Dependent failures and primary, secondary and command failures 375 15.5 Classification of common-cause failures according to their nature 376 15.5.1 Environmental hazards 378 15.5.2 Design errors 379 15.5.3 Manufacturing errors 382 15.5.4 Assembly errors 383 15.5.5 Operating errors 383 15.6 Using event reports to study common-cause failures 384 15.7 Dependent failure assessment methods 387 15.7.1 Main assessment methods 388 15.7.2 Other methods 391 15.8 Dependent and common-cause failure probabilities 394 15.8.1 Explicit methods 394 15.8.2 Parametric methods 395 15.9 Importance of these failures and means to prevent them 400 References 402 16 Human factors 405 16.1 Introduction 405 16.2 Historical overview 405 16.3 Behaviour of the human operator 407 16.3.1 Functioning of the human operator 407 16.3.2 Some outstanding characteristics 411 16.3.3 Classifying tasks 413 16.3.4 Human errors 413 16.4 Major concepts 418 16.5 Phases of a human reliability assessment 421 16.5.1 Identification of potential human errors 422 16.5.2 Selecting significant errors 423 16.5.3 Detailed analysis of significant errors 424 16.5.4 Integration with system modelling 425 16.5.5 Quantification 426 16.5.6 Presentation of the approach used and its results 426 16.5.7 Some remarks 426 16.6 Quantification models 427 16.6.1 TESEO 427 16.6.2 THERP 428 16.6.3 HCR 432 16.6.4 Simulation models 432 16.6.5 Conclusions 433 16.7 Data 433 16.7.1 Data collection 434 16.7.2 Data banks 437 16.7.3 Range of values 439 References 441 17 Mechanics 446 17.1 Introduction 446 17.2 General considerations 447 17.3 Stress and strength 448 17.4 Fatigue 449 17.4.1 Statistical approach 449 17.4.2 Probabilistic fracture mechanics 453 17.5 Dependability of mechanical systems 457 References 458 18 Software domain 461 18.1 Introduction 461 18.2 Concepts 462 18.3 Main features of the software system 466 18.3.1 Life-cycle stages 466 18.3.2 Software testing 467 18.3.3 Quality assurance 469 18.3.4 Reliability 470 18.4 First indications on software reliability 471 18.4.1 Complexity measurements 471 18.4.2 Test-related measurements 473 18.5 Software reliability prediction 474 18.5.1 Introduction to models 474 18.5.2 'Perfect debugging' model 475 18.5.3 'Imperfect debugging' model 480 18.5.4 'Random debugging' model 482 18.5.5 'Bugs with different occurrence rates' model 483 18.5.6 Parametric models 484 18.5.7 Model validation 484 18.5.8 Conclusions 487 18.6 Reliability of a software system 487 18.7 Availability and maintainability of a software system 491 18.8 Fault-tolerant software systems 494 18.8.1 Architecture 494 18.8.2 Reliability 495 18.8.3 Examples 496 18.9 Computer system dependability 497 References 498 19 Assessing safety 504 19.1 Introduction 504 19.2 The risk concept 505 19.3 The actual risks 506 19.3.1 Risks from human and natural sources 506 19.3.2 Risk acceptance 508 19.3.3 Risk perception 511 19.4 Probabilistic risk assessment (PRA) 514 19.4.1 Nuclear industry 514 19.4.2 Petrochemistry 523 19.5 Quantified safety goals 529 19.5.1 Principles of the approach 529 19.5.2 Aircraft 531 19.5.3 Chemical plants 534 19.5.4 Off-shore oil rigs 534 19.5.5 Nuclear power plants 536 References 541 PART 4 COMPUTERIZED METHODS 547 20 Computer codes for dependability assessment 549 20.1 Introduction 549 20.2 Cause tree analysis codes 550 20.2.1 Qualitative analysis computer codes 551 20.2.2 Quantitative analysis computer codes 553 20.2.3 Computer codes for qualitative and quantitative analysis 554 20.2.4 Computer codes for direct analysis 559 20.3 Codes for the consequence tree method 560 20.4 Codes for the state-space method 562 20.5 Codes for common-cause failure analysis 564 20.6 Codes for uncertainty evaluation 565 20.7 Monte Carlo simulation codes 566 20.7.1 Principle 567 20.7.2 Distribution generation 568 20.7.3 Simulation characteristics 569 20.7.4 Application 569 20.8 Miscellaneous programs 569 References 572 21 Automatic assessment of dependability 679 21.1 Introduction 579 21.2 Modelling by logical operators and inductive analysis: the GO program 580 21.2.1 Introduction 580 21.2.2 Modelling by logic operators 580 21.2.3 The analysis principles 582 21.2.4 The GO program 583 21.3 Modelling by decision tables and cause tree construction: the CAT program 583 21.3.1 Introduction 583 21.3.2 Modelling with decision tables 584 21.3.3 Cause tree construction 587 21.3.4 The CAT program 588 21.4 Electronic gate-based modelling and failure simulation: ESCAF and S.ESCAF systems 589 21.4.1 Introduction 589 21.4.2 Modelling and analysis principles 590 21.4.3 ESCAF 591 21.4.4 S.ESCAF 592 21.5 Using expert systems for modelling: the EXPRESS program 593 21.5.1 Introduction 593 21.5.2 How does an expert system work? 594 21.5.3 Example of a system reliability analysis 596 21.5.4 Prospects 600 21.6 Conclusion 600 References 600 PART 5 CONCLUSIONS 603 22 Dependability assessment approach 605 22.1 Introduction 605 22.2 Advantages and drawbacks of the various analysis methods 605 22.3 Comparison of the various analysis methods 609 22.3.1 Inherent features 609 22.3.2 System-dependent features 611 22.4 Criteria for choosing the methods 612 22.5 Dependability assessment: approach definition 613 22.6 Limitations of dependability assessment 615 22.6.1 Limitations of the qualitative assessment 615 22.6.2 Limitations of the quantitative assessment 616 22.7 Validation of the dependability assessment 617 22.8 Organization and management of dependability assessment 619 References 620 23 Dependability assessment application 621 23.1 Introduction 621 23.2 System design 622 23.2.1 Dependability assessment in design 622 23.2.2 Deterministic design and probabilistic design 623 23.3 System operation 625 References 626 PART 6 CASE STUDIES 627 24 Analysis of a set of elementary systems by different methods 629 24.1 Goals 629 24.2 Presentation of the system 629 24.2.1 General description 630 24.2.2 Detailed description of the elementary systems 631 24.3 Failure modes and effects analysis 633 24.4 Cause tree method 645 24.4.1 Beginning of the construction 645 24.4.2 Cause tree 648 24.5 Gathered fault combination method 653 24.5.1 Identification of internal gathered faults 653 24.5.2 Identification of external and global gathered faults 653 24.5.3 Undesirable event 662 24.5.4 Discussion 663 24.6 Consequence tree method 664 24.6.1 Initiating events 664 24.6.2 Consequence tree 665 24.6.3 Links with the gathered fault combination method 666 24.7 Cause-consequence diagram method 667 24.8 Quantitative analysis 670 24.8.1 Occurrence probability of the undesirable event 670 24.8.2 Conclusions; lessons 672 24.8.3 Quantitative analysis method: discussion 673 References 674 25 Human reliability assessment 676 25.1 Presentation of the example 676 25.2 Identification of potential human errors 676 25.3 Selecting significant errors 678 25.4 Detailed analysis of significant errors 678 25.5 Integration with system modelling 679 25.6 Quantification 680 25.7 Results 682 APPENDICES 685 Appendix 1 Main definitions 687 Appendix 2 Availability of a component and a system on standby and periodically tested 715 A2.1 Introduction 715 A2.2 Availability of a component on standby and periodically tested 715 A2.3 Mean availability 721 A2.4 Optimizing intervals between tests 723 A2.5 Availability of a system on standby and periodically tested 724 References 724 Appendix 3 Importance factors 725 A3.1 Introduction 725 A3.2 Importance factors 725 References 727 Appendix 4 Assessment of uncertainties 728 A4.1 Introduction 728 A4.2 Processing of uncertainties 729 A4.3 Propagation of uncertainties 730 A4.3.1 Monte Carlo-type simulation 730 A4.3.2 Moments method 731 References 732 Appendix 5 Operating rules for cases where a safety system is observed to be unavailable 733 A5.1 Introduction 733 A5.2 Method 734 A5.3 Examples of application 735 A5.3.1 Aeronautical applications 735 A5.3.2 Nuclear applications 736 References 737 Appendix 6 Semi-Markovian process: MUT, M17R and M17F 739 A6.1 MUT, MTTR and MTTF calculation 739 A6.2 Asymptotic frequency method 741 A6.3 Asymptotic duration method 742 A6.4 Examples 743 Reference 744 Appendix 7 International standards on dependability 745
Go to: other books | resource page
Philip Koopman: koopman@cmu.edu