OPTIMAL RELIABILITY MODELING
OPTIMAL RELIABILITY MODELING Principles and Applications
WAY KUO Texas A&M University
MING J. ZUO The University of Alberta
JOHN WILEY & SONS, INC.
This book is printed on acid-free paper.
∞
c 2003 by John Wiley & Sons, Inc. All rights reserved Copyright Published by John Wiley & Sons, Inc., Hoboken, New Jersey Published simultaneously in Canada No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, e-mail:
[email protected]. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books. Library of Congress Cataloging-in-Publication Data: Kuo, Way, 1951– Optimal reliability modeling : principles and applications / Way Kuo, Ming J. Zuo. p. cm. ISBN 0-471-39761-X (acid-free paper) 1. Reliability (Engineering)—Mathematical models. I. Zuo, Ming J. II. Title. TA169 .K86 2002 2002005287 620 .00452—DC21 Printed in the United States of America 10 9 8 7 6 5 4 3 2 1
CONTENTS
Preface
xi
Acknowledgments
xv
1 Introduction
1
1.1 1.2
Needs for Reliability Modeling, 2 Optimal Design, 3
2 Reliability Mathematics 2.1
2.2 2.3 2.4
2.5
5
Probability and Distributions, 5 2.1.1 Events and Boolean Algebra, 5 2.1.2 Probabilities of Events, 8 2.1.3 Random Variables and Their Characteristics, 11 2.1.4 Multivariate Distributions, 16 2.1.5 Special Discrete Distributions, 20 2.1.6 Special Continuous Distributions, 27 Reliability Concepts, 32 Commonly Used Lifetime Distributions, 35 Stochastic Processes, 40 2.4.1 General Definitions, 40 2.4.2 Homogeneous Poisson Process, 41 2.4.3 Nonhomogeneous Poisson Process, 43 2.4.4 Renewal Process, 44 2.4.5 Discrete-Time Markov Chains, 46 2.4.6 Continuous-Time Markov Chains, 50 Complex System Reliability Assessment Using Fault Tree Analysis, 58 v
vi
CONTENTS
3 Complexity Analysis 3.1 3.2 3.3 3.4
3.5
Orders of Magnitude and Growth, 63 Evaluation of Summations, 69 Bounding Summations, 73 Recurrence Relations, 75 3.4.1 Expansion Method, 77 3.4.2 Guess-and-Prove Method, 80 3.4.3 Master Method, 82 Summary, 83
4 Fundamental System Reliability Models 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9
4.10
4.11 4.12 4.13
5.4 5.5
85
Reliability Block Diagram, 86 Structure Functions, 87 Coherent Systems, 90 Minimal Paths and Minimal Cuts, 93 Logic Functions, 96 Modules within a Coherent System, 97 Measures of Performance, 100 One-Component System, 105 Series System Model, 107 4.9.1 System Reliability Function and MTTF, 107 4.9.2 System Availability, 110 Parallel System Model, 112 4.10.1 System Reliability Function and MTTF, 112 4.10.2 System Availability of Parallel System with Two i.i.d. Components, 114 4.10.3 System Availability of Parallel System with Two Different Components, 118 4.10.4 Parallel Systems with n i.i.d. Components, 122 Parallel–Series System Model, 124 Series–Parallel System Model, 127 Standby System Model, 129 4.13.1 Cold Standby Systems, 130 4.13.2 Warm Standby Systems, 137
5 General Methods for System Reliability Evaluation 5.1 5.2 5.3
62
140
Parallel and Series Reductions, 141 Pivotal Decomposition, 145 Generation of Minimal Paths and Minimal Cuts, 148 5.3.1 Connection Matrix, 148 5.3.2 Node Removal Method for Generation of Minimal Paths, 149 5.3.3 Generation of Minimal Cuts from Minimal Paths, 152 Inclusion–Exclusion Method, 153 Sum-of-Disjoint-Products Method, 157
CONTENTS
5.6
5.7
5.8
Markov Chain Imbeddable Structures, 164 5.6.1 MIS Technique in Terms of System Failures, 165 5.6.2 MIS Technique in Terms of System Success, 170 Delta–Star and Star–Delta Transformations, 171 5.7.1 Star or Delta Structure with One Input Node and Two Output Nodes, 173 5.7.2 Delta Structure in Which Each Node May Be either an Input Node or an Output Node, 178 Bounds on System Reliability, 180 5.8.1 IE Method, 181 5.8.2 SDP Method, 182 5.8.3 Esary–Proschan (EP) Method, 183 5.8.4 Min–Max Bounds, 185 5.8.5 Modular Decompositions, 186 5.8.6 Notes, 187
6 General Methodology for System Design 6.1 6.2
6.3
6.4 6.5 6.6 6.7 6.8 6.9 6.10
7.2
188
Redundancy in System Design, 189 Measures of Component Importance, 192 6.2.1 Structural Importance, 192 6.2.2 Reliability Importance, 193 6.2.3 Criticality Importance, 195 6.2.4 Relative Criticality, 197 Majorization and Its Application in Reliability, 199 6.3.1 Definition of Majorization, 199 6.3.2 Schur Functions, 200 6.3.3 L-Additive Functions, 203 Reliability Importance in Optimal Design, 206 Pairwise Rearrangement in Optimal Design, 207 Optimal Arrangement for Series and Parallel Systems, 209 Optimal Arrangement for Series–Parallel Systems, 210 Optimal Arrangement for Parallel–Series Systems, 222 Two-Stage Systems, 227 Summary, 230
7 The k-out-of-n System Model 7.1
vii
231
System Reliability Evaluation, 232 7.1.1 The k-out-of-n:G System with i.i.d. Components, 233 7.1.2 The k-out-of-n:G System with Independent Components, 234 7.1.3 Bounds on System Reliability, 250 Relationship between k-out-of-n G and F Systems, 251 7.2.1 Equivalence between k-out-of-n:G and (n − k + 1)-out-of-n:F Systems, 251 7.2.2 Dual Relationship between k-out-of-n G and F Systems, 252
viii
CONTENTS
7.3
7.4
7.5
Nonrepairable k-out-of-n Systems, 255 7.3.1 Systems with i.i.d. Components, 256 7.3.2 Systems with Nonidentical Components, 258 7.3.3 Systems with Load-Sharing Components Following Exponential Lifetime Distributions, 258 7.3.4 Systems with Load-Sharing Components Following Arbitrary Lifetime Distributions, 262 7.3.5 Systems with Standby Components, 264 Repairable k-out-of-n Systems, 266 7.4.1 General Repairable System Model, 267 7.4.2 Systems with Active Redundant Components, 270 7.4.3 Systems with Load-Sharing Components, 276 7.4.4 Systems with both Active Redundant and Cold Standby Components, 277 Weighted k-out-of-n:G Systems, 279
8 Design of k-out-of-n Systems 8.1
8.2
8.3
8.4
8.5
8.6
281
Properties of k-out-of-n Systems, 281 8.1.1 Component Reliability Importance, 281 8.1.2 Effects of Redundancy in k-out-of-n Systems, 282 Optimal Design of k-out-of-n Systems, 285 8.2.1 Optimal System Size n, 286 8.2.2 Simultaneous Determination of n and k, 290 8.2.3 Optimal Replacement Time, 293 Fault Coverage, 294 8.3.1 Deterministic Analysis, 295 8.3.2 Stochastic Analysis, 299 Common-Cause Failures, 302 8.4.1 Repairable System with Lethal Common-Cause Failures, 303 8.4.2 System Design Considering Lethal Common-Cause Failures, 305 8.4.3 Optimal Replacement Policy with Lethal Common-Cause Failures, 308 8.4.4 Nonlethal Common-Cause Failures, 310 Dual Failure Modes, 311 8.5.1 Optimal k or n Value to Maximize System Reliability, 313 8.5.2 Optimal k or n Value to Maximize System Profit, 317 8.5.3 Optimal k and n Values to Minimize System Cost, 319 Other Issues, 321 8.6.1 Selective Replacement Optimization, 321 8.6.2 TMR and NMR Structures, 322 8.6.3 Installation Time of Repaired Components, 323 8.6.4 Combinations of Factors, 323 8.6.5 Partial Ordering, 324
CONTENTS
9 Consecutive-k-out-of-n Systems 9.1
9.2
9.3
9.4
9.5
10.2 10.3 10.4
11.5 11.6 11.7 11.8
384
System Reliability Evaluation, 386 10.1.1 Special Multidimensional Systems, 386 10.1.2 General Two-Dimensional Systems, 387 10.1.3 Bounds and Approximations, 391 System Logic Functions, 395 Optimal System Design, 396 Summary, 400
11 Other k-out-of-n and Consecutive-k-out-of-n Models 11.1 11.2 11.3 11.4
325
System Reliability Evaluation, 328 9.1.1 Systems with i.i.d. Components, 328 9.1.2 Systems with Independent Components, 339 Optimal System Design, 350 9.2.1 B-Importances of Components, 350 9.2.2 Invariant Optimal Design, 356 9.2.3 Variant Optimal Design, 361 Consecutive-k-out-of-n:G Systems, 363 9.3.1 System Reliability Evaluation, 363 9.3.2 Component Reliability Importance, 365 9.3.3 Invariant Optimal Design, 366 9.3.4 Variant Optimal Design, 369 System Lifetime Distribution, 369 9.4.1 Systems with i.i.d. Components, 370 9.4.2 System with Exchangeable Dependent Components, 372 9.4.3 System with (k − 1)-Step Markov-Dependent Components, 375 9.4.4 Repairable Consecutive-k-out-of-n Systems, 378 Summary, 383
10 Multidimensional Consecutive-k-out-of-n Systems 10.1
ix
The s-Stage k-out-of-n Systems, 401 Redundant Consecutive-k-out-of-n Systems, 405 Linear and Circular m-Consecutive-k-out-of-n Model, 405 The k-within-Consecutive-m-out-of-n Systems, 407 11.4.1 Systems with i.i.d. Components, 408 11.4.2 Systems with Independent Components, 411 11.4.3 The k-within-(r, s)/(m, n):F Systems, 416 Series Consecutive-k-out-of-n Systems, 424 Combined k-out-of-n:F and Consecutive-kc -out-of-n:F System, 429 Combined k-out-of-mn:F and Linear (r, s)/(m, n):F System, 432 Combined k-out-of-mn:F, One-Dimensional Con/kc /n:F, and Two-Dimensional Linear (r, s)/(m, n):F Model, 435
401
x
CONTENTS
11.9
Application of Combined k-out-of-n and Consecutive-k-out-of-n Systems, 436 11.10 Consecutively Connected Systems, 438 11.11 Weighted Consecutive-k-out-of-n Systems, 447 11.11.1 Weighted Linear Consecutive-k-out-of-n:F Systems, 447 11.11.2 Weighted Circular Consecutive-k-out-of-n:F Systems, 450 12 Multistate System Models 12.1
12.2 12.3 12.4
12.5 12.6
452
Consecutively Connected Systems with Binary System State and Multistate Components, 453 12.1.1 Linear Multistate Consecutively Connected Systems, 453 12.1.2 Circular Multistate Consecutively Connected Systems, 458 12.1.3 Tree-Structured Consecutively Connected Systems, 464 Two-Way Consecutively Connected Systems, 470 Key Concepts in Multistate Reliability Theory, 474 Special Multistate Systems and Their Performance Evaluation, 480 12.4.1 Simple Multistate k-out-of-n:G Model, 480 12.4.2 Generalized Multistate k-out-of-n:G Model, 482 12.4.3 Generalized Multistate Consecutive-k-out-of-n:F System, 490 General Multistate Systems and Their Performance Evaluation, 494 Summary, 502
Appendix: Laplace Transform
504
References
513
Bibliography
527
Index
539
PREFACE
Recent progress in science and technology has made today’s engineering systems more powerful than ever. The increasing level of sophistication in high-tech industrial processes implies that reliability problems will not only continue to exist but are likely to require ever more complex solutions. Furthermore, system failures are having more significant effects on society as a whole than ever before. Consider, for example, the impact of the failure or mismanagement of a power distribution system in a major city, the malfunction of an air traffic control system at an international airport, failure of a nanosystem, miscommunication in today’s Internet systems, or the breakdown of a nuclear power plant. As a consequence, the importance of reliability at all stages of modern engineering processes, including design, manufacture, distribution, and operation, can hardly be overstated. Today’s engineering systems are also complicated. For example, a space shuttle consists of hundreds of thousands of components. These components functioning together form a system. The reliable performance of the system depends on the reliable performance of its constituent components. In recent years, statistical and probabilistic models have been developed for evaluating system reliability based on the components’ reliability, the system design, and the assembly of the components. At the same time, we should pay close attention to the usefulness of these models. Some models and published books are too abstract to understand, and others are too basic to address solutions for today’s systems. System reliability models are the focus of this book. We have attempted to include many of the system reliability models that have been reported in the literature with emphasis on the more significant ones. The models extensively covered include parallel, series, standby, k-out-of-n, consecutive-k-out-of-n, multistate, and general system models, including some maintainable systems. For each model, we discuss the evaluation of exact system reliability, the development of bounds for system reliability approximation, extensions to dual failure modes and/or multistates, and optimal system design in terms of the arrangement of components. Both static and dynamic xi
xii
PREFACE
performance measures are discussed. Failure dependency among components within some systems is also addressed. In addition, we believe that this is the first time that multistate system reliability models are systematically introduced and discussed in a book. The result is a state-of-the-art manuscript for students, system designers, researchers, and teachers in reliability engineering. We provide unique interpretations of the existing reliability evaluation methods. In addition to presenting physical explanations for k-out-of-n and consecutive-k-outof-n models that have recently been developed, we also show how these evaluation methods for assessing system reliability can be applied to several other areas. These include (1) general network systems that have multistage failure modes (i.e., degradation), (2) tree and reverse tree structures that are widely found in computer software development, and (3) applications of stochastic processes where the concern is location instead of time (Markov chain imbeddable structures). Design issues are also extensively addressed in this book. Given the same quantity of resources, an optimal system design can lead to much higher system reliability. Furthermore, with a thorough understanding of its design, we can seek better ways to diagnose, maintain, and improve an existing system. Optimal design by analytical and heuristic methodologies is thoroughly discussed here as well. The book is organized as follows. An introduction is given in Chapter 1. Chapter 2 provides reliability mathematics. In Chapter 2, not only the traditional notion of probability and distributions are provided, but also fundamental stochastic processes related to reliability evaluation are addressed. Chapter 3 briefly discusses complexity analysis because it is useful in the analysis of algorithm efficiency. In later chapters, complexity analysis will serve as a base for comparisons of different algorithms for system reliability evaluation. Chapter 4 starts by introducing coherent reliability systems along with the structure function, coherent systems, minimal cuts and paths, logic functions, and performance measures. It then introduces the fundamental system reliability models, including parallel, series, combinations of parallel and series, and standby systems. General methodologies for system reliability evaluation are covered in Chapter 5. Commonly used system reliability evaluation techniques such as parallel–series reduction, pivotal decomposition, the inclusion–exclusion method, and the sum-ofdisjoint-products method are introduced. Techniques for generation of minimal paths and/or minimal cuts are also discussed. The delta–star and star–delta transformations are analyzed as tools for system reliability evaluation. A new technique that has recently been reported in the literature for system reliability evaluation utilizes the so-called Markov chain imbeddable structures that exist in many system structures. This new technique is introduced here. Methods for system reliability approximation are also included in Chapter 5. In Chapter 6, we introduce general methodologies for optimal system design. Various measures of component reliability importance are introduced. The concept of majorization that is useful for optimal system design is discussed. Applications of importance measures and majorization along with pairwise rearrangements of components are illustrated for the optimal design of series, parallel, and mixed series–parallel systems. One may be able to design an optimal reliability system by
PREFACE
xiii
examining a carefully selected importance measure. These general design methodologies will also be useful for the design of more complicated system structures that will be covered in later chapters. Chapters 7 and 8 focus on reliability evaluation and optimal design of the k-outof-n system model, respectively. Although the k-out-of-n system is a special one, it has unique properties that allow it to demonstrate the efficiency of various reliability evaluation techniques. In Chapter 7, we introduce four different techniques that have been used in the development of reliability evaluation algorithms for k-out-of-n systems with independent components. The relationship between an F and a G system is thoroughly examined. Also extensively analyzed is the performance of nonrepairable and repairable systems. In Chapter 8, under the context of optimal design, we cover topics such as component reliability importance, imperfect fault coverage, commoncause failure, and dual failure modes. In recent years, the consecutive-k-out-of-n systems have been extensively studied. Chapter 9 covers the consecutive-k-out-of-n models and interpretations of these models when applied to a number of existing problems that would be difficult to handle otherwise. Specifically, we introduce both linear and circular systems and those with nonidentical components as well as approximations and bounds with various lifetime distributions. In this chapter, we present the notion of optimal configuration and invariant optimal configuration for both the F and G systems. We believe that readers should gain a thorough understanding of the newly developed paradigms being applied to consecutive-k-out-of-n systems and their optimal design. Chapter 10 gives results on multidimensional consecutive-k-out-of-n models and optimal design for such systems, including some time-dependent situations. Chapter 11 focuses on the combined k-out-of-n and consecutive-k-out-of-n models, including issues of both system reliability evaluation and optimal design. A case study on applying these combined k-out-of-n and consecutive-k-out-of-n models in remaining life estimation of a hydrogen furnace in a petrochemical company is included in this chapter. Other extended and related system models are briefly outlined in the general discussions presented in the previous chapters. Many modern systems do not simply work or fail. Instead, they may experience degraded levels of performance before a complete failure is observed. Multistate system models allow both the system and its components to have more than two possible states. Chapter 12 provides coverage of multistate system reliability models. In this chapter, we first discuss consecutively connected systems and two-way communication systems wherein the system is binary while the components are multistate. Then we extend some of the concepts used in binary system reliability theory, such as relevancy, coherency, minimal path vector, minimal cut vector, and duality into the multistate context. Some special multistate system reliability models are then introduced. Finally, methods for performance evaluation of general multistate systems are discussed. The new topics and unique features of this book on optimal system reliability modeling include 1. complexity analysis, which provides background knowledge on efficiency comparison of system reliability evaluation algorithms;
xiv
PREFACE
2. Markov chain imbeddable structures, which is another effective tool for system reliability analysis; 3. majorization, which is a powerful tool for the development of invariant optimal designs for some system structures; 4. multistate system reliability theory, which is systematically introduced for the first time in a text on engineering system reliability analysis; and 5. applications of the k-out-of-n and the consecutive-k-out-of-n system models in remaining life estimation. This book provides the reader with a complete picture of reliability evaluation and optimal system design for many well-studied system structures in both the binary and the multistate contexts. Based on the comparisons of computational complexities of the algorithms presented in this book, users can determine which evaluation methods can be most efficiently applied to their own problems. The book can be used as a handbook for practicing engineers. It includes the latest results and the most comprehensive algorithms for system reliability analysis available in the literature as well as for the optimal design of the various system reliability models. This book can serve as an advanced textbook for graduate students wishing to study reliability for the purpose of engaging in research. We outline various mathematical tools and approaches that have been used successfully in research on system reliability evaluation and optimal design. In addition, a primer on complexity analysis is included. With the help of complexity metrics, we discuss how to analyze and determine the right algorithm for optimal system design. The background required for comprehending this textbook includes only calculus, basic probability theory, and some knowledge of computer programming. There are 263 cited references and an additional 244 entries in the bibliography that are related to the material presented in this book. Way Kuo Texas A&M University Ming J. Zuo The University of Alberta
ACKNOWLEDGMENTS
We acknowledge the National Science Foundation, Army Research Office, Office of Naval Research, Air Force Office for Scientific Research, National Research Council, Fulbright Foundation, Texas Advanced Technology Program, Bell Labs, Hewlett Packard, and IBM for their funding of W. Kuo’s research activities over the past 25 years. We also acknowledge the Natural Sciences and Engineering Research Council of Canada (NSERC), University Grants Council of Hong Kong, and Syncrude Canada Ltd. for their support of M. J. Zuo’s research activities over the past 15 years. This manuscript grows from the authors’ collaborative and individual research and development projects, supported in part by the above agencies. The first draft of this book has been examined by Chunghun Ha, Wen Luo, and Jung Yoon Hwang of Texas A&M University. We are very grateful to their valuable suggestions and criticisms regarding reorganization and presentation of the materials. We acknowledge input to this manuscript from Jinsheng Huang of the University of Alberta, Kyungmee O. Kim of Texas A&M University, and Chang Woo Kang of Corning. Mary Ann Dickson and the Wiley editorial staff edited the manuscript. Dini S. Sunardi made significant effort in formatting the original LATEX files. Shiang Lee, Linda Malie, Fan Jiang, Mobin Akhtar, Jing Lin, Xinhao Tian, and Martin Agelinchaab have helped with checking the references. Lona Houston handled the correspondence. In the book, we try hard to give due credit to those who have contributed to the topics addressed. We apologize if we have inadvertently overlooked specific topics and other contributors. We have obtained permission to use material from the following IEEE Transactions on Reliability: M. J. Phillips, “k-out-of-n:G systems c 1980 are preferable,” IEEE Transactions on Reliability, R-29(2): 166–169, 1980, IEEE; T. K. Boehme, A. Kossow, and W. Preuss, “A generalization of consecutivek-out-of-n:F system,” IEEE Transactions on Reliability, R-41(3): 451–457, 1992, c 1992 IEEE; M. Zuo, “Reliability of linear and circular consecutively-connected xv
xvi
ACKNOWLEDGMENTS
c 1993 IEEE; systems,” IEEE Transactions on Reliability, R-42(3): 484–487, 1993, M. Zuo, “Reliability and design of 2-dimensional consecutive-k-out-of-n systems,” c 1993 IEEE; J. S. Wu IEEE Transactions on Reliability, R-42(3): 488–490, 1993, and R. J. Chen, “Efficient algorithms for k-out-of-n & consecutive-weighted-k-outc 1994 of-n:F system,” IEEE Transactions on Reliability, R-43(4): 650–655, 1994, IEEE; M. Zuo, D. Lin, and Y. Wu, “Reliability evaluation of combined k-out-of-n:F, consecutive-kc -out-of-n:F and linear connected-(r ,s)-out-of-(m,n):F system strucc 2000 IEEE; tures,” IEEE Transactions on Reliability, R-49(1): 99–104, 2000, J. Huang, M. J. Zuo, and Y. H. Wu, “Generalized multi-state k-out-of-n:G systems,” c 2000 IEEE. IEEE Transactions on Reliability, R-49(1): 105–111, 2000, Permissions are granted for use of materials from V. R. Prasad, K. P. K. Nair, and Y. P. Aneja, “Optimal assignment of components to parallel-series and seriesc 1991, INFORMS; parallel systems,” Operations Research, 39(3): 407–414, 1991, M. V. Koutras, “On a Markov chain approach for the study of reliability structures,” c 1996, The Applied ProbabilJournal of Applied Probability, 33:357–367, 1996, ity Trust; J. Malinowski and W. Preuss, “Reliability evaluation for tree-structured systems with multi-state components,” Microelectronics and Reliability, 36(1): 9– c 1996, Elsevier Science; J. Malinowski and W. Preuss, “Reliability of 17, 1996, reverse-tree-structured systems with multi-state components,” Microelectronics and c 1996, Elsevier Science; J. Shen and M. J. Zuo, Reliability, 36(1): 1–7, 1996, “Optimal design of series consecutive-k-out-of-n:G systems,” Reliability Engineerc 1994, Elsevier Science; M. J. Zuo and ing and System Safety, 45: 277–283, 1994, M. Liang, “Reliability of multistate consecutively-connected systems,” Reliability c 1994, Elsevier Science; Y. L. Engineering and System Safety, 44: 173–176, 1994, Zhang, M. J. Zuo, and R. C. M. Yam, “Reliability analysis for a circular consecutive2-out-of-n:F repairable system with priority in repair,” Reliability Engineering and c 2000, Elsevier Science; and M. J. Zuo, “ReSystem Safety, 68: 113–120, 2000, liability and component importance of a consecutive-k-out-of-n system,” Microelecc 1993, Elsevier Science. tronics and Reliability, 33(2): 243–258, 1993,
1 INTRODUCTION
Reliability is the probability that a system will perform satisfactorily for at least a given period of time when used under stated conditions. Therefore, the probability that a system successfully performs as designed is called “system reliability,” or the “probability of survival.” Often, unreliability refers to the probability of failure. System reliability is a measure of how well a system meets its design objective. A system can be characterized as a group of stages or subsystems integrated to perform one or more specified operational functions. In describing the reliability of a given system, it is necessary to specify (1) the failure process, (2) the system configuration that describes how the system is connected and the rules of operation, and (3) the state in which the system is defined to be failed. The failure process describes the probability law governing those failures. The system configuration, on the other hand, defines the manner in which the system reliability function will behave. The third consideration in developing the reliability function for a nonmaintainable system is to define the conditions of system failure. Other measures of performance include failure rate, percentile of system life, mean time to failure, mean time between failures, availability, mean time between repairs, and maintainability. Depending on the nature and complexity of the system, some measures are better used than others. For example, failure rate is widely used for single-component analysis and reliability is better used for large-system analysis. For a telecommunication system, mean time to failure is widely used, but for a medical treatment, survivability (reliability) is used. In reliability optimization, the maximization of percentile life of a system is another useful measure of interest to the system designers, according to Prasad et al. [196]. For man–machine systems, 1
2
INTRODUCTION
Abbas and Kuo [1] and Rupe and Kuo [207] report stochastic modeling measures that go beyond reliability as it is traditionally defined.
1.1 NEEDS FOR RELIABILITY MODELING Many of today’s systems, hardware and software, are large and complex and often have special features and structures. To enhance the reliability of such systems, one needs to access their reliability and other related measures. Furthermore, the system concept extends to service systems and supply chain systems for which reliability and accuracy are an important goal to achieve. There is a need to present state-ofthe-art optimal modeling techniques for such assessments. Recent progress in science and technology has made today’s engineering systems more powerful than ever. The increasing level of sophistication in high-tech industrial processes implies that reliability problems not only will continue to exist but also are likely to require ever more complex solutions. Furthermore, reliability failures are having more significant effects on society as a whole than ever before. Consider, for example, the impact of the failure or mismanagement of a power distribution system in a major city, the malfunction of an air traffic control system at an international airport, failure of a nanosystem, miscommunication in today’s Internet systems, or the breakdown of a nuclear power plant. The importance of reliability at all stages of modern engineering processes, including design, manufacture, distribution, and operation, can hardly be overstated. Today’s engineering systems are also complicated. For example, a space shuttle consists of hundreds of thousands of components. These components functioning together form a system. The reliable performance of the system depends on the reliable performance of its constituent components. In recent years, statistical and probabilistic models have been developed for evaluating system reliability based on component reliability, the system design, and the assembly of the components. At the same time, we should pay close attention to the usefulness of these models. Some models and published books are too abstract to understand and others are too basic to address solutions for today’s systems. System reliability models are the focus of this book. We have attempted to include all of the system reliability models that have been reported in the literature with emphasis on the significant ones. The models extensively covered include parallel, series, standby, k-out-of-n, consecutive-k-out-of-n, multistate, and general system models, including some maintainable systems. For each model, we discuss the evaluation of exact system reliability, development of bounds for system reliability approximation, extensions to dual failure modes and/or multistates, and optimal system design in terms of arrangement of components. Both static and dynamic performance measures are discussed. Failure dependency among components within some systems is also addressed. In addition, we believe that this is the first time that multistate system reliability models have been systematically introduced and discussed in a book. The result is a state-of-the-art reference manuscript for students, system designers, researchers, and teachers of reliability engineering.
OPTIMAL DESIGN
3
1.2 OPTIMAL DESIGN Many modern systems do not simply work or fail. Instead, they may experience degraded levels of performance before a complete failure is observed. Multistate system models allow both the system and its components to have more than two possible states. In addition to special multistate system reliability models, methods for performance evaluation of general multistate systems are discussed. The new topics and unique features on optimal system reliability modeling in this book include
1. complexity analysis, which provides background knowledge on efficiency comparison of system reliability evaluation algorithms; 2. Markov chain imbeddable structures, which is another effective tool for system reliability analysis; 3. majorization, which is another powerful tool for the development of invariant optimal designs for some systems; 4. multistate system reliability theory, which is systematically introduced for the first time in a text on engineering system reliability analysis; and 5. applications of the k-out-of-n and the consecutive-k-out-of-n system models in remaining life estimation.
In the past half of a century, numerous well-written books on reliability have become available. Among the system-oriented reliability texts, refer to Barlow and Proschan [22] for a theoretical foundation and to Schneeweiss [220] and Kapur and Lamberson [114] for a practical engineering approach. The primary goal of the reliability engineer has always been to find the best way to increase system reliability. According to Kuo et al. [132], accepted principles for doing this include (1) keeping the system as simple as is compatible with the performance requirements; (2) increasing the reliability of the components in the system; (3) using parallel redundancy for the less reliable components; (4) using standby redundancy, which is switched to active components when failure occurs; (5) using repair maintenance where failed components are replaced but not automatically switched in, as in 4; (6) using preventive maintenance such that components are replaced by new ones whenever they fail or at some fixed interval, whichever comes first; (7) using better arrangements for exchangeable components; (8) using large safety factors or a product improvement management program; and (9) using burn-in for components that have high infant mortality. Implementation of the above steps to improve system reliability will normally consume resources. A balance between system reliability and resource consumption is essential. All of these nine methods to enhance system reliability are based on a solid understanding of the system and system reliability modeling. This book provides the reader with a complete picture of reliability evaluation and optimal system design for many well-studied system structures under both the binary
4
INTRODUCTION
and the multistate contexts. Based on the comparisons of computational complexities of the algorithms presented in this book, users can determine which evaluation methods can be more efficiently applied to their own problems. The book includes the latest results and the most comprehensive algorithms for system reliability analysis available in the literature as well as optimal designs of the various system reliability models.
2 RELIABILITY MATHEMATICS
In this chapter, we introduce the mathematical concepts and techniques that are relevant to reliability analysis. We first cover the basic concepts of probability, the characteristics of random variables and commonly used discrete and continuous distributions. The definitions of reliability and of commonly used lifetime distributions are discussed. Stochastic processes are also introduced here. Finally, we explain how to assess the reliability of complex system using fault tree analysis. 2.1 PROBABILITY AND DISTRIBUTIONS 2.1.1 Events and Boolean Algebra A process of observation or measurement is often referred to as a statistical experiment in statistics terminology. Examples of an experiment include counting the number of visitors to a theme park during a day, measuring the temperature at a specific point of a piece of machinery, and flipping a coin. Each experiment has a set of all possible outcomes, which is called the sample space and denoted by S. The following lists three experiments and their sample spaces: 1. Counting the number of visitors to a theme park: S = {0, 1, 2, . . . }. 2. Measuring the temperature of machinery: S = {any real number}. 3. Flipping a coin: S = {head, tail}. When conducting an experiment, we are often interested in knowing whether the outcome is in a subset of the sample space. This subset may represent desired outcomes or undesired outcomes. We will use event to represent the set of outcomes that is of interest. For example, we may be interested in the following events: 5
6
RELIABILITY MATHEMATICS
S
E1
FIGURE 2.1
E2
Venn diagram showing that events E 1 and E 2 are disjoint.
1. A = The number of visitors to a theme park is greater than 4000. 2. B = The machine temperature is between 40◦ C and 60◦ C. 3. C = The coin is head. An event has occurred if the outcome of the experiment is included in the set of outcomes of the event. For a specific experiment, we may be interested in more than one event. For example, we may be interested in the event, denoted by E 1 , that the measured machine temperature is between 40 and 60◦ C and the event, denoted by E 2 , that it is above 100◦ C. To illustrate the relationship among the sample space S and events E 1 and E 2 , we often use the so-called Venn diagram as shown in Figure 2.1. We use a rectangle to represent the sample space and circles to represent events. All events must be subsets of the sample space. Based on our definitions of E 1 and E 2 , these two events cannot occur simultaneously. In other words, for a measured temperature value, if it is in E 1 , then it cannot be in E 2 , and vice versa. Two events are defined to be mutually exclusive or disjoint if they cannot occur simultaneously or if they do not have any outcome in common. Figure 2.1 shows that events E 1 and E 2 are disjoint. The union of two events A and B includes all outcomes that are either in A, or in B, or in both. We use A ∪ B to indicate the union of events A and B. If we write C = A ∪ B, then we say that event C occurs if and only if at least one of the two events A and B occurs. In Figure 2.2, the shaded area represents the union of events A and B. The intersection of two events A and B includes all outcomes that are in both A and B. We use A ∩ B, or AB for simplicity, to indicate the intersection of A and B. If we write C = A ∩ B or C = AB, then event C occurs if and only if both events A and B occur. The shaded area in Figure 2.3 represents the intersection of events A and B.
S
A
FIGURE 2.2
B
Venn diagram showing union of events A and B.
PROBABILITY AND DISTRIBUTIONS
7
S
A
FIGURE 2.3
B
Venn diagram showing intersection of events A and B.
For a given event E, its complement, denoted by E, indicates that event E does not occur. Here, E includes all outcomes that are in the sample space S but not in event E. For example, if E represents the event that the number of visitors to a theme park is greater than 4000, then E represents the event that the number of visitors to the theme park is no more than 4000. It is clear that any event and its complement together comprise the whole sample space. We usually use ∅ to indicate an empty set. General operations on events, including unions, intersections, and complements, are governed by a set of rules called the laws of Boolean algebra, which are summarized below: •
Commutative law: A ∪ B = B ∪ A,
•
Associative law: (A ∪ B) ∪ C = A ∪ (B ∪ C),
•
•
A ∪ S = S,
A ∩ S = A,
A ∪ ∅ = A,
A ∩ ∅ = ∅.
Complementation law: A ∩ A = ∅,
A = A.
Idempotent law: A ∪ A = A,
•
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C).
Identity law:
A ∪ A = S, •
(A ∩ B) ∩ C = A ∩ (B ∩ C).
Distributive law: A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C),
•
A ∩ B = B ∩ A.
A ∩ A = A.
De Morgan’s law: A ∪ B = A ∩ B,
A ∩ B = A ∪ B.
8
RELIABILITY MATHEMATICS
2.1.2 Probabilities of Events There are three approaches to measurement of the probability of a certain event. Each has its advantages and areas of applications. In the following, we briefly describe each of them. •
•
•
Equally Likely Approach. This approach applies when the total number of possible outcomes of an experiment is finite and each possible outcome has an equal chance of being observed. If an event of interest includes n possible outcomes and the sample space has N possible outcomes, then the probability for this event to occur is given by the ratio n/N . This approach finds wide application in games of chance and making selections based on the generation of random variables. For example, names are selected randomly in a poll and numbers are selected randomly in a lottery. This approach cannot be used when the possible outcomes of an experiment are not equally likely or the number of possible outcomes is infinite. For example, is it going to rain tomorrow? What will be tomorrow’s highest temperature reading? These questions cannot be answered with this approach. This approach has limited applications in engineering reliability analysis. Frequency Approach. According to this approach, the probability of an event is the proportion of occurrences of the event under similar conditions in the long run. This approach is the most widely used one. If a manufacturer claims that its product has a 0.90 probability of functioning properly for one year, this means that of the new units of this product that are sold for use under specified conditions, 90% of them will work properly for a full year, while the other 10% will experience some sort of problem within a year. If the weather office predicts that there is a 30% chance of rain tomorrow, this means that historically under similar weather conditions 30% of the time it has rained. This approach is very useful in obtaining reliability measures in engineering as multiple units of the same product may be tested under the same working conditions. The proportion of surviving units is used as a measure of the probability of survival for each unit of this product. Subjective Approach. According to this approach, the probability of an event represents the strength of one’s belief with regard to the uncertainties involved in the event. Such probabilities are simply one’s “educated” guesses based on his or her personal experience or expertise. It is used when there are no or few historical records of such events and setting experiments to observe such events is too expensive or impossible. This approach is gaining in favor due to the high speed of technology advancement in today’s world. For example, what is the probability of success in the development of a new medical procedure using DNA technology?
One or a combination of the above approaches may be used to assign the probabilities of some basic events of a statistical experiment. Probabilities are values of a set function. This set function assigns real numbers to various subsets of the sam-
PROBABILITY AND DISTRIBUTIONS
9
ple space S of a statistical experiment. Once such probabilities are obtained, we can follow some mathematical axioms to derive the probability measures of events that can be expressed as a function of those basic events. The following axioms are often used to restrict the ways in which we assign probabilities to events: 1. The probability of any event is a nonnegative real number, that is, Pr(A) ≥ 0 for any subset A of S. 2. The probability of the sample space is 1, that is, Pr(S) = 1. 3. If A1 , A2 , A3 , . . . , are a finite or infinite sequence of disjoint events within S, then Pr(A1 ∪ A2 ∪ A3 ∪ · · · ) = Pr(A1 ) + Pr(A2 ) + Pr(A3 ) + · · · . Based on these axioms, we have the following equations for the probability evaluation of events: Pr(∅) = 0,
(2.1)
Pr( A) = 1 − Pr(A),
(2.2)
Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B).
(2.3)
If A and B are two events and Pr(A) = 0, then the conditional probability of B given A is defined as Pr(B | A) =
Pr(A ∩ B) . Pr(A)
(2.4)
From equation (2.4), the probability of the intersection of two events is the following: Pr(A ∩ B) = Pr(A) Pr(B | A) = Pr(B) Pr(A | B).
(2.5)
Two events are defined to be independent if whether one event has occurred or not does not affect whether the other event will occur or not. If events A and B are independent, we have Pr(B | A) = Pr(B),
(2.6)
Pr(A | B) = Pr(A),
(2.7)
Pr(A ∩ B) = Pr(A) Pr(B).
(2.8)
Note that if two events A and B are independent, then the two events A and B are also independent. For a group of n events A1 , A2 , . . . , An to be independent, we require that the probability of the intersection of any 2, 3, . . . n of these events equal the product of their respective probabilities. These events may be pairwise independent without being independent. If we have three events A, B, and C, it is possible to have Pr(A ∩ B ∩ C) = Pr(A) Pr(B) Pr(C) while these three events are not pairwise independent.
10
RELIABILITY MATHEMATICS
Example 2.1 A manufacturer orders 30, 45, and 25% of the total demand for a certain part from suppliers A, B, and C, respectively. The defect rates of the units provided by suppliers A, B, and C are 2, 3, and 4%, respectively. Assume that the received units of this part are well mixed. What is the probability that a randomly selected unit is defective and supplied by supplier A? What is the probability that a randomly selected unit is defective? If a randomly selected unit is defective, what is the probability that it is provided by supplier A? In this example, we assume that each unit has an equal chance of being selected. Define: • • • •
A: a selected unit is from supplier A B: a selected unit is from supplier B C: a selected unit is from supplier C D: a selected unit is defective
Then, we have Pr(A) = 0.30, Pr(B) = 0.45, Pr(C) = 0.25, Pr(D | A) = 0.02, Pr(D | B) = 0.03, and Pr(D | C) = 0.04. We also know that events A, B, and C are mutually exclusive and A ∪ B ∪ C = S. The probability that a selected unit is defective and from supplier A can be calculated as Pr(A ∩ D) = Pr(A) Pr(D | A) = 0.30 × 0.02 = 0.0060. The probability that a selected unit is defective can be calculated as Pr(D) = Pr(D ∩ (A ∪ B ∪ C)) = Pr(D ∩ A) + Pr(D ∩ B) + Pr(D ∩ C) = Pr(A) Pr(D | A) + Pr(B) Pr(D | B) + Pr(C) Pr(D | C) = 0.0295. The probability that a defective unit is from suppler A can be calculated as Pr(A | D) =
0.0060 Pr(A ∩ D) = ≈ 0.2034. Pr(D) 0.0295
From these calculations, we can say that the defect rate of all received units is 2.95%. If a unit is found to be defective, there is a 20.34% of chance that it is from supplier A. Similar calculations and conclusions can be made for other suppliers. In Example 2.1, to find the defective rate of all received units, we divided them into three mutually exclusive groups, namely, A, B, and C. These three groups represent the exclusive suppliers for the manufacturer, that is, S = A ∪ B ∪ C. The defect rate of a unit from each of these groups is known. Thus, we can use conditional probability to find the overall defective rate of the received units. This approach can be generalized to the case where there are k mutually exclusive groups, as stated in the following theorem.
PROBABILITY AND DISTRIBUTIONS
11
Theorem 2.1 (Decomposition Theorem) If the events B1 , B2 , . . . , Bk constitute a partition of the sample space S, that is, Bi ∩ B j = ∅ for all i = j and k B = S and Pr(B ) = 0 for i = 1, 2, . . . , k, then for any event A, we have ∪i=1 i i Pr(A) =
k
Pr(Bi ) Pr(A | Bi ).
(2.9)
i=1
Theorem 2.1 is called the decomposition theorem, the factoring theorem, or the rule of elimination. It is used when it is easier to obtain conditional probabilities. It has wide applications in system reliability evaluation. 2.1.3 Random Variables and Their Characteristics Random Variables In many applications involving uncertain outcomes, we are often interested only in a certain aspect of the outcomes. For example, what will be the highest temperature tomorrow? What number will appear when a die is rolled? What is the total number when a pair of dies are rolled? How many light tubes are failed when the light fixtures in an office are inspected ? The random variable X is a function that assigns each outcome in sample space S with a real value. We will use a capital letter (e.g., X, Y ) to indicate a random variable and a lowercase letter (e.g., x, y) to indicate the specific value that a random variable may take. If X is used to represent the number of a rolled die, then X ≥ 3 represents the event that the number of a rolled die is at least 3 and X ∈ {2, 4, 6} represents the event that the number of the rolled die is even. Random variables can be divided into two classes, discrete random variables and continuous random variables. A discrete random variable may take a finite or countably infinite number of values. For example, the number of a rolled die can take only six possible values and the number of visitors to a theme park during a day can only be a nonnegative integer. A continuous random variable can take values on a continuous scale. For example, the highest daily temperature may be any value in the interval (−∞, ∞), and the life of a light bulb may be any nonnegative real value. Probability Distribution Function and Cumulative Distribution Function Consider a discrete random variable X . The function given by f (x) = Pr(X = x) for all x is called the probability mass function (pmf) of X . A function f (x) can serve as the pmf of a discrete random variable if and only if it satisfies the following requirements: 1. f (x) ≥ 0 for all x and 2. x f (x) = 1. The cumulative distribution function (CDF) of a discrete random variable X is defined to be F(x) = Pr(X ≤ x) = f (t). (2.10) t≤x
12
RELIABILITY MATHEMATICS
Let X = {x1 , x 2 , . . . , x n }; then the following equation can be used to calculate f (xi ) for i = 1, 2, . . . , n from F(x): F(x1 ) if i = 1, f (xi ) = (2.11) if i = 2, 3, . . . , n. F(xi ) − F(xi−1 ) Example 2.2
Consider the following discrete function: f (k) = p(1 − p)k−1 ,
k = 1, 2, . . . ,
where 0 < p < 1. Verify that it qualifies to be the pmf of a discrete random variable X with sample space S = {1, 2, . . . }. Find the CDF of X . What is the probability for X ≥ 10? First of all, we note that f (k) ≥ 0 for each possible k ∈ S because 0 < p < 1. We need to verify that ∞ k=1 f (k) = 1: ∞
f (k) =
k=1
∞
p(1 − p)k−1 = p ×
k=1
1 = 1. p
Our conclusion is that f (k) does qualify to be a pmf. Noting that F(k) =
k i=1
f (i) =
k
p(1 − p)i−1 = 1 − (1 − p)k ,
k = 1, 2 . . . ,
i=1
to find the CDF defined over (−∞, ∞), we use the following function when x is not necessarily a positive integer: 0 if x < 1, F(x) = F(k) if k ≤ x < k + 1, where k is a positive integer, Pr(X ≥ 10) = 1 − F(9) = 1 − [1 − (1 − p)9 ] = (1 − p)9 . The pdf used in this example actually describes the geometric distribution, which will be further discussed in Section 2.1.5. For a continuous random variable X we define a probability density function (pdf) associated with it. A function f (x) with −∞ < x < ∞ is called a pdf of the continuous random variable X if and only if b f (x) d x (2.12) Pr(a < X ≤ b) = a
for any real constants a and b such that a ≤ b. In words, the probability for the continuous random variable to be in interval (a, b] is measured by the area under the curve of f (x) within this interval. Based on this definition, the probability for a continuous random variable to take any fixed value is equal to zero. As a result,
13
PROBABILITY AND DISTRIBUTIONS
when one is calculating the probability for a continuous random variable to be in a certain interval, it does not make any difference whether the endpoints of the interval are included or not, as shown below, Pr(a ≤ X ≤ b) = Pr(a < X < b) = Pr(a ≤ X < b) = Pr(a < X ≤ b).
(2.13)
A function f (x) can serve as a pdf of a continuous random variable if and only if it satisfies the following conditions: 1. f (x) ≥ 0 for −∞ < x < ∞ and ∞ 2. −∞ f (x) d x = 1. The CDF of a continuous random variable is defined as follows: x f (t) dt, −∞ < x < ∞. F(x) = Pr(X ≤ x) = −∞
(2.14)
Given the CDF of a random variable, we can use the following equations to find the probability that the continuous random variable takes values in interval [a, b] with a ≤ b and the pdf of the random variable: Pr(a ≤ X ≤ b) = F(b) − F(a), f (x) =
d F(x) , dx
(2.15) (2.16)
where the derivative exists. Example 2.3 Consider the function f (x): f (x) = λe−λx ,
x ≥ 0,
where λ > 0 is a constant. Verify that it qualifies to be a pdf of a continuous random variable X , which may take nonnegative values. Find the CDF of X . What is the probability that X takes values in the interval [10, 20]? ∞ It is apparent that f (x) ≥ 0 for all x ≥ 0. Since 0 f (x) d x = 1, as shown below, f (x) qualifies to be a probability density function (pdf): ∞ ∞ f (x) d x = λe−λx d x = −e−λx |∞ 0 = 1. 0
The CDF of X is given by x f (t) dt = F(x) = 0
0
x 0
λe−λt dt = −e−λt |0x = 1 − e−λx ,
When x < 0, F(x) = 0, which is often omitted. Then Pr(10 ≤ X ≤ 20) = F(20) − F(10) = e−10λ − e−20λ .
x ≥ 0.
14
RELIABILITY MATHEMATICS
The pdf used in this example actually describes the exponential distribution, which is the most commonly used in reliability. Whether a random variable is discrete or continuous, its CDF satisfies the following conditions: • • •
F(−∞) = 0, F(∞) = 1, and F(a) ≤ F(b) for any real numbers a and b such that a ≤ b.
Consider two independent continuous random variables X with CDF G(x) and Y with CDF H (y). Let Z be the sum of these two random variables, that is, Z = X +Y . The CDF of Z , denoted by U (z), can be expressed as ∞ Pr(X + Y ≤ z | Y = y) d H (y) U (z) = Pr(Z ≤ z) = Pr(X + Y ≤ z) = =
−∞
∞
−∞
Pr(X ≤ z − y) d H (y) =
∞
−∞
G(z − y) d H (y)
= (G ∗ H )(z), where
(G ∗ H )(z) ≡
∞ −∞
G(z − y) d H (y)
(2.17)
is called the convolution of functions G and H . In words, the CDF of the sum of two independent random variables is equal to the convolution of the CDFs of these two individual random variables. This result can be extended to the sum of n ≥ 2 independent random variables; namely, the CDF of the sum of n independent random variables is equal to the convolution of the CDFs of these n individual random variables. If these individual random variables are independent and identically distributed (i.i.d) with CDF F(x), we use Fn to indicate the n-fold convolution of F with itself. The CDF of the sum of n i.i.d. random variables with CDF F(x) is the n-fold convolution of F with itself. Generally, the following recursive formula can be used for evaluation of convolutions of a function with itself: Fn = F ∗ Fn−1 ,
n ≥ 2,
(2.18)
where F1 = F. Median The median of a random variable X is defined to be the value of x such that F(x) = 0.5. The probability for X to take a value less than or equal to its median and the probability for X to take a value greater than or equal to its median are both equal to 50%. The 100 pth percentile, denoted by x p , of a random variable X is defined to be the value of x such that F(x p ) = p. For example, the 10th percentile of X is denoted by x0.1 and the 90th percentile of X is denoted by x0.9 . Thus, x0.5 represents the median of the random variable X .
15
PROBABILITY AND DISTRIBUTIONS
Expected Value The expected value E(X ), or µ, of a random variable X with pdf f (x) is x f (x) if X is discrete, x µ ≡ E(X ) = ∞ (2.19) x f (x) d x if X is continuous. −∞
The expected value E(X ) is also referred to as the mean, the average value, or the first moment about the origin of the random variable X . Moment The expected value of a deterministic function g(X ) of a random variable X with sample space S and pdf f (x) is given by g(x) f (x) if X is discrete, x E(g(X )) = ∞ (2.20) g(x) f (x) d x if X is continuous. −∞
When g(X ) in equation (2.20) takes the form of X r , where r is a nonnegative integer, E(X r ) is called the r th moment about the origin or the ordinary moment of random variable X , often denoted by µr . When r = 1, we have the first moment about the origin E(X ), which is exactly the expected value of X . Thus, we have µ1 ≡ µ. Note that µ0 = 1. When g(X ) in equation (2.20) takes the form of (X −µ)r , where r is a nonnegative integer and µ is the expected value of X , E((X − µ)r ) is called the r th moment about the mean or the central moment of random variable X , often denoted by µr . Note that µ0 = 1 and µ1 = 0. The second moment about the mean, µ2 , is of special importance in statistics because it indicates the spread of the distribution of the random variable. As a result, it is called the variance of the random variable and denoted by σ 2 or Var(X ). The positive square root of the variance is called the standard deviation of the random variable and denoted by σ . The following equation indicates the definition and the calculation of Var(X ): σ 2 ≡ Var(X ) = E(X 2 ) − (E(X ))2 (x − µ)2 f (x) x σ 2 = µ2 − µ2 = ∞ (x − µ)2 f (x) d x −∞
if X is discrete, (2.21) if X is continuous.
The following summarizes the equations for the evaluation of expectations: 1. E(a) = a, where a is a constant; 2. E(a X ) = a E(X ), where a is a constant;
16
RELIABILITY MATHEMATICS
3. E(a X + b) = a E(X ) + b, where a and b are constants; 4. E(X + Y ) = E(X ) + E(Y ); 5. E(g(X ) + h(Y )) = E(g(X )) + E(h(Y )), where g and h are deterministic function forms; and 6. Var(a X + b) = a 2 Var(X ), where a and b are constants. The variance represents the spread of a distribution. The Chebyshev theorem given below illustrates this point and provides a means of estimating the probability for a random variable to take values within the neighborhood of its mean. Theorem 2.2 (Chebyshev Theorem) For any given positive value k, the probability for a random variable to take on a value within k standard deviations of its mean is at least 1 − 1/k 2 . In other words, if µ and σ are the mean and the standard deviation of the random variable X , the following inequality is satisfied: Pr(| X − µ | < kσ ) ≥ 1 −
1 . k2
This theorem gives the lower bound on the probability that a random variable will take on a value within a certain number of standard deviations of its mean. This lower bound does not depend on the actual distribution of the random variable. By choosing k to be 2 and 3 respectively, we can see that the probabilities are at least 34 and 89 that a random variable X will take on a value within two and three standard deviations of its mean, respectively. To find the exact value of such probabilities, we need to know the exact distribution of the random variable. 2.1.4 Multivariate Distributions Bivariate Distribution In some situations, we are interested in the outcomes of two aspects of a statistical experiment. We use different random variables to indicate these different aspects. For example, we may use X and Y to represent the level of education and the annual salary, respectively, of a randomly selected individual who lives in a certain area. In this case, we are interested in the distribution of these two random variables simultaneously. We refer to their joint distribution as a bivariate distribution. If X and Y are discrete random variables, f (x, y) = Pr(X = x, Y = y) for each pair (x, y) within the sample space of (X, Y ) is called the joint probability distribution function or simply the joint pdf of X and Y . A bivariate function f (x, y) can serve as the joint pdf of discrete random variables X and Y if and only if it satisfies the following conditions: 1. f (x, y) ≥ 0 for each pair (x, y) within the range of the random variables and 2. x y f (x, y) = 1, where the summations cover all possible values of x and y within the range of the random variables.
PROBABILITY AND DISTRIBUTIONS
17
The joint CDF of discrete random variables X and Y , denoted by F(x, y), over all possible pairs of real values is defined as F(x, y) = Pr(X ≤ x, Y ≤ y) = f (s, t), −∞ < x < ∞,
−∞ < y < ∞,
(2.22)
s≤x t≤y
where f (s, t) is the value of the joint pdf of X and Y at point (s, t). If X and Y are continuous random variables, f (x, y) defined over the twodimensional real space is the joint pdf of random variables X and Y if and only if f (x, y) d x dy (2.23) Pr((X, Y ) ∈ A) = A
for any region A in the two-dimensional real space. A bivariate function f (x, y) can serve as a joint pdf of two continuous random variables X and Y if and only if it satisfies 1. f (x, y) ≥ 0 for −∞ < x < ∞ and −∞ < y < ∞ and ∞ ∞ 2. −∞ −∞ f (x, y) d x dy = 1. The joint CDF of continuous random variables X and Y , denoted by F(x, y), over all possible pairs of real values is defined as F(x, y) = Pr(X ≤ x, Y ≤ y) x y f (s, t) dt ds, = −∞ −∞
−∞ < x < ∞,
−∞ < y < ∞, (2.24)
where f (s, t) is the value of the joint pdf of X and Y at point (s, t). For the continuous random variables, we have f (x, y) =
∂2 F(x, y). ∂ x ∂y
(2.25)
The bivariate CDF of both discrete and continuous random variables satisfy the following conditions: 1. F(−∞, −∞) = 0, 2. F(∞, ∞) = 1, and 3. if a < b and c < d, then F(a, c) ≤ F(b, d). Even when there is more than one random variable of interest in an experiment, we may want to know the distribution of one of the random variables irrespective of what values the other random variables may take. In this case, we are interested in the
18
RELIABILITY MATHEMATICS
marginal distribution of a single random variable. If X and Y are random variables with joint pdf f (x, y), then the marginal pdf of X is given by f (x, y) if X and Y are discrete, y g(x) = ∞ (2.26) f (x, y) dy if X and Y are continuous −∞
for each x in the range of X , and the marginal pdf of Y is given by f (x, y) if X and Y are discrete, x h(y) = ∞ f (x, y) d x if X and Y are continuous
(2.27)
−∞
for each y in the range of Y . Once the marginal pdf of a random variable is obtained, we can use it to find the CDF of the random variable ignoring all other variables. Since the random variables of an experiment may depend on each other, we are sometimes interested in the conditional distribution of one random variable given that the other random variables have taken certain values or certain ranges of values. If f (x, y) is the joint pdf of (discrete or continuous) random variables X and Y , g(x) is the marginal pdf of X , and h(y) is the marginal pdf of Y , the function given by g(x | y) =
f (x, y) , h(y)
h(y) = 0,
(2.28)
for each x within the range of X is called the conditional pdf of X given Y = y. Correspondingly, the function given by h(y | x) =
f (x, y) , g(x)
g(x) = 0,
(2.29)
for each y within the range of Y is called the conditional pdf of Y given X = x. For two random variables X and Y with joint pdf f (x, y), the expected value of a function of these two random variables, g(X, Y ), is given by g(x, y) f (x, y) if X and Y are discrete, x y E(g(X, Y )) = ∞ ∞ g(x, y) f (x, y) d x dy if X and Y are continuous. −∞ −∞
(2.30) Let µ X and µY indicate the expected values of random variables X and Y , respectively. The covariance of X and Y , denoted by Cov(X, Y ) or σ X Y , is given by σ X Y ≡ Cov(X, Y ) = E ((X − µ X )(Y − µY )) = E(X Y ) − µ X µY .
(2.31)
19
PROBABILITY AND DISTRIBUTIONS
The correlation coefficient of two random variables, denoted by ρ X Y , is given by Cov(X, Y ) ρXY = √ . Var(X )Var(Y )
(2.32)
The correlation coefficient takes nominal values between −1 and 1. A positive value indicates that X and Y are positively correlated, and a negative value indicates that X and Y are negatively correlated. A positive correlation between two random variables indicates that there is a high probability that large values of one variable will go with large values of the other. A negative correlation indicates that there is a high probability that high values of one variable will go with low values of the other. Two random variables X and Y are said to be independent if and only if their joint pdf is equal to the product of the marginal pdf’s of the two random variables. We can also say that two random variables are independent if and only if the conditional pdf of each random variable is equal to its own marginal pdf irrespective of what value the other random variable takes. If X and Y are independent, we also have the equations E(X Y ) = E(X )E(Y ), Cov(X, Y ) = 0.
(2.33) (2.34)
Multivariate Distribution The definitions provided above with two random variables can be generalized to the multivariate case. The joint pdf and the joint CDF of n discrete random variables X 1 , X 2 , . . . , X n defined over their sample spaces are given, respectively, by f (x 1 , x2 , . . . , x n ) = Pr(X 1 = x 1 , X 2 = x 2 , . . . , X n = xn ), ... f (s, t, . . . µ). F(x1 , x2 , . . . , x n ) = s≤x 1 t≤x 2
µ≤x n
The joint CDF of n continuous random variables X 1 , X 2 , . . . , X n defined over their sample spaces is given by F(x 1 , x2 , . . . , xn ) =
x1
x2
−∞ −∞
···
xn −∞
f (t1 , t2 , . . . , tn ) dtn dtn−1 · · · dt2 dt1 .
When dealing with more than two random variables, we may also be interested in the joint marginal distribution of several of the random variables. For example, suppose that f (x 1 , x2 , . . . , xn ) is the joint pdf of discrete random variables X 1 , X 2 , . . . , X n (n > 3). The joint marginal pdf of random variables X 1 , X 2 , X 3 is given by m(x1 , x2 , x3 ) = ··· f (x1 , x2 , . . . , x n ) x4
x5
xn
20
RELIABILITY MATHEMATICS
for all values of x1 , x2 , and x 3 within the ranges of X 1 , X 2 , and X 3 , respectively. The joint marginal CDF of several random variables can be defined in a similar manner. The joint conditional distribution of several random variables can also be defined. For example, suppose that f (x1 , x2 , x3 , x4 ) is the joint pdf of discrete random variables X 1 , X 2 , X 3 , X 4 and m(x1 , x2 , x3 ) is the joint marginal pdf of random variables X 1 , X 2 , X 3 . Then, the joint conditional pdf of X 4 , given that X 1 = x1 , X 2 = x 2 , and X 3 = x3 , is given by q(x 4 | x1 , x 2 , x 3 ) =
f (x 1 , x 2 , x 3 , x 4 ) , m(x1 , x2 , x 3 )
m(x 1 , x 2 , x3 ) = 0.
If X 1 , X 2 , . . . , X n are independent, then, E(X 1 X 2 · · · X n ) = E(X 1 )E(X 2 ) · · · E(X n ),
n n Var ai X i = ai2 Var(X i ). i=1
(2.35) (2.36)
i=1
Note that while equations (2.35) and (2.36) are necessary conditions for the random variables to be independent, random variables satisfying these conditions are not necessarily independent.
2.1.5 Special Discrete Distributions In this section, we review some commonly used discrete distributions. The statistical experiments under which such distributions are derived will be discussed. The characteristics of these distributions will be derived or simply given. For a discrete random variable, it is often easier to use its pmf to characterize its distribution. Discrete Uniform Distribution Consider a random variable X that can take k distinct possible values. If each value has an equal chance of being taken by X , we say that X has a discrete uniform distribution. The pmf of X can be written as Pr(X = x) =
1 k
for x = x1 , x2 , . . . , xk ,
(2.37)
where xi = x j when i = j. The mean and variance of such a random variable can be expressed as µ= σ2 =
k 1 xi , k i=1
(2.38)
k 1 (xi − µ)2 . k i=1
(2.39)
PROBABILITY AND DISTRIBUTIONS
21
In the special case of xi = i for i = 1, 2, . . . , k, we have 1 for x = 1, 2, . . . , k, k k+1 , µ= 2
Pr(X = x) =
σ2 =
k2 − 1 . 12
(2.40) (2.41) (2.42)
The uniform distribution truly reflects the equally likely interpretation of probability. However, the assumption of equal likeliness is often used instead in statistical experiments. Bernoulli Distribution Consider a statistical experiment that has only two possible outcomes, which we will call “success” and “failure.” The probability of observing success and failure in the experiment is denoted by p and 1 − p, respectively. The random variable X is used to count the number of successes in the experiment. Apparently, X can only take one of two possible values, 0 or 1. The pmf of X is given by Pr(X = x) = p x (1 − p)1−x ,
x = 0, 1.
(2.43)
A random variable that has such a pmf is said to follow the Bernoulli distribution. The corresponding statistical experiment just described is referred to as a Bernoulli trial. The mean and variance of a Bernoulli random variable X are µ = p,
(2.44)
2
(2.45)
σ = p(1 − p).
Binomial Distribution Suppose that we are to conduct n Bernoulli trials where n is fixed. All these trials are independent and the probability of observing the success in each trial is a constant denoted by p. We are interested in the total number of successes that will be observed in these n trials. Let X i be the Bernoulli random variable representing the number of successes observed in the ith trial for i = 1, 2, . . . , n. Then, the total number of successes in n trials, denoted by X , can be expressed as X = X 1 + X2 + · · · + Xn . Based on our assumptions, X i ’s are i.i.d. with the pmf given in equation (2.43). If we complete n trials, we would get a specific sequence of n numbers consisting of 0’s and 1’s. The probability of getting a specific sequence with exactly x 1’s is equal to p x (1 − p)n−x . The total number of sequences of 0’s and 1’s such that there are exactly x 1’s is equal to nx . Thus, the probability of observing x 1’s in whatever sequence from n Bernoulli trials is equal to nx p x (1 − p)n−x . As a result, we can
22
RELIABILITY MATHEMATICS
express the pmf of the random variable X as n x p (1 − p)n−x , Pr(X = x) = x
x = 0, 1, 2, . . . , n.
(2.46)
A random variable with such a pmf is said to follow the binomial distribution with parameters n and p. The name “binomial distribution” comes from the fact that the values of the pmf of a binomial random variable for x = 0, 1, 2, . . . , n are the successive terms of the binomial expansion of function [(1 − p) + p]n , as shown below: n 0 n 1 p (1 − p)n + p (1 − p)n−1 [(1 − p) + p]n = 0 1 n 2 n n p (1 − p)n−2 + · · · + p (1 − p)0 . + 2 n Since the left-hand side of the above equation is equal to 1, this verifies that the sum of the binomial pmf over all possible x values is equal to 1. The mean and variance of a binomial random variable are given below. Their derivations are left as exercises: µ = np, 2
σ = np(1 − p).
(2.47) (2.48)
In deriving the binomial distribution, we have defined X to be the number of successes in n Bernoulli trials. We could have used Y to indicate the number of failures in n Bernoulli trials. Then, we have X + Y = n. Since X has the binomial distribution with parameters n and p, Y has the binomial distribution with parameters n and 1 − p. The probability for X to take a value of k is equal to the probability for Y to take the value of n − k for k = 0, 1, 2, . . . , n. The binomial distribution has applications in sampling with replacement. If X 1 and X 2 are independent random variables following the binomial distributions with the parameters n 1 and p and n 2 and p, respectively, the sum of these two random variables follows the binomial distribution with the parameters n 1 + n 2 and p. Example 2.4 One hundred fluorescent light tubes are used for lighting in a building. They are inspected every 30 days and failed tubes are replaced at inspection times. Thus, after each inspection, all 100 tubes are working. The failures of the light tubes are statistically independent. The probability for a working tube to last 30 days is constant at 0.80. What is the probability that at least 10 will be failed at the next inspection time? What is the average number of failed tubes at each inspection time? What is the interval such that the probability that the number of failed tubes at each inspection time is within this interval is at least 0.75? In this example, we use X to indicate the number of failed tubes at the time of the next inspection given that all 100 tubes are working properly at the end of the
23
PROBABILITY AND DISTRIBUTIONS
previous inspection. Then, X follows the binomial distribution with n = 100 and p = 1 − 0.8 = 0.2: Pr(X ≥ 10) = 1 − Pr(X ≤ 9) = 1 −
9
Pr(X = i)
i=0
=1−
9 100 i=0
i
× 0.2i × 0.8100−i ≈ 1 − 0.0023 = 0.9977.
The average number of failed tubes at√each inspection time is µ = n × p = 20. The standard deviation of X is σ = np(1 − p) = 4. Using Chebyshev’s theorem, we know that the number of failed tubes at each inspection has a 0.75 probability to be within two standard deviations of its mean. Thus, the interval for X is | X − 20 | < 8
or
12 < X < 28.
This means that there is a 75% chance that the number of failed tubes to be observed at each inspection will be somewhere between 12 and 28. This range can help the inspector to bring enough light tubes for replacement of the failed ones. Negative Binomial Distribution and Geometric Distribution Consider a statistical experiment wherein repeated Bernoulli trials are performed. The probability of success in each trial is a constant p. We are interested in the number of trials needed to observe the kth success, denoted by X . For X to take a value of x, x = k, k + 1, k + 2, . . . , we must observe k − 1 successes in the first x − 1 trials, and a success must be realized in the xth trial. Thus, the pmf of X is x −1 k x = k, k + 1, k + 2, . . . . (2.49) p (1 − p)x−k , Pr(X = x) = k−1 A random variable that has such a pmf is said to have a negative binomial distribution with the parameters k and p. The name “negative binomial distribution” comes from the fact that the values of the pmf given in equation (2.49) are the successive terms of the binomial expansion of (1/ p−(1− p)/ p)−k . The negative binomial distribution is also referred to as the binomial waiting time distribution or as the Pascal distribution. If the pmf of a negative binomial random variable X with the parameters k and p is denoted by Pr(X = x | k, p) and that of a binomial random variable Y with the parameters x and p is denoted by Pr(Y = k | x, p), the following equation describes the relationship between them: Pr(X = x | k, p) =
k Pr(Y = k | x, p). x
(2.50)
The mean and variance of the negative binomial distribution are as follows: µ=
k , p
(2.51)
24
RELIABILITY MATHEMATICS
σ2 =
k(1 − p) . p2
(2.52)
When k = 1, we are interested in the number of Bernoulli trials needed to observe the first success. In this case, the negative binomial distribution is given a special name called the geometric distribution. The pmf of a geometric distribution is Pr(X = x) = p(1 − p)x−1 ,
x = 1, 2, 3 . . . .
(2.53)
In Example 2.2, we illustrated how to find the CDF of a geometric distribution. The mean and variance of the geometric distribution are µ= σ2 =
1 , p
(2.54)
1− p . p2
(2.55)
If X is a geometric random variable, the following equation holds: Pr(X = n + x | X > n) = Pr(X = x).
(2.56)
This equation indicates that the number of additional Bernoulli trials needed to observe the first success does not depend on how many unsuccessful Bernoulli trials have been conducted. Because of this, we say that a geometric random variable has the memoryless property. If X 1 and X 2 are i.i.d. geometric random variables with the common parameter p, the sum of these two random variables follows the negative binomial distribution with the parameters p and k = 2. This result can be extended to the case with more than two random variables. Suppose that X 1 , X 2 , . . . , X k are i.i.d. geometric random variables with the common parameter p; the sum of these k random variables follows the negative binomial distribution with the parameters k and p. This experiment can be described as follows. Repeated Bernoulli trials are conducted where the probability of observing success in each trial is p. A counter is used to count the number of trials needed to observe a success. Whenever a success is observed, the counter is reset to zero. The number of trials needed for the counter to be reset again follows the geometric distribution with the parameter p. The total number of trials needed for the counter to be reset for the kth time follows the negative binomial distribution with the parameters k and p. Hypergeometric Distribution Consider a finite population with a total of N items, of which k items (called success items) carry the label of success while the remaining N − k items (called failure items) carry the label of failure. Our experiment involves a random selection of n items from the population, and we are interested in knowing the number of successes among these selected items, denoted by X . There is a total of Nn ways of selecting n items out of the finite population of N items. The number of ways of realizing exactly x success items out of a total of k
PROBABILITY AND DISTRIBUTIONS
25
success items in the population (and exactly n − x failure items out of a total of N −k −k failure items in the population) in a sample size of n is kx Nn−x . It is assumed that the items in the population are well mixed and thus each item has an equal chance of being selected. As a result, each way of getting x success items and n − x failure items has an equal chance of being realized. The pmf of X can be expressed as N −k k n−x x , Pr(X = x) = N n
x = 0, 1, 2, . . . , min{k, n},
(2.57)
where n < N . A random variable with such a pmf is said to follow the hypergeometric distribution. The mean and variance of the hypergeometric distribution are nk , N nk(N − k)(N − n) . σ2 = N 2 (N − 1) µ=
(2.58) (2.59)
The hypergeometric distribution has applications in sampling without replacement. Items are often selected and checked one by one and the checked ones are not placed back into the population. It is especially useful when N is small. This distribution has to be used in this case because the probability of selecting a success item in a certain trial depends on the outcomes of the previous trials. If many success items have been removed from the population already, the probability of getting another success item in the next trial is smaller than in previous trials. If n is very small relative to the value of N (say, n is no more than 5% of N ), removing an item from the population in one trial does not affect very much the probability of getting a success item in the next trial. Then, the probability of getting a success item in each trial can be approximated by k/N , and the binomial distribution with the parameters n and p = k/N can be used to approximate the hypergeometric distribution. Poisson Distribution Suppose that we are interested in finding the number of a certain type of events, X , that occur in a certain time interval of length t. The average number of events that occur in a unit of time, say an hour or a minute, is constant and denoted by λ such that the average number of events that occur in time interval t is equal to λt. The probability for one event to occur in a small time interval t can be approximated by λ t. The probability for more than one event to occur in the small time interval t is negligible. Therefore, the probability that no event occurs in the small time interval t can be approximated by 1 − λ t. Assume also that the occurrence of events in interval t does not depend on what happened prior to this time interval. Under these assumptions, it can be mathematically proved that X has the following pmf with ρ = λt:
26
RELIABILITY MATHEMATICS
Pr(X = x) =
ρ x e−ρ , x!
x = 0, 1, 2, . . . .
(2.60)
A random variable that has such a pmf is said to follow the Poisson distribution with the parameter ρ. Under the conditions that the Poisson distribution is derived, we are interested in the number of events that occur in a time interval. The events of interest may be the number of customers arriving at a service station in a certain time interval, the number of vehicles passing a certain point on a highway in a certain time interval, or the number of phone calls to a 911 service station in a certain time interval. This distribution may also be used when one is interested in the number of defects within a certain area, say A, of a sheet of metal, paper, or plastics. In these cases, λ can be interpreted as the average number of defects in a unit area, say, a square inch or a square centimeter and ρ = λA. The mean and variance of the Poisson distribution are µ = ρ,
(2.61)
2
(2.62)
σ = ρ,
respectively. If X 1 and X 2 are independent random variables following the Poisson distributions with the parameters ρ1 and ρ2 , respectively, the sum of these two random variables, X 1 + X 2 , follows the Poisson distribution with the parameter ρ1 + ρ2 . The Poisson distribution can be used to approximate the binomial distribution under the following conditions: n → ∞, p → 0 and ρ ≡ np = const. When n ≥ 20 and p ≤ 0.05, the approximation is good, and when n ≥ 100 and np ≤ 10, the approximation is excellent [78]. Example 2.5 A fabric is produced in continuous rolls with a fixed width. The number of defects per meter is estimated to be 0.2. It is believed that the number of defects within any fixed length follows the Poisson distribution. What is the probability that the number of defects on a fabric 10 meters long is less than or equal to 3? Based on the given data, we know ρ = 0.2 × 10 = 2. Let X represent the number of defects on a fabric 10 meters long. Then Pr(X ≤ 3) = Pr(X = 0) + Pr(X = 1) + Pr(X = 2) + Pr(X = 3) ≈ 0.8571. Multinomial Distribution If each of the n i.i.d. Bernoulli trials has more than two, say k, possible outcomes, we obtain a generalization of the binomial distribution, which is called the multinomial distribution. Let pi denote k the probability of observing the ith type of outcome in each Bernoulli trial with i=1 pi = 1. Let X i indicate the number of occurrences of the ith type of outcome in n such Bernoulli trials. Then, the random variables X 1 , X 2 , . . . , X k follow k the multinomial distribution with the parameters n and p1 , p2 , . . . , pk where i=1 pi = 1. The pmf of the multinomial
27
PROBABILITY AND DISTRIBUTIONS
distribution is given by Pr(X 1 = x1 , X 2 = x2 , . . . , X n = xn ) =
n p x1 p x2 · · · pkxk , x1, x2, . . . , xk 1 2
xi = 0, 1, 2, . . . , n for each i,
(2.63)
where k
xi = n,
i=1
n x1 , x2 , . . . , xk
k
pi = 1,
i=1
(2.64)
n! = . x1 !x 2 ! · · · x k !
The name “multinomial” comes from the fact that the probability for each possible set of outcomes is equal to the corresponding term of the multinomial expansion of ( p1 + p2 + · · · + pk )n . Multivariate Hypergeometric Distribution Consider a finite population of size N with k k types of items. Let ai indicate the number of items of type i such that i=1 ai = N . Suppose that n items are to be selected through sampling without replacement. The number of items of type i in the sample is a random variable denoted by X i . These random variables X 1 , X 2 , . . . , X k follow the multivariate hypergeometric distribution with the following pmf: ak a1 a 2 ··· x 1 x2 xk Pr(X 1 = x1 , X 2 = x2 , . . . , X k = xk ) = , N n xi = 0, 1, 2, . . . , min{ai , n} for each i, (2.65) where k
xi = n,
i=1
k
ai = N .
i=1
2.1.6 Special Continuous Distributions In this section, we review a few simple continuous distributions. Characteristics of these distributions will be derived or simply given. Relevant statistical experiments under which such distributions are derived will be discussed. We will not, however, discuss those continuous distributions that have been widely used in reliability analysis; these will be covered in a later section. In specifying a pdf or CDF function, we will use the convention that it takes the value of zero outside the specified regions.
28
RELIABILITY MATHEMATICS
Uniform Distribution A continuous random variable X follows the uniform distribution if and only if its pdf is given by f (x) =
1 , b−a
a < x < b,
(2.66)
where a and b are the parameters of the distribution. The CDF of a uniform random variable is x − a for a < x < b, F(x) = b − a (2.67) 1 for x ≥ b. When a = 0 and b = 1, the uniform distribution is called the standard uniform distribution. A uniform random variable may only take values in the interval (a, b), and the probability for the random variable to fall into a small interval of a fixed length anywhere in (a, b) is the same. This distribution is often used for generating random numbers in the interval (0, 1). It reflects the principle of equal likelihood that is essential in many discrete distributions. The mean and variance of a uniform random variable X are µ= σ2 =
a+b , 2
(2.68)
(b − a)2 . 12
(2.69)
Beta Distribution Here, we first introduce the gamma function. The gamma function of any positive value α is defined as (α) =
∞
y α−1 e−y dy.
0
This function has the following properties: • • • •
for α > 1, (α) = (α − 1)(α − 1); for any positive integer n, (n) = (n − 1)!; (1) = 1; and √ ( 12 ) = π.
A random variable X follows the beta distribution if and only if its pdf is given by f (x) =
(α + β) α−1 (1 − x)β−1 , x (α)(β)
0 < x < 1,
(2.70)
PROBABILITY AND DISTRIBUTIONS
4
3
29
α = 0.5 β = 0.5 α = 1.5 β=5
α=5 β = 1.5
α=5 β=5
f (x) 2
1 α=1 β=1 0 0
0.2
0.4
FIGURE 2.4
x
0.6
0.8
1
Different shapes of beta pdf.
where α > 0 and β > 0 are the parameters of the beta distribution. The CDF of a beta random variable does not have a closed form. It involves the incomplete beta function: x (α + β) t α−1 (1 − t)β−1 dt for 0 < x < 1, F(x) = (α)(β) 0 (2.71) 1 for x ≥ 1. The name beta distribution comes from the beta function, defined as B(a, b) =
0
1
x a−1 (1 − x)b−1 d x =
(a)(b) . (a + b)
(2.72)
Refer to the Appendix for additional discussions of the beta function. When α = β = 1, the beta distribution reduces to the standard uniform distribution. A beta random variable can only take values in the interval (0, 1). The beta distribution has found important applications in Bayesian inference where it is used to represent the parameter of a binomial distribution. It has also been used in modeling the distribution of the state of a component or a system in continuous multistate reliability theory, to be discussed in Chapter 12. The beta pdf exhibits different shapes for different parameter values (see Figure 2.4), and thus it is said to be a “flexible” pdf. The mean and the variance of a beta random variable are µ=
α , α+β
(2.73)
30
RELIABILITY MATHEMATICS
σ2 =
αβ (α
+ β)2 (α
+ β + 1)
.
(2.74)
Normal Distribution A random variable X follows the normal distribution if and only if its pdf is given by 1 1 x −µ 2 f (x) = √ exp − , −∞ < x < ∞, (2.75) 2 σ σ 2π where µ and σ > 0 are the parameters of the normal distribution. Actually, the parameter µ is the mean of the normal random variable and the parameter σ is the standard deviation of the normal random variable. The normal distribution with µ = 0 and σ = 1 is called the standard normal distribution. The CDF of a normal random variable does not have a closed functional form. Even the standard normal random variable does not have a closed CDF form. Because of the wide applications of the normal distribution, the CDF of the standard normal variable is often tabulated in tables. The popular notation (x) is used to indicate the CDF of a standard normal random variable: x 1 2 −∞ < x < ∞. (2.76)
(x) = √ e−t /2 dt, 2π −∞ If X follows the normal distribution with mean µ and standard deviation σ , then Z = (X − µ)/σ follows the standard normal distribution. With this result, we can evaluate the probability for X to be in any interval using the CDF of the standard normal variable Z : x1 − µ x2 − µ Pr(x1 < X < x2 ) = Pr
−
. (2.77) σ σ The pdf of a standard normal random variable exhibits the bell-shaped curve centered around x = 0. It is a symmetric function. As a result, we have f (x) = f (−x),
(x) = 1 − (−x).
(2.78) (2.79)
If X and Y are i.i.d. random variables following the standard normal distribution, the sum of these two random variables follows the normal distribution with µ = 0 and σ 2 = 2. Let X i be a normal random variable with mean µ and standard deviation σ for i = 1, 2, . . . , n. The sum of these random variables follows the normal distribution with mean nµ and variance nσ 2 . When n is very large and p is close to 12 , the binomial distribution with the parameters n and p can be approximated by the normal distribution with µ = np and σ 2 = np(1 − p). In this case, the following equation can be used to evaluate
PROBABILITY AND DISTRIBUTIONS
31
the probability that a binomial random variable X is less than or equal to a certain integer a: Pr(X ≤ a) =
a n i=0
=
i
pi (1 − p)n−i ≈ Pr(Y ≤ a + 0.5)
a + 0.5 − np . √ np(1 − p)
(2.80)
In this equation, we have applied the continuity correction. Since the binomial random variable X can only take integer values while the normal random variable Y can take continuous values, the probability for Y to be in (a − 0.5, a + 0.5) is used to approximate the probability for X to be equal to a. This approximation provides fairly accurate results when np and n(1 − p) are both greater than 5 [78]. Bivariate Normal Distribution A pair of random variables X and Y follow the bivariate normal distribution if and only if their joint pdf is given by x − µX 2 1 1 exp − f (x, y) = σX 2(1 − ρ 2 ) 2πσ X σY 1 − ρ 2 y − µY y − µY 2 x − µX + (2.81) − 2ρ σX σY σY for −∞ < x < ∞ and −∞ < y < ∞, where σ X > 0, σY > 0, and −1 < ρ < 1. The parameter ρ is the correlation coefficient of the two random variables. Based on the pdf of the bivariate normal distribution, it can be shown that the marginal distribution of X is the normal distribution with the mean µ X and the standard deviation σ X , and the marginal distribution of Y is the normal distribution with the mean µY and the standard deviation σY . The converse is not necessarily true. In other words, two random variables may each follow the normal distribution while their joint distribution is not a bivariate normal distribution. The conditional distribution of X given Y = y is a normal distribution with the mean µ X |Y =y = µ X + ρ
σX (y − µY ) σY
(2.82)
and the variance σ X2 |Y =y = σ X2 (1 − ρ 2 ),
(2.83)
and the conditional distribution of Y given X = x is a normal distribution with the mean µY |X =x = µY + ρ
σY (x − µ X ) σX
(2.84)
32
RELIABILITY MATHEMATICS
and the variance σY2|X =x = σY2 (1 − ρ 2 ).
(2.85)
Theorem 2.3 If two random variables follow the bivariate normal distribution, they are independent if and only if ρ = 0.
2.2 RELIABILITY CONCEPTS Reliability is defined as the probability that a device will perform its intended functions satisfactorily for a specified period of time under specified operating conditions. Based on this definition, reliability is measured as a probability. Probability theory has been used to analyze the reliability of components as well as the reliability of systems consisting of these components. Reliability is defined in terms of a device, which may be a component in a system or a system consisting of many components. Since the performance of a system usually depends on the performance of its components, the reliability of a system is a function of the reliability of its components. The intended function of the device is supposedly understood and the degree of success of the device’s performance of the intended function can be measured so that we can easily conclude if the performance is satisfactory or not. Time is an important factor in the definition of reliability. If a newly purchased device can perform its intended functions satisfactorily, what is the probability that it will last (continue to perform satisfactorily) for a specified period of time, say three years? How long will it last? In other words, what will be the life of this device? The lifetime of the device can be treated as a random variable with a statistical distribution and related properties. Further, the operating conditions, such as stress, load, temperature, pressure, and/or other environmental factors, under which the device is expected to operate must be specified. Under most circumstances in our discussions throughout this book, the operating conditions are constant and implicit. When the operating conditions of a device change over its lifetime, it is explicitly stated. Let T be the random variable representing the lifetime of a device. The units of measurement for the lifetime may be a time unit such as seconds, hours, days, and years or a usage unit such as miles driven and cycles of operation. The random variable T is continuous and can take only nonnegative values. Its statistical distribution can be described by its probability density function f (t), its cumulative distribution function F(t), and/or its characteristics such as mean and variance. Given that we understand the intended functions, the operating conditions, and the satisfactory performance of the device when it is new, we need only deal with the probability that the device can last beyond a specified period t. Thus, the reliability function of the device, denoted by R(t), is given by R(t) = Pr(T > t) = 1 − F(t) =
t
∞
f (x) d x.
(2.86)
RELIABILITY CONCEPTS
33
In words, R(t) is the probability that the device’s lifetime is larger than t, the probability that the device will survive beyond time t, or the probability that the device will fail after time t. It is obvious that R(0) = 1 and R(∞) = 0. The function R(t) is a nonincreasing function of t. The reliability function is also called the survivor function in some literature. The CDF of T is also called the unreliability function. The expected value or the mean of the lifetime T is also called the mean time to failure (MTTF) or the expected life of the device. It can be evaluated with the following standard equation: MTTF = E(T ) =
∞ 0
t f (t) dt.
(2.87)
A computationally more efficient formula for evaluation of MTTF is MTTF = E(T ) =
∞
R(t) dt.
(2.88)
0
Some devices may go through several failures before they are scrapped. These devices are said to be repairable. For repairable devices, the MTTF represents the mean time to the first failure. After it is repaired and put into operation again, the average time to the next failure is indicated by mean time between failures (MTBF). The MTBF represents the average operating time from the point that a failed device is restored to operation to the point of time that it becomes failed again. It does not include the amount of time needed to repair the failed device. If each repair restores the device to “as good as new” condition, we say that the repair is perfect. Under perfect repairs, MTBF is equal to MTTF. Since there is usually an aging effect in most devices, very often we see a decreasing MTBF as more failures are experienced by the device. The average amount of time needed to repair a failed device is called mean time to repair (MTTR). For repairable devices, availability is often used as a measure of its performance. The availability of a device is defined to be the probability that the device is available whenever needed. For a repairable device with perfect repair on any failure, its availability can be expressed as A=
MTBF . MTBF + MTTR
(2.89)
A detailed description on the meaning of availability can be seen in Kuo [128]. The conditional reliability is defined to be the probability for a device to work satisfactorily for an additional duration of τ given that it has worked properly for a duration of t, that is, R(τ | t) = Pr(T > t + τ | T > t) =
R(t + τ ) . R(t)
(2.90)
The failure rate function, or the hazard function, denoted by h(t), is defined to be the probability that a device will fail in the next time unit given that it has been
34
RELIABILITY MATHEMATICS
working properly up to time t, that is, h(t) = lim Pr(T ≤ t + t | T > t) = t→0
f (t) . R(t)
(2.91)
The cumulative failure rate function, or the cumulative hazard function, denoted by H (t), is defined to be t h(x) d x. (2.92) H (t) = 0
The average failure rate of a device over an interval (t1 , t2 ) is given by 1 h(t1 , t2 ) = t2 − t1
t2 t1
h(t) dt =
H (t2 ) − H (t1 ) . t2 − t 1
(2.93)
The failure rate function is often used to indicate the health condition of a working device. A high failure rate indicates a bad health condition because the probability for the device to fail in the next instant of time is high. It is obvious that one of the functions f (t), F(t), R(t), h(t), or H (t) is adequate to specify completely the lifetime distribution of a device. Given any of these functions, the others can be derived. For example, the relationships between h(t) or H (t) and R(t) are h(t) = −
d ln R(t), dt
(2.94)
R(t) = e−H (t) .
(2.95)
The failure rate functions of many devices exhibit the “bathtub” curve shown in Figure 2.5, which has been divided into three sections. In the interval (0, t1 ), which is usually short, a decreasing-failure-rate (DFR) function is observed. This is often
h (t)
Early failure
Useful life t1
FIGURE 2.5
Wear-out failure t2
Bathtub curve of failure rate function.
t
COMMONLY USED LIFETIME DISTRIBUTIONS
35
Mechanical device
h (t)
Electronic device with elevated stress
Electronic device
Computer software t
FIGURE 2.6
Variations of bathtub curve.
referred to as the early-failure period. The failures that occur in this interval are called early failures, burn-in failures, or infant mortality failures. They are mainly due to manufacturing defects and can be screened out using burn-in techniques. In the interval (t1 , t2 ), the failure rate function is fairly constant. This section is often referred to as the useful life of the device or the constant-failure-rate period. The failures that occur in this interval are called chance failures or random failures. They are usually caused by chance events like accidents, overloading, and a combination of the underlying complex physical failure mechanisms. In the interval (t2 , ∞), the failure rate function is increasing. This interval is often called the increasing-failurerate (IFR) period or the wearout failure period. The failures that occur in this period are due to wearout, aging, or serious deterioration of the device. The life of the device is close to its end once entering this period, unless there is preventive maintenance or major overhauls to revitalize the device. It should be noted that the shapes of bathtub curves of different devices may be dramatically different. For example, electronic devices have a very long useful life period. Computer softwares generally have a decreasing failure rate. Mechanical devices have a long wearout period where preventive maintenance measures are used to extend the lives of these devices. Stresses applied on the devices often shift the bathtub curve upward. Figure 2.6 illustrates these different bathtub curves. For further discussions, readers are referred to Kuo et al. [129] and Kuo and Kim [130, 131].
2.3 COMMONLY USED LIFETIME DISTRIBUTIONS The lifetime of a device is the random variable of interest in reliability analysis. It is continuous and can only take nonnegative values. As a consequence, we deal mainly with continuous distributions in reliability analysis. In this section, we describe the continuous distributions that have been widely used in reliability analysis.
36
RELIABILITY MATHEMATICS
Exponential Distribution A random variable T has the exponential distribution if and only if its pdf is given by f (t) = λe−λt ,
t ≥ 0,
where λ > 0 is the parameter of the distribution. It can be easily verified that an exponential random variable has the following reliability function, CDF, failure rate function, mean, and variance respectively: R(t) = e−λt ,
t ≥ 0,
(2.96)
F(t) = 1 − e−λt ,
t ≥ 0,
(2.97)
h(t) = λ,
(2.98)
1 , λ 1 σ2 = 2 . λ µ=
(2.99) (2.100)
When t = µ, F(t) = 1 − e−1 ≈ 63.2%. Thus, the MTTF represents the 63.2th percentile of the random variable T . From equation (2.98), we observe the following: 1. The measuring unit of the parameter λ is probability per unit of time. 2. The failure rate function of an exponential random variable is constant. Based on the definition of the failure rate function, we conclude that the probability of failure in the next instant of time is constant no matter how old the device is. This means that a device that has an age of t is as good as a new item with an age of zero. This is the memoryless property of the exponential distribution. 3. The parameter λ is called the failure rate of the exponential distribution. The memoryless property can also be demonstrated with conditional reliability: R(x | t) = Pr(T > x + t | T > t) = =
Pr(T > x + t) Pr(T > t)
e−λ(t+x) = e−λx = R(x), e−λt
x ≥ 0.
Thus, the MTBF of such a device is equal to its MTTF. Based on equation (2.99), we can see that it is always equal to the reciprocal of the constant failure rate. The exponential distribution is closely related to the Poisson distribution. Consider a repairable device following the exponential lifetime distribution with the parameter λ. Whenever it fails, it is repaired instantaneously. The total number of failures that the device will experience during the interval (0, t) is a random variable, denoted by X . Then, it can be shown that X follows the Poisson distribution with the
COMMONLY USED LIFETIME DISTRIBUTIONS
37
parameter λt, whose pdf is given by Pr(X = n) =
(λt)n e−λt , n!
t ≥ 0,
n = 0, 1, 2, . . . .
(2.101)
The exponential distribution is the most widely used distribution in reliability analysis. One reason is its mathematical simplicity. As will be seen later in this book, a mathematical analysis of many system structures is only possible when the exponential lifetime distribution is assumed. The other reason is its constant failure rate that reflects many real-life phenomena. For example, after burn-in techniques have been used to screen out early failures, many products exhibit a roughly constant failure rate during their useful life, as reflected in the bathtub curve. For many devices that are purchased and used, a significant portion of their lives can be modeled by the exponential distribution. Weibull Distribution A random variable T has the Weibull distribution if and only if its pdf is given by f (t) =
βt β−1 −(t/η)β e , ηβ
t ≥ 0,
(2.102)
where β > 0 is the shape parameter and η > 0 is the scale parameter of the distribution. The reliability function, CDF, failure rate function, mean, and variance of a Weibull random variable are given by β
R(t) = e−(t/η) , β
F(t) = 1 − e−(t/η) ,
t ≥ 0,
(2.103)
t ≥ 0,
(2.104)
β t ≥ 0, (t/η)β−1 , η 1+β µ = η , β 2 1+β 2+β 2 2 − σ =η . β β
h(t) =
(2.105) (2.106) (2.107)
When t = η, F(t) = 1 − e−1 ≈ 63.2%. Thus, the scale parameter η represents the 63.2th percentile of the random variable T . The shape parameter β determines the shape of the distribution function. When 0 < β < 1, h(t) is a DFR. When β = 1, h(t) = β/η is constant, the Weibull distribution reduces to the exponential distribution, and we say that the device has a CFR. When β > 1, h(t) is an IFR. By varying the β value, the Weibull distribution can be used to model devices with DFR, CFR, and IFR. Thus, we say that the Weibull distribution is a very flexible distribution in reliability analysis. It can be used to model all three regions of the bathtub curve.
38
RELIABILITY MATHEMATICS
Gamma Distribution A random variable T has the gamma distribution if and only if its pdf is given by f (t) =
λβ β−1 −λt e , t (β)
t ≥ 0,
(2.108)
where β > 0 is the shape parameter and λ > 0 is the scale parameter. The gamma function was defined when the beta distribution was introduced earlier in this book. Generally, there are no closed-form expressions for the reliability function, the CDF, or the failure rate function. However, they are listed below together with some characteristics of the gamma random variable: R(t) =
λβ (β)
F(t) =
λβ (β)
h(t) =
∞
x β−1 e−λx d x,
t
t
x β−1 e−λx d x,
0
t β−1 e−λt ∞
x β−1 e−λx d x
,
t ≥ 0,
(2.109)
t ≥ 0,
(2.110)
t ≥ 0,
(2.111)
t
β µ= , λ β σ2 = 2 . λ
(2.112) (2.113)
When β = 1, the gamma distribution reduces to the exponential distribution and thus has a CFR. When 0 < β < 1, the gamma distribution has a DFR. When β > 1, the gamma distribution has an IFR. Thus, it is also a flexible lifetime distribution and can be used to model each of the three regions of the bathtub curve. When β is an integer, the gamma distribution is known as the Erlang distribution. In this case, it does have closed-form expressions for its reliability function, CDF, and failure rate function, as given below: f (t) = R(t) =
λβ t β−1 e−λt , (β − 1)!
t ≥ 0,
(2.114)
t ≥ 0,
(2.115)
(λt)k −λt e , k!
t ≥ 0,
(2.116)
λβ t β−1 , β−1 k (β − 1)! [(λt) /k!] k=0
t ≥ 0.
(2.117)
β−1 k=0
F(t) = 1 −
(λt)k −λt e , k! β−1 k=0
h(t) =
COMMONLY USED LIFETIME DISTRIBUTIONS
39
The Erlang distribution can be derived as follows. Consider a repairable device having an exponential lifetime distribution with the parameter λ. Whenever it fails, it is instantaneously repaired to “as good as new” status. This device is useful as long as the total number of failures it experiences does not exceed β, an integer. Then, the lifetime of this device is a summation of these β failure interarrival times, each following the same exponential distribution. This situation also arises when we are interested in the lifetime of a system with one active component and β − 1 identical standby components with a perfect sensing and switching mechanism. In each of these situations, if we are interested in knowing the number of failures that the system may experience within a specified mission time t at the component or the system level, the Poisson distribution with the parameter λt applies. This also means that the sum of k i.i.d. exponential random variables with the parameter λ follows the gamma distribution with the parameters λ and k. Lognormal Distribution A random variable T has the lognormal distribution if and only if ln T has the normal distribution. The pdf of a lognormal random variable is given by 1 1 f (t) = t > 0, (2.118) √ exp − 2 (ln t − µ)2 , 2σ σ t 2π where σ > 0 and µ are the shape and scale parameters of the lognormal distribution, respectively. If the pdf of T is as shown in equation (2.118), the random variable X = ln T has the normal distribution with the parameters µ and σ , where µ is the mean of X and σ is the standard deviation of X . We can also say that X has the normal distribution if and only if T = e X has the lognormal distribution. The lognormal distribution is applicable in lifetime analysis because it can only take nonnegative values. Its evaluation can be done through the normal distribution or the standard normal distribution because of the close relationship between them. The reliability function, CDF, and failure rate function of a lognormal variable in terms of the corresponding functions of the standard normal distribution are given below. Recall that (x) is the CDF of a standard normal random variable: ln t − µ R(t) = 1 −
, t > 0, (2.119) σ ln t − µ F(t) =
, t > 0, (2.120) σ h(t) =
f (t) , 1 − [(ln t − µ)/σ ]
E(T ) = eµ+σ Var(T ) = e
2 /2
2µ+σ 2
, e
σ2
−1 .
t > 0,
(2.121) (2.122) (2.123)
The failure rate function of a lognormal distribution increases initially, reaches its peak, and then decreases as time goes to infinity.
40
RELIABILITY MATHEMATICS
2.4 STOCHASTIC PROCESSES In this section, we review relevant concepts and techniques in stochastic processes that are useful for system reliability analysis. For advanced coverage of this topic, readers are referred to Parzen [186] and Ross [206]. 2.4.1 General Definitions Let N (t) be a random variable representing the state of a system at time t for t ≥ 0. We call {N (t), t ∈ T } a stochastic process, where T indicates the domain of time t. If T is a discrete set, for example, T = {0, 1, 2, . . . }, we have a discrete-time stochastic process. If T is a continuous set, for example, T = {t | t ≥ 0}, we have a continuous-time stochastic process. The set of all possible values of N (t) for t ∈ T is called the state space of the stochastic process. Definition 2.1 A continuous-time stochastic process {N (t), t ≥ 0} is a counting process if N (t) satisfies 1. 2. 3. 4.
N (t) ≥ 0, N (t) is integer valued, N (t) is a nondecreasing function of t, and for s < t, N (t) − N (s) represents the number of events that occur in the interval (s, t].
Let N (t) be a counting process, and let Si indicate the point of time at which the ith event occurs for i ≥ 1 with S0 ≡ 0. Then the time elapsed between successive events, or the so-called interarrival time, can be represented by Ti = Si − Si−1 . The relationships among Ti , Si , and N (t) is shown in Figure 2.7. A counting process {N (t), t ≥ 0} is said to have independent increments if, for all 0 ≤ t0 < t1 < · · · < tn (n ≥ 2), the random variables N (t1 ) − N (t0 ), N (t2 ) − N (t1 ), . . . , N (tn )−N (tn−1 ) are independent of one another. For each n ≥ 1, N (tn )− N (tn−1 ) is called an increment in the state of the system, and it represents the number of events that occur in the time interval (tn−1 , tn ]. In other words, the number of events that occur in the interval (tn−1 , tn ] is independent of the number of events that occur in any other disjoint time interval. The number of events in any time interval may be described by a statistical distribution. For example, the number of events in the interval (0, 5] may follow the Poisson distribution with a parameter of 5 while the number of failures in the interval (10, 15] may follow the Poisson distribution with a parameter of 10, and these two distributions are independent. A counting process is said to have stationary increments if the increment N (s + t) − N (s) has the same distribution as N (t) for all t > 0 and s ≥ 0. In other words, the number of events in any interval is dependent only on the length of the interval and not on where this interval starts or ends. A stochastic process with stationary increments is called a stationary process or a homogeneous process. In such a process,
STOCHASTIC PROCESSES
41
N (t ) 4
3
2
1
0 t S0
T1
FIGURE 2.7
S1
T2
S2
T3
S3
Relationships among N (t), Ti , and Si in a counting process.
the numbers of events in disjoint intervals of the same length always have the same distribution with the same parameter values. 2.4.2 Homogeneous Poisson Process Definition 2.2 A counting process {N (t), t ≥ 0} is called a homogeneous Poisson process (HPP) with the parameter λ > 0 if the following conditions are satisfied: 1. N (0) = 0, 2. the process has stationary and independent increments, 3. the probability for an event to occur in the interval (t, t + t] may be written as λ t + o(t),1 and 4. the probability for more than one event to occur in the interval (t, t + t] is negligible for small t; in other words, this probability can be written as o(t). In a homogeneous Poisson process, the number of events in any interval of length t follows the Poisson distribution with the parameter λt, that is, for all s ≥ 0 and t > 0, Pr(N (s + t) − N (s) = n) = Pr(N (t) = n) = 1 The
(λt)n −λt e , n!
n = 0, 1, 2, . . . .
notation o(t) denotes a function of t such that limt→0 [o(t)/t] = 0.
(2.124)
42
RELIABILITY MATHEMATICS
Let m(t) denote the expected number of events that have occurred by time t. We have m(t) = E(N (t)) = E(N (s + t) − N (s)) = λt,
(2.125)
Var(N (t)) = λt.
(2.126)
The parameter λ is called the rate of occurrence of events. It represents the average number of events that occur within a unit of time. A more general definition of the rate of occurrence of events at time t, denoted by w(t), is given by w(t) =
dm(t) . dt
(2.127)
Apparently in a homogeneous Poisson process, w(t) is a constant and equal to λ. The interarrival times Ti ’s for i ≥ 1 between any two successive events are independent of one another and identically distributed following the exponential distribution with the parameter λ > 0. In other words, the pdf and CDF of Ti for i = 1, 2, . . . are f (t) = λe −λt ,
t ≥ 0,
F(t) = 1 − Pr(T1 > t) = 1 − Pr(N (t) = 0) = 1 − e−λt ,
t ≥ 0.
The arrival time of the nth event is called the waiting time of the nth event, denoted by Sn : Sn =
n
Ti ,
n ≥ 1.
(2.128)
i=1
In words, the waiting time of the nth event is equal to the sum of n i.i.d. exponential random variables. As stated when we introduced the gamma distribution, Sn has the gamma distribution with the scale parameter λ and the shape parameter n. The pdf of Sn is f (t) =
λn t n−1 −λt e , (n − 1)!
t > 0.
This pdf can also be derived using the observation that the nth event occurs before or at time t (i.e., Sn ≤ t) if and only if the number of events having occurred by time t is greater than or equal to n and then using the Poisson survivor function. If the events that occur in a homogeneous Poisson process can be classified into two different types, say, type 1 and type 2, we can use N1 (t) and N2 (t) to indicate the numbers of type 1 and type 2 events that have occurred by time t, respectively. If an event occurs at time t, the probability for this event to be of type 1 is denoted by P(t), and thus, the probability that it is of type 2 is equal to 1 − P(t). Under these conditions, N1 (t) and N2 (t) can be proved to be independent Poisson random variables with the parameters λpt and λ(1 − p)t, respectively, where
STOCHASTIC PROCESSES
p=
1 t
43
∞
P(x) d x.
(2.129)
0
For a proof of this result, readers are referred to Ross [206]. The homogeneous Poisson process may be used to analyze equipment reliability when the following conditions are satisfied: 1. A new device is put into operation at time zero. 2. The lifetime distribution of the device is exponential with the parameter λ > 0. 3. Whenever there is a failure, the device is either instantaneously repaired to “as good as new” condition or instantaneously replaced by a new identical device. 4. The lifetimes of all devices are independent of one another. 2.4.3 Nonhomogeneous Poisson Process The nonhomogeneous Poisson process (NHPP) generalizes the homogeneous Poisson process by allowing the rate of occurrence of events to be time dependent. It does not have stationary increments. Definition 2.3 The counting process {N (t), t ≥ 0} is said to be a nonhomogeneous or nonstationary Poisson process with an intensity function λ(t) ≥ 0 if 1. 2. 3. 4.
N (0) = 0, {N (t), t ≥ 0} has independent increments, Pr(N (t + t) − N (t) = 1) = λ(t) t + o(t), and Pr(N (t + t) − N (t) ≥ 2) = o(t).
Based on this definition, it can be shown that the number of events that occur in the s+t time interval (s, s + t] follows the Poisson distribution with the parameter λ(x) d x, that is, s Pr(N (s + t) − N (s) = n) =
s+t s
n λ(x) d x n!
exp −
n = 0, 1, 2, . . . .
t
s+t
λ(x) d x , (2.130)
In terms of a reliability system, an NHPP may be used to describe the following situation: 1. A new system is put into operation at time 0. 2. Whenever the system is failed, it is instantaneously repaired. The repair may be a minimal repair or an imperfect repair. A minimal repair simply restores the system to the working state. The failure rate of the system right after the
44
RELIABILITY MATHEMATICS
repair is the same as that right before the failure. We say that minimal repair does not change the health condition of the system. It restores the system to “as bad as old” condition. An imperfect repair is used to describe a situation that is neither the “as bad as old” situation when a minimal repair is conducted nor the “as good as new” situation when a replacement is made. The system’s health condition is somewhere between “as good as new” and “as bad as old” right after the repair. Thus, the life distribution of a repaired system depends on the extent of the repair and the condition of the system before the failure. For example, suppose that a new system follows the exponential distribution with the parameter λ. Because of the extent of the repair to each additional failure, the life distribution of the system after the ith repair may follow the exponential distribution with the parameter (i + 1)λ for i ≥ 1. In other words, the failure rate of the system increases by λ with each additional failure and repair. 2.4.4 Renewal Process In the homogeneous Poisson process, i.i.d. exponential random variables are used to model the interarrival times. The renewal process generalizes the homogeneous Poisson process by allowing the interarrival times to be i.i.d. random variables having an arbitrary distribution. Other conditions for a renewal process are the same as those for a homogeneous Poisson process. The elapsed times between two successive events are assumed to be independent and identically distributed with a CDF: F(t) = Pr(Ti ≤ t)
for t ≥ 0,
i = 1, 2, . . . .
The average interarrival time is µ = E(Ti ) =
∞
x d F(x), 0
i = 1, 2, . . . .
We will still use Sn to indicate the arrival time of the nth event (n ≥ 1) with S0 ≡ 0. The relationship between N (t) and Sn can be expressed as N (t) = sup{n : Sn ≤ t} = max{n : Sn ≤ t}. Since Sn is a sum of n i.i.d. random variables with a common CDF F(t), the distribution of Sn is Pr(Sn ≤ t) = Fn (t),
(2.131)
where Fn (t) denotes the n-fold convolution of F with itself. To find the distribution of N (t), we note that N (t) is greater than or equal to n if and only if Sn is less than or equal to t. As a result, we have Pr(N (t) = n) = Pr(N (t) ≥ n) − Pr(N (t) ≥ n + 1) = Pr(Sn ≤ t) − Pr(Sn+1 ≤ t) = Fn (t) − Fn+1 (t). (2.132)
STOCHASTIC PROCESSES
45
In a renewal process, we call m(t) = E(N (t)) the renewal function. The relationship between m(t) and F(t) is given below: m(t) =
∞
n Pr(N (t) = n) =
n=0
∞
Fn (t).
(2.133)
n=1
The number of events that have occurred by time t goes to infinity as t → ∞. However, the average number of events that occur in a unit of time goes to a constant as time goes to infinity: 1 N (t) → t µ
as t → ∞,
(2.134)
S N (t) →µ N (t)
as t → ∞.
(2.135)
The following theorem states that the expected number of events averaged on time also goes to a constant as time goes to infinity. Theorem 2.4 (Elementary Renewal Theorem) 1 m(t) → t µ
as t → ∞.
(2.136)
The renewal process may be used to describe the following scenario in reliability analysis: 1. At time zero, a new piece of equipment is put in use. 2. Whenever the equipment fails, it is instantaneously replaced by a new piece of identical equipment. 3. The lifetime of the equipment is i.i.d. with CDF F(t). Under this description, an event indicates the failure of the equipment. Whenever an event occurs, the equipment is renewed by a new piece of equipment. Since we have assumed that the replacement is instantaneous, the terms renewals and events can be used interchangeably. Alternating Renewal Process Consider a device whose state changes between working and failed as time goes on. At time zero, it is in the working state. After a time duration of X 1 , it enters the failed state. It stays in the failed state for a time duration of Y1 and then enters the working state. The device takes the working state for a duration of X 1 , the failed state for a duration of Y1 , the working state for a duration of X 2 , the failed state for a duration of Y2 , and so on. Suppose that X 1 , X 2 , . . . are i.i.d. random variables with CDF G(x) and Y1 , Y2 , . . . are i.i.d. random variables with CDF H (y). We further assume that whenever the device enters the working state, the process is renewed. In other words, each pair of (X n , Yn ) is independent of
46
RELIABILITY MATHEMATICS
the pair (X i , Yi ) for all i < n. However, Yn may be dependent on X n for any fixed n . Let N (t) represent the number of renewals by time t ≥ 0 with N (0) = 0. Then N (t) = 0 for t < X 1 + Y1 and N (t) = 1 for X 1 + Y1 ≤ t < X 1 + X 2 + Y1 + Y2 . We say that {N (t), t ≥ 0} forms an alternating renewal process. In an alternating renewal process, the sequence {X n + Yn } for n = 1, 2, . . . are i.i.d. random variables. Let F(t) denote the CDF of X n + Yn for n ≥ 1 and P(t) denote the probability that the device is in the working state at time t. The following theorem can be used to find the asymptotic value of P(t). Theorem 2.5 If both X n and Yn are continuous random variables for each n ≥ 1 and E(X n + Yn ) < ∞, then lim P(t) =
t→∞
E(X n ) E(X n ) + E(Yn )
for any n ≥ 1.
(2.137)
Renewal Reward Process Consider a renewal process {N (t), t ≥ 0} with i.i.d. interarrival times Tn with CDF F(t) for n ≥ 1. When the nth renewal occurs, there is a reward denoted by Cn (n ≥ 1). If this reward is negative, it would mean a penalty. We assume that Cn for n ≥ 1 are i.i.d. random variables. The reward Cn at the nth renewal may be dependent on Tn . However, each pair of (Tn , Cn ) is independent of the pair of (Ti , Ci ) for all i < n. Let C(t) represent the total reward earned by time t. Then, we have C(t) =
N (t)
Ci .
i=1
Theorem 2.6 If E(Cn ) < ∞ and E(Tn ) < ∞, then, for any n ≥ 1, E(C(t)) E(Cn ) → t E(X n )
as t → ∞.
(2.138)
This theorem indicates that the long-run average reward is the expected reward earned during a cycle divided by the expected length of a cycle. 2.4.5 Discrete-Time Markov Chains A discrete-time stochastic process {N (t), t = 0, 1, . . . } with discrete states 0, 1, . . . is called a Markov chain if the equation Pr(N (t + 1) = j | N (0) = i 0 , N (1) = i 1 , N (2) = i 2 , . . . , N (t) = i) = Pr(N (t + 1) = j | N (t) = i) = pi j (t)
(2.139)
is satisfied for all possible states of i0 , i 1 , i 2 , . . . , i t−1 , i, j and t ≥ 0. We call pi j (t) in equation (2.139) the one-step transition probability for the process from state i at
STOCHASTIC PROCESSES
47
time t to state j at time t + 1. The one step refers to the unit time increment from time t to time t + 1. Equation (2.139) specifies that the conditional probability for the stochastic process to make a transition from state i at time t to state j at time t + 1 given its past states at times 0, 1, . . . , t − 1, t is dependent only on its state at time t. In other words, the state that the process will visit next at time t +1 is dependent only on the state at present time t. The property specified by equation (2.139) is called the Markovian property. Let P(t) represent the matrix of one-step transition probability pi j (t)s: p00 (t) p01 (t) p02 (t) . . . p10 (t) p11 (t) p12 (t) . . . .. .. .. .. . . . P(t) = . (2.140) . pi0 (t) pi1 (t) pi2 (t) . . . .. .. .. .. . . . . Since all pi j (t)s are probabilities, we have pi j (t) ≥ 0, ∞
pi j (t) = 1,
i, j ≥ 0, i = 0, 1, 2, . . . ,
t ≥ 0, t ≥ 0.
j=0
A Markov chain is not necessarily a counting process. The number N (0) does not have to be zero and N (t) is usually not a monotonic function of the discrete time t. The process jumps from state to state following the transition probabilities. In particular, when the one-step transition probability matrix P(t) is independent of time t, we say that the Markov chain has stationary or homogeneous transition probabilities, and the Markov chain is said to be stationary or homogeneous. Stationary or homogeneous Markov processes are defined in a similar manner. In this case, we use pi j to represent the probability for the process to make a one-step transition from state i to state j. Correspondingly, P is used to indicate the stationary one-step transition probability matrix, that is, p00 p01 p02 . . . p10 p11 p12 . . . .. .. .. .. . . . P= . (2.141) , pi0 pi1 pi2 . . . .. .. .. .. . . . . where pi j ≥ 0, ∞ j=0
pi j = 1,
i, j ≥ 0, i = 0, 1, 2, . . . .
48
RELIABILITY MATHEMATICS
The n-step transition probability pinj is defined to be the probability that the process will be in state j after n additional transitions given that it is currently in state i, that is, pinj ≡ Pr(N (t + n) = j | N (t) = i),
n, i, j, t ≥ 0,
where pii0 ≡ 1 and pi0j ≡ 0 for i = j. When n = 1, the n-step transition probability is the one-step transition probability. The n-step transition probability pinj can be calculated by summing up the r -step transition probability from state i to state k multiplied by the (n − r )-step transition probability from state k to state j over all possible k values, that is, pinj =
∞
r n−r pik pk j ,
0 < r < n.
(2.142)
k=0
Equation (2.142) is called the Chapman–Kolmogorov equation. In matrix form, it can be written as P(n) = P(r ) P(n−r ) ,
0 < r < n,
(2.143)
where P(n) ≡ ( pinj ). Based on the Chapman–Kolmogorov equation, we have P(n) = Pn .
(2.144)
In words, the n-step transition matrix can be calculated with the nth power of the onestep transition matrix. In addition, we can rewrite equation (2.143) into the following form: Pn = Pr Pn−r ,
0 < r < n.
(2.145)
Let (t) be a row vector representing the probability of the Markov chain in different states at time t (t ≥ 0), that is,
(t) = (π0 (t), π1 (t), π2 (t), . . . ), where πi (t) is the probability that the process is in state i at time t. It is obvious that we must have
∞
πi (t) ≥ 0,
t, i = 0, 1, 2, . . . ,
πi (t) = 1,
t = 0, 1, 2, . . . .
i=0
Given (0) at time zero, we can evaluate (t) at time t with the following equation:
(t) = (0)Pt ,
t = 1, 2, . . . .
(2.146)
STOCHASTIC PROCESSES
49
For a finite-state Markov chain with states 0, 1, . . . , n, we have the following one-step transition matrix, the initial state distribution, and the final state distribution at time t: p00 (t) p01 (t) p02 (t) . . . p0n (t) p10 (t) p11 (t) p12 (t) . . . p1n (t) P(t) = . .. , .. .. .. .. . . . . pn0 (t) pn1 (t) pn2 (t) . . . pnn (t)
(0) = (π0 (0), π1 (0), . . . , πn (0)),
(2.147)
(t) = (π0 (t), π1 (t), . . . , πn (t)) = (0)Pt . In reliability analysis, state i for 0 ≤ i ≤ n is often used to indicate that there are i failed components in the system. Suppose that the system works if the number of failed components is less than or equal to m (1 ≤ m ≤ n − 1). States 0, . . . , m are the working states of the system while states m + 1, . . . , n are the failed states of the system. In this case, the probability for the system to work at time t is equal to the probability that the system is in a state less than or equal to m, that is, Rs (t) =
m i=0
πi (t) = (t)(1, . . . , 1, 0, . . . , 0)T . ! !
(2.148)
n−m
m+1
The probability that the system is failed at time t is Fs (t) = 1 − Rs (t) =
n i=m+1
πi (t) = (t)(0, . . . , 0, 1, . . . , 1)T . ! ! m+1
(2.149)
n−m
Imbedded Markov Chains A discrete-time Markov chain may be imbedded in a continuous-time stochastic process. Consider a single service station where customers arrive following the Poisson distribution with the parameter λ. Suppose that the service times of individual customers are i.i.d. random variables having a general distribution and are independent of the arrival process. If we use X (t) to indicate the number of customers waiting in line for service at time t, then {X (t), t ≥ 0} is a continuous-time stochastic process that does not possess the Markovian property. However, if we use N (t) to denote the number of customers waiting in line for service at the moment that the tth customer departs the service station after being served, then {N (t), t = 1, 2, . . . } can be proved to be a discrete-time Markov chain, where t = i (i ≥ 1) corresponds to the moment that the ith customer departs the station. In other words, it can be shown that the conditional distribution of N (t + 1), given the values of N (1), N (2), . . . , N (t), depends only on the value of N (t). Let U (t) denote the number of customers that arrive during the time interval that the tth customer is receiving service. If the departure of the tth customer leaves no customer in line, that is, N (t) = 0, the next arrival, the (t + 1)th customer, will be served right away. The number of customers waiting in line when this (t + 1)th customer departs
50
RELIABILITY MATHEMATICS
is simply equal to the number of new arrivals when this customer is receiving service. If the departure of the tth customer leaves N (t) customers in line, one of these customers will be served right away and there are N (t) − 1 customers in line now. When the customer receiving service departs, the total number of customers in line is equal to N (t) − 1 plus the number of new arrivals while this customer is receiving service. In summary, we can express N (t + 1) as N (t + 1) =
N (t) + U (t + 1) − 1 U (t + 1)
if N (t) > 0, if N (t) = 0.
Based on this equation, we can see that N (t + 1) does not depend on the values of N (1), N (2), . . . , N (t − 1). The value of N (t) together with the information on the arrival process, which is independent of the service history, is enough to determine the value of N (t + 1). The Markov chain {N (t), t = 1, 2, . . . } is called the imbedded Markov chain since it is imbedded in a continuous-time stochastic process. The use of imbedded Markov chains to study the properties of continuous-time stochastic processes is a very useful technique in stochastic analysis. 2.4.6 Continuous-Time Markov Chains Consider a continuous-time stochastic process {N (t), t ≥ 0} with discrete states 0, 1, . . . . It is called a continuous-time Markov chain if the following condition is satisfied: Pr(N (t + h) = j | N (t) = i, N (x) = i x , 0 ≤ x < t) = Pr(N (t + h) = j | N (t) = i)
for all h ≥ 0.
We call Pr(N (t + h) = j | N (t) = i) the transition probability of the process from state i at time t to state j at time t + h. In a continuous-time Markov chain, the probability of transition from state i at time t to state j at time t + h for h ≥ 0 is independent of what has happened before time t. It depends only on the current state i at time t and the time increment h. Note that it does not matter how long the process has been in state i by time t. The only information needed is that the process is in state i at time t. As a result, a continuoustime Markov chain satisfies the Markovian property. If Pr(N (t + h) = j | N (t) = i) is independent of the time t but only dependent on the time increment h, or in mathematical terms, Pr(N (t + h) = j | N (t) = i) = pi j (h)
for t, h ≥ 0,
i, j = 0, 1, . . . ,
the transition probabilities are said to be stationary or homogeneous, and consequently, the continuous-time Markov chain is said to be stationary or homogeneous. It is obvious that the following conditions are satisfied by a stationary continuoustime Markov chain:
STOCHASTIC PROCESSES
∞
pi j (h) ≥ 0
for h ≥ 0,
pi j (h) = 1
for i ≥ 0, h ≥ 0.
51
j=0
Because of the Markovian property and the stationary property, the probability that the process will make a transition to state j at time t + h, given that it is in state i at time t, depends only on i, j, and h. It does not depend on the time duration that the process has stayed in state i or the current time point. In other words, such a process has the memoryless property. As a result, the time duration that the stationary Markov process stays in a certain state must follow the exponential distribution. Whenever a continuous-time Markov chain enters a state, say i, it will stay in this state for a duration that is exponentially distributed with the parameter, say vi . When the process leaves state i, it has a probability of, say, qi j , entering state j with j=i qi j = 1. We can obtain an embedded discrete-time Markov chain if we are only interested in the states of the process at the time points when a transition occurs. The continuous-time Markov chain changes from state to state following a discrete-time Markov chain, and the amount of time it spends in each state is exponentially distributed. In addition, the state it will move into next is independent of the amount of time it spends in the current state. We call vi the rate of transition out of state i and qi j the probability of entering state j, given it leaves from state i. Thus, the transition rate from state i to state j can be written as vi j = vi qi j for j = i. If vi = 0, we say that state i is an absorbing state. If vi → ∞, state i is called a constant state since the process entering such a state will instantaneously leave the state. For a continuous-time Markov chain, we often use the transition rates vi and vi j for i, j = 0, 1, 2, . . . instead of the transition probabilities pi j (h). The relationship between the transition probabilities and the transition rates can be explained as follows. The rate of transition from state i to state j for i = j is defined as the limit " dpi j (x) "" pi j (t) vi j = lim , i = j, (2.150) = t d x "x=0 t→0 provided that this limit exists. For this limit to exist, the transition probability from state i to state j for j = i in a time interval of length t is assumed to be asymptotically proportional to t, that is, pi j (t) = vi j t + o(t). The rate of transition out of state i, no matter which state the process enters next, is defined as the limit " 1 − pii (t) dpii (x) "" vi = lim =− (2.151) t d x "x=0 t→0 provided that this limit exists. Since pii (t) = 1 − j=i pi j (t), we have
52
RELIABILITY MATHEMATICS
vi =
vi j .
(2.152)
j=i
The probability of entering state j, given that a transition out of state i has been made, is given by qi j =
vi j , vi
j = i,
i = 0, 1, 2, . . . .
(2.153)
A continuous-time Markov chain is not necessarily a counting process. After staying at the current state for a random amount of time, it may move into any other state. It is a flexible stochastic process because it can be considered as a product of two random variables, one representing the amount of time spent in the current state and the other representing the destination state after it leaves the current state. Pure Birth Process A continuous-time Markov chain {N (t), t ≥ 0} with discrete states 0, 1, . . . and homogeneous transition rate matrix (vi j ) is called a pure birth process if vi j = 0 for all j = i + 1. A pure birth process with N (t) = 0 is a counting process. When the state of the process increases by 1, we say that a birth has occurred. The state of the process then represents the size of some population. The interarrival time of births follows the exponential distribution but not necessarily with a constant parameter. Instead, the parameter depends only on the current state of the process. If we use λi to represent the parameter of the exponential sojourn time in state i, we can say that the pure birth process has parameters {λi , i = 0, 1, . . . }. The parameter λi is called the birth rate of the process in state i. A state transition diagram for a pure birth process is given in Figure 2.8. In the transition diagram, the arrows indicate possible transitions from state to state, and the parameter along each arrow indicates the rate of the transition. If a pure birth process makes a transition out of a state, say state i, it must enter state i + 1. Thus, we have qi,i+1 = 1 and qi, j = 0 for j = i + 1. The transition rate out of state i, denoted by vi , is equal to the parameter of the exponential sojourn time in state i, that is, vi = λi . The pure birth process is a generalization of the homogeneous Poisson process. In a HPP, the sojourn time at any state follows the exponential distribution with the same parameter λ. A pure birth process is not a renewal process because the sojourn times at different states are not identically distributed. It can be considered as an NHPP. If we use Ti to represent the sojourn time of the process in state i (i = 0, 1, . . . ), then we know that Ti is exponentially distributed with the parameter λi . In addition,
λ0
0
λ1
1 FIGURE 2.8
λ2
2
λ3
3
λ4
4
...
State transition diagram for pure birth process.
STOCHASTIC PROCESSES
53
these sojourn times are independent of one another. Define Pi (t) = Pr(N (t) = i | N (0) = 0),
i = 0, 1, 2, . . . ,
to be the probability that the process is in state i at time t, given that it is in state 0 at time 0; the following differential equations can be obtained from the state transition diagram in Figure 2.8:
P0 (t) = −λ0 P0 (t),
(2.154)
Pi (t) = −λi Pi (t) + λi−1 Pi−1 (t),
i = 1, 2, . . . ,
(2.155)
with boundary conditions P0 (0) = 1
Pi (0) = 0
and
for i = 1, 2, . . . .
These equations are the Kolmogorov forward equations for the pure birth process. Using the Laplace transform technique, we can easily verify the following solution of equation (2.154): P0 (t) = e−λ0 t ,
t ≥ 0.
(2.156)
The solution to equation (2.155), namely, Pi (t) for i = 1, 2, . . . , is given by t eλi x Pi−1 (x) d x, i = 1, 2, . . . . (2.157) Pi (t) = λi−1 e−λi t 0
This solution can be verified by taking the first-order derivative of both sides of this equation with respect to t to yield the original equation (2.155). The solution Pi (t) for i = 0, 1, 2, .∞. . is supposed to be a pdf of the state of the process at time t. Thus, we require i=0 Pi (t) = 1 for all t ≥ 0. This requirement is equivalent to the condition ∞ 1 → ∞. λ i=0 i
(2.158)
For a proof of this condition, readers are referred to Feller [75]. The pure birth process can be used to model the reliability of a nonrepairable system with redundant or standby components. Such a system will eventually reach a failed state as more components become failed. This will be illustrated later in this book. Birth-and-Death Process A continuous-time Markov chain {N (t), t ≥ 0} with discrete states 0, 1, . . . and homogeneous transition rate matrix (vi j ) is called a birthand-death process if vi j = 0 for all i and j such that | i − j | > 1. In a birth-and-death process, the state of the system may go up by 1 (we say that a birth occurs) or down by 1 (we say that a death occurs) after a random amount of
54
RELIABILITY MATHEMATICS
time at the current state. Let λi and µi represent the rates of transitions from state i to state i + 1 and from state i to state i − 1, respectively. They are called the birth rate and the death rate, respectively. The relationships among vi , vi j , qi j , λi , and µi are λi = vi qi,i+1 ,
(2.159)
µi = vi qi,i−1 ,
(2.160)
vi = λi + µi ,
(2.161)
λi , λi + µi µi = , λi + µi
qi,i+1 =
(2.162)
qi,i−1
(2.163)
vi,i+1 = λi ,
(2.164)
vi,i−1 = µi ,
(2.165)
qi,i+1 + qi,i−1 = 1.
(2.166)
The starting state of a birth-and-death process is usually greater than zero as the first event may be either a birth or a death. The state transition diagram for a birth-and-death process is shown in Figure 2.9. It is obvious that µ0 = 0 because it is impossible to have additional death once there is no one left in the population. Define Pi j (t) = Pr(N (t) = j | N (0) = i) for i, j = 0, 1, . . . . We can obtain the following differential equations based on the state transition diagram:
Pi0 (t) = −λ0 Pi0 (t) + µ1 Pi1 (t),
i = 0, 1, 2, . . . , (2.167)
Pi j (t) = λ j−1 Pi, j−1 (t) − (λ j + µ j )Pi j (t) + µ j+1 Pi, j+1 (t),
i = 0, 1, 2, . . . , (2.168)
with initial conditions Pii (0) = 1 and Pi j (0) = 0 for i = j. These equations are the Kolmogorov forward equations for the birth-and-death process. The sojourn time for the process in state i is exponentially distributed with the parameters vi = λi + µi for i ≥ 0. It is generally difficult to solve the differential equations for explicit expressions of the state distributions Pi j (t) at time t. How-
λ0
0
λ1
1 µ1
FIGURE 2.9
λ2
2 µ2
λ3
3 µ3
λ4
...
4 µ4
µ5
State transition diagram for birth-and-death process.
STOCHASTIC PROCESSES
55
ever, we can discuss the asymptotic behaviors of the birth-and-death process. When the birth-and-death process reaches the steady state, the population size has reached “equilibrium.” In other words, the population size has stabilized, or become “con stant.” When this happens, we know Pi j (t) = 0 for all i and j. The steady-state population distribution will also be independent of the initial population distribution. Let P j = lim Pi j (t), t→∞
i, j = 0, 1, 2, . . . .
Using the differential equations (2.167) and (2.168), we have the following steadystate equations: −λ0 P0 + µ1 P1 = 0,
(2.169)
λ j−1 P j−1 − (λ j + µ j )P j + µ j+1 P j+1 = 0,
j = 1, 2, . . . . (2.170)
These equations can be solved in terms of P0 as P1 =
λ0 P0 , µ1
P2 =
λ1 λ1 λ0 P1 = P0 , µ2 µ2 µ1
P3 =
λ2 λ2 λ1 λ0 P2 = P0 , µ3 µ3 µ2 µ1
.. . Pn =
λn−1 λn−1 λn−2 · · · λ1 λ0 Pn−1 = P0 . µn µn µn−1 · · · µ2 µ1
With the following additional requirement on the values of P j for j ≥ 0, ∞
P j = 1,
(2.171)
j=0
we find the state distribution for the steady-state system as follows: −1 ∞ λ j−1 λ j−2 · · · λ1 λ0 , P0 = 1 + µ µ · · · µ µ j j−1 2 1 j=1
Pn =
λn−1 λn−2 · · · λ1 λ0 µn µn−1 · · · µ2 µ1 −1 ∞ λ j−1 λ j−2 · · · λ1 λ0 , × 1 + µ j µ j−1 · · · µ2 µ1 j=1
(2.172)
n ≥ 1.
(2.173)
56
RELIABILITY MATHEMATICS
For the steady-state distribution of the system state to exist, we must have 0 < P0 < 1, that is, λi > 0,
µi+1 > 0
for i = 0, 1, 2, . . . ,
∞
λ j−1 λ j−2 · · · λ1 λ0 < ∞. µ j µ j−1 · · · µ2 µ1 j=1
If the birth-and-death process has a finite state space {0, 1, 2, . . . , N }, we can easily write the Kolmogorov forward equations for this process as
Pi0 (t) = −λ0 Pi0 (t) + µ1 Pi1 (t),
(2.174)
Pi j (t) = λ j−1 Pi, j−1 (t) − (λ j + µ j )Pi j (t) + µ j+1 Pi, j+1 (t),
j = 0, 1, 2, . . . , N − 1,
Pi N (t) = λ N −1 Pi,N −1 (t) + µ N Pi N (t),
(2.175) (2.176)
where i = 0, 1, 2, . . . , N with the initial state distribution given by Pii (0) = 1 and Pi j (0) = 0 for i = j. The limiting distribution of the process as time goes to infinity is −1 N λ j−1 λ j−2 · · · λ1 λ0 if n = 0, 1 + µ j µ j−1 · · · µ2 µ1 j=1 Pn = lim Pin (t) = t→∞ λ · · · λ1 λ0 λ n−1 n−2 P0 if n = 1, 2, . . . , N , µn µn−1 · · · µ2 µ1 (2.177) which is independent of the initial process state i. Example 2.6 (One-Component System) Consider a piece of equipment as a single unit. It may experience two possible states, state 1 (working) or state 0 (failed). The working time of the system follows the exponential distribution with the parameter λ. After the equipment is failed, it is put into repair right away. The repair time follows the exponential distribution with the parameter µ. Given that the equipment is in state 1 at time 0, find the probabilities as a function of time t that the system is in state 0 and state 1. Find the steady-state distribution of the state of the equipment. The state of the equipment can be modeled by a simple birth-and-death process with only two possible states. However, first we have to differentiate the state of the equipment from that of the process. The state of the process represents the number of failures that exist in the equipment. Thus, the state 0 of the process represents the state 1 (the working state) of the equipment while the state 1 of the process represents the state 0 (the failure state) of the equipment. The process will certainly (i.e., with a probability of 1) transfer to the other state after staying at the current state for a
STOCHASTIC PROCESSES
57
λ 0
1 µ
FIGURE 2.10
Transition diagram of birth-and-death process for one-component system.
random amount of time. The process transition diagram is shown in Figure 2.10. The parameters of the process are q01 = q10 = 1,
v01 = λq01 = λ,
v10 = µq10 = µ.
Based on the state transition diagram in Figure 2.10, we can derive the following differential equations:
P0 (t) = −λP0 (t) + µP1 (t),
P1 (t) = λP0 (t) − µP1 (t). Taking the Laplace transform of both sides of the equation and applying boundary conditions, we have sL[P0 (t)] − 1 = −λL[P0 (t)] + µL[P1 (t)], sL[P1 (t)] = λL[P0 (t)] − µL[P1 (t)]. Solving these two equations, we obtain an expression of L[P0 (t)]: L[P0 (t)] =
s+µ µ/(λ + µ) λ/(λ + µ) = + . s(s + λ + µ) s s+λ+µ
An inverse Laplace transform results in P0 (t) =
λ µ + e−(λ+µ)t . λ+µ λ+µ
Since P1 (t) + P0 (t) = 1, we have P1 (t) =
λ λ − e−(λ+µ)t . λ+µ λ+µ
The steady-state distribution of the state of the process can be obtained by letting t → ∞ in the expressions of P0 (t) and P1 (t): P0 =
µ , λ+µ
P1 =
λ . λ+µ
Of course, we must have λ/µ < ∞ for the above results to hold.
58
RELIABILITY MATHEMATICS
The probability P0 (t) (and P0 ), which is equivalent to the probability that the equipment is working, represents the probability that the process is in state 0. The probability P1 (t) (and P1 ), which is equivalent to the probability that the equipment is failed, represents the probability that the process is in state 1. The availability function and the steady-state availability of the equipment are obtained by equation (2.89): µ λ + e−(λ+µ)t , λ+µ λ+µ µ . A = P0 = λ+µ
A(t) = P0 (t) =
2.5 COMPLEX SYSTEM RELIABILITY ASSESSMENT USING FAULT TREE ANALYSIS Reliability analysis for system design is required prior to optimization of system reliability. In cases where a complex multicomponent system has to be dealt with, obtaining system reliability as either an objective function or a constraint function is necessary. Fault tree analysis has been regarded as an effective means of assessing the reliability of the complex system. A fault tree technique analyzes various parallel or sequential combinations of subordinate component faults in a deductive process. This process develops a basis for determining the failure probability that a top- (or system-) level fault can occur. The Boolean logical relationships and symbols of the fault tree permit one to decompose the complexity of a system, then build a graphical structure of the top-level fault as a whole. Example 2.7 Figure 2.11 illustrates a simplified spraying system consisting of a reservoir, a spray, pumps, valves, and piping. Assume that there are no human errors, no common-cause failures among components, and no failure in the pipe connec-
Valves V1
R
Pump P1
Valves V3
Sp
Reservoir
Spray P2 V2
V4 Sa
AC Power
Spray actuating system
AC
FIGURE 2.11
Schematic diagram for simplified spraying system.
COMPLEX SYSTEM RELIABILITY ASSESSMENT USING FAULT TREE ANALYSIS
59
tions. The function of the system is to spray enough water from the reservoir R when only one of the two pumps, P1 or P2, works. All the valves V1 through V4 are normally open. The actuating system senses the demand for the spraying system and automatically starts the spray. The two pumps and spray actuating system utilize the same AC power. The fault tree shown in Figure 2.12 is for the top-level event that no water is sprayed when needed. Table 2.1 lists some event and gate symbols commonly used in the construction of fault trees. Figure 2.12 explains the mathematical relationships of the events involved. It is then possible to solve Boolean expressions of the fault tree shown in Figure 2.12 in terms of the basic events. Starting from the top of the tree, the step-by-step, top-down procedure is given below to obtain minimal cut sets for fault tree evaluation: T = E1 + E2 , E1 = E3 · E4 ,
E2 = Sa + Sp + AC,
E3 = R + V1 + V3 + E5 ,
E4 = R + V2 + V4 + E6 ,
E5 = AC + P1 ,
E6 = AC + P2 .
Finally, the Boolean expression for T reduces to T = Sp + AC + R + Sa + V2 V3 + V1 V2 + V3 V4 + V1 V4 + V3 P2 + V1 P2 + P1 P2 + V2 P1 + V4 P1 . Then, the top event of the fault tree is expressed in terms of the 13 minimal cut sets obtained that involve only basic events. The quantitative evaluation of the fault tree is directed at calculating the probability of occurrence of the top event, T, of the fault tree. That is, the probability that the top event, T, occurs is obtained by inclusion– exclusion with the probability of occurrence of the basic events: Pr(T) = Pr(C1 ∪ C2 ∪ C3 ∪ · · · ∪ C13 ) =
13 n=1
Pr(Cn ) −
12
13
Pr(Cn1 Cn2 ) + · · · + (−1)12 Pr(C1 C2 · · · C13 ),
n 1 =1 n 2 =n 1 +1
where Pr(T) is probability that the top event, T, occurs and C1 = Sp , C2 = AC, C3 = R, C4 = Sa , C5 = V2 V3 , C6 = V1 V2 , C7 = V3 V4 , C8 = V1 V4 , C9 = V3 P2 , C10 = V1 P2 , C11 = P1 P2 , C12 = V2 P1 , and C13 = V4 P1 . For computation of the top event of large fault trees that depict complex systems, a number of computer-aided programs are available. Some of the representative programs and examples are classified by Kumamoto and Henley [127], Modarres [169], and the U.S. Nuclear Regulatory Commission [239].
60
V1 fails to close
V1
R fails to close
R
P1
AC
R
R fails to close
V2
V2 fails to close
V4
AC
AC Fails
V4 fails to close
E4
No water from P2 branch is delivered
P2
P2 fails to function
E6
P2 does not function
Sa
Sa fails
Sp
Sp fails
E2
A spray does not function
Fault tree representation for spraying system shown in Figure 2.11.
P1 fails to function
AC Fails
E5
P1 does not function
FIGURE 2.12
V3
V3 fails to close
E3
No water from P1 branch is delivered
E1
No water is delivered to a spray
T
No water is sprayed when needed
AC
AC fails
COMPLEX SYSTEM RELIABILITY ASSESSMENT USING FAULT TREE ANALYSIS
61
TABLE 2.1 Event and Gate Symbols Event Symbol
Meaning of Symbol
1
Basic component failure event with sufficient data
2
Undeveloped event
3
State of system or component event
Gate Symbol
Gate Name
Meaning of Symbol
4
AND
Output event occurs if all input events occur simultaneously
5
OR
Output event occurs if any one of the input events occurs
3 COMPLEXITY ANALYSIS
System reliability evaluation is a core topic of this book. In recent years, researchers have been working to develop the most efficient algorithms for system reliability evaluation for all kinds of systems. In this section, we describe the terminology and methods used to measure and compare the efficiency of algorithms. In analysis of the efficiency of algorithms, the term time complexity is used to measure the amount of time needed for an algorithm to find the solution to a problem. The term space complexity is used to measure the memory requirement during the execution of the algorithm. Both time complexity and space complexity are generally functions of the size of the problem to be solved. The larger the problem, the longer it will take for an algorithm to solve the problem and, naturally, the larger the storage requirement during execution of the algorithm. In system reliability analysis, the size of the system is often represented by the number of components in the system. Since components may be arranged differently to form different system structures, the time and space complexities are often different for different system structures even if the numbers of components in these different structures are the same. For a specific system structure with size n, different algorithms have been proposed by researchers to evaluate system reliability. The complexities of these algorithms need to be compared. Though space complexity is also a concern, it is often not as important as time complexity. As a result, we will concentrate mainly on time complexity in our coverage of complexity analysis in this book. The concepts and methods to be covered can also be applied to space complexity analysis. Given the computation power of today’s computers, it does not take much time for any algorithm, fast or slow, to find system reliability for small systems. As the system size increases, both time and space complexities increase. It is when the system size is large that the advantages of one algorithm over others can be revealed. 62
ORDERS OF MAGNITUDE AND GROWTH
63
Thus, complexity analysis usually refers to analysis of the efficiencies of algorithms under the condition that n is very large. By including the limit with n → ∞, we can analyze the asymptotic complexity of the algorithm. In other words, we can opt for algorithms whose time and/or space complexities do not increase too fast as the size of the problem increases without bound. With such algorithms, we are able to solve much larger problems within a reasonable amount of time. To examine the complexity of an algorithm, we may translate the algorithm into a computer program and let the program run on a specific computer. The time it takes for the program to solve the problem of a specific size n is one measure of the time complexity of the algorithm; however, it is not the best way. First, this may take a very long time for a relatively large n. Second, many other factors may have distorted the intrinsic complexity of the algorithm—for example, the programming style used, the implementation strategy, the programming language, and the specific computer. An algorithm may be found to be faster or slower than another algorithm depending on the programmer, the language, or the machine. What we are most interested in is the intrinsic quality, the time complexity, of the algorithm, independent of other factors like programmer, language, or machine. We need to be able to measure the time complexity of an algorithm mathematically. To analyze the time complexity of an algorithm, we need to define the time unit to be used. The time needed for an algorithm to solve a problem may be expressed in terms of the number of arithmetic operations (additions, subtractions, multiplications, and divisions). If the number of arithmetic operations is used as the unit of time complexity, we are assuming that each arithmetic operation takes the same amount of time to execute. We know that this assumption introduces errors in measuring time complexity. For example, it is reasonable to say that the addition and the subtraction of the same two numbers take about the same amount of time and the multiplication and the division of the same two numbers take about the same amount of time. However, the multiplication usually takes more time than the addition of two numbers. That is why some authors have expressed the time complexity of an algorithm in terms of both the number of additions (and subtractions) and the number of multiplications (and divisions). Other units used to measure the time complexity of algorithms include the number of function evaluations and the number of comparisons. In this book, we will express the time complexity of algorithms in terms of arithmetic operations, though other measures reported in the literature will also be discussed.
3.1 ORDERS OF MAGNITUDE AND GROWTH For some algorithms, it is relatively easy to express the time complexity in terms of the exact number of arithmetic operations as a function of the size of the problem. However, sometimes, we are satisfied by knowing the upper bound on the time complexity as it may be too difficult to find the exact number of arithmetic operations. Under these circumstances, we use the worst-case scenario, which is the longest possible time needed to execute the algorithm. After we have obtained this upper bound,
64
COMPLEXITY ANALYSIS
we say that the algorithm will take at most this much time to solve the stated problem of size n. When we are interested in the actual time needed for an algorithm to solve a problem, we analyze the order of magnitude of complexity functions. A more important concept in complexity analysis is called the rate of growth or order of growth. Sometimes, we are not so much interested in knowing the exact number of arithmetic operations or the actual time. Instead, we are interested in the rate of growth of the complexity function as the size of the problem n increases. Suppose that it takes exactly 2n 2 arithmetic operations for an algorithm to solve a problem of size n. If we know that such a problem of size n = 10 can be solved with this algorithm in 1 minute, then when n = 50, it will take 2 × 502 /(2 × 102 ) = 25 minutes to solve the problem. In other words, when the problem size increases five times (from n = 10 to n = 50), the time complexity of the algorithm increases 52 = 25 times. This shows that the rate of growth of the time complexity of this algorithm is quadratic. The constant multiplicative 2 in the exact number of arithmetic operations, 2n 2 , is not relevant to the concept of rate of growth. The mathematical definitions given below will provide the language needed in complexity analysis of algorithms for solving a problem of size n. Let f (x) and g(x) be two nonnegative functions of nonnegative variable x. In the following paragraphs, we provide the definitions that enable us to compare the rate of growth and the magnitude of f (x) and g(x). Under the condition that x is large, the conclusions that we may reach with such comparisons include the following: 1. 2. 3. 4.
f (x) grows at most as fast as g(x), f (x) grows at the same rate as g(x), f (x) grows at least as fast as g(x), and f (x) and g(x) have virtually the same magnitude.
If f (x) and g(x) represent the complexity functions of two different algorithms, such conclusions clearly state the relative efficiencies of these two algorithms. Definition 3.1 We write that f (x) = O(g(x)) if there exist positive constants C and x 0 such that f (x) ≤ Cg(x) for all x > x0 . The mathematical symbol O used in Definition 3.1 is read “big oh of.” If f (x) = O(g(x)), we read that f (x) is big oh of g(x), and we mean that when x is larger than a finite value x 0 , f (x) will be always less than g(x) multiplied by a positive constant C. In other words, as long as x is large enough, f (x) does not grow at a rate faster than g(x). It is possible that f (x) grows at the same rate as g(x) or at a slower rate than g(x). When we use the symbol O, we are stating that the growth rate of f (x) is not faster than that of g(x) in the worst case. We also say that g(x) is an asymptotic upper bound of the function f (x). Remember that the lower the rate of growth of an algorithm, the more efficient the algorithm. Consider the following functions:
ORDERS OF MAGNITUDE AND GROWTH
65
f 1 (x) = x 5 + x 2 + sin(x), f 2 (x) = 5x 15 + 2e x , f 3 (x) = 20x 2 + 10,000x. Applying Definition 3.1, we can write the following statements: f 1 (x) = O(x 6 ) or O(x 5 ), f 2 (x) = O(e x ), f 3 (x) = O(x 2 ). Correspondingly, we can say that the growth rate of f 1 (x) is at most x 6 or at most x 5 , that of f 2 (x) is at most e x , and that of f 3 (x) is at most x 2 . The big-oh relationship does not necessarily tell us the exact growth rate of the function f (x). It only tells us that the growth rate of f (x) is at most g(x). It does not tell us the actual magnitude of these two functions either. When we write the O relationship, we usually write g(x) in the simplest possible form and in the smallest possible rate of growth so that g(x) represents a close worst possible rate of growth of f (x). Take the function f 1 (x) as an example. We would rather write it as O(x 5 ) instead of O(x 6 ). The big-oh relationship is the most popular notation in complexity analysis of reliability evaluation algorithms because it is relatively simple to derive an expression of g(x) compared to other complexity notation to be introduced later in this section. The next definition is used when we are interested in knowing the exact rate of growth of certain functions. Definition 3.2 We write that f (x) = (g(x)) if there exist positive constants a1 , a2 , and x0 such that we have a1 g(x) < f (x) < a2 g(x) for all x > x 0 . The statement f (x) = (g(x)) indicates that f (x) and g(x) have the same rate of growth when x is large enough. Again, the relationship does not tell us anything about the actual magnitudes of these two functions. It only tells us about the relative rates of growth of these two functions. To be more specific, it tells us that these two functions have exactly the same rate of growth. We also say that g(x) is an asymptotic tight bound for f (x). Using Definition 3.2, we can say that f 1 (x) has a growth rate of x 5 ; f 2 (x) has an exponential growth rate; and f 3 (x) has a quadratic growth rate. We also say that f 1 (x) and f 3 (x) have polynomial growth rates of different orders. Generally speaking, the relationship is a stronger statement than the O relationship. However, these two relationships may give exactly the same functions on the right-hand side. Again, it is a good practice to write g(x) in the simplest possible form. Any multiplicative constant in g(x) should be removed as we are only talking about the rate of growth of f (x). To find an asymptotic tight bound (the relationship) or the asymptotic upper bound (the O relationship) for a given function f (x), we simply drop the lower
66
COMPLEXITY ANALYSIS
order terms and omit the multiplicative constant of the highest order term. However, we need to check their definitions to verify the derived statements. As we know, the O notation provides an asymptotic upper bound on the rate of growth of f (x). The notation provides an asymptotic lower bound on the rate of growth of f (x). Definition 3.3 We write that f (x) = (g(x)) if there exist positive constants C and x 0 such that f (x) ≥ Cg(x) for all x > x0 . This definition is useful when we are interested in knowing the best-case performance of an algorithm. We often say that the algorithm will take at least (g(n)) time to solve a problem of size n. Here are a few examples of the relationship: f 1 (x) = x 5 + x 2 + sin(x) = (x 5 ), f 2 (x) = 5x 15 + 2e x = (e x ), f 3 (x) = 20x 2 + 10,000x = (x 2 ), ' √ f 4 (x) = 10 + 7x = (x 1/4 ). The three relationships, namely, O, , and , all describe the rate of growth of functions, instead of the actual magnitudes of these functions. Based on their definitions, we have the following theorem. Theorem 3.1 For any two functions f (x) and g(x), we have f (x) = (g(x)) if and only if f (x) = O(g(x)) and f (x) = (g(x)). This result is very useful in finding the asymptotic tight bounds of algorithm complexities. It is often too difficult to find the exact expression of the running time of an algorithm directly in the worst- and best-case scenarios. However, it may be easier to find the asymptotic upper bound and the asymptotic lower bound for the running time. If these two bounds have the same growth rate, then the tight bound of the algorithm also has this growth rate. The following definition is used to indicate the magnitudes of two functions instead of the growth rates. Definition 3.4 We write f (x) ∼ g(x) if and only if lim
x→∞
f (x) = 1. g(x)
The statement f (x) ∼ g(x) indicates that f (x) and g(x) not only have the same rate of growth but also are virtually equal to each other in magnitude when x is large
ORDERS OF MAGNITUDE AND GROWTH
67
enough. As a result, in terms of time complexity, g(x) can be used to represent the time (e.g., the number of arithmetic operations) needed for the algorithm of complexity f (x) to solve a problem of size x when x is very large. The following are a few example functions f (x) and their equivalent functions g(x) when x is large: f 1 (x) = x 5 + x 2 + sin(x) ∼ x 5 , f 2 (x) = 5x 15 + 2e x ∼ 2e x , f 3 (x) = 20x 2 + 10,000x ∼ 20x 2 , ' √ f 4 (x) = 10 + 7x ∼ (7x)1/4 . For these examples, we have to keep the multiplicative constant in the g(x) expression. When x is large enough, g(x) can be used to represent the exact complexity function f (x). Sometimes O, , , and ∼ have the same g(x) form. However, generally speaking, the ∼ provides a stronger statement than , which in turn provides a stronger statement than O and . In the following, we consider a few simple functions of x and examine the rates of growth as x increases. We will list these functions from lower to higher growth rate: 1. The double-logarithm function log2 log2 (x) grows much more slowly than x. For example, when x = 216 = 65, 536, log2 log2 (x) = 4 and when x = 232 = 4, 294, 967, 296, log2 log2 (x) = 5. 2. The logarithm function ln x grows faster than the double-logarithm function but more slowly than x. For example, when x = 210 = 1024, log2 x = 10, and when x = 2100 ≈ 1.2677 × 1030 , log2 x = 100. 3. The polynomial functions of x, for example, x 0.01 , x, x 2 , and x 100 , are the next group of functions that do not grow very fast in x. All polynomial functions grow faster than logarithm functions. This can be easily verified by taking the limit of the ratio between the polynomial function and the logarithm function as x goes to infinity. A polynomial function may grow faster or more slowly than x depending on the order of the polynomial function. Polynomial time algorithms are called fast algorithms as today’s computers have no problem executing such algorithms within a reasonable amount of time. 4. Exponential functions grow faster than polynomial functions. When x is large, today’s computers have difficulty handling algorithms with exponential complexity functions. Algorithms that have complexity functions which grow faster than polynomial functions are considered slow. A problem that takes exponential time to solve is considered difficult. If f (x) = (x), we say that f (x) has a linear rate of growth and the corresponding algorithm has a linear complexity function. For f (x) = (x 2 ) and
68
COMPLEXITY ANALYSIS
f (x) = (x 3 ), we say that f (x) has a quadratic or cubic rate of growth, respectively. Problems that can be solved with polynomial time algorithms are called easy problems. These problems are sometimes collectively referred to as class P problems. There is another class of problems called NP (nondeterministic polynomial) problems. We will intuitively discuss what they are. For rigorous mathematical definitions and more detailed discussions, readers are referred to Wilf [243]. A problem is an NP problem if there exists a polynomial time algorithm that can verify whether a claimed solution is a solution of the problem; this is not the same as finding an algorithm to solve the problem. It is easy to find a solution to a problem in class P while it is easy to check a solution to a problem in class NP. A solution to an NP problem may be very difficult to find, however. There are many problems that belong to the class NP, for example, the graph-coloring problem and the traveling-salesman problem. Nobody knows if a polynomial time algorithm exists for finding a solution to these problems. It is believed that there are no polynomial time algorithms for solving NP problems. Although no one has proved it, we feel that these problems are inherently intractable. Within the NP class of problems, there is a subclass called NP complete. A problem is an NP-complete problem if it belongs to NP and every problem in NP can be reduced to it in at most polynomial time. For example, a system of linear equations, AX = b, where | A | = 0, where | A | is the determinant of the matrix A and A is nonsymmetric, can be reduced to BX = c, where B = A T A and c = AT b, in polynomial time. The new system of linear equations has a symmetric coefficient matrix, B. We can say that NP-complete problems represent the hardest problems in the class NP. If we can find a polynomial time algorithm for solving an NP-complete problem, then we have found a polynomial time algorithm for solving all problems in NP. Conversely, if we have proved that there does not exist a polynomial time algorithm for an NP problem, then no polynomial time algorithms exist for any NP-complete problem. The traveling-salesman problem and the graph-coloring problem are both NP-complete problems. The term NP hard has been used to indicate problems that are as hard as NP-complete problems. Thus, when we say that a problem is NP hard, we mean that the best algorithm known for solving it has at least an exponential time complexity. In system reliability evaluation, we are interested in finding algorithms that have polynomial complexity functions. Algorithms with exponential complexity functions are not desirable. Even though there are polynomial time algorithms available, researchers may still try to develop algorithms that have even lower order polynomial complexity functions. When there are no algorithms with polynomial complexity functions available, we have to use the best exponential time algorithms. However, we are always in the process of searching for more efficient algorithms. In our algorithm analysis in this book, the only notation that will be used are “O,” “,” and “ .” Another similar notation but used for a different purpose is the little “o.” We will not use “o” for complexity analysis at all. Instead, the little “o” will be used to indicate quantity of a much smaller magnitude relative to a quantity that approaches to zero. The little “o” definition will be provided when needed.
EVALUATION OF SUMMATIONS
69
3.2 EVALUATION OF SUMMATIONS In analysis of the time complexity of algorithms, we are interested in the execution time of the algorithm as a function of the size of the problem. As mentioned earlier, we use the number of arithmetic operations to be performed to indicate the execution time of the algorithm. Consider the following algorithm for finding the product of two n × n matrixes: ProductMatrix(A[n, n], B[n, n]) For i = 1 To n By 1 Do For j = 1 To n By 1 Do C[i, j] = 0; For k = 1 To n By 1 Do C[i, j] = C[i, j] + A[i, k] ∗ B[k, j]; EndFor EndFor EndFor Return C[n, n]; End
Type of Operations Assignment Assignment Assignment Assignment Assignment
Number of Operations n n2 n2 n3 2n 3
In this algorithm, there are assignment operations in which a value is assigned to a variable. There are also arithmetic operations, namely additions and multiplications. There are three nested loops. If we assume that each operation, be it an assignment or arithmetic, takes the same amount of time, we can simply add up the total number of operations for the algorithm to find the product of two square matrixes: f (n) = n + n 2 + n 2 + n 3 + 2n 3 = 3n 3 + 2n 2 + n. As a result, we can write f (n) = O(n 3 ), f (n) = (n 3 ), f (n) = (n 3 ), and f (n) ∼ 3n 3 . The algorithm has a cubic complexity function. If we assume that an assignment operation takes much less time than an arithmetic operation (this is very often the case) and count only the number of arithmetic operations, we will get the following expression of the running time: f (n) = 2n 3 . As a result, we can write f (n) = O(n 3 ), f (n) = (n 3 ), f (n) = (n 3 ), and f (n) ∼ 2n 3 . We still conclude that the algorithm has a cubic complexity function. The omittance of the assignment operations does not change the order of growth of the complexity of the algorithm. The order of magnitude is changed by a constant factor. In the algorithm for evaluation of the product of two square matrices, there are three nested loops with each loop executed n times. Each execution of the equation for calculating C[i, j] requires two arithmetic operations. Thus, the total number of arithmetic operations can be calculated as
70
COMPLEXITY ANALYSIS n n n
2 = 2n 3 .
i=1 j=1 k=1
Most algorithms involve some sort of For, Do, or While loop. This means that we often need to add the number of operations of all these loops in order to find an expression for the running time of the algorithm. In this section, we describe some common summation series for which we can find an exact expression of the sum. The techniques used in finding these expressions will be outlined. Consider an arithmetic series with a1 , a2 , . . . , an , where ai+1 − ai is constant for i = 1, 2, . . . , n − 1. The summation of such a series of numbers is given below: n
ai = a1 + a2 + · · · + an =
i=1
n(a1 + an ) . 2
(3.1)
When a1 = 1 and ai+1 − ai = 1 for all i, we have n
i = 1 + 2 + ··· + n =
i=1
n(n + 1) = (n 2 ). 2
(3.2)
Consider a geometric series, 1, x, x 2 , . . . , x n , where x = 1. The summation of this series is given below: n
xi = 1 + x + x2 + · · · + xn =
i=0
x n+1 − 1 . x −1
(3.3)
When | x | < 1 and n → ∞, we have ∞
xi =
i=0
1 , 1−x
| x | < 1.
(3.4)
When x is a constant greater than 1, we can write n
x i = (x n+1 ).
(3.5)
i=0
To find the exact expression of a quadratic series, f (n) = 1 + 22 + 32 + · · · + n 2 =
n
i 2,
(3.6)
i=1
we can start with a summation of a higher order series, namely a cubic series, and use the following mathematical manipulations:
EVALUATION OF SUMMATIONS n
i3 =
i=1
n n−1 n−1 (i − 1 + 1)3 = ( j + 1)3 = ( j 3 + 3 j 2 + 3 j + 1) i=1 n−1
=
n−1
j 3 + 3
j=0
=
n
j=0
n
n
j 2 + 3
j 3 − n3 + 3
j=0 n−1
j + n
j=0 n
j 2 − 3n 2 +
j=1
i3 + 3
i=1
f (n) =
j=0
j=1
=
71
n i=1
i2 =
i=1
i2 −
3n(n − 1) +n 2
n(n + 1)(2n + 1) , 2
(3.7)
n(n + 1)(2n + 1) . 6
(3.8)
In this example, we start with a summation of the cubic series and then cancel the third-order terms from both sides of the equation. Finally, we find the summation of the quadratic series. Another technique that can be used to find the expression of the summation of a series is to multiply the series by a constant to create the summation of a new series and then subtract them from each other, as illustrated with the following example: f (n) =
n
i3i = 1 × 31 + 2 × 32 + 3 × 33 + · · · + n × 3n .
(3.9)
i=1
Multiplying both sides of equation (3.9) by 3, we have 3 f (n) = 1 × 32 + 2 × 33 + 3 × 34 + · · · + (n − 1) × 3n + n × 3n+1 .
(3.10)
Subtracting equation (3.9) from equation (3.10), we have 2 f (n) = −31 − 32 − 33 − · · · − 3n−1 − 3n + n × 3n+1 3n+1 − 3 + n × 3n+1 , 2 2n − 1 3 f (n) = × 3n+1 + . 4 4 =−
(3.11) (3.12)
Another powerful method for evaluation of the sum of a series is to use a generating function. Suppose that we need to evaluate the sum of the series of numbers a0 , a1 , a2 , . . . , an . We can construct the following generating function of this series of numbers: g(x) = a0 + a1 x + a2 x 2 + a3 x 3 + · · · + an x n =
n k=0
ak x k .
(3.13)
72
COMPLEXITY ANALYSIS
Many properties of the original series can be obtained through analysis of this generating function. Differentiation and integration techniques can be used to manipulate the generating function. We will illustrate this technique with a few examples. The summation series shown in equation (3.9) may be considered as a special case of the following generating function with x = 3: f (n) =
n
i xi = x
i=1
n
i x i−1 = x
i=1
n
(x i ) ,
(3.14)
i=1
n f (n) i x i−1 . = x i=1
(3.15)
The summation of the geometric series with x = 1 is given in equation (3.3). Now differentiating equation (3.3) with respect to x, we have n i=1
i x i−1 =
nx n+1 − (n + 1)x n + 1 . (x − 1)2
(3.16)
Comparing equations (3.15) and (3.16), we have f (n) = x ×
nx n+2 − (n + 1)x n+1 + x nx n+1 − (n + 1)x n + 1 = . 2 (x − 1) (x − 1)2
(3.17)
Assigning different values to x in equation (3.17), we obtain the expressions of different summations. For example, when x = 2, we have f (n) =
n
i2i = 2 + 2 × 22 + 3 × 23 + · · · + n × 2n = (n − 1)2n+1 + 2.
i=1
(3.18) When x = 3, we have f (n) =
n
i3i = 3 + 2 × 32 + 3 × 33 + · · · + n × 3n
i=1
=
2n − 1 n+1 3 + . 3 4 4
(3.19)
The technique of integration can also be used to help us find expressions of some summations. Take the infinite geometric series shown in equation (3.4) as an example. By integrating both sides of equation (3.4) with respect to x, we find the following: ∞ i x x2 x3 1 =x+ + + · · · = ln . i 2 3 1−x i=1
(3.20)
BOUNDING SUMMATIONS
73
3.3 BOUNDING SUMMATIONS Sometimes, we do not need, or it is too difficult, to find the exact expression of a summation. In these situations, we are interested in bounding the summation. A few techniques that are useful in finding the bounds of summations are described in this section. Each term in the summation series may be replaced by a larger term in order to find an upper bound on the summation, as shown in the following example: f (n) =
n
i2 ≤
i=1
n
n 2 = n 3 = O(n 3 ).
(3.21)
i=1
In this example, we replaced each term by the largest term in the series. Comparing to the exact expression for the same summation series given in equation (3.8), we know that we have obtained a tight upper bound of the summation series. This technique is very useful and simple to use in bounding summations. However, it does not guarantee tight upper bounds in all circumstances. Consider the harmonic series shown below: f (n) =
n 1 i=1
=1+
i
1 1 1 + + ··· + . 2 3 n
(3.22)
Applying the same bounding technique, we have f (n) =
n 1 i=1
i
≤
n
1 = n = O(n).
i=1
Based on this expression, we can say that f (n) increases at most as a linear function of n. However, it can be shown that f (n) actually increases much more slowly than that. In fact, it increases at the rate of a logarithmic function of n, which will be verified later in this section: f (n) =
n 1 i=1
i
= (ln n).
Instead of using the largest term to replace every term in a summation series, we may divide the series into many intervals. If we are interested in finding an upper bound, we then use the largest term within each interval to replace all the terms in the same interval. If we are interested in finding a lower bound, we use the smallest term within each interval to replace all the terms in the same interval. We expect to find tighter bounds by splitting a summation series into more intervals. We have found an upper bound on the summation of a quadratic series in equation (3.21) by bounding individual terms. In the following, we derive a lower bound for the same series by dividing the summation series into two intervals. For each interval, we replace each term by a value that is even smaller than the smallest term in the
74
COMPLEXITY ANALYSIS
same interval: f (n) =
n
i2 =
i=1
≥
n/2 i=1
n 3 2
n
i2 +
i2 ≥
i=n/2+1
n/2
n
0+
k=1
k=n/2+1
= (n 3 ),
n 2 2 (3.23)
where n/2 indicates the largest integer less than or equal to n/2. Considering both equations (3.21) and (3.23) and using Theorem 3.1, we conclude the following: f (n) =
n
i 2 = (n 3 ).
i=1
This shows that we can use bounding techniques to find the exact rate of growth of a summation series. Another method for finding upper and lower bounds of a summation is the use of integration. Consider the following summation: f (n) =
n
a(i) = a(1) + a(2) + · · · + a(n),
(3.24)
i=1
where a(i) has to be a monotonically increasing function of i. This summation represents the sum of the function a(i) evaluated at points i = 1, i = 2, . . . , i = n. Since these points are consecutive positive integers, we can view this summation as an (over) estimation of the area under the curve of the continuous function a(x) in the interval from x = 0 to x = n. Since a(x) is a continuous function, the area under the curve of a(x) from point x = 0 to x = n will provide a lower bound and the area under the curve of a(x) from point x = 1 to x = n + 1 will provide an upper bound to the summation given in equation (3.24) as illustrated in Figure 3.1.
a(x) a(x-1)
0
1
2
FIGURE 3.1
n-1
n
n+1
x
Use of integrals to bound a summation series.
RECURRENCE RELATIONS
75
In Figure 3.1, the squares represent the terms to be added together; the areas under a(x) and a(x − 1) represent the two integrals, which provide the lower and the upper bounds to the summation: n n+1 n+1 n a(x) d x = a(x − 1) d x ≤ a(i) ≤ a(x) d x. (3.25) 0
1
i=1
1
This technique requires that a(i) be a monotonically nondecreasing function of i. In algorithm analysis, every so often we do have a(i) satisfying this requirement. This technique can also be modified to handle monotonically decreasing functions. Consider the summation of a quadratic series given in equation (3.6): n
i ≥
i=1 n i=1
n
2
0
i2 ≤
x2 dx =
n+1
1
n3 = (n 3 ), 3
x2 dx =
(n + 1)3 1 n 3 + 3n 2 + 3n − = = O(n 3 ). 3 3 3
This concludes that n
i 2 = (n 3 ).
i=1
Take the harmonic series in equation (3.22) as another example. Since the harmonic series is a monotonically decreasing function of i, we need to modify our approach of using the integration method: n 1 i=1
i
n 1 i=1
i
≥
n+1 1
≤1+
1 d x = ln(n + 1) = (ln n), x n
1
1 d x = 1 + ln n = O(ln n). x
Thus, n 1 i=1
i
= (ln n).
3.4 RECURRENCE RELATIONS As will be seen later in the book, some algorithms make recursive calls to themselves. These algorithms are called recursive algorithms. The running time of a recursive algorithm can be described by a recurrence relation. A recursive relation, or simply
76
COMPLEXITY ANALYSIS
a recurrence, describes a function in terms of itself with smaller input values. An example of a recursive algorithm is for finding the factorial of n, denoted F(n): F(n) = n F(n − 1),
(3.26)
with a boundary condition F(0) ≡ 1. To find the factorial of n, we simply multiply the factorial of n − 1 by n. In each step of applying equation (3.26), we need to perform a single arithmetic operation, multiplication. It takes n steps to reach the boundary condition F(0). If we use T (n) to represent the running time of calculating the factorial of n, then we have the following recursive relation for calculating T (n): T (n) = T (n − 1) + 1. We can easily solve this equation to find the explicit expression T (n) = n = (n). In complexity analysis of recursive algorithms, we often see the following type of recursive relation describing the running time of the algorithm: T (n) = aT
n b
+ f (n),
(3.27)
where T (n) represents the running time of the algorithm for solving a problem of size n, a ≥ 1 and b > 1 are constants, f (n) is a function of the problem size n, and T (n/b) represents the running time of the algorithm for solving a problem of reduced size n/b. Equation (3.27) is often used to represent the running time of a divide-and-conquer algorithm. In a divide-and-conquer algorithm, the problem of size n is divided into a subproblems of size n/b each. We need to solve each of the a subproblems separately and then combine the solutions of these a subproblems to obtain the solution to the original problem of size n. The time needed to combine the solutions of the a subproblems is represented by f (n), and T (n/b) represents the time needed to solve each subproblem of size n/b. The recurrence relation shown in equation (3.27) needs to be solved in order to find an explicit expression for the running time of the recursive algorithm for solving a problem of size n. In this section, we will introduce a few techniques that can be used to solve recurrences similar to the one shown in equation (3.27). Before introducing the techniques, let’s comment on some assumptions that will be used in our analysis. First, the sizes of problems are usually integers. Thus, n and n/b in equation (3.27) are all supposed to be integers. One of the ways to make sure that this condition is satisfied is to assume that n is a certain power of b such that we are always dealing with an integer no matter how many times the recursive algorithm is called until a boundary condition can be utilized. Another way is to make sure that the sizes of the subproblems are all integers when the divide part of the divide-andconquer algorithm is used. For example, a problem of size n may be divided into two subproblems (a = 2) of sizes n2 and n2 , respectively (b = 2), where n2 is the smallest integer larger than or equal to n2 . In this case, the running time of the
77
RECURRENCE RELATIONS
algorithm can be expressed as T (n) = T
( n ) 2
+T
* n + 2
+ f (n).
(3.28)
When necessary, we will deal with the recurrence relation shown in equation (3.28) to handle the integer problem size requirement. However, in most cases, we can ignore the floors and/or ceilings in our analysis of recurrence relations. Second, the boundary condition usually specifies the running time of the algorithm when n = 1 and this running time is often a constant, that is, T (1) = (1). We will relax this boundary condition to that the running time of any algorithm is constant for sufficiently small problems, that is, T (n) = (1) for sufficiently small n.
3.4.1 Expansion Method The most natural method for solving a recurrence relation is to expand it until the boundary condition is reached. From such an expansion, we will get a summation series that depends on the size n of the original problem and the boundary condition. We can then use the techniques covered in the section on summations to find an asymptotic expression on the running time of the algorithm. Example 3.1 Consider the following recurrence relation: T (n) = 2T
n 2
+ n3.
(3.29)
Assuming that n is a high power of 2, we can expand the recurrence as follows: n n 3 + 2T 2 2 4 , n 3 n 3 n +2 + 2T = n3 + 2 2 4 8 n 3 n 3 n = n3 + 2 + 22 2 + 23 T 3 . 2 2 2
T (n) = n 3 + 2T
n
= n3 + 2
We can continue this expansion process until we reach T (1). If we denote the last term by 2 K T (n/2 K ), then it reaches T (1) when n/2 K = 1, or equivalently, K = log2 n. As a result, there are K + 1 terms in the final summation series: n 3
n 3 3 n 3 3 n K −1 + 2 + · · · + 2 + 2 K T (1) 2 22 23 2 K −1 2 log2 n−1 1 1 1 3 3 3 =n + n + ··· + n 3 + 2log2 n (1) n + 4 4 4
T (n) = n 3 + 2
+ 22
78
COMPLEXITY ANALYSIS
=n
3
log 2 n−1 i=0
1 4
i
4n 3 + n (1) = 3
1 1 − 2 + n (1) n
= (n 3 ). i The summation series in this example has decreasing values, that is, 14 decreases with i. If we are interested only in an asymptotic upper bound of T (n), we can change the finite summation into infinite summation, as shown below: T (n) = n 3
log 2 n−1 i=0
=
1 4
i + n (1) ≤ n 3
∞ i 1 i=0
4
+ n (1)
4 3 n + n (1) = O(n 3 ). 3
Example 3.2 Now consider another example where we do not require that n be a high power of 2: ( n ) T (n) = 3T +n 2 ( n ) ( n ) =n+3 + 3T 2 4 ( n ) ( n ) (n ) +9 + 3T =n+3 2 4 8 (n ) (n ) ( n ) ( n ) =n+3 +9 + 27 + 3T 2 4 8 16 ( ) ( ) ( ) ( n ) 1 n 2 n 3 n 4 +3 +3 +3 T . (3.30) =n+3 1 2 3 2 2 2 24 The (i + 1)st term in the above summation series for i = 0, 1, 2, . . . , K is 3i n/2i and we reach T (1) when n/2 K +1 = 1, or equivalently, K = log2 n − 1. Noting T (1) = (1), we can write the above summation as T (n) =
K
3i
(n)
i=0
=
2i
log 2 n−1
3i
+ 3 K +1 T (1) (n) 2i
i=0
≤n
log 2 n−1 i=0
3 2
+ 3log2 n (1)
i + 3log2 n (1)
= 2n log2 3 − 2n + n log2 3 (1)
RECURRENCE RELATIONS
79
= O(n log2 3 ) = O(n 2 ). From the covered two examples, we summarize the following observations: 1. T (1) requires (1) time. 2. The expansion method may involve a lot of algebraic manipulations. We need to use summation evaluation and bounding techniques covered earlier to find or bound the expression of T (n). 3. In expanding the recurrence relations, two questions need to be answered. How far do we need to expand to reach T (1)? This will tell us how many terms we have in the summation series. The other question is to find the number of T (1) terms in the summation. We need to know the factor in front of T (1). A recursion tree may be used to help us visualize the expansion process and answer
n3
n3
n 2
n 4 3
3
n 2
n 4
n 8
3
n 8
3
3
n 4
n 8
3
n 8
3
3
3
n 8
n 2 2
n 4 3
n 8
3
3
n 8
3
...
T (1)
n= T (n) = n 3 + 2
n 3
2
+4
n 3 4
3
n 4 4
3
n 8 8
3
...
...
n 8
3
nT (1)
T (1)
2k , k
= log2 n
. . . . + 2(log2 n)−1
n
2(log2 n)−1
3 + nT (1)
(log2 n)−1 2 1 1 + n (1) + ··· + 4 4 4n 3 1 = 1 − 2 + n (1) = (n 3 ). 3 n
=n
3
1+
FIGURE 3.2
1 + 4
Development of recursion tree for T (n) = 2T (n/2) + n 3 .
80
COMPLEXITY ANALYSIS
these two questions. Figure 3.2 shows a recursion tree for the recurrence relation given in equation (3.29). The number of levels in the completely expanded tree is equal to log2 n + 1 and in level i for 1 ≤ i ≤ log2 n + 1, the number of terms to be added is equal to 2i−1 . At the bottom of the tree, there are 2log2 n = n “leaves” or terms of T (1). The summation of the series is also shown in Figure 3.2. 4. If we have a decreasing geometric series, as shown in Example 3.1, we may change the finite summation into an infinite summation to find an upper bound on T (n). 5. Through the process of expanding and summing the terms, we may be able to guess the function form in the final O notation. Next we will show another method that can be used to prove whether a guessed form is correct.
3.4.2 Guess-and-Prove Method The guess-and-prove method can be applied when we can guess the solution of a recurrence relation. After a guess is obtained, we then use mathematical induction to prove the guess is correct. This method can be used to obtain asymptotic upper bounds, asymptotic lower bounds, and asymptotic tight bounds. Because of the nature of this method, it is applicable only when we have a guess of the solution of the recurrence relationship under consideration. Consider the recurrence relation n + n. (3.31) T (n) = 2T 2 Suppose that we guess that the solution is T (n) = O(n log2 n). We need to verify that T (n) ≤ cn log2 n for a fixed constant c and all large n values. The standard procedure for using mathematical induction to prove a statement is to show (a) it is correct for n = i + 1 assuming that it is correct for n = i and (b) it is correct for n = 1. This procedure may be modified based on the statements to be proved. Sometimes, we may assume that it is correct for n = i and prove that it is correct for n = 2i. In addition, based on the definition of the O relationship, we only need to show that T (n) ≤ cn log2 n for n > n 0 , where n 0 is a positive constant. Thus, we do not have to prove that the relationship is correct for the initial condition of n = 1 as long as we can prove that it is correct for a small enough n value, say n 0 . In this proving process, we need to identify the c value such that it is a fixed constant. To prove that T (n) = O(n log2 n) for the recurrence in equation (3.31), we need to prove T (n) ≤ cn log2 n for all n ≥ n 0 and c is a constant. Assuming that T (i/2) ≤ c(i/2) log2 (i/2), we need to prove that T (i) ≤ ci log2 i: i i i T (i) = 2T + i ≤ 2c log2 + i = ci log2 i − ci + i 2 2 2 ≤ ci log2 i
for c ≥ 1.
RECURRENCE RELATIONS
81
This proves that T (i) ≤ ci log2 i for c ≥ 1 given that T (i/2) ≤ c log2 (i/2). Now we need to verify whether an initial condition is satisfied or not. First let us see if T (1) is less than or equal to c × 1 × log2 1 = 0. Apparently, this cannot be satisfied since the computation time of a problem of size 1 cannot be equal to 0 or less. However, we will try T (2): T (2) ≤ c × 2 log2 2 = 2c. We can certainly select a large enough c value such that T (2) ≤ 2c. This finishes our proof that T (n) = O(n log2 n). Next we will examine the recurrence relation given in equation (3.29). As we have verified using the expansion method, the solution to this recurrence is T (n) = (n 3 ). In the following, we will use the guess-and-prove method to show the following two relationships: T (n) = O(n 3 ),
T (n) = (n 3 ).
Once these two relations are proved, we know that T (n) = (n 3 ) according to Theorem 3.1. First, we will prove T (n) ≤ cn 3 for all n > n 0 for a certain constant c. Given T (i/2) ≤ c(i/2)3 , we need to prove that T (i) ≤ ci 3 : 3 i i c 3 3 T (i) = 2T + i3 = 1 + + i ≤ 2c i ≤ ci 3 , 2 2 4 as long as 1 + c/4 ≤ c, that is, c ≥ 4/3. Now check the initial condition to see if we have T (1) ≤ c × 13 = c. It is obvious that we can choose c large enough to make sure that T (1) ≤ c. This proves that T (n) = O(n 3 ). Next we try to prove T (n) ≥ cn 3 for all n ≥ n 0 for a certain constant c. Given T (i/2) ≥ c(i/2)3 , we need to prove that T (i) ≥ ci 3 : 3 i c 3 i 3 + i3 = 1 + + i ≥ 2c i ≥ ci 3 , T (i) = 2T 2 2 4 as long as 1 + c/4 ≥ c, that is, c ≤ 4/3. Now check the initial condition to see if we have T (1) ≥ c. It is obvious that we can choose c small enough such that T (1) ≥ c. This proves that T (n) = (n 3 ). As a result, we have proved that T (n) = (n 3 ). There are two points that we should keep in mind when using the guess-and-prove method: 1. To prove an upper bound on T (n), we may use a more general form of hypothesis of mathematical induction. The more general form may include all the lower order terms. For example, if we need to prove that T (n) = O(n 3 ), the general form of the mathematical hypothesis to be proved may be written as T (n) ≤ cn 3 + dn 2 + en + f , where c > 0, d, e, and f are constants. In our examples above, we have chosen the constants of the lower order terms to be zero. For example, to prove that T (n) = O(n) for recurrence relation T (n) = 2T (n/2) + 1, we may have to use the hypothesis T (n) ≤ cn − b. It is left to the reader to verify this. 2. For a given recurrence relation, we may use variable substitution to simplify the derivation of the bounds on execution time. For example, for T (n) =
82
COMPLEXITY ANALYSIS
√ 2T ( n) + 2, we can substitute n = m 2 to have a new recurrence relation, namely, T (m 2 ) = 2T (m) + 2. This will generally simplify the process of deriving and proving the bounds equations. 3.4.3 Master Method As stated earlier, a “divide-and-conquer” recursive algorithm often has a recurrence relation similar to the one shown in equation (3.27), which represents the running time of the algorithm. The following master theorem can be used to solve this type of recurrence. Theorem 3.2 Consider the recurrence relationship of the form n + f (n), T (n) = aT b where a ≥ 1 and b > 1 are constants, f (n) is a function of n, T (n) is defined on positive integers, and n/b is interpreted as either n/b or n/b. Then, the asymptotic bounds of T (n) are available under the following conditions: 1. If f (n) = O(n logb a− ) for some constant > 0, then T (n) = (n logb a ). 2. If f (n) = (n logb a ), then T (n) = (n logb a log2 n). 3. If f (n) = (n logb a+ ) for some constant > 0 and a f (n/b) ≤ c f (n) for some constant c < 1 and all sufficiently large n, then T (n) = ( f (n)). The proof of this theorem is omitted here. Interested readers may refer to Cormen et al. [58]. However, we will expand the recurrence relations to help us better understand this theorem: n n n T (n) = aT + f + f (n) = a aT + f (n) b b b2 n n 2 = f (n) + a f +a f b b2 n n logb n−1 + a3 f f + · · · + a + a logb n T (1) b3 blogb n−1 log b n−1 n ai f . (3.32) = n logb a (1) + bi i=0 There are two terms in the final expression of T (n) shown in equation (3.32). The order of magnitude of T (n) then depends on the relative rate of growth of these two terms. The first term is a polynomial function n logb a while the second term is a summation of the products of a i and f (n/bi ) for 0 ≤ i ≤ logb n − 1. From examination of the three cases covered in the Theorem 3.2, we see that the rate of growth of f (n) is compared to the function n logb a . In case 1 of the theorem, f (n)
SUMMARY
83
grows much more slowly than n logb a by a polynomial factor of n for > 0; we say that f (n) grows polynomially more slowly than n logb a . The rate of growth of T (n) is determined by the first term in T (n), that is, T (n) = (n logb a ). In case 2 of the theorem, f (n) has the same rate of growth as n logb a and T (n) = (n logb a log2 n). In case 3 of the theorem, f (n) grows polynomially faster than n logb a and a f (n/b) ≤ c f (n) for c < 1 and n large enough. In this case, T (n) has the same growth rate as f (n). This theorem covers only three cases. Apparently, there are other cases that it does not cover. If none of the three conditions in the theorem is satisfied, we have to use other methods to find the bounds for T (n). The cases that are not covered by the theorem include f (n) grows faster than n logb a but not polynomially faster and f (n) grows more slowly than n logb a but not by a factor of a polynomial function n . In the following, we present examples to illustrate how to use Theorem 3.2 to find bounds for some recurrence relations. Consider the following recurrence relation, which has already been solved using the expansion method: n T (n) = 2T + n3. 2 In this example, a = 2, b = 2, and f (n) = n 3 . We have n logb a = n log2 2 = n. It is 3 obvious that f (n) = n 3 = (n 1+ ) for 0 < ≤ 2 and a f (n/b) = 2 12 n ≤ cn 3 for c = 12 < 1 and n > 0. Thus, case 3 of Theorem 3.2 applies and T (n) =
( f (n)) = (n 3 ). Consider the following recurrence relation, which has also been studied with the expansion method: ( n ) T (n) = 3T + n. 2 In this recurrence relation, we have the floor operator. However, as we have mentioned in Theorem 3.2, we can also use the theorem to solve cases with floors or ceilings. In applications of the theorem, the floor or ceiling operators can be simply ignored. In this example, we have a = 3, b = 2, f (n) = n, and n logb a = n log2 3 . We can write f (n) as f (n) = n = O(n log2 3−log2 1.2 ). Here we have selected = log2 1.2 > 0. As a result, case 1 of Theorem 3.2 applies and T (n) = (n log2 3 ).
3.5 SUMMARY In this section, we have provided the language needed for complexity analysis of algorithms; namely, we have provided definitions for measuring the running time of algorithms. These definitions include the asymptotic upper bounds (O), asymptotic lower bounds (), asymptotic tight bounds ( ), and equivalence (∼). The rates of
84
COMPLEXITY ANALYSIS
growth of some common functions are compared. We have also covered techniques for finding expressions of summations and bounding summations. Three methods for the finding running time of recursive algorithms are also discussed. For advanced discussion of algorithm analysis, readers are referred to Knuth [117], Cormen et al. [58], and Manber [163]. The materials covered in this chapter will be useful in later analysis of reliability evaluation algorithms.
4 FUNDAMENTAL SYSTEM RELIABILITY MODELS
The term system is used to indicate a collection of components performing a specific function. For example, an overhead projector can be called a system with the function of projecting images on a transparency on to a screen. Its major components include one or more light bulbs, an electric fan, a power switch, and a knob for adjusting focus. A computer system performs a range of functions such as computing, data processing, data input and output, playing music and movies, and others. It consists of the following major components: a computer unit, a monitor, a keyboard, a mouse, a printer, and a pair of speakers. The computer unit as a component of the computer system can also be treated as a system by itself. It is the heart of the computer system. It consists of one or more central processing units (CPUs), a motherboard, a display card, disk controller cards, hard and floppy disk drives, CD-ROM drives, a sound card, and possibly other components. How a system is decomposed into components depends on several factors. From a practical point of view, a component usually refers to a part that is readily available on the market and can be replaced economically upon its failure. For example, when a computer system is down due to the failure of the computer unit, we rarely replace the whole computer unit. Instead, effort is made to identify the part inside the computer unit that has caused the computer unit to fail. For example, a hard disk may be replaced to make the computer unit work again. For a complicated system, we often use hierarchical decompositions in system reliability evaluation. Thus, in our discussions in this book, a component may be further decomposed into a collection of other components. The performance of a system depends on the performances of its components, and the importance of the components may differ. In system reliability analysis, we are interested in finding the relationship between component reliabilities and system reliability. 85
86
FUNDAMENTAL SYSTEM RELIABILITY MODELS
In this chapter, we introduce the concepts useful in system reliability analysis. Reliability block diagrams are used to represent system structures. Fundamental system structures are introduced and their performance measures are analyzed. Components subjected to the common-cause failures are not addressed here, but can be referenced in Kuo et al.[132].
4.1 RELIABILITY BLOCK DIAGRAM A reliability block diagram is often used to depict the relationship between the functioning of a system and the functioning of its components. Take the overhead projector as an example. Suppose that there are two light bulbs in the system. A reliability block diagram of the overhead projector is given in Figure 4.1. In a reliability block diagram, a rectangle or a circle is often used to represent a component. The name of the component may be given in the block. More often, only the label of each component is given in the block. The reliability block diagram in Figure 4.1 shows the five major components in the system. It also indicates that the system works if and only if the switch works, the fan works, the knob works, and at least one of the two bulbs works. A reliability block diagram does not necessarily represent how the components are physically connected in the system. It only indicates how the functioning of the components will ensure the functioning of the system. That is why a reliability block diagram represents the logic relationship between the functioning of the system and the functioning of its components. A reliability block diagram, as shown in Figure 4.1, is best interpreted as the signal flows through the components from left to right. A working component allows the signal to flow through it while a failed one does not. Reliability block diagrams have been used to represent series structures, parallel structures, series–parallel structures, parallel–series structures, bridge structures, and general network structures. The diagrams of these structures will be given when they are introduced. However, not all systems can be represented by reliability block diagrams. For example, the k-out-of-n systems, to be discussed in Chapter 7, cannot be represented by a reliability block diagram without duplicating components. In discussions of system structures, we often use n to indicate the number of components in the system and each component is given a unique label from the set {1, 2, . . . , n}. The set of components in a system is denoted by C.
Bulb 1 Switch
Fan
Knob Bulb 2
FIGURE 4.1
Reliability block diagram for overhead projector.
STRUCTURE FUNCTIONS
87
4.2 STRUCTURE FUNCTIONS Throughout most of this book, we assume that the system and each component may only be in one of two possible states, working or failed. Thus, the state of each component or the system is a discrete random variable that can take only two possible values indicating the working state and the failure state, respectively. Let xi indicate the state of component i for 1 ≤ i ≤ n and 1 if component i works, (4.1) xi = 0 if component i is failed. Then, vector x = (x1 , x2 , . . . , x n ) represents the states of all components and is called the component state vector. The state of the system is also a binary random variable and is completely determined by the states of the components. Let φ represent the state of the system and 1 if the system works, (4.2) φ= 0 if the system is failed. If the states of all components are known, the system state is also known. The state of the system is a deterministic function of the states of the components. Thus, we often write φ = φ(x) = φ(x1 , x2 , . . . , x n )
(4.3)
and call φ(x) the structure function of the system. Each unique system structure corresponds to a unique structure function φ(x). We usually use (C, φ) to represent a system with the set of components C and the structure function φ. Example 4.1 Take the overhead projector whose reliability block diagram is given in Figure 4.1 as an example. Component 1 is the switch; component 2 is the fan; component 3 is the knob; and components 4 and 5 are bulbs 1 and 2, respectively. Among these components, the light bulbs have the shortest life based on our experience. If bulb 1 is failed, we write x 4 = 0. The structure function of this system is given by φ(x 1 , x2 , x3 , x4 , x 5 ) = x1 x2 x3 [1 − (1 − x4 )(1 − x5 )] = min{x 1 , x 2 , x 3 , max{x 4 , x5 }}.
(4.4)
One can easily verify that equation (4.4) represents the state of the system as a function of the states of the components. The system is in state 1 if and only if components 1, 2, and 3 all are in state 1 and at least one of the light bulbs is in state 1. The system is in state 0 if and only if at least one of components 1, 2, and 3 is in state 0 or both components 4 and 5 are in state 0.
88
FUNDAMENTAL SYSTEM RELIABILITY MODELS
1
...
2
FIGURE 4.2
n
Reliability block diagram of series system.
Example 4.2 A series system works if and only if every component works. Such a system is failed whenever any component is failed. The reliability block diagram of a series system is given in Figure 4.2. The structure function of a series system is given by φ(x) =
n .
xi = min{x1 , x2 , . . . , xn }.
(4.5)
i=1
A series system reflects the essential functions to be performed satisfactorily by individual components for the system to perform its functions satisfactorily. For example, for an automobile to work properly, the engine, the transmission, the steering, and the braking subsystems must all work properly. For reliability analysis, these essential subsystems are considered to form a series system. Example 4.3 A parallel system is failed if and only if all components are failed. It works as long as at least one component works. The reliability block diagram of a parallel system is given in Figure 4.3. The structure function of a parallel system is given by φ(x) = 1 −
n .
(1 − xi ) = max{x 1 , x 2 , . . . , x n }.
(4.6)
i=1
Because of the importance/of the parallel system structure, Barlow and Proschan [22] introduced the notation to simplify its structure function. The structure function of a parallel system can be written as φ(x) =
n 0
xi ,
where
i=1
n 0 i=1
xi ≡ 1 −
n .
(1 − xi ).
i=1
1
...
2
n FIGURE 4.3
Reliability block diagram of parallel system.
(4.7)
STRUCTURE FUNCTIONS
89
In a parallel system, not all components are necessary for the system to work properly. Actually, only one component needs to work properly to make the system work properly. Then, why are n components included in the system when only one is essential? This is called redundancy. The other n − 1 components in the parallel system are called redundant components. They are included to increase the probability that there is at least one working component. Redundancy is a technique widely used in engineering to enhance system reliability. The pivotal decomposition technique may be used in derivation of the structure function of a system. This technique relies on the enumeration of the states of a selected component. For any i such that 1 ≤ i ≤ n, the following equation may be used for any system structure: φ(x) = x i φ(1i , x) + (1 − xi )φ(0i , x),
(4.8)
where (ai , x) ≡ (x1 , x2 , . . . , xi−1 , a, xi+1 , . . . , xn ) for a ∈ {0, 1}. Based on this equation, the state of the system is either equal to the state of the system when component i is working (i.e., xi = 1) or equal to the state of the system when component i is failed (i.e., xi = 0). When xi = 1, the second term in equation (4.8) disappears. When xi = 0, the first term in equation (4.8) disappears. This equation is useful for network systems and complex system structures. In equation (4.8), the structure function of a system with n components is expressed in terms of the structure functions of two different subsystems each with n − 1 components. In the first subsystem, the state of component i is equal to 1 while the states of the other n − 1 components are random variables. In the second subsystem, the state of component i is equal to 0 while the states of the other n − 1 components are random variables. Through repeated applications of equation (4.8), we can eventually reach a subsystem whose structure function is known. The selection of the component to be decomposed first in this process is critical in order to quickly find the structure function of the system. Example 4.4 Consider the overhead projector example again. We will illustrate the use of the pivotal decomposition technique to verify its structure function given in equation (4.4): φ(x1 , x2 , x3 , x4 , x5 ) = x1 φ(1, x2 , x3 , x4 , x5 ) + (1 − x 1 ) × 0 = x 1 [x2 φ(1, 1, x3 , x4 , x5 ) + (1 − x2 ) × 0] = x 1 x2 [x 3 φ(1, 1, 1, x4 , x 5 ) + (1 − x 3 ) × 0] = x1 x2 x3 [x4 × 1 + (1 − x4 )φ(1, 1, 1, 0, x5 )] = x 1 x2 x3 {x 4 + (1 − x4 )[x5 × 1 + (1 − x5 ) × 0]} = x 1 x2 x3 (x4 + x5 − x 4 x5 ) = x 1 x2 x3 [1 − (1 − x 4 )(1 − x5 )].
90
FUNDAMENTAL SYSTEM RELIABILITY MODELS
Definition 4.1 (Barlow and Proschan [22]) given by
Given a structure φ, its dual φ D is
φ D (x) = 1 − φ(1 − x),
(4.9)
where 1 = (1, 1, . . . , 1) is a vector with n elements. Thus, 1 − x = (1 − x 1 , 1 − x 2 , . . . , 1 − xn ). Example 4.5 Consider the structure function φ(x) of a series system given in equation (4.5). Its dual structure function can be derived with equation (4.9) as follows: φ D (x) = 1 − φ(1 − x) = 1 −
n .
(1 − xi ).
i=1
The final expression in this equation is exactly the same as equation (4.6), which is the structure function of a parallel system. Thus, we conclude that the dual structure of a series system is a parallel system. By finding the dual structure of a parallel system, one can confirm that the dual structure of a parallel system is a series system. In other words, the series and the parallel systems are the duals of each other.
4.3 COHERENT SYSTEMS Definition 4.2 (Barlow and Proschan [22]) A component is irrelevant to the performance of a system if the state of the system is not affected by the state of this component. In mathematical terms, component i(1 ≤ i ≤ n) is irrelevant to the structure function φ if and only if φ(1i , x) = φ(0i , x) for any component state vector x. Otherwise, the component is said to be relevant. If a component is relevant, it means that there exists at least one component state vector x such that the state of component i dictates the state of the system. In other words, when other components are in certain states, specified by (x 1 , x2 , . . . , xi−1 , xi+1 , . . . , xn ), we have φ(x1 , x 2 , . . . , xn ) = xi . Under these circumstances, when component i works, the system works; when component i fails, the system fails. An irrelevant component is useless to the system. In engineering practice, useless components usually do not exist in systems. As a result, for mathematical analysis of reliability systems, we often assume that all components are relevant. Another observation of engineering systems is that improving the performance of a component usually does not deteriorate the performance of the system. For example, replacing a failed component in a working system usually does not make the system fail. Of course, replacing a failed component in a failed system does not necessarily restore the functioning of the system because there may be other failed components in the system. To reflect this reality, we often assume the structure function of ev-
COHERENT SYSTEMS
91
ery engineering system to be a nondecreasing function of the state of every component. Combining these two requirements, we provide the definition of a coherent system. Definition 4.3 (Barlow and Proschan [22]) A system with structure function φ(x) is coherent if and only if φ(x) is nondecreasing in each argument xi for 1 ≤ i ≤ n and every component is relevant. Based on this definition, a coherent system satisfies the following conditions: 1. φ(0) = 0. In words, the system is failed when all components are failed. 2. φ(1) = 1. In words, the system works when all components work. 3. If x < y, then φ(x) ≤ φ(y). In words, improvement of any component does not degrade the performance of the system. 4. For every component i, there exists a component state vector such that the state of component i dictates the state of the system. For two vectors each with n elements, x and y, we write x < y if xi ≤ yi for each i and xi < yi for at least one i (1 ≤ i ≤ n). In words, we say that vector x is smaller than vector y. Definition 4.4 (Barlow and Proschan [22]) Two components are said to be symmetric if interchange of their positions in a system does not affect the state of the system. In other words, components i and j are said to be symmetric if the following condition is satisfied for all possible component state vectors: φ(x 1 , . . . , xi−1 , xi , xi+1 , . . . , x j−1 , x j , x j+1 , . . . , x n ) = φ(x1 , . . . , xi−1 , x j , xi+1 , . . . , x j−1 , xi , x j+1 , . . . , xn ). The structure function of a coherent system is said to be symmetric if all components are symmetric to one another. As we will see later, symmetric systems include series, parallel, and k-out-of-n structures. Definition 4.5 (Barlow and Proschan [22]) Consider two coherent structure functions φ1 and φ2 with the same definition domain S = {0, 1}n . We say that φ1 is stronger than φ2 if φ1 (x) ≥ φ2 (x) for all x ∈ S and φ1 (x) > φ2 (x) for at least one x ∈ S. Based on this definition, a stronger structure results in a system state that is higher or at least equal to the state that results from a weaker structure for the same component state vector. A stronger structure requires less components to work for the system to work. On the other hand, a weaker structure requires that more components work for the system to work.
92
FUNDAMENTAL SYSTEM RELIABILITY MODELS
A series structure requires all components to work for the system to work while a parallel structure requires only one component to work for the system to work. As a result, we say that a parallel structure is stronger than a series structure [94]. Theorem 4.1 (Barlow and Proschan [22]) The series and the parallel structures are the weakest and the strongest structures among all coherent structures, respectively; that is, n .
xi ≤ φ(x) ≤
i=1
n 0
xi ,
(4.10)
i=1
where φ denotes the structure function of any coherent system. Proof Based on Definition 4.3, the structure function of a coherent system satisfies the following condition: 0 ≤ φ(x) ≤ 1,
for all x.
Again use φ p and φs to denote the structure functions of a parallel and a series structure, respectively. Then, we have • • •
φs (0) = φ(0) = φ p (0) = 0; φs (1) = φ(1) = φ p (1) = 1; and for all 0 < x < 1, φs (x) = 0, φ p (x) = 1, and φs (x) ≤ φ(x) ≤ φ p (x).
As a result, we conclude that for all 0 ≤ x ≤ 1 we have φs (x) ≤ φ(x) ≤ φ p (x). A parallel structure is strong because redundancy is used in the system. Whenever there are alternative ways of achieving the system function, we say that redundancy is utilized. A series system does not have any redundancy because there is only one way for the system to work properly; that is, all components have to work properly. In a parallel system, there are 2n − 1 different ways for the system to work properly. Each component constitutes a different way. Redundancy can be applied at the component level or at a subsystem level to increase the strength of the structure of the system. Figure 4.4 shows two different ways of applying redundancy to a series system with components 1, 2, and 3. As stated in the following theorem, redundancy at the component level produces a stronger system structure than redundancy at the subsystem level. Proving that Figure 4.4a is a stronger structure than Figure 4.4b is left to readers as an exercise. Theorem 4.2 (Barlow and Proschan [22]) Let φ be a coherent structure. Then / / φ(x y) ≥ φ(x) φ(y), (4.11)
MINIMAL PATHS AND MINIMAL CUTS
1
2
3
4
5
6
93
(a) Redundancy at component level
1
2
3
4
5
6
(b) Redundancy at the subsystem level FIGURE 4.4
Applying redundancy at (a) component and (b) subsystem levels.
where / / / / x y ≡ (x 1 y1 , x2 y2 , . . . , x n yn ).
4.4 MINIMAL PATHS AND MINIMAL CUTS For the set of components C = {1, 2, . . . , n} in a system and a component state vector x, let C 1 (x) represent the set of components whose states are 1 in x and C0 (x) the set of components whose states are 0 in x; that is, C 1 (x) ≡ {i | xi = 1, i ∈ C},
C0 (x) ≡ {i | xi = 0, i ∈ C}.
It is apparent that C1 (x) ∩ C0 (x) = ∅ and C1 (x) ∪ C0 (x) = C. Definition 4.6 (Barlow and Proschan [22]) Consider a coherent system (C, φ). A component state vector x is called a path vector if φ(x) = 1. The corresponding path or path set is C 1 (x). A path vector x is called a minimal path vector if φ(y) = 0 for any y < x. The corresponding minimal path or minimal path set is C 1 (x). A component state vector x is called a cut vector if φ(x) = 0. The corresponding cut or cut set is C0 (x). A cut vector x is called a minimal cut vector if φ(y) = 1 for any y > x. The corresponding minimal cut or minimal cut set is C0 (x). A path is a set of components whose simultaneous functioning ensures the functioning of the system, while a cut is a set of components whose simultaneous failure ensures the failure of the system. A minimal path is a minimal set of components whose simultaneous functioning ensures the functioning of the system. A minimal cut is a minimal set of components whose simultaneous failure ensures the failure of the system. For a series system, each component by itself forms a minimal cut and
94
FUNDAMENTAL SYSTEM RELIABILITY MODELS
MP1
MP2 MC2
...
MCm
...
MC1
MPl
(a) A coherent system of minimal paths
(b) A coherent system of minimal cuts.
FIGURE 4.5 Reliability block diagrams of coherent system in terms of (a) minimal paths or (b) minimal cuts.
the set of all components is a minimal path. Thus, a series system has n distinct minimal cuts and one minimal path. For a parallel system, each component is a minimal path and the set of all components is a minimal cut. Thus, a parallel system has n minimal paths and one minimal cut. Based on the definition of dual structures given in equation (4.9), we can see that if x is a minimal path in the primal structure φ, then 1 − x is a minimal cut in the dual structure φ D . For a system to work, at least one minimal path has to work. The system may be considered as a parallel structure with each minimal path as a component. For a system to fail, at least one minimal cut has to fail. As a result, the system may be considered as a series structure with each minimal cut as a component. Let l and m represent the numbers of minimal paths and minimal cuts in a system, respectively. Let MPi for 1 ≤ i ≤ l and MC j for 1 ≤ j ≤ m indicate the ith minimal path and the jth minimal cut, respectively. Then, the reliability block diagram of any coherent system may be depicted as in Figure 4.5. Based on the definitions of minimal paths and minimal cuts, the components in each minimal path have a series structure and the components in each minimal cut have a parallel structure. Thus, MPi and MC j for each i and j in Figure 4.5 can be further decomposed into a series structure and a parallel structure, respectively. As a result, minimal paths and minimal cuts may be used in derivation of the structure function of any coherent system. However, we note that different minimal paths may have components in common and different minimal cuts may have components in common. In the following section, we use an example to illustrate the use of minimal paths and minimal cuts in development of the structure function of the so-called bridge structure. Example 4.6 Consider the bridge structure given in Figure 4.6. There are four minimal paths and four minimal cuts. The minimal paths and the minimal cuts are
MINIMAL PATHS AND MINIMAL CUTS
1
95
2 3
4
FIGURE 4.6
5
Bridge structure.
MP1 = {1, 2},
MP2 = {4, 5},
MP3 = {1, 3, 5},
MP4 = {2, 3, 4},
MC1 = {1, 4},
MC2 = {2, 5},
MC3 = {1, 3, 5},
MC4 = {2, 3, 4}.
Let ρi for 1 ≤ i ≤ 4 indicate the structure function of the ith minimal path. Since MPi has a series structure for 1 ≤ i ≤ 4, we have ρ1 = x 1 x 2 ,
ρ2 = x 4 x 5 ,
ρ3 = x1 x3 x5 ,
ρ4 = x 2 x 3 x 4 .
Let γ j for 1 ≤ j ≤ 4 indicate the structure function of the jth minimal cut. Since MC j has a parallel structure for 1 ≤ j ≤ 4, we have γ1 = 1 − (1 − x 1 )(1 − x4 ), γ2 = 1 − (1 − x 2 )(1 − x5 ), γ3 = 1 − (1 − x 1 )(1 − x3 )(1 − x5 ), γ4 = 1 − (1 − x 2 )(1 − x3 )(1 − x4 ). Using Figure 4.5, we can derive the structure function of the bridge structure with either the minimal paths or the minimal cuts: φ(x) =
4 0
ρi = 1 − (1 − ρ1 )(1 − ρ2 )(1 − ρ3 )(1 − ρ4 )
i=1
= 1 − (1 − x 1 x2 )(1 − x4 x5 )(1 − x1 x3 x5 )(1 − x2 x3 x4 ), φ(x) = γ1 γ2 γ3 γ4 = [1 − (1 − x1 )(1 − x4 )][1 − (1 − x2 )(1 − x5 )] ×[1 − (1 − x1 )(1 − x3 )(1 − x5 )][1 − (1 − x2 )(1 − x3 )(1 − x4 )]. It can be verified that these two equations give the same structure function of the bridge system. However, it is clear that further simplification of these two equations is going to be very tedious. The structure functions of more complex systems will be even more time consuming to derive and/or simplify.
96
FUNDAMENTAL SYSTEM RELIABILITY MODELS
4.5 LOGIC FUNCTIONS The structure function defined earlier is a discrete mathematical function. It takes binary inputs and produces a binary output, namely 0 or 1. It is used to indicate the relationship between the state of the system and the states of the components. From Example 4.6, we see that it is not easy to simplify a binary mathematical function. In this section, we define the logic function of a coherent system. For a logic function, Boolean algebra may be used to simplify its expressions. Notation • • •
•
• •
xi : the event that component i works, 1 ≤ i ≤ n x i : the complement of xi , indicating that component i is failed, 1 ≤ i ≤ n (1, 2, . . . , i − 1, Ti , i + 1, . . . , n): a system with components {1, 2, . . . , n} given that component i is working, where Ti indicates that the event that component i works is true (1, 2, . . . , i − 1, Fi , i + 1, . . . , n): a system with components {1, 2, . . . , n} given that component i is failed, where Fi indicates that the event that component i works is false S(1, 2, . . . , n): the event that the system with components {1, 2, . . . , n} works S(1, 2, . . . , n): the complement of S(1, 2, . . . , n), indicating that the system with components {1, 2, . . . , n} is failed
With these notations, the logic function of a series system with n components may be written as Sseries = x 1 x2 · · · x n .
(4.12)
The logic function of a parallel system may be written as S parallel = x 1 x 2 · · · x n .
(4.13)
For the logic functions, the pivotal decomposition technique given in equation (4.8) can be written as S(1, 2, . . . , n) = xi S(1, 2, . . . , i − 1, Ti , i + 1, . . . , n) + x i S(1, 2, . . . , i − 1, Fi , i + 1, . . . , n).
(4.14)
Example 4.7 Consider the bridge structure given in Figure 4.6. The minimal paths and minimal cuts are given in Example 4.6. We will use the same notation to indicate the logic functions of the subsystems. Namely, ρi for 1 ≤ i ≤ 4 indicates the logic function of the ith minimal path. Since MPi has a series structure for 1 ≤ i ≤ 4, we have ρ1 = x 1 x 2 ,
ρ2 = x4 x5 ,
ρ3 = x1 x3 x5 ,
ρ4 = x 2 x3 x4 .
MODULES WITHIN A COHERENT SYSTEM
97
Let γ j for 1 ≤ j ≤ 4 indicate the logic function of the jth minimal cut. Since MC j has a parallel structure for 1 ≤ j ≤ 4, we have γ 1 = x 1 x 4,
γ 2 = x 2 x 5,
γ 3 = x 1 x 3 x 5,
γ 4 = x 2 x 3 x 4.
Using Figure 4.5, we can derive the logic function of the bridge structure with either the minimal paths or the minimal cuts. Note that the formulas of Boolean algebra given in Chapter 2 can be used to simplify the logic function expressions easily. With the minimal paths, we have S = ρ 1 ρ 2 ρ 3 ρ 4 = x1 x2 x4 x5 x1 x3 x5 x2 x3 x4 . With the minimal cuts, we have S = γ1 γ2 γ3 γ4 = x 1 x 4 x 2 x 5 x 1 x 3 x 5 x 2 x 3 x 4 = (x 1 ∪ x4 )(x 2 ∪ x5 )(x 1 ∪ x3 ∪ x5 )(x 2 ∪ x3 ∪ x4 ).
4.6 MODULES WITHIN A COHERENT SYSTEM In engineering terminology, a module is simply a cluster of components that is treated as a single entity in a piece of equipment. For example, an engine is often treated as a module in the airline industry. If an engine is failed or has signs of serious degradation, it is taken off the airplane as a whole and sent to the engine manufacturer for repair and rebuild. Because of the use of integrated circuits and other advanced manufacturing technologies, many sophisticated devices are treated as modules. For example, end users rarely repair failed CD-ROMs, floppy drives, or hard disks. Even though these devices are repairable, they are not economically repairable at the end users’ hands. In system reliability theory, a module indicates a group of components that has a single input from and a single output to the rest of the system. The state of the module can be represented by a binary random variable. The contribution of all components in a module to the performance of the whole system can be represented by the state of the module. Once the state of the module is known, one does not need to know the states of the components within the module in determining the state of the system. A formal definition of a module is given below. Definition 4.7 (Barlow and Proschan [22]) The coherent system (A, χ ) is a c module of the coherent system (C, φ) if φ(x) = ψ(χ (x A ), x A ), where ψ is a coherent structure function, A is a subset of C, Ac is the subset of C that is complementary to A, and x A is a vector representing the states of the components in set A. The set A ⊆ C is called a modular set of (C, φ). Based on this definition, each component by itself is a module of the system. A module has a structure function. The system can be considered as consisting of a
98
FUNDAMENTAL SYSTEM RELIABILITY MODELS
module and the components that are not in the module. The structure function that relates the system state and the state of the module and the states of other components is denoted by ψ. If A is a module, Ac must be a module too. More than two modules may exist in a coherent system. Modular decomposition is a technique that can be used to decompose a coherent system into several disjoint modules. Such a decomposition is useful if the structure function of each module can be easily derived and the structure function relating the system state to the states of these disjoint modules can be easily derived. A formal definition of modular decomposition is given below. Definition 4.8 (Barlow and Proschan [22]) A modular decomposition of a coherent system (C, φ) is a set of disjoint modules, {(A1 , χ1 ), (A2 , χ2 ), . . . , (Ar , χr )} together with an organizing structure ψ such that C=
r 1
Ai
and
Ai ∩ A j = ∅
for i = j,
(4.15)
i=1
φ(x) = ψ(χ1 (x A1 ), χ2 (x A2 ), . . . , χr (x Ar )).
(4.16)
To apply the modular decomposition technique in derivation of system structure functions, we need to identify disjoint modules within the system structure. The structure function of each disjoint module should be easy to derive. These modules can then be treated as a “supercomponent” or a subsystem, and we need to find the relationship between the state of the system and the states of these subsystems. Hopefully, the number of such subsystems is much smaller than the number of components in the system. Treating these subsystems as if they were components, we can further identify disjoint modules within the system whose structure functions can be easily derived. Repeating this process, eventually we find the structure function of the whole system. Any series subsystem that exists in a coherent system forms a module. Any parallel subsystem in a coherent system forms a module. The structure functions of series and parallel structures are well known. For some coherent systems, we may have to use both the pivotal decomposition technique and the modular decomposition technique to derive the system’s structure function efficiently. Replacing the structure function by a logic function in Definitions 4.7 and 4.8, we can express the system logic function in terms of the logic functions of disjoint modules in the system. Example 4.8 We will examine again the overhead projector whose reliability block diagram is given in Figure 4.1. Its structure function has been given in Example 4.1. Assume A = {1, 2, 3} represents a module that has a series structure and B = {4, 5} a module that has a parallel structure. These two modules are disjoint. The structure functions of these two modules are χ (x A ) = x1 x2 x 3 ,
χ(x B ) = 1 − (1 − x4 )(1 − x5 ).
Modules A and B are connected in series to form the system. Thus, the organizing structure of these two modules is series. The structure function of the system can be
MODULES WITHIN A COHERENT SYSTEM
99
written as φ(x) = ψ χ (x A ), χ (x B ) = χ (x A ) · χ (x B ) = x1 x2 x3 [1 − (1 − x 4 )(1 − x5 )]. The logic functions of modules A and B can be written as S(A) = x1 x2 x3 ,
S(B) = x 4 x 5 .
The logic function of the system is then S(C) = S(A)S(B) = x1 x 2 x3 x 4 x 5 . Example 4.9 Consider the bridge structure given in Figure 4.6. There are five components in the system. No modules with more than one and less than five components exist in the system. Thus, modular decomposition cannot be used directly. However, we can use the pivotal decomposition technique first. As mentioned before, it is important to pick the right sequence of components to perform pivotal decompositions. Since this is a simple structure, a direct examination of Figure 4.6 reveals that decomposing component 3 would result in series and parallel subsystems. The system structure function can be expressed as φ(x) = x3 φ(x 1 , x2 , 1, x4 , x5 ) + (1 − x3 )φ(x1 , x 2 , 0, x4 , x5 ).
(4.17)
Instead of using the pivotal decomposition technique repetitively, we can switch to the use of the modular decomposition technique from here on. In equation (4.17), φ(x1 , x2 , 1, x4 , x5 ) represents the structure function of the bridge structure given that component 3 is in state 1. Under this condition, the system reliability block diagram is shown in Figure 4.7. As shown in Figure 4.7, A = {1, 4} and B = {2, 5} are two disjoint modules of the system, each with a parallel structure. These two disjoint modules are connected in series to form the system with structure function φ(x1 , x2 , 1, x 4 , x5 ). Thus, we have χ (x1 , x 4 ) = 1 − (1 − x1 )(1 − x4 ), χ (x2 , x 5 ) = 1 − (1 − x2 )(1 − x5 ), φ(x 1 , x 2 , 1, x 4 , x 5 ) = χ (x1 , x4 ) · χ (x2 , x5 ) = [1 − (1 − x 1 )(1 − x4 )][1 − (1 − x2 )(1 − x 5 )].
FIGURE 4.7
1
2
4
5
(4.18)
Reliability block diagram of bridge structure under condition that x3 = 1.
100
FUNDAMENTAL SYSTEM RELIABILITY MODELS
FIGURE 4.8
1
2
4
5
Reliability block diagram of bridge structure under condition that x3 = 0.
In the second term of equation (4.17), φ(x1 , x2 , 0, x 4 , x5 ) represents the structure function of the bridge structure given that component 3 is in state 0. Under this condition, the system reliability block diagram is shown in Figure 4.8. As shown in Figure 4.8, D = {1, 2} and E = {4, 5} are two disjoint modules of the system, each with a series structure. These two disjoint modules are connected in parallel to form the system with structure function φ(x1 , x2 , 0, x 4 , x5 ). Thus, we have χ (x1 , x2 ) = x1 x2 ,
χ(x 4 , x5 ) = x4 x5 ,
φ(x1 , x2 , 0, x 4 , x5 ) = 1 − [1 − χ (x1 , x2 )][1 − χ (x4 , x 5 )] = 1 − (1 − x 1 x2 )(1 − x4 x5 ).
(4.19)
Substituting equations (4.18) and (4.19) into equation (4.17), we obtain the structure function of the original bridge system structure as follows: φ(x) = x3 [1 − (1 − x 1 )(1 − x4 )][1 − (1 − x2 )(1 − x5 )] + (1 − x 3 )[1 − (1 − x1 x2 )(1 − x4 x5 )]. Similarly, we can obtain the logic function of the system as follows: S(A) = x 1 x 4 ,
S(B) = x 2 x 5 ,
S(D) = x1 x2 ,
S(E) = x4 x5 ,
S(x1 , x2 , T3 , x4 x5 ) = S(A)S(B) = x 1 x 4 x 2 x 5 , S(x1 , x 2 , F3 , x4 , x 5 ) = S(D)S(E) = x1 x2 x4 x5 , S(C) = x3 x 1 x 4 x 2 x 5 + x 3 x1 x2 x4 x5 .
4.7 MEASURES OF PERFORMANCE Any device of interest, be it a component or a system, must work satisfactorily under specified conditions when needed. With the notation defined earlier, φ(x) may be used to measure the performance of the system. However, the problem is that φ(x) is a random variable. Although we know that it can only take one of two possible values, 0 and 1, we never know for sure whether a system that is in state 1 will continue to be in state 1 in the future or for how long it will continue to be in state 1.
MEASURES OF PERFORMANCE
101
The length of time that a new working system will continue to work properly is defined as its lifetime, or simply life, often denoted by Ts . The lifetime of the system is also a random variable. Similar arguments can be made about the state xi and lifetime Ti of component i for 1 ≤ i ≤ n. Thus, the variables of interest include the state of the device and the lifetime of the device. Since it is difficult or impossible to measure them directly, we often use some statistical characteristics of these random variables to measure the performance of the device. Reliability Many devices of interest have a mission time. It is defined to be the period of time during which a device is required to work satisfactorily. For example, a missile is required to work properly from the time point it is loaded into a launcher to the point of hitting the target. A rocket for launching satellites is required to work properly from the launching time point to the time point that the satellite reaches its orbit. An airplane making an overseas flight is required to work properly for the entire duration of the flight, including takeoff and landing. For a device with a specified mission time, we often use reliability to measure performance. The reliability of such a device is defined to be the probability that it will work properly during the mission time under specified working conditions. The working conditions of each device are usually implicit and understood. The mission time for each device is also understood and treated implicitly. Note that the mission time is not a random variable. Under these conditions, the following notation is used. Notation • • • •
pi : reliability of component i(1 ≤ i ≤ n) qi : unreliability of component i(1 ≤ i ≤ n) Rs : reliability of the system Q s : unreliability of the system
Since xi for 1 ≤ i ≤ n and φ(x) are all 0–1 random variables, we have pi = Pr(xi = 1) = E(xi ),
1 ≤ i ≤ n,
(4.20)
qi = 1 − pi ,
1 ≤ i ≤ n,
(4.21)
Rs = Pr(φ(x) = 1) = E(φ(x)),
(4.22)
Q s = 1 − Rs .
(4.23)
We know that the state of a system is a function of the states of its components. The states of the components in a system may be independent or dependent on one another. When the components work independently from one another, we say that the reliability of a system is a function of the reliabilities of the components and write Rs = h(p) = h( p1 , p2 , . . . , pn ),
(4.24)
where p = ( p1 , p2 , . . . , pn ) is called the component reliability vector. We sometimes call h(p) the reliability function of the structure φ. As will be illustrated further
102
FUNDAMENTAL SYSTEM RELIABILITY MODELS
later, h(p) reflects the unique relationship between system reliability and component reliabilities of each distinct system structure φ. However, we must differentiate this reliability function of component reliabilities from the reliability function of time t to be defined later. Example 4.10 Take the series system structure as an example. For the system to work, all components must work. Assuming that components work and fail independently and using its structure function given in equation (4.5), we have Rs = Pr(x1 x2 · · · x n = 1) = Pr(x 1 = 1, x2 = 1, . . . , xn = 1) = Pr(x1 = 1) Pr(x2 = 1) · · · Pr(xn = 1) = p1 p2 · · · pn .
(4.25)
If the logic function of the system is used, with equation (4.12), we have Rs = Pr(x 1 x2 · · · x n ) = Pr(x 1 ) Pr(x2 ) · · · Pr(xn ) = p1 p2 · · · pn .
(4.26)
Example 4.11 Consider the parallel system structure. For the system to fail, all components must fail. Again assume that all components are independent. With the logic function given in equation (4.13), we have Q s = Pr(x 1 x 2 · · · x n ) = Pr(x 1 ) Pr(x 2 ) · · · Pr(x n ) = q1 q2 · · · qn ,
(4.27)
Rs = 1 − q1 q2 · · · qn .
(4.28)
The reliability measures defined and illustrated above are referred to as static measures because they do not deal with the time factor directly. In other words, the required service time is implicit and fixed. Thus, it does not appear in the equations of system reliability evaluation. Though the mission times may be different, such system reliability evaluation equations can be used for any specified duration of time. In addition, other system performance measures to be defined later can be easily obtained from the reliability measures defined above. Reliability Function In many practical applications, a certain kind of service will be needed for a long time or indefinitely. However, we know that a specific device cannot work indefinitely. To continue providing the required service, new devices will have to be purchased to replace old devices. In these situations, we would like to have a specific device work as long as economically possible. No finite mission time is specified here. The lifetime of the device is the random variable of interest. We are interested in the lifetime distribution of the system as a function of the lifetime distributions of the components, which are assumed to be known. The following notation will be used when the time factor has to be dealt with explicitly. Notation • •
Ti : lifetime of component i, 1 ≤ i ≤ n Ts : lifetime of the system
MEASURES OF PERFORMANCE • • • • • • • •
103
Ri (t): reliability function of component i, Ri (t) = Pr(Ti > t), 1 ≤ i ≤ n Fi (t): unreliability function of component i, or the CDF of Ti , Fi (t) = 1 − Ri (t), 1 ≤ i ≤ n f i (t): pdf of the lifetime of component i, 1 ≤ i ≤ n h i (t): failure rate function of component i, 1 ≤ i ≤ n Rs (t): reliability function of the system, Rs (t) = Pr(Ts > t) Fs (t): unreliability function of the system, or the CDF of Ts , Fs (t) = 1 − Rs (t) f s (t): pdf of the lifetime of the system h s (t): failure rate function of the system
The reliability function Ri (t) or Rs (t) is defined to be the probability that a device will survive beyond a time duration of length t, where t > 0. It is the reliability of a device for an arbitrary mission time t > 0. A new device is often assumed to be in the working state, that is, R(0) = 1. As the device gets used, the probability of working satisfactorily up to time t will decrease and approach zero as t increases. The reliability function is said to be a dynamic measure of the performance of the device since it is a function of time. The reliability function of a system as a function of time t can be easily obtained from the static reliability measure of the system defined earlier when the components are independent. For example, the following equation can be directly obtained from equations (4.22) and (4.24) when time factor t is introduced for a nonrepairable system. The only substitutions made are Rs (t) in the position of Rs , x(t) in the position of x, and Ri (t) in the position of pi for 1 ≤ i ≤ n: Rs (t) = Pr(φ(x(t)) = 1) = E(φ(x(t))) = h(R1 (t), R2 (t), . . . , Rn (t)).
(4.29)
The reliability function of a series system can be extended from equation (4.25) as Rs (t) = R1 (t)R2 (t) · · · Rn (t).
(4.30)
This can be easily verified as follows assuming that components are independent: Rs (t) = Pr(Ts > t) = Pr(min{T1 , T2 , . . . , Tn } > t) = Pr(T1 > t, T2 > t, . . . , Tn > t) = Pr(T1 > t) Pr(T2 > t) · · · Pr(Tn > t) = R1 (t)R2 (t) · · · Rn (t). The unreliability function of a parallel system can be extended from equation (4.27) as Fs (t) = F1 (t)F2 (t) · · · Fn (t).
(4.31)
Its reliability function is then Rs (t) = 1 − F1 (t)F2 (t) · · · Fn (t).
(4.32)
104
FUNDAMENTAL SYSTEM RELIABILITY MODELS
This can be easily verified as follows: Rs (t) = Pr(Ts > t) = 1 − Pr(Ts ≤ t) = 1 − Pr(max{T1 , T2 , . . . , Tn } ≤ t) = 1 − Pr(T1 ≤ t, T2 ≤ t, . . . , Tn ≤ t) = 1 − Pr(T1 ≤ t) Pr(T2 ≤ t) · · · Pr(Tn ≤ t) = 1 − F1 (t)F2 (t) · · · Fn (t). In the process of extending static reliability measures to dynamic reliability measures used above, Ri (t) and Fi (t) are used to substitute pi and qi , respectively, for 1 ≤ i ≤ n and Rs (t) and Fs (t) are used to substitute Rs and Q s , respectively. Other characteristic functions of the system’s lifetime include its pdf f s (t) and its failure rate function h s (t). Both can be derived from the reliability function of the system, Rs (t): d Fs (t) d Rs (t) =− , dt dt f s (t) . h s (t) = Rs (t) f s (t) =
(4.33) (4.34)
Other equations relating pdf, CDF, and h(t) are given in Chapter 2. They are valid for the system and its components. Mean Time to Failure (MTTF) The MTTF is a single value that indicates how long a device can last on average when no repairs are allowed. The MTTF of a system can be derived from its reliability function as follows: ∞ Rs (t) dt. (4.35) MTTFs = E(Ts ) = 0
Availability A system often has redundancy built into it. In other words, some components of the system may be failed while the system still functions satisfactorily. When too many components are failed or some critical components are failed, the system will become failed. As components become failed, they may get repaired so that the number of failed components is maintained below a certain level. Thus, repairs of failed components may extend the life of the system. A system may work properly for a period of time and eventually become failed. Once it is failed, repairs and overhauls may be performed on the system to bring it back to the working state again. Thus, the state of the system may change between 1 and 0 several times before it is scrapped. The probability that the system works at time t is called its availability function and is given by A(t) = Pr(φ(x(t)) = 1).
(4.36)
For a repairable system, {φ(x(t)) = 1} indicates that the system is in the working state at time t. It does not say anything about the states that the system has experi-
ONE-COMPONENT SYSTEM
105
enced before time t. It may have gone through several up and down cycles before staying at the up state at time t. If we need the system to provide service at a specific time point t, A(t) is a good indicator of its availability at time t. The availability function is usually difficult to derive. Instead, we are interested in the long-run average availability of the system. This is called the steady-state availability of the system. To evaluate the steady-state availability, we need the mean time between failures (MTBF) and mean time to repair (MTTR) of the system in steady state. Mean time between failures of a system, is defined to be the average continuous working duration of the system. Mean time to repair is defined to be the average amount of time needed to repair a failed system. The steady-state availability of the system, As , can then be expressed as: As =
MTBFs . MTBFs + MTTRs
(4.37)
4.8 ONE-COMPONENT SYSTEM Components are the building blocks of a system. The performance of the system depends on the performance of every component. In this section, we provide an analysis of the performance of a simple system that contains a single component. Under this assumption, we do not need to worry about component failure dependency that may exist in multiple-component systems. In a single-component system, each performance measure of the system is equal to the performance measure of the component. These performance measures include mission reliability, reliability as a function of time t, MTTF, MTTR, MTBF, availability function, and steady-state availability. Since the focus of this book is on system performance evaluation, we assume that the lifetime distribution of each component is known. In other words, component mission reliability p, component reliability function R(t), component CDF F(t), component failure rate function h(t), component pdf f (t), and MTTF or the expected life are all assumed to be given. The lifetime distribution of the component reflected in these functions and constants represents the inherent reliability of the component under specified operation conditions without external intervention such as repair or maintenance. When repairs and maintenance are utilized to extend the life of the component, we need to have a model for the repair and maintenance activities. With these repair and maintenance models, we are then able to evaluate the MTBF, availability function, and steady-state availability of the component. When the component is failed, it is usually assumed that a repairman starts performing repair on the component immediately. The time duration required to repair the component is a random variable. It may be described by a repair time distribution and/or its average MTTR. The state of the one-component system with repair provisions can be described by an alternating renewal process (see Chapter 2 for discussions on stochastic processes) if every repair restores the component to “as good as new” condition (this
106
FUNDAMENTAL SYSTEM RELIABILITY MODELS
kind of repair is a perfect repair), the consecutive working times are i.i.d., and the consecutive repair times are i.i.d. When each repair restores the condition of the component to “as bad as old” (this kind of repair is a minimal repair), the number of repairs that the component goes through as a function of time can be described by a nonhomogeneous Poisson process. If the distributions of the sojourn times in state 1 and state 0 both follow the exponential distribution, the state of the component (1 or 0) can be described by a birth-and-death process, regardless of whether perfect repair or minimal repair is assumed. In this case, the repair policy does not make any difference because of the memoryless property of the exponential distribution. Working and Repair Time Durations with Independent Exponential Distributions Suppose that the lifetime distribution of the component is exponential with the parameter λ > 0 and the repair time distribution is exponential with the parameter µ > 0. Under these assumptions, the following are known: MTTF =
1 , λ
MTTR =
1 . µ
The availability function of the component is equal to the probability that the component is in the working state. As derived in Example 2.6, we have the following availability measures [128]: µ λ + e−(λ+µ)t , λ+µ λ+µ µ As = . λ+µ
As (t) =
(4.38) (4.39)
Working and Repair Time Durations with General Distributions When the component becomes failed, it is replaced with an identical new component. This replacement is equivalent to a perfect repair. Thus, the state of the system oscillates between working and failed. Let the working time durations be i.i.d. random variables denoted by X 1 , X 2 , . . . and the failure time durations (or repair time durations) be i.i.d. random variables denoted by Y1 , Y2 , . . . . Then, X i + Yi is called a renewal cycle (i = 1, 2, . . . ). The renewal time points are represented by T1 , T2 , . . . . Apparently, we have Ti = Ti−1 + X i + Yi ,
i = 1, 2, . . . ,
where T0 ≡ 0. The number of renewals by time point t, denoted by N (t), forms an alternating renewal process. According to Theorem 2.5, we know that the steadystate availability of the system is As =
E(X i ) , E(X i ) + E(Yi )
for any i ≥ 1.
When no repairs are allowed, the reliability function of the system, Rs (t), is simply the reliability function of X 1 . When repairs are allowed, general expressions of the system availability function are difficult or impossible to obtain.
SERIES SYSTEM MODEL
107
4.9 SERIES SYSTEM MODEL The series structure and the parallel structure are the most fundamental and widely used system structures in classical reliability theory. We have used them to illustrate concepts like the reliability block diagram, system reliability, and system reliability function in this chapter. Because of their importance in reliability theory, we provide more detailed analyses of these two system structures in the following section. The reliability block diagram of a series system is given in Figure 4.2. When the components are independent, its system reliability equation Rs is given in equation (4.25) and its system reliability function Rs (t) is given in equation (4.30). Independence of component failures is assumed unless stated otherwise. 4.9.1 System Reliability Function and MTTF Assume that the system is not repairable. Using equations (2.94) and (4.30), we find that the failure rate function of the series system is h s (t) = −
n n n d d f i (t) ln[Ri (t)] = h i (t). ln[Rs (t)] = − = dt dt i=1 R (t) i=1 i=1 i
(4.40)
From this equation, we see that the failure rate of a series system is equal to the sum of the failure rates of the components. As a result, the failure rate of the system is usually much higher than the failure rate of each component when the system size is large. This can also be seen from the following equation for a series system: Ts = min{T1 , T2 , . . . , Tn }.
(4.41)
The lifetime of a series system is equal to the smallest lifetime among the lifetimes of all components. The series system can only last as long as the weakest component in the system. The reliability of a series system is no more than the reliability of the worst component. The series system model is sometimes referred to as the competing failure model based on the following interpretations. All n components are competing to be the first one to cause the system to fail. The lifetime of the component that fails first determines the lifetime of the series system. For optimal system design, one should try to reduce the number of components that are connected in series. As mentioned earlier, the series configuration usually indicates the number of distinct functions that have to be performed for the system to perform the intended function. This usually means that the number of series components or subsystems is fixed. In this case, other means such as enhancement of component reliability and/or introduction of redundancy are often utilized to increase system reliability. Under special circumstances, the lifetime of a series system has a simple analytical statistical distribution. For example, if all components have exponential lifetime distributions, the system will also have an exponential lifetime distribution. Let λi be the failure rate of component i. Then, λi is the parameter of the exponential lifetime
108
FUNDAMENTAL SYSTEM RELIABILITY MODELS
distribution of component i. The failure rate of the series system, denoted by λs , is given by λs =
n
λi .
(4.42)
i=1
Because λs is also a constant, this means that the system’s lifetime follows the exponential distribution with the parameter λs . The reliability function of the system is given by
n −λs t Rs (t) = e = exp − λi t , t ≥ 0. (4.43) i=1
The MTTF of the system is given by MTTFs =
1 1 = n . λs λ i=1 i
(4.44)
When the components are i.i.d. with a constant failure rate of λ, we have λs = nλ, Rs (t) = e MTTFs =
(4.45)
−nλt
,
t ≥ 0,
1 . nλ
(4.46) (4.47)
Example 4.12 A system to be designed has a series structure with 10 subsystems. The required system reliability for a mission time of 200 hours is specified to be not less than 0.95. It is believed that the lifetime of each subsystem is i.i.d. with the exponential lifetime distribution. Find the maximum failure rate allowed for each subsystem. What is the MTTF and the reliability of the system for a mission with a duration of 300 hours with the maximum failure rate allowed for each subsystem? From Rs (200) ≥ 0.95, we have Rs (200) = e −200λs ≥ 0.95, which means λs ≤ 2.5647 × 10−4 . Let λ be the failure rate of each subsystem. From λs = 10λ ≤ 2.5647 × 10−4 , we have λ ≤ 2.5647 × 10−5 . Using equations (4.43) and (4.44), we have MTTFs =
1 1 = ≈ 3899.09 hours, λs 2.5647 × 10−4 −4
Rs (300) = e−300×2.5647×10
≈ 0.9259.
SERIES SYSTEM MODEL
109
When the failure rate function of each component is a (decreasing or increasing) linear function of time t, that is, h i (t) = ai + bi t,
t ≥ 0,
we can find the following performance measures of the series system: h s (t) =
n
h i (t) = as + bs t,
(4.48)
h s (x) d x = as t + 12 bs t 2 ,
(4.49)
i=1
Hs (t) =
t
0
3 2 Rs (t) = e−Hs (t) = exp − as t + 12 bs t 2 , MTTFs =
∞
0
(4.50)
√ 2 eas /(2bs ) 2π as Rs (t) dt = √ 1− √ , bs bs
(4.51)
where
(x) =
x −∞
1 2 √ e−t /2 dt, 2π
as =
n
ai ,
bs =
i=1
n
bi .
i=1
When the lifetime distribution of component i follows the Weibull distribution with the shape parameter βi and the scale parameter ηi , that is, βi h i (t) = ηi
t ηi
βi −1
= ai t bi ,
t > 0,
β
where βi > 0, ηi > 0, ai = βi /ηi i , and bi = βi − 1, we have the following performance measures of the series system: h s (t) =
n
h i (t) =
i=1
Hs (t) =
0
t
n
ai t bi ,
t > 0,
(4.52)
t > 0,
(4.53)
t > 0.
(4.54)
i=1
Rs (t) = e−Hs (t)
n
ai bi +1 , t b +1 i=1 i
n ai bi +1 , = exp − t b +1 i=1 i
h s (x) d x =
Exercise 1. Derive the expression of MTTFs for a series system when its reliability function is as given in equation (4.54).
110
FUNDAMENTAL SYSTEM RELIABILITY MODELS
4.9.2 System Availability Now we will consider repairable series systems. First consider the case when all components are i.i.d. with a constant failure rate denoted by λ. This means that all components follow the exponential lifetime distribution. At time 0, the system is in the working state. As soon as a failure of a component occurs, the system goes into the failed state. We assume that no more component failures will occur once the system is failed and the probability for two components to fail at exactly the same time is negligible. Under these conditions, the failure state of the system has only one failed component. Let µ represent the repair rate of a failed component. It is also the repair rate of the system because the system is restored to the working condition as soon as the only failed component is repaired. Because of the memoryless property of the exponential distribution, a repaired component is as good as new. The number of failed components in the series system forms a birth-and-death process with a state space of {0, 1}. The state 0 of the process represents the working state of the system while the state 1 of the process represents the failed state of the system. The transition diagram of the birth-and-death process is given in Figure 4.9. The only difference between Figure 4.9 for a series system and Figure 2.10 for a one-component system discussed in Chapter 2 is the transition rate from state 0 to state 1 of the birth-and-death process. The transition rate is nλ in Figure 4.9 while it is λ in Figure 2.10. Replacing λ by nλ in the equations in Example 2.6, we obtain the following performance measures of the repairable series system: nλ µ + e−(nλ+µ)t , nλ + µ nλ + µ µ , As = nλ + µ
As (t) =
MTBFs = MTTFs = MTTRs =
1 , nλ
1 . µ
(4.55) (4.56) (4.57) (4.58)
From these equations, we see that the larger the system size n, the smaller the availability and the MTBF of the series system. When the components are not identical but still follow independent exponential distributions, we can also analyze the availability of a series system. Let λi and µi be the failure rate and repair rate of component i, respectively. The state of the system
nλ
0
1 µ
FIGURE 4.9
Transition diagram of birth-and-death process for series system.
SERIES SYSTEM MODEL
1
µ1
µ2
0
µn λn
λ3 µ3
...
n FIGURE 4.10
2
λ2
λ1
111
3
Transition diagram of series system with independent components.
is indicated by 0, 1, 2, . . . , n, where state 0 represents that the system is working and state i represents that the system is failed due to the failure of component i for 1 ≤ i ≤ n. The state transition diagram of the system is given in Figure 4.10. Based on Example 2.6, the steady-state availability of each component, under the assumption that a repair facility is always available for each component, is Ai =
µi , λi + µi
i = 1, 2, . . . , n.
If there is always a repair facility available such that there are no queues of failed components, we can substitute availability measures in the place of reliability measures to express the steady-state availability of the series system as a function of the steady-state availabilities of the components, as given below: As =
n . i=1
Ai =
n .
µi . λ + µi i=1 i
For repairable series systems whose components do not have the exponential distribution, it is difficult or impossible to derive closed-form expressions for the system availability function. Dependence of component failures does not have any impact on the reliability of a series system, though it may have impact on the cost of operating the system. The satisfactory operations of other components do not improve the reliability of an individual component. The failure of a component may cause other components to fail simultaneously, thus increasing the cost of repairing the system since there are more failed components. Because the first component failure has already caused the system to fail, the additional failures have no impact on system reliability. Thus, even though component failures are dependent, the reliability of a series system is still as expressed in equation (4.30).
112
FUNDAMENTAL SYSTEM RELIABILITY MODELS
RS 1 0.8 0.6 0.4
p = 0.5
0.2 0 0
FIGURE 4.11
2
4
6
8
10
n
Reliability of parallel system as function of system size n.
4.10 PARALLEL SYSTEM MODEL A parallel system works if and only if at least one component works. It fails if all components are failed. The reliability block diagram of a parallel system is given in Figure 4.3. When the components are independent, its system reliability equation Rs is given in equation (4.28). Since only one component needs to work for the system to work, a parallel system is a redundant system structure. Only one of the components is essential while others are said to be redundant. The purpose of utilizing a parallel structure is to increase system reliability through redundancy. Figure 4.11 shows the relationship between the reliability of a parallel system and the number of i.i.d. components in the system. From Figure 4.11, we have the following observations: 1. The reliability of a parallel system approaches 1 as n goes to infinity. This means that no matter how low the reliability of a component is, we can always achieve very high system reliability through redundancy. 2. From this figure, the amount of improvement in system reliability by each additional component becomes smaller as the system size becomes larger. In other words, there is a diminishing return from the addition of one more component as the system size increases. As a result, there is an issue of determination of the optimal system size of a parallel system in order to maximize system profit or minimize system cost. Kuo et al. [132] provide extensive coverage on this issue. 4.10.1 System Reliability Function and MTTF Consider a nonrepairable parallel system with n components. The system reliability as a function of time t, Rs (t), is given in equation (4.32). The lifetime of a parallel
PARALLEL SYSTEM MODEL
113
system can be expressed as Ts = max{T1 , T2 , . . . , Tn }.
(4.59)
This means that the lifetime of a parallel system is equal to the longest lifetime among the lifetimes of all components. The parallel system can last as long as the best component in the system. There are no simple expressions for the system reliability function even when components all have the exponential lifetime distributions. In the following consider this simplest case. First we will consider a two-component parallel system. Component i has a constant failure rate denoted by λi for 1 ≤ i ≤ 2. The system performance measures are Rs (t) = 1 − F1 (t)F2 (t) = 1 − (1 − e−λ1 t )(1 − e−λ2 t ) = e −λ1 t + e −λ2 t − e−(λ1 +λ2 )t , ∞ 1 1 1 Rs (t) dt = + − . MTTFs = λ λ λ + λ2 1 2 1 0 Similarly, we can find the MTTF of a three-component parallel system: MTTFs =
1 1 1 1 1 + + − − λ1 λ2 λ3 λ1 + λ2 λ1 + λ3 −
1 1 + . λ2 + λ3 λ1 + λ2 + λ3
(4.60)
From these two cases, we can see that there is no simple closed expression for the MTTF of a general parallel system with n components. However, if we assume that all components are i.i.d. with the common constant failure rate λ, we can express the reliability function and the MTTF of the system as Rs (t) = 1 − (1 − e−λt )n , ∞ Rs (t) dt = MTTFs = 0
=
n 1 1 . λ i=1 i
(4.61) ∞2 0
3
1 − (1 − e−λt )n dt (4.62)
We have the following interpretations for equation (4.62). In a parallel system, only one component is essential. When the n components are i.i.d., the MTTF of the first component is 1/λ. Adding the second component into the system will increase the MTTF of the system by one-half of 1/λ. Adding the third component into the system will increase the MTTF of the system by one-third of 1/λ. Adding the nth component to the system will increase the system’s MTTF by one nth of 1/λ. This also shows that increasing the system size of a parallel system has a diminishing return as the system size increases.
114
FUNDAMENTAL SYSTEM RELIABILITY MODELS
λ1
λ0
0 µ1 FIGURE 4.12
2
1 µ2
State transition diagram of parallel system with two i.i.d. components.
4.10.2 System Availability of Parallel System with Two i.i.d. Components Before analyzing the availability of a general repairable parallel system, we first consider a two-component parallel system. The two components are i.i.d. with the exponential lifetime distribution. Let λ be the failure rate of each component. When there are i failed components in the system, the failure rate of the system is denoted by λi for i = 0, 1 and the repair rate is denoted by µi for i = 1, 2. We will use the number of failed components to indicate the state of the system with a state space {0, 1, 2}. The state transition diagram of the system is given in Figure 4.12. It is obvious that we have λi = (2 − i)λ,
i = 0, 1.
The value of µi depends on how many repair facilities are available for the system. The number of failed components in the system forms a birth-and-death process with a finite population. Let Pi (t) denote the probability that the system is in state i at time t. From Figure 4.12, we can derive the following differential equations. These equations can also be obtained from equations (2.174)–(2.176): P0 (t) = −λ0 P0 (t) + µ1 P1 (t), P1 (t) = λ0 P0 (t) − (λ1 + µ1 )P1 (t) + µ2 P2 (t), P2 (t) = λ1 P1 (t) − µ2 P2 (t), with boundary conditions P0 (0) = 1 and P1 (0) = P2 (0) = 0. This set of differential equations is exactly the same as those given in Example A.4 except that the notation xi (t) instead of Pi (t) for i = 0, 1, 2 is used in Example A.4. With the Laplace transform technique, explained in the Appendix, the solutions of L(xi (t)) or L(P0 (t)) for i = 0, 1, 2 are found in Example A.4. L(P0 (t)) =
(s + λ1 + µ1 )(s + µ2 ) − λ1 µ2 , (s + λ0 )(s + λ1 + µ1 )(s + µ2 ) − λ0 µ1 (s + µ2 ) − λ1 µ2 (s + λ0 ) (4.63)
L(P1 (t)) =
λ0 (s + µ2 ) , (s + λ0 )(s + λ1 + µ1 )(s + µ2 ) − λ0 µ1 (s + µ2 ) − λ1 µ2 (s + λ0 ) (4.64)
PARALLEL SYSTEM MODEL
L(P2 (t)) =
115
λ0 λ1 . (s + λ0 )(s + λ1 + µ1 )(s + µ2 ) − λ0 µ1 (s + µ2 ) − λ1 µ2 (s + λ0 ) (4.65)
Further simplification of these Laplace transforms is difficult or impossible without specifying the relationships among µi for i = 1, 2 and among λi for i = 0, 1. However, as illustrated in Example A.4, the steady-state probability for the system to be in each state can be derived without further knowledge of the λi and µi values: P0 ≡ P0 (∞) =
µ1 µ2 , µ1 µ2 + λ0 µ2 + λ0 λ1
(4.66)
P1 ≡ P1 (∞) =
λ0 µ2 , µ1 µ2 + λ0 µ2 + λ0 λ1
(4.67)
P2 ≡ P2 (∞) =
λ0 λ1 . µ1 µ2 + λ0 µ2 + λ0 λ1
(4.68)
Thus, the steady-state availability of the two-component parallel system with i.i.d. components is As = P0 + P1 =
µ1 µ2 + λ0 µ2 . µ1 µ2 + λ0 µ2 + λ0 λ1
(4.69)
To find closed expressions of the availability function As (t) of the two-component parallel system with i.i.d. components, we consider the following scenarios for repair facilities. Case I: A Single Repair Facility In this case, we have µi = µ for i = 1, 2. Substituting λ0 = 2λ, λ1 = λ, and µ1 = µ2 = µ into equations (4.63)–(4.65), we have L(P0 (t)) =
s 2 + (λ + 2µ)s + µ2 , |A|
(4.70)
L(P1 (t)) =
2λ(s + µ) , |A|
(4.71)
L(P2 (t)) =
2λ2 , |A|
(4.72)
where | A | = s 3 + (3λ + 2µ)s 2 + (2λ2 + 2λµ + µ2 )s = s(s − s1 )(s − s2 ), −(3λ + 2µ) + λ2 + 4λµ , s1 = 2 −(3λ + 2µ) − λ2 + 4λµ s2 = . 2
116
FUNDAMENTAL SYSTEM RELIABILITY MODELS
Then, equations (4.70)–(4.72) can be written as L(P0 (t)) =
s 2 + (λ + 2µ)s1 + µ2 s 2 + (λ + 2µ)s2 + µ2 µ2 + 1 + 2 , s1 s2 s s1 (s1 − s2 )(s − s1 ) s2 (s2 − s1 )(s − s2 )
L(P1 (t)) =
2λµ 2λ(s1 + µ) 2λ(s2 + µ) + + , s1 s2 s s1 (s1 − s2 )(s − s1 ) s2 (s2 − s1 )(s − s2 )
L(P2 (t)) =
2λ2 2λ2 2λ2 + + . s1 s2 s s1 (s1 − s2 )(s − s1 ) s2 (s2 − s1 )(s − s2 )
Inverse Laplace transforms of these equations yield P0 (t) =
s 2 + (λ + 2µ)s1 + µ2 s1 t s22 + (λ + 2µ)s2 + µ2 s2 t µ2 + 1 e + e , (4.73) s1 s2 s1 (s1 − s2 ) s2 (s2 − s1 )
P1 (t) =
2λµ 2λ(s1 + µ) s1 t 2λ(s2 + µ) s2 t + e + e , s1 s2 s1 (s1 − s2 ) s2 (s2 − s1 )
(4.74)
P2 (t) =
2λ2 2λ2 2λ2 + e s1 t + es2 t . s1 s2 s1 (s1 − s2 ) s2 (s2 − s1 )
(4.75)
The availability function of the system can be written as As (t) = P0 (t) + P1 (t) = 1 − P2 (t) =
2λ2 2λ2 2λµ + µ2 s1 t − − e es2 t . s1 (s1 − s2 ) s2 (s2 − s1 ) 2λ2 + 2λµ + µ2
(4.76)
The steady-state availability can be obtained from this equation as As = lim As (t) = t→∞
2λµ + µ2 . 2λ2 + 2λµ + µ2
(4.77)
This equation can also be confirmed with equation (4.69). Case II: Two Identical Repair Facilities When one component is failed, only one repair facility can be used for repair of the failed component. We cannot simultaneously use two repair facilities to repair a single component. When both components are failed, each one is repaired at a separate repair facility. Readers may refer to Misra [168] for discussion on repair facilities that are capable of providing joint servicing of a single component. In this case, we have µ1 = µ and µ2 = 2µ. Substituting λ0 = 2λ, λ1 = λ, µ1 = µ, and µ2 = 2µ into equations (4.63)–(4.65), we have L(P0 (t)) =
s 2 + (λ + 3µ)s + 2µ2 , |A|
(4.78)
PARALLEL SYSTEM MODEL
117
L(P1 (t)) =
2λ(s + 2µ) , |A|
(4.79)
L(P2 (t)) =
2λ2 , |A|
(4.80)
where | A | = s 3 + 3(λ + µ)s 2 + 2(λ + µ)2 s = s(s + λ + µ)(s + 2λ + 2µ). Then, equations (4.78)–(4.80) can be written as L(P0 (t)) =
µ2 /(λ + µ)2 2λµ/(λ + µ)2 λ2 /(λ + µ)2 + + , s s+λ+µ s + 2(λ + µ)
L(P1 (t)) =
2λµ/(λ + µ)2 2λ(λ − µ)/(λ + µ)2 2λ2 /(λ + µ)2 + − , s s+λ+µ s + 2(λ + µ)
L(P2 (t)) =
λ2 /(λ + µ)2 2λ2 /(λ + µ)2 λ2 /(λ + µ)2 − + . s s+λ+µ s + 2(λ + µ)
Inverse Laplace transforms of these equations yield P0 (t) =
µ2 2λµ λ2 −(λ+µ)t + e + e−2(λ+µ)t , (λ + µ)2 (λ + µ)2 (λ + µ)2
(4.81)
P1 (t) =
2λµ 2λ(λ − µ) −(λ+µ)t 2λ2 + e − e−2(λ+µ)t , 2 2 (λ + µ) (λ + µ) (λ + µ)2
(4.82)
P2 (t) =
λ2 2λ2 λ2 − e−(λ+µ)t + e−2(λ+µ)t . 2 2 (λ + µ) (λ + µ) (λ + µ)2
(4.83)
The availability function of the system is As (t) = P0 (t) + P1 (t) = 1 − P2 (t) =
2λµ + µ2 2λ2 λ2 −(λ+µ)t + e − e−2(λ+µ)t . (λ + µ)2 (λ + µ)2 (λ + µ)2
(4.84)
The steady-state availability can be obtained from this equation or equation (4.69) as As = lim As (t) = t→∞
µ2 + 2λµ . (λ + µ)2
(4.85)
The steady-state availability of the parallel system can also be obtained as follows. Since there are two repair facilities for the two-component parallel system, failed components receive repairs right away. In other words, there are no queues
118
FUNDAMENTAL SYSTEM RELIABILITY MODELS
of failed components waiting for service. Under this circumstance, the relationship between steady-state system availability and steady-state component availabilities is the same as the relationship between system reliability and component reliabilities. With equation (4.28), we have
n .
µ (1 − Ai ) = 1 − 1 − As = 1 − λ + µ i=1
2 =
µ2 + 2λµ . (λ + µ)2
4.10.3 System Availability of Parallel System with Two Different Components Consider a parallel system with two different components both having the exponential lifetime distribution. Component 1 has a constant failure rate denoted by γ1 while component 2 has a constant failure rate denoted by γ2 . The repair rates of components 1 and 2 are denoted by ν1 and ν2 , respectively. Because the two components are different, we have to consider the number of repair facilities separately too. Case I: Two Repair Facilities In this case, we need to differentiate the following four different system states: • • • •
State 0: Both components are working. State 1: Component 2 is working and component 1 is failed. State 2: Component 1 is working and component 2 is failed. State 3: Both components are failed.
The system state transition diagram is given in Figure 4.13. From Figure 4.13, we derive the following differential equations: P0 (t) = −(γ1 + γ2 )P0 (t) + ν1 P1 (t) + ν2 P2 (t),
(4.86)
P1 (t)
(4.87)
= γ1 P0 (t) − (γ2 + ν1 )P1 (t) + ν2 P3 (t),
1
␥1
0
␥2
1
2
2
1
␥2
3 ␥1
2 FIGURE 4.13 State transition diagram of two-component parallel system with different components and two repair facilities.
PARALLEL SYSTEM MODEL
119
P2 (t) = γ2 P0 (t) − (γ1 + ν2 )P2 (t) + ν1 P3 (t),
(4.88)
P3 (t) = γ2 P1 (t) + γ1 P2 (t) − (ν1 + ν2 )P3 (t).
(4.89)
When the system reaches its steady state, Pi (t) becomes a constant for i = 0, 1, 2, 3 and Pi (t) = 0 for i = 0, 1, 2, 3. Thus, to find the steady-state availability of the system, we can use equations (4.86)–(4.89) directly by setting limt→∞ Pi (t) = 0 and limt→∞ Pi (t) ≡ Pi for i = 0, 1, 2, 3 to obtain the following equations: 0 = −(γ1 + γ2 )P0 + ν1 P1 + ν2 P2 ,
(4.90)
0 = γ1 P0 − (γ2 + ν1 )P1 + ν2 P3 ,
(4.91)
0 = γ2 P0 − (γ1 + ν2 )P2 + ν1 P3 ,
(4.92)
0 = γ2 P1 + γ1 P2 − (ν1 + ν2 )P3 .
(4.93)
This system of linear equations is linearly dependent. As a result, one of the equations can be simply removed. However, we need to add another equation in order to find a unique solution. This additional equation is P0 + P1 + P2 + P3 = 1.
(4.94)
Dropping equation (4.93) and adding equation (4.94), we have the following linear equations: −(γ1 + γ2 )P0 + ν1 P1 + ν2 P2 = 0,
(4.95)
γ1 P0 − (γ2 + ν1 )P1 + ν2 P3 = 0,
(4.96)
γ2 P0 − (γ1 + ν2 )P2 + ν1 P3 = 0,
(4.97)
P0 + P1 + P2 + P3 = 1.
(4.98)
From these equations, we find the following steady-state probabilities: ν1 ν2 , (γ1 + ν1 )(γ2 + ν2 ) γ1 ν2 P1 = , (γ1 + ν1 )(γ2 + ν2 ) γ2 ν1 , P2 = (γ1 + ν1 )(γ2 + ν2 ) γ1 γ2 P3 = . (γ1 + ν1 )(γ2 + ν2 ) P0 =
(4.99) (4.100) (4.101) (4.102)
The steady-state availability of the system is then As = P0 + P1 + P2 = 1 − P3 =
ν1 ν2 + γ1 ν2 + γ2 ν1 . (γ1 + ν1 )(γ2 + ν2 )
(4.103)
120
FUNDAMENTAL SYSTEM RELIABILITY MODELS
To find the availability function of the system, we need to solve differential equations (4.86)–(4.89). With the Laplace transform technique, we find the following system state distribution functions: P0 (t) =
1 (γ1 + ν1 )(γ2 + ν2 ) ν1 ν2 + γ1 ν2 e−(γ1 +ν1 )t + γ2 ν1 e−(γ2 +ν2 )t + γ1 γ2 e−(γ1 +γ2 +ν1 +ν2 )t ,
P1 (t) =
1 (γ1 + ν1 )(γ2 + ν2 ) γ1 ν2 − γ1 ν2 e−(γ1 +ν1 )t + γ1 γ2 e−(γ2 +ν2 )t − γ1 γ2 e−(γ1 +γ2 +ν1 +ν2 )t ,
P2 (t) =
1 (γ1 + ν1 )(γ2 + ν2 ) γ2 ν1 + γ1 γ2 e−(γ1 +ν1 )t − γ2 ν1 e−(γ2 +ν2 )t − γ1 γ2 e−(γ1 +γ2 +ν1 +ν2 )t ,
P3 (t) =
1 (γ1 + ν1 )(γ2 + ν2 ) γ1 γ2 − γ1 γ2 e−(γ1 +ν1 )t − γ1 γ2 e−(γ2 +ν2 )t + γ1 γ2 e−(γ1 +γ2 +ν1 +ν2 )t .
The availability function of the system is As (t) = P0 (t) + P1 (t) + P2 (t) = 1 − P3 (t) =
ν1 ν2 + γ1 ν2 + γ2 ν1 + γ1 γ2 e−(γ1 +ν1 )t + γ1 γ2 e−(γ2 +ν2 )t − γ1 γ2 e−(γ1 +γ2 +ν1 +ν2 )t . (γ1 + ν1 )(γ2 + ν2 ) (4.104)
Case II: A Single Repair Facility In this case, we need to differentiate the following five different system states: • • • • •
State 0: Both components are working. State 1: Component 2 is working and component 1 is failed. State 2: Component 1 is working and component 2 is failed State 3: Both components are failed and component 1 is under repair. State 4: Both components are failed and component 2 is under repair.
The system state transition diagram is given in Figure 4.14. From Figure 4.14, we derive the following differential equations: P0 (t) = −(γ1 + γ2 )P0 (t) + ν1 P1 (t) + ν2 P2 (t),
(4.105)
P1 (t)
(4.106)
= γ1 P0 (t) − (γ2 + ν1 )P1 (t) + ν2 P4 (t),
PARALLEL SYSTEM MODEL
121
␥2
3
1
␥1
1
1
0
2
2
␥2
2
␥1
4
FIGURE 4.14 State transition diagram of parallel system with two different components and one repair facility.
P2 (t) = γ2 P0 (t) − (γ1 + ν2 )P2 (t) + ν1 P3 (t),
(4.107)
P3 (t) = γ2 P1 (t) − ν1 P3 (t),
(4.108)
P4 (t)
(4.109)
= γ1 P2 (t) − ν2 P4 (t).
Using similar methods to the ones used when there are two repair facilities, we find the following steady-state system state distributions: P0 =
ν1 ν2 (γ1 ν1 + γ2 ν2 + ν1 ν2 ) , ν1 ν2 (γ1 ν1 + γ2 ν2 + ν1 ν2 ) + γ1 ν2 (γ1 + γ2 + ν2 )(γ2 + ν1 ) + γ2 ν1 (γ1 + γ2 + ν1 )(γ1 + ν2 )
P1 =
γ1 ν1 ν2 (γ1 + γ2 + ν2 ) , ν1 ν2 (γ1 ν1 + γ2 ν2 + ν1 ν2 ) + γ1 ν2 (γ1 + γ2 + ν2 )(γ2 + ν1 ) + γ2 ν1 (γ1 + γ2 + ν1 )(γ1 + ν2 )
P2 =
γ2 ν1 ν2 (γ1 + γ2 + ν1 ) , ν1 ν2 (γ1 ν1 + γ2 ν2 + ν1 ν2 ) + γ1 ν2 (γ1 + γ2 + ν2 )(γ2 + ν1 ) + γ2 ν1 (γ1 + γ2 + ν1 )(γ1 + ν2 )
P3 =
γ1 γ2 ν2 (γ1 + γ2 + ν2 ) , ν1 ν2 (γ1 ν1 + γ2 ν2 + ν1 ν2 ) + γ1 ν2 (γ1 + γ2 + ν2 )(γ2 + ν1 ) + γ2 ν1 (γ1 + γ2 + ν1 )(γ1 + ν2 )
P4 =
γ1 γ2 ν1 (γ1 + γ2 + ν1 ) . ν1 ν2 (γ1 ν1 + γ2 ν2 + ν1 ν2 ) + γ1 ν2 (γ1 + γ2 + ν2 )(γ2 + ν1 ) + γ2 ν1 (γ1 + γ2 + ν1 )(γ1 + ν2 )
The steady-state availability of the system is As
=
P0 + P1 + P2 = 1 − (P3 + P4 )
=
ν1 ν2 (γ1 ν1 + γ2 ν2 + ν1 ν2 ) + γ1 ν1 ν2 (γ1 + γ2 + ν2 ) + γ2 ν1 ν2 (γ1 + γ2 + ν1 ) . ν1 ν2 (γ1 ν1 + γ2 ν2 + ν1 ν2 ) + γ1 ν2 (γ1 + γ2 + ν2 )(γ2 + ν1 ) + γ2 ν1 (γ1 + γ2 + ν1 )(γ1 + ν2 )
(4.110) To find the availability function of the system, we need to solve differential equations (4.105)–(4.109). The Laplace transform technique may be used to find this solution. However, the process becomes tedious and the final expressions are quite long. It will be much easier to solve the differential equations if we know the numerical values of the constants γ1 , γ2 , ν1 , and ν2 .
122
FUNDAMENTAL SYSTEM RELIABILITY MODELS
Exercise 1. Assume a set of values of γ1 , γ2 , ν1 , and ν2 and derive the availability function of the parallel system using equations (4.105)–(4.109). 4.10.4 Parallel Systems with n i.i.d. Components From the analyses of parallel systems with two components, we see that it is tedious and time consuming to derive system availability as a function of time. However, it is relatively easy to derive the steady-state availability under the assumption that the component lifetimes are exponentially distributed. In this section, we limit ourselves to parallel systems with n i.i.d. components with the exponential lifetime distribution. Let λ be the failure rate of each component. Since all components are i.i.d., we may use the number of failed components to indicate the state of the system. When the number of failed components reaches n, the system is failed; otherwise, it is working. Suppose that there are r identical repair facilities, where 1 ≤ r ≤ n. Let µ be the repair rate of each failed component at each repair facility. Let λi and µi represent, respectively, the failure rate and the repair rate of the system when there are i failed components in the system. Then, we have the expressions i = 0, 1, 2, . . . , n − 1, λi = (n − i)λ, iµ, i = 1, 2, 3, . . . , r, µi = r µ, i = r + 1, r + 2, . . . , n. The system state transition diagram of such a parallel system is given in Figure 4.15. From this figure, we derive the following differential equations: P0 (t) = −λ0 P0 (t) + µ1 P1 (t),
(4.111)
Pi (t) = λi−1 Pi−1 (t) − (λi + µi )Pi (t) + µi+1 Pi+1 (t),
(4.112)
i = 1, 2, 3, . . . , n − 1, Pn (t) = λn−1 Pn−1 (t) − µn Pn (t),
(4.113)
with boundary conditions P0 (0) = 1 and Pi (0) = 0 for i = 1, 2, . . . , n. The transition diagram given in Figure 4.15 and differential equations (4.111)– (4.113) describe a birth-and-death process with a finite population. The differential λ0
0
λ1
1 µ1
FIGURE 4.15
λ2
2 µ2
λ3
...
3 µ3
λn-1
µ4
n µn
State transition diagram of parallel system with n i.i.d. components.
PARALLEL SYSTEM MODEL
123
equations for a birth-and-death process with a finite population and an initial population of i are given in equations (2.174)–(2.176). The limiting distribution of the birth-and-death process is given in equation (2.177). As a result, the steady-state distribution of the state of the parallel system is given by
−1 n λ j−1 λ j−2 · · · λ1 λ0 , P0 = 1 + µ j µ j−1 · · · µ2 µ1 j=1 Pi =
λi−1 Pi−1 , µi
(4.114)
i = 1, 2, . . . , n.
(4.115)
The steady-state availability of the system is then (4.116) As = P0 + P1 + · · · + Pn−1 = 1 − Pn −1 n λ j−1 λ j−2 · · · λ1 λ0 λn−1 λn−2 · · · λ1 λ0 1 + . (4.117) =1− µn µn−1 · · · µ2 µ1 µ µ · · · µ µ j j−1 2 1 j=1 Example 4.13 Consider a parallel system with 10 i.i.d. components and 4 i.i.d. repair facilities. The failure rate of each component is λ = 0.001 per hour and the repair rate of each failed component is µ = 0.001 per hour. What is the steady-state availability of the system? Based on the given data, we have λi = (10 − i)λ = 0.001(10 − i),
i = 0, 1, 2, . . . , 9,
µi = iµ = 0.001i,
i = 1, 2, 3, 4,
µi = 4µ = 0.004,
i = 5, 6, 7, 8, 9, 10.
Using equation (4.114), we find −1 10 λ j−1 λ j−2 · · · λ1 λ0 P0 = 1 + µ µ · · · µ µ j j−1 2 1 j=1
λ0 λ0 λ1 λ0 λ 1 λ9 = 1+ + + ··· + ··· µ1 µ1 µ2 µ1 µ2 µ10
−1
P1 =
λ0 P0 ≈ 0.0051, µ1
P2 =
λ1 P1 ≈ 0.0229, µ2
P3 =
λ2 P2 ≈ 0.0610, µ3
P4 =
λ3 P3 ≈ 0.1067, µ4
P5 =
λ4 P4 ≈ 0.1600, µ5
P6 =
λ5 P5 ≈ 0.2000, µ6
≈ 0.0005,
124
FUNDAMENTAL SYSTEM RELIABILITY MODELS
P7 =
λ6 P6 ≈ 0.2000, µ7
P8 =
λ7 P7 ≈ 0.1500, µ8
P9 =
λ8 P8 ≈ 0.0750, µ9
P10 =
λ9 P9 ≈ 0.0188. µ10
The steady-state availability of the system is As = 1 − P10 ≈ 0.9812. In the analysis provided above, we have assumed that there are r repair facilities. When the number of failed components is greater than r , some failed components have to wait in line for repair. When failed components do not need to wait for repair (i.e., failed components receive repair services right away no matter how many components are failed), there is a simpler expression for the availability of a parallel system with independent components. Let Ai be the availability of component i, that is, Ai =
µi , λi + µi
i = 1, 2, . . . , n,
where λi is the failure rate of component i and µi is the repair rate of component i. We can simply substitute availability measures in the places of reliability measures to derive the expression of system availability, that is, As = 1 −
n . i=1
(1 − Ai ) = 1 −
n . i=1
1−
µi λi + µi
=1−
n .
λi . λ + µi i=1 i
Parallel redundancy has been widely used to enhance system reliability. However, the benefits may not be as large as illustrated above where we have assumed the assumption that component failures are independent. Failures of components may be dependent when the components are sharing a constant load. As more components become failed, the load to be carried by each surviving component increases. This case will be analyzed later. Another factor that should be considered is the so-called common-cause failures. There may exist common causes whose occurrences will cause all components to fail simultaneously. A third factor is that a parallel system may need to be aware of how many components are working in the system. If a component becomes failed, the system needs to reconfigure itself for the situation with n − 1 components working. The function of sensing component failures and reconfiguration of the system to adjust for component failures may be imperfect. This situation will be discussed under imperfect fault coverage in Chapter 7.
4.11 PARALLEL–SERIES SYSTEM MODEL A parallel–series system consists of m disjoint modules that are connected in parallel and module i for 1 ≤ i ≤ m consists of n i components that are connected in series.
PARALLEL–SERIES SYSTEM MODEL
Module 1
2
...
n1
1
2
...
n2
2
...
nm
...
Module 2
1
125
Module m
1
FIGURE 4.16
Parallel–series system structure.
The reliability block diagram of a parallel–series system is given in Figure 4.16. In such a parallel–series system, there are m minimal paths and they do not have any components in common. Using the technique of modular decomposition, we can first express the reliability of each module as a function of its component reliabilities and then the reliability of the system as a function of the reliabilities of the modules. Notation • •
pi j : reliability of component j in module i, 1 ≤ i ≤ m, 1 ≤ j ≤ n i qi j : 1 − pi j , 1 ≤ i ≤ m, 1 ≤ j ≤ n i
The reliability of module i is Pi =
ni .
pi j ,
i = 1, 2, . . . , m.
j=1
The reliability of a parallel–series system is Rs = 1 −
m .
(1 − Pi ) = 1 −
i=1
m .
1 −
i=1
ni .
pi j .
(4.118)
j=1
When the components are i.i.d. with reliability pi j = p and the number of components in each module is constant, that is, n i = n, the system reliability can be expressed as Rs = 1 − (1 − p n )m . To find the time-dependent performance measures of a parallel–series system assuming that it is nonrepairable, we simply treat the reliability of each component as a function of time t, as given below:
126
FUNDAMENTAL SYSTEM RELIABILITY MODELS
Rs (t) =
ni m . . 1 − 1 − Ri j (t) i=1
if components are independent,
j=1
1 − [1 − (R(t))n ]m
if components are i.i.d. and module sizes are the same. (4.119)
When all components have the exponential lifetime distributions with the parameters λi j for 1 ≤ i ≤ m and 1 ≤ j ≤ n i , we have
Rs (t) =
ni m . . 1 − e−λi j t 1 − i=1
j=1
1 − (1 − e−λnt )m
if components have constant failure rates,
(4.120)
if components are i.i.d. and module sizes are the same.
When all components have an identical failure rate λ and the module sizes are equal, the MTTF of a parallel–series system has a simple expression, as given below: MTTFs = = = =
∞
0
1 nλ 1 nλ
Rs (t) dt =
1 0 1 0
∞2 0
3 1 − (1 − e−λnt )m dt
1 − xm dx 1−x (1 + x + x 2 + · · · + x m−1 ) d x,
1 − x = e−λnt ,
m 1 1 1 1 1 1 . 1 + + + ··· + = nλ 2 3 m nλ i=1 i
This equation is similar to the MTTF expression of a parallel system. The parallel–series system can be considered to be a parallel system with m subsystems whose structures are in series. Each series subsystem with n components has a MTTF of 1/(nλ). The contribution of the second series subsystem to the MTTF of the system is one-half of 1/(nλ). The contribution of the third series subsystem is one third of 1/(nλ). Additional subsystems have diminishing benefits. It is difficult to derive the availability function of a parallel–series system or the steady-state availability with limited repair facilities even under the assumption that all components follow the exponential lifetime distributions. However, when there are infinite repair facilities, the steady-state availability measures can be derived similarly to the reliability measures. Equation (4.118) can be used wherein all availability measures replace the reliability measures.
Module 1
Module 2
Module m
1
1
1
2
2
...
...
...
SERIES–PARALLEL SYSTEM MODEL
n1
n2
nm
FIGURE 4.17
127
2
...
Series–parallel system structure.
4.12 SERIES–PARALLEL SYSTEM MODEL A series–parallel system consists of m disjoint modules that are connected in series and module i for 1 ≤ i ≤ m consists of n i components that are connected in parallel. The reliability block diagram of a series–parallel system is given in Figure 4.17. In such a series–parallel system, there are m minimal cuts and they do not have any components in common. Using the technique of modular decomposition, we can first express the reliability of each module as a function of its component reliabilities and then the reliability of the system as a function of the reliabilities of the modules. With the same notation used for the parallel–series system, the reliability of module i is Pi = 1 −
ni .
(1 − pi j ),
i = 1, 2, . . . , m.
j=1
The reliability of a series–parallel system is Rs =
m . i=1
Pi =
m . i=1
1 −
ni .
(1 − pi j ) .
(4.121)
j=1
When the components are i.i.d. with unreliability qi j = q and the number of components in each module is the same, that is, n i = n, the system reliability can be expressed as Rs = (1 − q n )m . To find the time-dependent performance measures of a series–parallel system assuming that it is nonrepairable, we simply treat the reliability of each component as a function of time t, as given below:
128
FUNDAMENTAL SYSTEM RELIABILITY MODELS
Rs (t) =
ni m . . 1 − Fi j (t) i=1
if components are independent,
j=1
(4.122)
5m 4 1 − (F(t))n
if components are i.i.d. and module sizes are the same.
When all components have the exponential distributions with the parameters λi j for 1 ≤ i ≤ m and 1 ≤ j ≤ n i , we have
Rs (t) =
ni m . . 1 − 1 − e−λi j t i=1
if components have constant failure rates,
j=1
(4.123)
2
n 3m 1 − 1 − e−λt
if components are i.i.d. and module sizes are the same.
When all components have an identical failure rate λ and module sizes are equal to n, the MTTF of a series–parallel system is derived below: MTTFs =
∞ 0
=
1 λ
=
1 λ
Rs (t) dt = 1
0 1 0
x n )m
(1 − 1−x
∞2 0
n 3m 1 − 1 − e−λt dt
dx
(1 + x + x 2 + · · · + x n−1 )(1 − x n )m−1 d x
n−1 1
=
1 λ
=
[(i + 1)/n] (m) 1 n−1 . nλ i=0 [(i + 1)/n + m]
i=0 0
x i (1 − x n )m−1 d x
(4.124)
In the above derivation, the beta function defined in equation (2.72) is used. Example 4.14 Consider a series–parallel system with three modules (m = 3) and each module consists of four components (n = 4). All components are i.i.d. with a common failure rate λ = 0.001 per hour. Find the MTTF of this system. We can directly apply equation (4.124): MTTFs =
3 [(i + 1)/4] (3) 1 ≈ 1200 hours. 4 × 0.001 i=0 [(i + 1)/4 + 3]
STANDBY SYSTEM MODEL
129
For a definition of the gamma function, readers are referred to Chapter 2. The values of the gamma function are tabulated in some books; for example, see Beyer [28]. It is also difficult to derive the availability function of a series–parallel system or the steady-state availability with limited repair facilities even under the assumption that all components follow the exponential lifetime distributions. However, if there are infinite repair facilities, the steady-state availability measures can be derived similarly to the reliability measures. Equation (4.121) can be used wherein all availability measures replace the reliability measures.
4.13 STANDBY SYSTEM MODEL In a parallel system with n components, only one component has to work properly for the system to work properly. However, all n components in the system are used simultaneously. Another type of redundancy used in engineering design is called standby redundancy. In this case, only one component is active in the system. One or more additional components may be placed in the system, but in the standby condition. A sensing and switching mechanism is used to monitor the operation of the active component. Whenever the active component is failed, a standby component is immediately switched into active operation. Figure 4.18 depicts a standby system with two standby components. There are three types of standby: hot standby, warm standby, and cold standby. Hot standby components are also called active redundant components. A hot standby component has the same failure rate as the active component. A cold standby component has a zero failure rate. In other words, it does not fail while in standby. Warm standby implies that inactive components have a failure rate that is between 0 and the failure rate of active components. Warm standby may include cold standby and hot standby as extreme cases. A warm standby component is not an active component. However, it may fail while in the standby condition. Such failures are also referred to as dormant failures. An example of a cold standby component is a spare light bulb in an overhead projector. An example of warm standby is a power plant, which often has at least one extra generating unit spinning so that it can be switched into full operation quickly when needed.
1 2 SS sensing and switching unit
3 FIGURE 4.18
Standby system with two standby components.
130
FUNDAMENTAL SYSTEM RELIABILITY MODELS
4.13.1 Cold Standby Systems In a cold standby system, standby components do not fail. Thus, we only need to concentrate on active components and the sensing and switching mechanism. In this section, we analyze the reliability and lifetime distribution of cold standby systems when no repairs of failed components are allowed. Notation • • • • •
•
Ri : reliability function of component i when it is active; R when components are i.i.d. f i : pdf of component i when it is active; f when components are i.i.d. Ti : operating lifetime of component i; T when components are i.i.d. MTTFi : MTTF of component i; MTTF when components are i.i.d. Subscript SS is used to indicate the corresponding reliability measures of the sensing and switching mechanism. In other words, f ss , Rss , Tss , and MTTFss indicate the pdf, reliability function, lifetime, and MTTF of the sensing and switching mechanism. f s , Rs , Ts , MTTFs : pdf, reliability function, lifetime, and MTTF of system
Perfect Sensing and Switching Mechanism When the sensing and switching mechanism is perfect, that is, instantaneous and failure free, a standby component is switched into operation as soon as the active component becomes failed. The system fails when the last component fails in active operation. First we consider the case when there are only two components in the system. Suppose that at time 0 the system is put into operation with component 1 in active operation and component 2 in standby. When the life of component 1 is over, instantaneously component 2 is switched into operation. The system’s life is over when component 2 becomes failed. We note that the system’s lifetime is equal to the sum of the operating lifetimes of components 1 and 2: Ts = T1 + T2 .
(4.125)
The event that the system’s lifetime is greater than a certain time t can be realized in two possible ways: (1) component 1 survives beyond time t or (2) component 1 fails at time x(0 ≤ x < t) and component 2 works properly for a time period longer than t − x. The reliability function of the system can be expressed as t Rs (t) = Pr(Ts > t) = R1 (t) + f 1 (x)R2 (t − x) d x. (4.126) 0
If we know the lifetime distributions of the components, we can use equation (4.126) to find the system’s reliability function. To find the system’s MTTF, we may use Rs (t) in equation (4.126). However, a simpler way is to use equation (4.125) to find E(Ts ) directly: MTTFs = E(Ts ) = E(T1 ) + E(T2 ).
(4.127)
131
STANDBY SYSTEM MODEL
Example 4.15 Consider a two-component cold standby system with i.i.d. components. Each component follows the exponential lifetime distribution with parameter λ. Using equations (4.126) and (4.127), we have the following: Rs (t) = R1 (t) +
t 0
f (x)R(t − x) d x = e−λt (1 + λt), 2 , λ
(4.129)
d Rs (t) = λ2 te−λt , dt
(4.130)
MTTFs = E(T1 ) + E(T2 ) = f s (t) = −
λ2 t f s (t) = . Rs (t) 1 + λt
h s (t) =
(4.128)
(4.131)
Now, consider an n-component standby system. At time 0, component 1 is put into operation. When component 1 fails, component 2 is switched into operation instantaneously. Because the sensing and switching mechanism is perfect, the lifetime of the system with n components can be extended from equation (4.127) and expressed as Ts = T1 + T2 + · · · + Tn .
(4.132)
The event that the system’s lifetime is greater than a certain time t can be realized in n possible mutually exclusive ways: (1) component 1 survives beyond time t; (2) component 1 fails after a working duration of x1 (0 ≤ x1 < t) and component 2 operates properly for a time period longer than t − x 1 ; (3) component 1 fails after a working duration of x1 (0 ≤ x 1 < t), component 2 fails after a working duration of x2 (0 ≤ x2 < t−x1 ), and component 3 operates properly for a time period longer than t − x1 − x2 ; . . . , and (n) component 1 fails after a working duration of x1 (0 ≤ x1 < t), component 2 fails after a working duration of x2 (0 ≤ x 2 < t − x1 ), component 3 fails after a working duration of x3 (0 ≤ x 3 < t − x1 − x2 ), . . . , component n − 1 fails after a working duration of x n−1 (0 ≤ x n−1 < t − x1 − x2 − · · · − xn−2 ), and component n operates properly for a time period longer than t − x1 − x2 −· · ·− xn−1 . The reliability function of the system can be expressed as a sum of the probabilities of these n mutually exclusive events: Rs (t) = R1 (t) + +
0
+
t
0
t
0
t
f 1 (x1 )R2 (t − x1 ) d x1
f 1 (x1 ) f 1 (x1 )
t−x 1 0 t−x 1 0
f 2 (x 2 )R3 (t − x1 − x2 ) d x2 d x 1 + · · · f 2 (x 2 )
0
t−x1 −x2
f 3 (x3 ) · · ·
t−x 1 −x2 −···−xn−2
0
f n−1 (xn−1 )Rn (t − x1 − · · · − xn−1 ) d xn−1 · · · d x 2 d x1 .
(4.133)
132
FUNDAMENTAL SYSTEM RELIABILITY MODELS
It is obvious that equation (4.133) is tedious for derivation of the system reliability function when n is large. However, this is the only method available when the components are independent but not identical. Under the special assumption that all components follow exponential lifetime distributions, the Markov chain technique may also be used for system reliability function derivation. Example 4.16 Consider a cold standby system with n different components. The lifetime of component i is exponentially distributed with parameter λi (i = 1, 2, . . . , n), where λi = λ j for i = j. The MTTF of such a system can be easily found to be MTTFs =
n
MTTF j =
j=1
n 1 . λ j=1 j
In the following, we illustrate the use of equation (4.133) to find the expression of the system reliability function for different n values: fi (t) = λi e−λi t , Ri (t) = e−λi t ,
i = 1, 2, . . . , n, i = 1, 2, . . . , n.
When n = 2, we have t t −λ1 t −λ2 t Rs (t) = R1 (t) + f 1 (x1 )R2 (t − x1 ) d x1 = e + λ1 e e−(λ1 −λ2 )x1 d x1 0
0
λ2 λ1 = e−λ1 t + e−λ2 t . λ2 − λ1 λ1 − λ2 When n = 3, we can apply equation (4.133) directly or use the expression of the system reliability function for n = 2 derived already. The reliability of the threecomponent system is equal to the reliability function of the two-component system plus the probability that the third component is the one making the system last beyond time t: Rs (t) =
λ2 λ1 e−λ1 t + e−λ2 t λ2 − λ1 λ1 − λ2 t t−x1 + f 1 (x 1 ) f 2 (x2 )R3 (t − x 1 − x2 ) d x2 d x1 0
0
λ2 λ1 = e−λ1 t + e−λ2 t λ2 − λ1 λ1 − λ2 4 5 λ1 λ2 (λ2 − λ3 )e−λ1 t + (λ3 − λ1 )e−λ2 t + (λ1 − λ2 )e−λ3 t − (λ1 − λ2 )(λ2 − λ3 )(λ3 − λ1 ) 3 . λi . e−λ j t = λ − λj i= j i j=1
STANDBY SYSTEM MODEL
λ1
λ2
1 FIGURE 4.19 fect switching.
133
λ3
3
2
4
State transition diagram of three-component cold standby system with per-
For a cold standby system with n components following exponential lifetime distributions, each with a distinct failure rate, the following equation is obtained from a generalization of the system reliability function when n = 3: Rs (t) =
n
e−λ j t
j=1
.
i= j
λi . λi − λ j
Mathematical induction may be used to prove this equation. Example 4.17 Derive the reliability function of a three-component cold standby system using the Markov chain technique. Each component has a different constant failure rate. Let λi indicate the failure rate of component i for i = 1, 2, 3. The system may be in four different states. State i indicates that component i is active for i = 1, 2, 3 while state 4 indicates that the system is failed. When the system is in state i for i = 1, 2, 3, 4, components 1, 2, . . . , i − 1 are failed. The state transition diagram of the system is given in Figure 4.19. Based on Figure 4.19, we have the following differential equations: P1 (t) = −λ1 P1 (t), P2 (t) = −λ2 P2 (t) + λ1 P1 (t), P3 (t) = −λ3 P3 (t) + λ2 P2 (t), P4 (t) = λ3 P3 (t), with boundary conditions P1 (0) = 1 and Pi (0) = 0 for i = 2, 3, 4. The last equation can be ignored for now. With the Laplace transform technique, we obtain the following equations: sL(P1 (t)) − 1 = −λ1 L(P1 (t)), sL(P2 (t)) = −λ2 L(P2 (t)) + λ1 L(P1 (t)), sL(P3 (t)) = −λ3 L(P3 (t)) + λ2 L(P2 (t)). Solving these equations, we have L(P1 (t)) =
1 , s + λ1
134
FUNDAMENTAL SYSTEM RELIABILITY MODELS
L(P2 (t)) =
λ1 , (s + λ1 )(s + λ2 )
L(P3 (t)) =
λ1 λ 2 . (s + λ1 )(s + λ2 )(s + λ3 )
Inverse Laplace transforms yield P1 (t) = e−λ1 t , P2 (t) =
λ2 λ1 e−λ1 t + e−λ2 t , λ2 − λ1 λ1 − λ2
P3 (t) =
λ1 λ2 λ1 λ2 e−λ1 t + e−λ2 t (λ2 − λ1 )(λ3 − λ1 ) (λ1 − λ2 )(λ3 − λ2 ) +
λ1 λ2 e−λ3 t . (λ1 − λ3 )(λ2 − λ3 )
The reliability function of the system is then Rs (t) = P1 (t) + P2 (t) + P3 (t) =
λ2 λ3 λ1 λ3 e−λ1 t + e−λ2 t (λ2 − λ1 )(λ3 − λ1 ) (λ1 − λ2 )(λ3 − λ2 )
λ1 λ2 e−λ3 t (λ1 − λ3 )(λ2 − λ3 ) 3 . λi . = e−λ j t λ − λj i= j i j=1 +
When the n components in the system are i.i.d., Misra [168] provides a simpler approach for evaluation of the system reliability function no matter what the common lifetime distribution is. First, assume that the operating lifetime of each component follows the exponential distribution with parameter λ. Because of this assumption, the lifetime of the system, Ts , as expressed in equation (4.132), follows the gamma distribution with parameters n and λ with the following pdf: f s (t) = λe−λt
(λt)n−1 , (n − 1)!
t ≥ 0.
(4.134)
The reliability function of the system can be derived from the pdf f s (t) as Rs (t) =
∞ t
f s (x) d x = e−λt
n−1
(λt) j . j! j=0
(4.135)
STANDBY SYSTEM MODEL
135
Since R(t) = e−λt , we have λt = − ln R(t). Substituting these into equation (4.135), we have an alternative expression of the system reliability function: Rs (t) = R(t)
n−1
[− ln R(t)] j . j! j=0
(4.136)
Equation (4.136) is derived with the assumption that component lifetimes follow the same exponential distribution. Misra [168] states that even when the i.i.d. components do not follow the exponential lifetime distribution, equation (4.136) can still be used as long as the correct component reliability function R(t) is used. Using equation (4.136), we can derive the following results for different lifetime distributions: 1. When the i.i.d. components have a linearly increasing failure rate function, namely, h(t) = a + bt, we have t 1 2 , R(t) = exp − (a + bx) d x = exp − at + bt 2 0 √ x 2 a ea /(2b) 2π 1 2 1− √ , where (x) = √ √ e−t /2 dt, MTTF = b b −∞ 2π j 1 n−1 at + 2 bt 2 1 2 Rs (t) = exp − at + bt , 2 j! j=0 MTTFs = nMTTF. 2. When the i.i.d. components have a Weibull lifetime distribution with h(t) = at b , we have
at b+1 R(t) = exp − , b+1 1 , MTTF = (b + 1)−b/(b+1) a −1/(b+1) b+1 where (z) =
∞ 0
t z−1 e−t dt,
Rs (t) = exp −
at b+1 b+1
n−1 j=0
2 3j at b+1 /(b + 1) j!
,
MTTFs = nMTTF. 3. When the i.i.d. components have an extreme-value distribution with h(t) = aebt , we have
136
FUNDAMENTAL SYSTEM RELIABILITY MODELS
a R(t) = exp − ebt − 1 , b ∞ −t e 1 dt, MTTF = − ea/b b a/b t 2 3 j bt a n−1 (a/b) e − 1 , Rs (t) = exp − ebt − 1 b j! j=0 MTTFs = nMTTF. Exercises 1. Use equation (4.126) to find the reliability function of a cold standby system with two i.i.d. components when the components follow the exponential distribution. 2. Use equation (4.126) to find the reliability function of a cold standby system with two i.i.d. components when the components follow the Weibull distribution. 3. Use equation (4.126) to find the reliability function of a cold standby system with three i.i.d. components when the components have the failure rate function h(t) = a + bt, where a and b are constants.
Imperfect Sensing and Switching Mechanism First consider a standby system with only two components. When the sensing and switching unit is imperfect, the event that the system will survive beyond time t may be realized in the following mutually exclusive ways: 1. Component 1 survives beyond time t. 2. Component 1 fails at time x(0 < x < t), the sensing and switching unit survives beyond time x, and component 2 survives longer than a duration of t − x. Based on this decomposition of the event that the system survives beyond time t, we can write the system reliability function as Rs (t) = R1 (t) +
t 0
f 1 (x)RSS (x)R2 (t − x) d x.
(4.137)
The life of the system is no longer equal to the sum of the individual lives of the two components due to imperfect sensing and switching. We have to use Rs (t) to calculate MTTFs .
STANDBY SYSTEM MODEL
137
Example 4.18 Consider the special case when the components and the sensing and switching unit follow the exponential distributions with parameters λ1 , λ2 , and λSS , respectively. Using equation (4.137), we obtain the following:
Rs (t) =
MTTFs =
−λss t −λt 1 + λ e 1 − e λss
if λ1 = λ2 = λ,
λ1 e−λ2 t −λ1 t + 1 − e−[(λ1 −λ2 )+λss ]t e (λ1 − λ2 ) + λss
if λ1 = λ2 and λ1 − λ2 + λss = 0,
1 λ1 . + λ1 λ2 (λ1 + λss )
We can follow a similar approach to derive the system reliability function and MTTFs when there are three or more components in the system. For example, the event that the system survives beyond time t for a three-component system can be decomposed into the following mutually exclusive events: 1. Component 1 survives beyond time t. 2. Component 1 fails after a working duration of x 1 , component SS survives a duration of x 1 , and component 2 survives a working duration of t − x1 , where 0 ≤ x1 < t. 3. Component 1 fails after a working duration of x 1 , component SS survives beyond time point x 1 , component 2 fails after a working duration of x 2 , component SS survives beyond time point x1 + x2 , and component 3 survives a working duration of length t − x1 − x 2 . Exercise 1. Find the reliability function and the MTTF of a three-component standby system with imperfect switching. All components and the sensing and switching mechanism follow the exponential lifetime distributions. Consider the cases when components are identical and when components are not identical. 4.13.2 Warm Standby Systems In a warm standby system, both the active component and each standby component may fail. However, a standby or dormant component usually has a lower failure rate than the active component. A factor that makes the analysis of such systems complicated is that the failure rate functions of a component in standby and active states are often dependent on each other. In the following, we will assume that they are independent of each other (e.g., in the case of exponential lifetime distributions). A two-component system will illustrate the techniques. The subscript d is used to indicate the dormant component or the component in the warm standby state. Perfect Sensing and Switching Mechanism Again we analyze the event that the system survives beyond time t. For this to occur, one of the following mutually exclusive events has to occur:
138
FUNDAMENTAL SYSTEM RELIABILITY MODELS
1. Component 1 survives beyond time t. 2. Component 1 fails at time x, component 2 in the standby state survives beyond time point x, and component 2 starting in operation at time x survives beyond time point t, where 0 ≤ x < t. The probability of the union of these events is the reliability function of the system, Rs (t). Assuming that R2 (t − x) is independent of R2d (x), we can express Rs (t) as Rs (t) = R1 (t) +
t 0
f 1 (x)R2d (x)R2 (t − x) d x.
(4.138)
Once the expression of Rs (t) is obtained, we can use equation (2.88) to find MTTFs . Example 4.19 Suppose that the components of a warm standby system follow the exponential lifetime distribution with parameters λ1 , λ2d , and λ2 . With equation (4.138), we obtain the following expressions of the system reliability function and the system MTTF: e−λ1 t + λ1 te−λ2 t , if λ1 + λ2d − λ2 = 0, −λ2 t Rs (t) =
λ e −(λ1 +λ2d −λ2 )t 1 e−λ1 t + if λ1 + λ2d − λ2 = 0, λ1 + λ2d − λ2 1 − e MTTFs =
1 λ1 . + λ1 λ2 (λ1 + λ2d )
Imperfect Sensing and Switching Mechanism Using the same approach, we decompose the event that the system survives beyond time t into the following mutually exclusive events: 1. Component 1 survives beyond time t. 2. Component 1 fails at time x, component 2 in the standby state survives beyond time point x, component SS survives beyond time x, and component 2 starting in operation at time point x survives beyond time point t, where 0 ≤ x < t. Assuming that R2 (t − x) is independent of R2d (x), we can express Rs (t) as Rs (t) = R1 (t) +
t 0
f 1 (x)R2d (x)Rss (x)R2 (t − x) d x.
(4.139)
Example 4.20 Consider a two-component warm standby system with imperfect switching. Every component has the exponential lifetime distribution. The parameters of the exponential distributions are λ1 for component 1 in the active mode, λ2d for component 2 in the standby mode, λss for the sensing and switching mechanism, and λ2 for component 2 in the active mode. With equation (4.139), we obtain the
STANDBY SYSTEM MODEL
139
following expressions of system reliability function and system MTTF of a warm standby system with two components: e−λ1 t + λ1 te−λ2 t if λ1 + λss + λ2d − λ2 = 0, −λ2 t λ1 e −(λ1 +λss +λ2d −λ2 )t Rs (t) = e−λ1 t + 1 − e λ1 + λss + λ2d − λ2
if λ1 + λss + λ2d − λ2 = 0,
MTTFs =
1 λ1 . + λ1 λ2 (λ1 + λss + λ2d )
5 GENERAL METHODS FOR SYSTEM RELIABILITY EVALUATION
Reliability is an important measure of the performance of a device. It is defined to be the probability that the device will perform its intended functions satisfactorily for a specified period of time under specified operating conditions. For simple and/or inexpensive devices, laboratory testing is often conducted to obtain their reliability measures. These devices are often the components of more complex and more expensive systems. It is usually not economical to test complex systems to failure. Instead, knowledge of the system structure as a function of its components is often used to calculate system reliability given component reliabilities. In this chapter, we discuss algorithms and methods for evaluation of system reliability given component reliabilities. Many factors affect the reliability of a system as a function of component reliabilities. In this chapter, we adopt the following assumptions unless specified otherwise: 1. The systems to be discussed are limited to coherent systems. In other words, all components are relevant and improvement of component performance does not degrade the performance of the system. 2. The system and its components may be in only two possible states, working or failed. 3. The states of the components are independent random variables. 4. The mission time of the system and its components are implicitly specified. Based on these assumptions, we deal primarily with system and component reliabilities rather than their reliability functions of time t. The time that it takes for the algorithms and methods to be discussed to find system reliability depends on the number of components in the system and the structure 140
PARALLEL AND SERIES REDUCTIONS
141
of the system. Different algorithms have been developed for system reliability evaluation of the same system structure. The efficiencies of these algorithms may be different. Many systems can be represented by reliability block diagrams. The satisfactory performance of a system that can be represented by a reliability block diagram is often interpreted as a successful signal flow from the left end to the right end of the reliability block diagram. There are systems that can be represented by network diagrams, for example, computer networks, telecommunications networks, and utility networks. A network consists of nodes and links. Usually both links and nodes are failure prone. In our discussions of networks in this book, we mostly assume that either links or nodes but not both are failure prone. With this assumption, a network diagram can be treated in a way similar to a reliability block diagram under the assumption that there is a single source and a single sink in the network. The successful operation of a network may be defined differently for different purposes. A network may be defined as working if a single source node is able to communicate with a single sink node. In this case, we say that we are dealing with a two-terminal network reliability problem. If the network is defined to be working properly if k nodes are able to communicate with one another, then we are dealing with a k-terminal network reliability problem. The algorithms to be covered in this book are applicable for evaluation of equipment reliability and two-terminal network reliability. The reliability evaluation of k-terminal networks is outside the scope of this book. In this chapter, the following methods for system reliability evaluation will be covered: parallel and series reductions, the decomposition method, the inclusion– exclusion method, the sum-of-disjoint-products method, the delta–star transformation method, the star–delta transformation method, and the Markov chain imbeddedstructures method. Many of these methods require that the minimal paths or minimal cuts be known. As a result, we also discuss methods for generating minimal paths and minimal cuts.
5.1 PARALLEL AND SERIES REDUCTIONS Parallel and series structures are the most fundamental system reliability structures. They are widely used in reliability systems. The system reliability of such system structures can be easily evaluated. Before applying any other methods for system reliability evaluation, one should always identify parallel and series subsystems that exist in a complex system structure and apply parallel and series reductions. In a series reduction, a series subsystem with n components is replaced with a supercomponent whose reliability is equal to the product of the reliabilities of the components in the subsystem. In a parallel reduction, a parallel subsystem with n components is replaced with a supercomponent whose unreliability is equal to the product of the unreliabilities of the components in the subsystem. We use two examples to illustrate parallel and series reductions.
142
GENERAL METHODS FOR SYSTEM RELIABILITY EVALUATION
Example 5.1 Consider a system whose reliability block diagram is given in Figure 5.1a. The system has seven components. The reliability and unreliability of component i are given and denoted by pi and qi , respectively (i = 1, 2, . . . , 7). Use series and parallel reductions for system reliability evaluation. From the system structure given in Figure 5.1a, we see that components 3 and 4 form a series subsystem, which can be represented by a “supercomponent” denoted by 3–4. The reliability of supercomponent 3–4 is equal to the product of the reliabilities of components 3 and 4: p3–4 = p3 p4 . Components 5, 6, and 7 also form a series subsystem, which can be represented by a supercomponent denoted by 5–6–7. The reliability of supercomponent 5–6–7 is equal to the product of the reliabilities of components 5, 6, and 7: p5–6–7 = p5 p6 p7 . After these two series reductions, the reliability block diagram in Figure 5.1a is transformed into that in Figure 5.1b. Examination of Figure 5.1b reveals that component 2 and component 3–4 form a parallel subsystem denoted by supercomponent 2–3–4, whose unreliability is the product of the unreliabilities of component 2 and supercomponent 3–4: q2–3–4 = q2 q3–4 = q2 (1 − p3–4 ) = q2 (1 − p3 p4 ). After this parallel reduction, the reliability block diagram in Figure 5.1b is transformed into that in Figure 5.1c. Figure 5.1c shows that component 1 and supercomponent 2–3–4 form a series subsystem and a series reduction produces a supercomponent denoted by 1–2–3–4, whose reliability is p1–2–3–4 = p1 p2–3–4 = p1 (1 − q2–3–4 ) = p1 [1 − q2 (1 − p3 p4 )]. This series reduction transforms Figure 5.1c to Figure 5.1d. Figure 5.1d can be further simplified with a parallel reduction to generate Figure 5.1e, which has only one supercomponent, denoted by 1–2–3–4–5–6–7. The unreliability of this supercomponent is q1–2–3–4–5–6–7 = q1–2–3–4 q5–6–7 = {1 − p1 [1 − q2 (1 − p3 p4 )]}(1 − p5 p6 p7 ). The system reliability is equal to the reliability of the final supercomponent in Figure 5.1e: Rs = 1 − q1–2–3–4–5–6–7 = 1 − {1 − p1 [1 − q2 (1 − p3 p4 )]}(1 − p5 p6 p7 ). In Example 5.1, we are able to find system reliability using only series and parallel reductions. This is because the system structure in Figure 5.1a consists only of series and parallel connections. Using only parallel and series reductions, we are
PARALLEL AND SERIES REDUCTIONS
143
2 1
5
3
4
6
7
(a)
2 1 3-4 5-6-7 (b)
1
2-3-4
5-6-7 (c)
1-2-3-4
5-6-7 (d)
1-2-3-4-5-6-7 (e) FIGURE 5.1
Using series and parallel reductions in Example 5.1.
usually unable to simplify a general network into a single supercomponent. Other techniques to be discussed later will be applied after parallel and series reductions to find the exact system reliability. The following example shows that parallel and series reductions can simplify a system structure but not to the point of finding exact system reliability.
144
GENERAL METHODS FOR SYSTEM RELIABILITY EVALUATION
1
2
6
7
5 3
8
4
9 (a)
1-2
6
7-8
5 3-4
9 (b)
1-2
6-7-8 5
3-4
9 (c)
FIGURE 5.2
Using series and parallel reductions in Example 5.2.
Example 5.2 Consider the network depicted in Figure 5.2a. The nine failure-prone components are numbered from 1 to 9. We are interested in finding the reliability for the source to be able to communicate to the sink. Examining Figure 5.2a, we can see that series reductions can be applied to components 1 and 2 resulting in a supercomponent called 1–2 and to components 3 and 4 resulting in a supercomponent called 3–4. The reliabilities of these two supercomponents are p1−2 = p1 p2 ,
p3−4 = p3 p4 .
A parallel reduction can be applied to components 7 and 8 resulting in a supercomponent 7–8 whose unreliability is given by q 7–8 = q 7 q8 . After these series and parallel reductions, the network diagram in Figure 5.2a is transformed into the network diagram in Figure 5.2b.
PIVOTAL DECOMPOSITION
145
Applying a series reduction to component 6 and supercomponent 7–8 transforms Figure 5.2b to Figure 5.2c with p6–7–8 = p6 (1 − q7−8 ) = p6 (1 − q7 q8 ). No more series or parallel reductions can be used to further simplify the network diagram in Figure 5.2c. A comparison of Figure 5.2c with Figure 4.6 indicates that we have simplified the original network diagram into a bridge structure. The structure function and the logic function of the bridge structure are derived in Examples 4.6 and 4.7, respectively. Other methods to be discussed later are needed to find the reliability of a bridge structure.
5.2 PIVOTAL DECOMPOSITION In Chapter 4, we illustrated the use of the decomposition theorem in derivation of the system structure function and system logic function. The same idea can be applied for system reliability evaluation directly. The pivotal decomposition method is based on the concept of conditional probability. The following equation illustrates the idea behind this method: Pr(system works) = Pr(component i works) Pr(system works | component i works) + Pr(component i fails) Pr(system works | component i fails).
(5.1)
The efficiency of this method depends on the ease of evaluating the conditional probabilities. This means that the selection of the component to be decomposed may play an important role in the efficiency of this method. If the decomposition of a selected component results in two system structures for which parallel and/or series reductions can be applied again, the efficiency of the system reliability evaluation will be enhanced. We will use an example to illustrate this method. Example 5.3 Consider the bridge structure shown in Figure 4.6. As illustrated in Example 5.2, no parallel or series reductions may be applied to the bridge structure directly. There are five components in Figure 4.6. A decomposition on any component will result in system structures to which parallel and series reductions can be applied. In Example 4.9, we selected component 3 to be decomposed in derivation of the structure and logic functions of the bridge structure. In this example, again, we will choose component 3 to be decomposed first. Using equation (5.1), we have Rs = p3 Pr(system works | component 3 works) + q3 Pr(system works | component 3 fails).
(5.2)
146
GENERAL METHODS FOR SYSTEM RELIABILITY EVALUATION
The reliability block diagram of the bridge structure, under the condition that component 3 works, is given in Figure 4.7. Applying parallel and series reductions to the system in Figure 4.7 results in Pr(system works | component 3 works) = (1 − q1 q4 )(1 − q2 q5 ). The reliability block diagram of the bridge structure, under the condition that component 3 fails, is given in Figure 4.8. Applying parallel and series reductions to the system in Figure 4.8 results in Pr(system works | component 3 fails) = 1 − (1 − p1 p2 )(1 − p4 p5 ). Substituting these two conditional probabilities into equation (5.2) yields the reliability of the bridge structure: Rs = p3 (1 − q1 q4 )(1 − q2 q5 ) + q3 [1 − (1 − p1 p2 )(1 − p4 p5 )].
(5.3)
The bridge structure is a commonly seen system reliability structure. It either exists by itself as a subsystem in a system or appears when other network transformation techniques have been applied. In Example 5.2, parallel and series reductions transform the original system structure into a bridge structure. Since the reliability of a bridge structure is derived in Example 5.3, we can standardize its expression so that it can be used whenever a bridge structure is spotted. The standard bridge structure used in this book is defined as follows: •
• •
The reliability block diagram of a bridge structure is given in Figure 4.6. Figure 4.6 may also represent a network diagram with a failure-prone link. The diagram has the shape of a diamond. Communications, or signal flows, are from left to right. Component 3 is the center of the bridge structure. The two components on the top two edges are numbered as 1 and 2 from left to right. The two components on the bottom two edges are numbered as 4 and 5 from left to right.
With these specifications for a standard bridge structure, its system reliability expression is given in equation (5.3). Example 5.4 Consider the network diagram given in Figure 5.3a. Links are failure prone while the nodes are perfect. We are interested in the reliability of communications between the source node and the sink node. There are seven links in this network. We will pick component 2 (link 2) as the one to be decomposed. When component 2 is perfect, the original network diagram in Figure 5.3a is simplified to that in Figure 5.3b. Parallel and series reductions can be applied to find the reliability of the two-terminal network diagram in Figure 5.3b: Pr(system works | component 2 works) = (1 − q1 q5 ){1 − q4 [1 − p7 (1 − q3 q6 )]}.
PIVOTAL DECOMPOSITION
1
Source
4 2
5
3
147
Sink
7
6 (a) Source
1
Sink
4 5 3
7
6 (b)
4
1 Source
Sink
3 5-6
7
(c) FIGURE 5.3
Use of decomposition in two-terminal network reliability evaluation.
When component 2 is failed, the original network diagram in Figure 5.3a is simplified to that in Figure 5.3c. A series reduction on components 5 and 6 produces a supercomponent called 5–6 in the place of link 5 and link 6. Mapping the components in Figure 5.3c to the standard bridge structure given in Figure 4.6 and applying equation (5.3), we find the two-terminal reliability of the network in Figure 5.3c as follows: Pr(system works | component 2 fails) = p3 (1 − q1 q5−6 )(1 − q4 q7 ) + q3 [1 − (1 − p1 p4 )(1 − p5−6 p7 )] = p3 [1 − q1 (1 − p5 p6 )](1 − q4 q7 ) + q3 [1 − (1 − p1 p4 )(1 − p5 p6 p7 )]. Substituting these two conditional probabilities into equation (5.1), we obtain the reliability of the network depicted in Figure 5.3a:
148
GENERAL METHODS FOR SYSTEM RELIABILITY EVALUATION
Rs = p2 Pr(system works | component 2 works) + q2 Pr(system works | component 2 fails) = p2 (1 − q1 q5 ){1 − q4 [1 − p7 (1 − q3 q6 )]} + q2 { p3 [1 − q1 (1 − p5 p6 )](1 − q4 q7 ) + q3 [1 − (1 − p1 p4 )(1 − p5 p6 p7 )]}. The decomposition method is based on the enumeration of the states of a selected component. Since each component has two states, an application of the decomposition method to a system with n failure-prone components generates two different system structures each with n − 1 failure-prone components. If the reliabilities of these two smaller system structures are known, it takes two multiplications and one addition to find the reliability of the n-component system. If one relies on the decomposition method only without utilizing the results of parallel and series reductions and other available results, this method is not very efficient. Applying the decomposition method to the two different (n − 1)-component system structures will generate four different system structures each with n − 2 failure-prone components. This process is repeated as the decomposition method is applied to these four system structures until we reach system structures each with a single component. The number of system structures with one component each generated by exhaustive application of the decomposition method is equal to 2n−1 . The number of arithmetic operations needed to find the reliability of an n-component system using only the decomposition method is then equal to 3 × (2n−1 − 1), which is an exponential function. However, as illustrated in the examples above, combining the decomposition method with parallel and series reductions and utilizing reliability expressions of standard system structures produce the result much faster. 5.3 GENERATION OF MINIMAL PATHS AND MINIMAL CUTS Minimal paths and minimal cuts play an important role in system reliability evaluation. Many methods to be covered later in this book require that the minimal paths or minimal cuts of a system be known. For a reliability block diagram or a two-terminal network problem, the minimal paths to be generated are between the source and the sink. Most methods for minimal path generation require a connection matrix to be developed first. We first outline how to construct a connection matrix and then explain methods for generation of minimal paths and minimal cuts. Readers may refer to Billinton and Allan [29] for additional coverage of this topic. 5.3.1 Connection Matrix Each component in a reliability block diagram is considered to be a failure-prone link. These components connect the perfect nodes, including the source and the sink. In a two-terminal network diagram, the links are failure prone and the nodes are perfect. A connection matrix describes the direct connections between each pair of
GENERATION OF MINIMAL PATHS AND MINIMAL CUTS
149
2
1
Source
4
1
2
5
FIGURE 5.4
5
3 6
3
Sink
7 4
Network diagram with labeled nodes.
nodes for signal flows between the source and the sink. Since we are considering undirected networks, such as a reliability block diagram and a two-terminal network, the connection matrix is symmetric. If there is no direct connection between node i and node j, the entries at position (i, j) and ( j, i) of the connection matrix are zero. Because a node is always connected to itself, the main diagonal elements of the matrix are 1. If component k connects node i and node j, the entries at positions (i, j) and ( j, i) of the connection matrix are xk . Here xk represents the event that link k or component k works. Take the network diagram in Figure 5.3a as an example. The links have been numbered from 1 through 7. Figure 5.4 shows the same network diagram with the nodes labeled as 1, 2, 3, 4, and 5. We need to find all minimal paths between node 1 (the source) and node 5 (the sink). Based on our descriptions above, the connection matrix of the network diagram in Figure 5.4 is as follows: To node 1
2
3
4
1
2 x1 3 x5 4 0
x1 1 x2 x3
x5 x2 1
0 x3 x6
5
x4
x6 0
1 x7
C=
From node
1
0
5
0 x4 0 . x7
(5.4)
1
If one is dealing with a directed network, the connection matrix may not be symmetric. Since there may not be bidirectional connections between each pair of nodes, the connection matrix will have more entries of zero. With the derived connection matrix of a network, we are ready to introduce a method for enumeration of minimal paths of the network. 5.3.2 Node Removal Method for Generation of Minimal Paths Aggarwal et al. [5] report a method for the generation of minimal paths through removal of perfect nodes in a network connection matrix. Let C be the connection
150
GENERAL METHODS FOR SYSTEM RELIABILITY EVALUATION
matrix of the order m×m, and the direct connection from node i to node j is indicated by ci j . This method removes nodes that are neither the source nor the sink from the connection matrix one by one until the only nodes left in the matrix are the source node and the sink node. When a node is removed, the entries of the connection matrix with the remaining nodes are modified using the following equation: ci j = ci j + cil cl j , cii
= 1,
if node l is removed, i = j, i = l, j = l, 1 ≤ i < m, 1 < j ≤ m,
(5.5)
i = 1, 2, . . . , m.
(5.6)
A good practice for applying this method is to label the source node as the first node and the sink node as the last node. Each intermediate node is removed one by one until a 2 × 2 matrix is left. Removal of node i is equivalent to the removal of row i and column i of the original connection matrix where 2 ≤ i ≤ m − 1 for a network with m perfect nodes. Based on equation (5.5), the entries of the first column and the last row do not affect final results at all. These entries do not need to be updated as nodes are removed. For the network diagram in Figure 5.4, the connection matrix C is given in equation (5.4), which is a 5 × 5 matrix. First we remove node 2. The modified entries of the original matrix become the following according to equation (5.5): = x5 + x1 x2 , c13
c14 = 0 + x 1 x 3 = x1 x 3 ,
c34 = x6 + x2 x3 ,
c35 = 0 + x 2 x 4 = x2 x 4 ,
= x6 + x2 x3 , c43
c45 = x7 + x3 x4 .
c15 = 0 + x1 x4 = x 1 x 4 ,
The modified connection matrix becomes the following 4 × 4 matrix: x1 x3 x1 x4 1 x5 + x1 x2 x5 1 x 6 + x2 x3 x2 x4 . C = 0 x6 + x2 x3 1 x7 + x3 x4 0 0 x7 1
(5.7)
In subsequent node removals, we do not have to remember the original node labels. We can apply equation (5.5) to the matrix in equation (5.7), which is a 4 × 4 matrix. Now consider removing the second row and the second column of this matrix with equation (5.5). The modified entries of this matrix become = x1 x3 + (x5 + x 1 x2 )(x 6 + x 2 x3 ) = x1 x3 + x5 x6 + x 2 x3 x5 + x 1 x2 x6 , c13 c14 = x1 x4 + (x5 + x 1 x2 )(x 2 x4 ) = x1 x4 + x 2 x4 x5 , = x7 + x 3 x4 + (x 2 x3 + x 6 )(x2 x4 ) = x7 + x 3 x4 + x 2 x4 x6 . c34
In the above equations, we have applied Boolean algebra to simplify the expressions of the entries of the modified matrix. For example, x1 x3 + x1 x3 x2 = x1 x3 . The
GENERATION OF MINIMAL PATHS AND MINIMAL CUTS
151
modified connection matrix becomes the following 3 × 3 matrix: 1 x1 x3 + x5 x6 + x 2 x3 x5 + x1 x2 x6 x1 x4 + x2 x4 x5 1 x 7 + x3 x4 + x2 x4 x6 . (5.8) C = 0 0 x7 1 Removal of the second row and the second column of the matrix C in equation (5.8) results in the following modified entry of the remaining rows and columns: = x1 x4 + x 2 x 4 x5 + (x1 x3 + x5 x6 + x 2 x3 x5 + x 1 x2 x6 )(x 7 + x3 x4 + x 2 x4 x6 ) c13
= x1 x 4 + x 2 x 4 x5 + x 1 x3 x7 + x 5 x6 x7 + x 2 x3 x5 x7 + x1 x2 x6 x7 + x 3 x4 x5 x6 . , we applied Boolean algebra equations such as x x + In the derivation of c13 1 4 x1 x3 x4 = x 1 x4 . For other Boolean algebra equations, readers are referred to Chapter 2. The resulting 2 × 2 matrix includes all minimal paths in the entry at the top right . The minimal paths of the network diagram in corner of the matrix, which is c13 Figure 5.4 for communications between the source node and the sink node are x 1 x4 , x 2 x4 x5 , x1 x3 x7 , x5 x6 x7 , x 2 x3 x5 x7 , x1 x2 x6 x7 , and x 3 x 4 x5 x6 . Since we have labeled the source node as node 1 and the sink node as node m (the last node), we are only interested in the entry at the top right corner of the connection matrix as we remove one intermediate node at a time. We present some interpretations of this entry as it is updated during the node removal process. In the original connection matrix, C, the entry at the top right corner, represents the minimal paths between the source and the sink through only one failure-prone link. In our example, this entry is 0, indicating that the source and the sink are not directly connected through a single link. After node 2 is removed, the entry at the top right corner of the modified connection matrix, C , represents the minimal paths from the source to the sink with up to two different links going through node 2. From C , we see that there is only one such minimal path, which is x 1 x4 . After node 3 is removed, the entry at the top right corner of the modified connection matrix, C , represents the minimal paths from the source to the sink with up to three different links going through node 2 and/or node 3. From C , we see that there are two such minimal paths, which are x1 x4 and x2 x4 x5 . After node 4 is removed, the entry at the top right corner of the modified connection matrix, C , represents the minimal paths from the source to the sink with up to four different links going through node 2, node 3, and/or node 4. From C , we see that there are seven such minimal paths, which are x1 x4 , x 2 x4 x5 , x1 x3 x7 , x5 x6 x7 , x 1 x2 x6 x7 , x3 x4 x5 x6 , and x 2 x 3 x5 x7 . Since Boolean algebra is used to simplify the expressions of the entries, it is guaranteed that no loops or backtracking will exist in the paths. In other words, minimal paths will be guaranteed. We simply list all one-link connections from any node to any other node in the connection matrix. We do not need to worry about the directions of signal flows. The node removal method for enumeration of minimal paths involves symbolic operations with Boolean algebra. An m × m connection matrix, where m is the num-
152
GENERAL METHODS FOR SYSTEM RELIABILITY EVALUATION
ber of perfect nodes in the network, has to be available. The number of entries in the matrix that has to be updated is equal to m−2
i2 =
i=1
1 (m − 2)(m − 1)(2m − 3). 6
During each update, equation (5.5) has to be used, which involves one Boolean multiplication and one Boolean addition. 5.3.3 Generation of Minimal Cuts from Minimal Paths Locks [149] describes a method for generation of all minimal cuts from the set of all minimal paths using Boolean algebra. Suppose that there are l minimal paths and they are denoted by MP1 , MP2 , . . . , MPl . Then, the set of minimal cuts can be derived with the following Boolean operations: MP1 MP2 · · · MPl .
(5.9)
Take the network diagram given in Figure 5.4 as an example. The minimal paths between the source and the sink of the network have been found with the node removal method as MP1 = x1 x4 , MP5 = x1 x2 x6 x7 ,
MP2 = x 2 x4 x5 ,
MP3 = x1 x3 x7 ,
MP6 = x3 x4 x5 x6 ,
MP4 = x 5 x6 x7 ,
MP7 = x2 x3 x5 x7 .
We can find the minimal cuts by inverting or complementing each minimal path and then evaluating the intersection of these complements. Let Bi represent the intersection of the complements of the minimal paths MP1 , MP2 , . . . , MPi . Applying equation (5.9) by considering one more minimal path at a time and using Bi to represent the result when i minimal paths have been considered, we have B1 = MP1 = x1 x4 = x 1 + x 4 , B2 = MP2 B1 = x 1 x 2 + x 4 + x 1 x 5 , B3 = MP3 B2 = x 1 x 2 + x 1 x 4 + x 1 x 5 + x 3 x 7 , B4 = MP4 B3 = x 1 x 5 + x 3 x 4 x 5 + x 1 x 2 x 6 + x 1 x 4 x 6 + x 3 x 4 x 6 + x 1 x 2 x 7 + x 4 x 7, B5 = MP5 B4 = x 1 x 5 + x 1 x 2 x 6 + x 1 x 2 x 7 + x 2 x 3 x 4 x 5 + x 1 x 4 x 6 + x 3 x 4 x 6 + x 4 x 7, B6 = MP6 B5 = x 1 x 5 + x 4 x 7 + x 1 x 2 x 6 + x 1 x 4 x 6 + x 3 x 4 x 6 + x 1 x 2 x 3 x 7 + x 2 x 3 x 4 x 5,
INCLUSION–EXCLUSION METHOD
153
B7 = MP7 B6 = x 1 x 5 + x 4 x 7 + x 1 x 2 x 6 + x 3 x 4 x 6 + x 1 x 2 x 3 x 7 + x 2 x 3 x 4 x 5 . Here, B7 includes all the minimal cuts of the system.
5.4 INCLUSION–EXCLUSION METHOD Inclusion–exclusion (IE) is a classical method for producing the reliability expression of a general system using its minimal paths or minimal cuts. The IE method, also known as Poincar´e and Sylvester’s theorem, provides successive upper and lower bounds by Bonferroni inequalities on system reliability that converge to the exact system reliability. Let E j be the event that all components in the minimal path T j work. We also say that E j represents the event that minimal path T j works. The probability that the minimal path T j works can be expressed as Pr(E j ) =
.
pi .
(5.10)
i∈T j
A system with l minimal paths works if and only if at least one of 6 the minimal paths works. In other words, system success corresponds to the event lj=1 E j . The reliability of the system is equal to the probability of the union of l events, namely, l 1 (5.11) Rs = Pr E j . j=1
Let Sk =
1≤i 1 <···
7 7 7 E i2 ··· E ik . Pr E i1
(5.12)
Then, Sk represents the sum of the probabilities that any k minimal paths are simultaneously working. By the IE principle (see Feller [75]), the reliability of the system, which is equal to the probability of the union of the l minimal paths, can be expressed as Rs =
l
(−1)k−1 Sk .
(5.13)
k=1
In application of equation (5.13), S1 is included, S2 is excluded, S3 is included, S4 is excluded, and so on. This is where the name of the IE method comes from. In this process of including and excluding additional terms, upper and lower bounds on Rs become available, as given below:
154
GENERAL METHODS FOR SYSTEM RELIABILITY EVALUATION
Rs ≤ S1 ,
(5.14)
Rs ≥ S1 − S2 ,
(5.15)
Rs ≤ S1 − S2 + S3 ,
(5.16)
Rs ≥ S1 − S2 + S3 − S4 ,
(5.17)
Rs ≤ S1 − S2 + S3 − S4 + S5 ,
(5.18)
.. . These inequalities are the so-called Bonferroni inequalities. Tighter bounds on Rs are provided by these successive inequalities and, eventually, the exact value of Rs is obtained when (−1)l−1 Sl is included. In practice, it may be necessary to calculate only the first few Sk values in order to obtain an Rs value that is regarded as accurate. In a parallel system with n components, there are n minimal paths. Because these n minimal paths do not have components in common, equation (5.13) has the maxi n − 1. This is because S has n terms, S has n mum number of possible terms, 2 1 2 1 2 terms, . . . , Sn has nn terms. Apparently, the IE method is not an efficient method for reliability evaluation of a parallel system. The reliability of a parallel system can be easily evaluated with the simple formula in equation (4.28). For a system with l minimal paths, the maximum number of possible terms generated by the IE method is 2l − 1. This would occur only when there is no component in common for any two minimal paths. Usually, some minimal paths would have components in common. For example, for the network diagram given in Figure 5.4 covered in the previous section, component 1 appears in three different minimal paths, namely MP1 , MP3 , and MP5 . Whenever there are common components in some minimal paths, some of the 2l − 1 possible terms of the IE method cancel each other because of the alternating signs in front of Sk for 1 ≤ k ≤ l. As a result, the actual number of final terms generated by the IE method is usually much smaller than 2l − 1. However, the IE method has to evaluate all these 2l − 1 terms and then let some of the terms cancel each other to produce the final result. In other words, this method is not a very efficient method for systems with a large number of minimal paths. Example 5.5 Use the IE method to find the reliability of the bridge network given in Figure 4.6. The minimal paths given in Example 4.6 are T1 = {1, 2},
T2 = {4, 5},
T3 = {1, 3, 5},
T4 = {2, 3, 4}.
The logic functions of the minimal paths denote the events that each minimal path works and are given in Example 4.7. We will use MPi to indicate the logic function of the ith minimal path. Then, we have MP1 = x 2 x2 ,
MP2 = x4 x5 ,
Applying the IE method, we have
MP3 = x 1 x3 x5 ,
MP4 = x 2 x3 x4 .
155
INCLUSION–EXCLUSION METHOD
S1 = Pr(MP1 ) + Pr(MP2 ) + Pr(MP3 ) + Pr(MP4 ) = p1 p2 + p4 p5 + p1 p3 p5 + p2 p3 p4 , S2 = Pr(MP1 MP2 ) + Pr(MP1 MP3 ) + Pr(MP1 MP4 ) + Pr(MP2 MP3 ) + Pr(MP2 MP4 ) + Pr(MP3 MP4 ) = p1 p2 p4 p5 + p1 p2 p3 p5 + p1 p2 p3 p4 + p1 p3 p4 p5 + p2 p3 p4 p5 + p1 p2 p3 p4 p5 , S3 = Pr(M P1 MP2 MP3 ) + Pr(MP2 MP2 MP4 ) + Pr(MP1 MP3 MP4 ) + Pr(MP2 MP3 MP4 ) = 4 p1 p2 p3 p4 p5 , S4 = Pr(MP1 MP2 MP3 MP4 ) = p1 p2 p3 p4 p5 , Rs = S1 − S2 + S3 − S4 = p1 p2 + p4 p5 + p1 p3 p5 + p2 p3 p4 − ( p1 p2 p5 + p1 p2 p3 p5 + p1 p2 p3 p4 + p1 p3 p4 p5 + p1 p3 p4 p5 ) + 2 p1 p2 p3 p4 p5 .
If all components are i.i.d. with reliability p, the reliability of the bridge structure becomes Rs = 2 p 2 + 2 p 3 − 5 p 4 + 2 p 5 .
(5.19)
In this example, there are four minimal paths, that is, l = 4. In evaluating Sk for 1 ≤ k ≤ 4, 24 − 1 = 15 terms are evaluated. In the final expression of Rs , 11 terms are added or subtracted. This means that four terms are canceled. For reliability evaluation of the bridge structure, the IE method is not as efficient as the decomposition method in combination with parallel and series reductions illustrated through Example 5.3. The IE method is based on the IE principle for evaluation of the union of several events. We have illustrated its use in system reliability evaluation using minimal paths. It can also be used for system unreliability evaluation using minimal cuts. Let E j be the event that all components in minimal cut set C j fail. We also say that E j represents the event that minimal cut C j fails. The probability that the minimal cut C j fails can be expressed as . Pr(E j ) = qi . (5.20) i∈C j
A system with m minimal cuts fails if and only if at least one of 6 the minimal cuts fails. In other words, system failure corresponds to the event mj=1 E j . The unreliability of the system is equal to the probability of the union of the m events, namely, m 1 (5.21) Q s = Pr E j . j=1
156
GENERAL METHODS FOR SYSTEM RELIABILITY EVALUATION
Let Vk =
1≤i 1 <···
7 7 7 ··· Pr E i1 E i2 E ik .
(5.22)
Then, Vk represents the sum of the probabilities that any k of the minimal cuts are simultaneously failed. The system unreliability, which is equal to the probability of the union of the failures of the m minimal cuts, can be expressed as Qs =
m
(−1)k−1 Vk .
(5.23)
k=1
In application of equation (5.23), V1 is included , V2 is excluded, V3 is included, V4 is excluded, and so on. In this process of including and excluding additional terms, upper and lower bounds on Q s become available, as given below: Q s ≤ V1 ,
(5.24)
Q s ≥ V1 − V2 ,
(5.25)
Q s ≤ V1 − V2 + V3 ,
(5.26)
Q s ≥ V1 − V2 + V3 − V4 ,
(5.27)
Q s ≤ V1 − V2 + V3 − V4 + V5 ,
(5.28)
.. . Example 5.6 Consider the bridge structure given in Figure 4.6. The minimal cuts given in Example 4.6 are C1 = {1, 4},
C2 = {2, 5},
C3 = {1, 3, 5},
C4 = {2, 3, 4}.
The logic functions of the minimal cuts denote the events that each minimal cut fails and are given in Example 4.7. We use MCi to indicate the logic function of the ith minimal cut. Then, we have MC1 = x 1 x 4 ,
MC2 = x 2 x 5 ,
MC3 = x 1 x 3 x 5 ,
MC4 = x 2 x 3 x 4 .
Applying the IE method, we have V1 = Pr(MC1 ) + Pr(MC2 ) + Pr(MC3 ) + Pr(MC4 ) = q1 q4 + q2 q5 + q1 q3 q5 + q2 q3 q4 , V2 = Pr(MC1 MC2 ) + Pr(MC1 MC3 ) + Pr(MC1 MC4 ) + Pr(MC2 MC3 ) + Pr(MC2 MC4 ) + Pr(MC3 MC4 ) = q1 q2 q4 q5 + q1 q3 q4 q5 + q1 q2 q3 q4 + q1 q2 q3 q5 + q2 q3 q4 q5 + q1 q2 q3 q4 q5 ,
SUM-OF-DISJOINT-PRODUCTS METHOD
157
V3 = Pr(MC1 MC2 MC3 ) + Pr(MC1 MC2 MC4 ) + Pr(MC1 MC3 MC4 ) + Pr(MC2 MC3 MC4 ) = 4q1 q2 q3 q4 q5 , V4 = Pr(MC1 MC2 MC3 MC4 ) = q1 q2 q3 q4 q5 , Rs = 1 − (V1 − V2 + V3 − V4 ) = 1 − q1 q4 − q2 q5 − q1 q3 q5 − q2 q3 q4 + q1 q2 q4 q5 + q1 q3 q4 q5 + q1 q2 q3 q4 + q1 q2 q3 q5 + q2 q3 q4 q5 − 2q1 q2 q3 q4 q5 . When all components are i.i.d. with unreliability q, the unreliability of the bridge structure becomes Q s = 2q 5 − 5q 4 + 2q 3 + 2q 2 .
(5.29)
5.5 SUM-OF-DISJOINT-PRODUCTS METHOD Fratta and Montanari [77] first reported the sum-of-disjoint-products (SDP) method in 1973. Abraham [2] published an improved version of SDP. Many other authors provided improvements on this method. Locks would be the conspicuous author who made the most of SDP in system reliability evaluation [146]–[149]. The SDP method uses minimal paths or minimal cuts to evaluate the probability of the union of several events. The union of the minimal paths can be expressed by the logic function of the system. This logic function can be expressed as a union of several terms. If these terms of the logic function are disjoint, then there is a oneto-one correspondence between this expression of the logic function of the system and the reliability measure of the system. Thus, the focus of the method is on expressing the logic function of the system as a union of disjoint terms. Each of these disjoint terms is a product of the events that individual components work or fail. The SDP method differs from the IE method in the signs (plus or minus) of the terms in the system reliability formula. With the IE method, the signs of the terms alternate between plus and minus with the plus signs denoting the sets whose probability is added in the reliability formula and the minus signs denoting the sets whose reliability is subtracted from the reliability formula due to double counting in the previous inclusion operations. With the SDP method, however, all sets are included, none are double counted, and all terms have the plus signs. An SDP formula for any but very small systems is smaller than the IE formula [149]. Disjoint Terms: Addition Law The addition law of probabilities is the underlying justification for the SDP method. If two or more events have no elements in common, the probability that at least one of the events will occur is the sum of the probabilities
158
GENERAL METHODS FOR SYSTEM RELIABILITY EVALUATION
of the individual events. If two events A and B have elements in common, the union of these two events, A ∪ B, may be expressed as the union of event A with event AB, where A denotes the complement of A. Then we have the following equation for evaluation of the probability of A ∪ B: Pr(A ∪ B) = Pr(A) + Pr( AB).
(5.30)
Similarly with three events A, B, and C, we have Pr(A ∪ B ∪ C) = Pr(A) + Pr(AB) + Pr(A B C).
(5.31)
With n events A1 , A2 , . . . , An , we have Pr(A1 ∪ · · · ∪ An ) = Pr(A1 ) + Pr( A1 A2 ) + Pr(A1 A2 A3 ) + · · · + Pr( A1 · · · A n−1 An ). (5.32) Equation (5.32) expresses the probability of the union of n events as a sum of n probability terms. Each additional term represents the contribution to the probability of the union by the additional event. For example, the first term, Pr(A1 ), is the contribution of the first event, A1 , on the probability of the union. The second term, Pr( A1 A2 ), represents the additional contribution of A2 that has not been accounted for by A1 toward the probability of the union. Expressing the probability of the union of n events into a summation of n probability terms is straightforward. However, the challenging task is how to evaluate the additional contribution of an additional event that has not been accounted for by any of the previous events. Abraham [2], Locks [149], and Wilson [244] divide the implementation of the addition law into two loops, the outer loop and the inner loop. Special rules are included in the inner loop for efficient evaluation of the additional contribution toward the union by an additional event that has not been accounted for by any of the previous events. We describe these two loops below. Outer Loop The outer loop of the SDP method is based on equation (5.32). Suppose that the l minimal paths of the system are MP1 , MP2 , . . . , MPl . Let Sk denote the contribution of the kth minimal path in the union of all minimal paths for 1 ≤ k ≤ l. Then, we have the following iterative equations for the evaluation of Sk : S1 = MP1 ,
(5.33)
Sk = MP1 MP2 · · · MPk−1 MPk ,
1 < k ≤ l.
(5.34)
Let Uk = Pr(Sk ),
1 ≤ k ≤ l.
Then Uk is the probability that the kth minimal path is the first minimal path that makes the system work. In other words, Uk is the probability that the first k − 1
SUM-OF-DISJOINT-PRODUCTS METHOD
159
minimal paths are failed and the kth minimal path is working. The reliability of the system can be expressed as Rs = U1 + U2 + · · · + Ul .
(5.35)
In the successive iterations of the outer loops, we find tighter and tighter lower bounds on system reliability. When all minimal paths are enumerated, the exact system reliability is found, as shown below: Lower1 = U1 , Lower2 = Lower1 + U2 , Lower3 = Lower2 + U3 , .. . Rs = Lowerl−1 + Ul . The order in which the minimal paths or minimal cuts are considered plays an important role in the efficiency of the algorithm. The following guidelines, in priority order, are suggested by Locks [149] and Wilson [244]: 1. Minimal paths (or minimal cuts) with fewer numbers of components are considered first. For example, for the bridge structure in Figure 4.6, we have MP1 = x1 x 2 , MP2 = x 4 x5 , MP3 = x 1 x 3 x5 , and MP4 = x2 x3 x4 . 2. Among minimal paths or minimal cuts of the same size (i.e., with the same number of components), the one that has the largest number of components in common with the previous minimal path is considered next. For example, among the minimal paths bc, cde, ad f , and adg, the correct ordering following these first two guidelines is MP1 = bc, MP2 = cde and the other two may be ordered as number 3 and number 4 arbitrarily. 3. Minimal paths (or minimal cuts) with the same number of components are ordered in ascending order (if the components are numbered) or alphabetical order (if the components are labeled with letters of the alphabet). For example, x 1 x2 is MP1 and x 4 x5 is MP2 for the bridge structure in Figure 4.6. For the remaining two minimal paths considered under rule 2, MP3 = ad f and MP4 = adg based on this rule. 4. The components within a minimal path are ordered in ascending or alphabetical order. For example, if components 1 and 2 form the first minimal path, we write MP1 = x1 x2 . If components a and c form the first minimal path, we write MP1 = ac. Inner Loop The outer loop takes one more minimal path into consideration in each iteration. At the kth step of the outer loop when the kth minimal path is considered, the inner loop is responsible for finding the expression of the event that includes the
160
GENERAL METHODS FOR SYSTEM RELIABILITY EVALUATION
kth minimal path but not any of the first k − 1 minimal paths. The following steps illustrate the evaluation of the kth term, Sk , where k > 1, using equation (5.34): 1. Remove the components that are present in MPk from each of the previous minimal paths, that is, MP1 , MP2 , . . . , MPk−1 . Find the logic expression of the union of these previous k −1 modified minimal paths. Use Boolean algebra to simplify this expression. 2. Invert the simplified logic expression, that is, find the complement of the simplified logic expression. This expression is a sum or union of products. 3. Express the logic expression obtained in the previous step as a sum of disjoint products. 4. Multiply each term of the logic expression obtained in the previous step by the logic expression of the kth minimal path. This will give the expression of Sk that represents the additional contribution of MPk and is disjoint with each of the previous minimal paths. Once Sk is obtained, Uk can be obtained directly from Sk . When Uk becomes available for 1 ≤ k ≤ l, we can easily find Rs with equation (5.35). We use three examples to illustrate the SDP method. Example 5.7 Consider the bridge structure in Figure 4.6. The minimal paths have been ordered and expressed as MP1 = x 1 x2 ,
MP2 = x4 x5 ,
MP3 = x 1 x3 x5 ,
MP4 = x 2 x3 x4 .
At outer loop 1, we consider the first minimal path, x1 x2 . In the inner loop, we simply find S1 , from which we obtain U1 : S1 = x1 x2 , U1 = Pr(S1 ) = p1 p2 . At outer loop 2, we consider the second minimal path, x4 x5 . In the inner loop, we first need to remove x4 and x 5 , which are members of the second minimal path, from the previous minimal path. Since they are not present in MP1 , the resulting logic expression is x 1 x2 . No simplification of this expression is possible. Inverting this expression results in x1 x2 = x 1 + x 2 . Expressing this logic function into a sum of disjoint products, we have x 1 + x1 x 2 . Multiplying MP2 into this expression, we have the expression of S2 : S2 = (x 1 + x 1 x 2 )x4 x5 = x 1 x4 x5 + x1 x 2 x4 x5 . In this equation, S2 is a sum of two disjoint products, each of which is disjoint from each of the previous minimal paths. From the expression of S2 , we find U2 as follows: U2 = q1 p4 p5 + p1 q2 p4 p5 .
SUM-OF-DISJOINT-PRODUCTS METHOD
161
At outer loop 3, we consider the third minimal path, x1 x3 x5 . In the inner loop, we deal with the following logic expressions: x1 x2 + x4 x5
(remove x1 , x 3 and x 5 , if present, from this expression),
x2 + x4
(no need to simplify; invert the expression),
x 2 x 4,
(multiply this expression by x1 x3 x5 ),
S3 = x1 x 2 x3 x 4 x5 , U3 = p 1 q2 p 3 q 4 p 5 . At outer loop 4, we consider the fourth minimal path, x2 x3 x4 . In the inner loop, we deal with the following logic expressions: x1 x2 + x4 x5 + x 1 x3 x5 ,
(remove x2 , x3 , and x4 from this expression),
x1 + x5 + x1 x5
(simplify such that x1 x5 is absorbed),
x1 + x5
(invert this expression),
x1x5
(multiply this expression by x2 x3 x4 ),
S4 = x 1 x2 x3 x4 x 5 , U4 = q1 p2 p3 p4 q5 . All four minimal paths have been considered. The reliability of the system is Rs = U1 + U2 + U3 + U4 = p1 p2 + q1 p4 p5 + p1 q2 p4 p5 + p1 q2 p3 q4 p5 + q1 p2 p3 p4 q5 . When the components are i.i.d. with reliability p and unreliability q, we have Rs = p 2 + qp2 + qp 3 + 2q 2 p 3 = 2 p2 + 2 p3 − 5 p 4 + 2 p 5 . Example 5.8 Evaluate the reliability of the bridge structure in Figure 4.6 using minimal cuts. From Example 5.6, MC1 = x 1 x 4 ,
MC2 = x 2 x 5 ,
MC3 = x 1 x 3 x 5 ,
Let Tk = MC 1 MC 2 · · · MC k−1 MC k , Vk = Pr(Tk ). For minimal cut 1,
MC4 = x 2 x 3 x 4 .
162
GENERAL METHODS FOR SYSTEM RELIABILITY EVALUATION
T1 = x 1 x 4 , V1 = Pr(T1 ) = q1 q4 . For minimal cut 2, x1x4
(no need to remove x 2 and x 5 ; invert this expression),
x1 + x 4 = x 1 + x 1 x4 ,
(multiply this expression by x 2 x 5 ),
T2 = x1 x 2 x 5 + x 1 x 2 x4 x 5 , V2 = p1 q2 q5 + q1 q2 p4 q5 . For minimal cut 3, x 1 x 4 + x 2 x 5,
(remove x 1 , x 3 and x 5 , if present, from this expression),
x 4 + x 2,
(invert this expression),
x 2 x4
(multiply this expression by x 1 x 3 x 5 ),
T3 = x 1 x2 x 3 x4 x 5 , V3 = q1 p2 q3 p4 q5 . For minimal cut 4, x1x4 + x2x5 + x1x3x5
(remove x 2 , x 3 , and x 4 from this expression),
x1 + x5 + x1x5
(simplify this expression),
x1 + x5
(invert this expression),
x 1 x5
(multiply this expression by x 2 x 3 x 4 ),
T4 = x1 x 2 x 3 x 4 x5 , V4 = p1 q2 q3 q4 p5 . The unreliability and reliability of the system are Q s = V1 + V2 + V3 + V4 , Rs = 1 − Q s = 1 − q1 q4 − p1 q2 q5 − q1 q2 p4 q5 − q1 p2 q3 p4 q5 − p1 q2 q3 q4 p5 . When the components are i.i.d. with reliability p and unreliability q, we have Rs = 1 − q 2 − pq 2 − pq 3 − 2 p 2 q 3 .
SUM-OF-DISJOINT-PRODUCTS METHOD
163
Example 5.9 Consider the network diagram in Figure 5.4. The seven minimal paths of this network for communications between node 1 and node 5 are derived with the node removal method in Section 5.3.2. These minimal paths are reordered as x 1 x4 , x 1 x3 x7 , x5 x6 x7 , x2 x4 x5 , x2 x 3 x5 x7 , x1 x2 x6 x7 , and x3 x4 x5 x6 . For minimal path 1, x1 x4 , we have S1 = x1 x4
and
U1 = p1 p4 .
For minimal path 2, x 1 x3 x7 , we deal with the following logic expressions: x4
(invert this expression),
x4
(multiply this expression by x 1 x3 x7 ),
S2 = x1 x3 x 4 x7 , U2 = p1 p3 q4 p7 . For minimal path 3, x5 x6 x7 , we have x1 x4 + x1 x3
(simplify this expression),
x 1 (x3 + x4 )
(invert this expression),
x1 + x3 x4
(express it as a sum of disjoint products),
x 1 + x1 x 3 x 4
(multiply this expression by x 5 x6 x7 ),
S3 = x 1 x 5 x6 x7 + x1 x 3 x 4 x5 x6 x7 , U3 = q1 p5 p6 p7 + p1 q3 q4 p5 p6 p7 . For minimal path 4, x2 x4 x5 , we have x1 + x1 x3 x7 + x6 x7
(simplify this expression),
x1 + x6 x7
(invert this expression),
x1 x6 + x1 x7
(express it as a sum of disjoint products),
x 1 x 6 + x 1 x6 x 7
(multiply this expression by x2 x4 x 5 ),
S4 = x 1 x2 x4 x5 x 6 + x 1 x2 x4 x5 x6 x 7 , U4 = q1 p2 p4 p5 q6 + q1 p2 p4 p5 p6 q7 . For minimal path 5, x 2 x3 x5 x7 , we have
164
GENERAL METHODS FOR SYSTEM RELIABILITY EVALUATION
x 1 x 4 + x 1 + x 6 + x4
(simplify this expression),
x1 + x4 + x 6
(invert this expression),
x1x4x6
(multiply this expression by x 2 x3 x5 x7 ),
S5 = x 1 x 2 x3 x 4 x5 x 6 x7 , U5 = q1 p2 p3 q4 p5 q6 p7 . For minimal path 6, x1 x2 x6 x7 , we have x4 + x 3 + x5 + x 4 x 5 + x 3 x5
(simplify this expression),
x3 + x 4 + x5
(invert this expression),
x3x4x5
(multiply this expression by x1 x2 x6 x7 ),
S6 = x1 x 2 x 3 x 4 x 5 x6 x7 , U6 = p1 p2 q3 q4 q5 p6 p7 . For minimal path 7, x3 x4 x5 x6 , we have x1 + x 1 x 7 + x 7 + x2 + x 2 x7 + x1 x2 x7
(simplify this expression),
x1 + x2 + x7 ,
(invert this expression),
x1x2x7
(multiply this expression by x3 x4 x5 x6 ),
S7 = x 1 x 2 x3 x4 x5 x6 x 7 , U7 = q1 q2 p3 p4 p5 p6 q7 . The reliability of the system is Rs = U1 + U2 + · · · + U6 + U7 = p1 p4 + p1 p3 q4 p7 + q1 p5 p6 p7 + p1 q3 q4 p5 p6 p7 + q1 p2 p4 p5 q6 + q1 p2 p4 p5 p6 q7 + q1 p2 p3 q4 p5 q6 p7 + p1 p2 q3 q4 q5 p6 p7 + q1 q2 p3 p4 p5 p6 q7 .
5.6 MARKOV CHAIN IMBEDDABLE STRUCTURES Chao and Lin [51] report the first use of a Markov chain in the analysis of a system reliability structure. The system reliability structure analyzed by Chao and Lin [51] is the consecutive-k-out-of-n:F structure, which will be covered in detail in Chapter 9. A k-dimensional binary vector is used to represent the state of the Markov chain in a consecutive-k-out-of-n:F system. The Markov chain has 2k possible states and, as a result, it is useful only for small k values. Fu and Hu [80] use a scalar to represent
MARKOV CHAIN IMBEDDABLE STRUCTURES
165
the state of the Markov chain, which makes the Markov chain technique an efficient tool for reliability evaluation of the consecutive-k-out-of-n:F systems. Subsequently, Chao and Fu [49, 50] standardize this approach of using the Markov chain in analysis of various system structures and provide a general framework and general results for it. System structures that can be represented by a Markov chain are named linearly connected systems by Fu and Lou [82]. Koutras [120] provides a systematic summary of this technique and calls such systems Markov chain imbeddable systems (MISs). A general discussion of this technique is covered in this section. We follow the notation used by Koutras [120]. 5.6.1 MIS Technique in Terms of System Failures Consider a general system with n components and assume that these components are labeled as 1, 2, . . . , n. In such a system, the number of failed components and their positions determine the state of the system. The principle of operation or breakdown must apply equally to the system and any subsystem consisting of components 1, 2, . . . , l(1 ≤ l ≤ n). Examples will be given to illustrate when this principle can be applied to a system structure. The system state may be represented by an integer that takes values 0, 1, . . . , m wherein state m indicates that the system is failed while the other states represent different levels of system deterioration. Take a series system with n components as an example. Whenever at least one component is failed, the system is failed. Now consider a series subsystem with components 1, 2, . . . , l(1 ≤ l ≤ n). The same failure principle applies to the subsystem; that is, whenever at least one component is failed, the subsystem is failed. The system may be in state 0 or 1 (i.e., m = 1), where 0 means no component failure in the system and 1 means at least one component failure in the system. Definition 5.1 (Koutras [120]) A reliability structure with n components is said to be a Markov chain imbeddable system, or MIS, if 1. there6 exists a finite state8 space S = {s0 , s1 , . . . , s N } that can be partitioned as m S = i=0 Si , where Si S j = ∅ for all i = j, and 2. there exists a Markov chain {Yl , 1 ≤ l ≤ n} defined on S such that (a) Yl ∈ Si if and only if the subsystem consisting of components 1, 2, . . . , l has reached the ith level of deterioration (i = 0, 1, . . . , m − 1) and (b) Yl ∈ Sm if and only if the subsystem consisting of components 1, 2, . . . , l is failed. In Definition 5.1, the number of system states of interest is equal to N + 1 and only the last system state, namely N + 1, indicates system failure. These N + 1 system states can be partitioned into m + 1 system deterioration levels or stages, if necessary. Without loss of generality, the state space S can be arranged such that Si = {s ji , s ji +1 , . . . , s ji+1 −1 }, i = 0, 1, . . . , m − 1 ( j0 = 0, jm = N ), and Sm =
166
GENERAL METHODS FOR SYSTEM RELIABILITY EVALUATION
{s N }. Note that the last level of deterioration represented by Sm indicates system breakdown. Let pi j (l) = Pr(Yl = s j | Yl−1 = si )
for 1 ≤ l ≤ n
be the transition probabilities of the Markov chain and ⌳l = pi j (l) be the corresponding (N + 1) × (N + 1) transition probability matrix. Since Sm = {s N } represents the system’s breakdown, it corresponds to an absorbing state; thus, 0 if i = N , (5.36) p N ,i = 1 if i = N , that is, the last row of the matrix ⌳l is (0, 0, . . . , 0, 1). Define the following (N + 1)-dimensional column vectors: e j = (0, . . . , 0, 1, 0, . . . , 0)T ,
where the 1 is at the jth position,
1 = (1, 1, . . . , 1)T , u = (1, 1, . . . , 1, 0)T , ⌸0 = (Pr(Y0 = s0 ), Pr(Y0 = s1 ), . . . , Pr(Y0 = s N ))T , where e j is called the jth unit column vector in the (N + 1) dimensional space and ⌸0 is the vector representing the probabilities of the Markov chain {Yl } when l = 0. Obviously, we must have Nj=0 Pr(Y0 = s j ) = 1. For a brand new system, we would have Pr(Y0 = s0 ) = 1. In other words, the system is at its best state at time 0. For 0 < j < N , Pr(Y0 = s j ) = 1 may be used to represent imperfect systems at time 0. We will use Rl and Q l to represent the reliability and unreliability of the subsystem with components 1, 2, . . . , l(1 ≤ l ≤ n). The following theorem can be used for system reliability evaluation [49, 50, 120]. Theorem 5.1 The reliability Rn and unreliability Q n of an MIS is given by
n . T Rn = ⌸0 (5.37) ⌳l u, l=1
Q n = 1 − Rn =
⌸0T
n .
⌳l e N +1 .
(5.38)
l=1
Equations (5.37) and (5.38) can be used for the computation of the reliability and unreliability of an MIS as long as the transition probability matrix has been defined.
MARKOV CHAIN IMBEDDABLE STRUCTURES
167
The following theorem provides a recursive procedure for numerical computation of Rn and Q n of MIS [49, 120]. Theorem 5.2 Let a(l) = [a0 (l), a1 (l), . . . , a N (l)]T , l = 1, 2, . . . , n, be the vector sequence generated by the recurrence relations: a j (l) =
N
ai (l − 1) pi j (l),
j = 0, 1, . . . , N ,
(5.39)
i=0
with initial condition a0 = ⌸0 , that is, a j (0) = ⌸T0 e j+1 ,
j = 0, 1, . . . , N .
(5.40)
Q n = a N (n).
(5.41)
Then, Rn =
N −1
a j (n),
j=0
This theorem can be used to derive efficient reliability and unreliability evaluation formulas for various system structures, as will be illustrated throughout this book. The next theorem provides a simple matrix-based formula for the evaluation of the reliability (unreliability) generating function of an MIS whose finite Markov chain is stationary. It should be noted that systems with i.i.d. component lifetimes can be represented by stationary Markov chains. Theorem 5.3 (Kourtras [120]) generating functions r (z) =
∞
For an MIS with ⌳l = ⌳, l = 1, 2, . . . , the
Rn z n ,
f (z) =
n=0
∞
Qn zn
(5.42)
n=0
of the reliabilities and unreliabilities are given by r (z) = ⌸T0 (I − z⌳)−1 u,
f (z) = ⌸T0 (I − z⌳)−1 e N +1 ,
(5.43)
respectively. If ⌸0 = e1 , the Markov chain is at state 0 at time 0. In this case, the evaluation of r (z) and f (z) can be easily achieved by computing the last entry, α1,N +1 , in the last row of the matrix, (I − z⌳)−1 . Then we will have r (z) =
1 − α1,N +1 , 1−z
f (z) = α1,N +1 .
168
GENERAL METHODS FOR SYSTEM RELIABILITY EVALUATION
Remarks 1. The MIS structure and Theorem 5.1 have been used by Chao and Fu [49], Fu and Lou [82], and Chao and Fu [50] in analysis of the asymptotic (n → ∞) behavior of the system. In this case, the problem becomes the study of the ergodicity of infinite products of stochastic matrices. For work along this direction, readers are also referred to Iosifescu [110]. 2. As will be further illustrated later in this book, Theorem 5.2 provides a convenient way to generate efficient system reliability evaluation algorithms for many system structures. To illustrate the use of MIS, consider both series and parallel systems. First, consider a series system with n components with reliabilities pi = 1 − qi (i = 1, 2, . . . , n). A subsystem with components 1, 2, . . . , l(1 ≤ l ≤ n) is failed if at least one component is failed. We can use S = {0, 1} = {s0 , s1 } as our state space. State 0 indicates that all the components are working, that is, the series system or subsystem is working; state 1 indicates that at least one of the components is failed, that is, the series system or subsystem is failed. In a series system we are only interested in knowing whether any component is failed or not. The partition S0 = {0} and S1 = {1} satisfies condition (1) in Definition 5.1, where N = m = 1. If we define 1. Yl = 0 if the subsystem with components 1, 2, . . . , l is working (1 ≤ l ≤ n) and 2. Yl = 1 if the subsystem with components 1, 2, . . . , l is failed (1 ≤ l ≤ n), then we have a Markov chain {Yl , l = 0, 1} defined on S. The transition probability matrix of the Markov chain is pl ql p00 (l) p01 (l) ⌳l = = . (5.44) p10 (l) p11 (l) 0 1 In equation (5.44), p00 (l) represents the probability that the l-component subsystem works (i.e., Yl = 0) given that the (l − 1) component subsystem works (i.e., Yl−1 = 0). It is simply the probability that the additional component l works, that is, p00 (l) = pl . Similar interpretations and justifications can be obtained for other pi j (l)’s (i = 0, 1, j = 0, 1). We can then immediately obtain the following: n .
⌳l =
9n l=1
0
l=1
pl
1−
9n l=1
pl
1
.
With ⌸0 = (1, 0)T and u = (1, 0)T , Theorem 5.1 yields the formulas for evaluation of series system reliability and unreliability: Rn =
n . l=1
pl ,
Qn = 1 −
n . l=1
pl .
MARKOV CHAIN IMBEDDABLE STRUCTURES
169
Since a series system is very simple, there is no need to use Theorem 5.2 to derive recursive equations. We will illustrate the use of Theorems 5.2 and 5.3 in later chapters. Now consider a parallel system with n components whose reliabilities are given by pi for i = 1, 2, . . . , n and define qi = 1 − pi for i = 1, 2, . . . , n. We are interested in knowing if all the n components are failed or not. The subsystem with components 1, 2, . . . , l(1 ≤ l ≤ n) is defined to be failed if n components are failed. This principle should be applicable to both the system and each subsystem. It is apparent that a subsystem with less than n components will never fail. We have to use S = {0, 1, . . . , n} = {s0 , s1 , . . . , sn } as our state space. State i indicates that i components have failed (i = 0, 1, . . . , n). We can use partition Si = {si } for i = 0, 1, . . . , n. Referring to Definition 5.1, we have N = m = n. If we define Yl = i if there are i failures in the subsystem with components 1, 2, . . . , l, then we have a Markov chain {Yl , 1 ≤ l ≤ n} defined on S. We will take a three-component parallel system as an example. The transition probability matrix of the Markov chain is
pl
0 ⌳l = 0 0
ql
0
pl
ql
0
pl
0
0
p1 p2 p3 3 . ⌳l = 0 l=1 0
0
0 , ql 1
(5.45)
p1 p2 q3 + p3 p1 q2 + p3 q1 p2
q3 p1 q2 +q3 q1 p2 +q1 q2 p3
p1 p2 p3
p1 p2 q3 + p3 p1 q2 + p3 q1 p2
0
p 1 p2 p3
0
0
0
q3 p1 q2 +q3 q1 p2 . +q1 q2 p1 p2 q3 + p1 q2 + q1 q1 q2 q3
(5.46)
1
With this matrix, we can find the system reliability R3 and unreliability Q 3 : p p p 1 2 3 R3 = [1 0 0 0] 0 0 0
p1 p2 q3 + p3 p1 q2 + p3 q1 p2
q3 p1 q2 +q3 q1 p2 +q1 q2 p3
p1 p2 p3
p1 p2 q3 + p 3 p1 q2 + p3 q1 p2
0
p 1 p2 p3
0
0
1 q1 q2 q3 q3 p1 q2 1 +q3 q1 p2 (5.47) +q1 q2 1 p1 p2 q3 + p1 q2 + q1 1
0
= p1 p2 p3 + p1 p2 q3 + p3 p1 q2 + p3 q1 p2 + q3 p1 q2 + q3 q1 p2 + q1 q2 p3 , (5.48)
170
GENERAL METHODS FOR SYSTEM RELIABILITY EVALUATION
p p p 1 2 3 Q 3 = [1 0 0 0] 0 0 0
p1 p2 q3 + p3 p1 q2 + p3 q1 p2
q3 p1 q2 +q3 q1 p2 +q1 q2 p3
p1 p2 p3
p1 p2 q3 + p 3 p1 q2 + p3 q1 p2
0
p 1 p2 p3
0
0
= q 1 q2 q3 .
0 q1 q2 q3 q3 p1 q2 0 +q3 q1 p2 (5.49) +q1 q2 0 p1 p2 q3 + p1 q2 + q1 1
1 (5.50)
From the analysis of the parallel system with the MIS technique, we find that it is not a very efficient technique for analysis of the parallel system. This is because the MIS technique concentrates on system failure. State N of the Markov chain is the only state indicating the failure of the system. States 0, 1, . . . , N − 1 are all success states. In a series system, as soon as one component is failed, the system is failed, N = 1, and N is much smaller than n. In a parallel system, the system fails only if all components are failed and thus N = n. If the MIS technique can be defined such that it concentrates on system success, it will be more useful for analysis of parallel systems and other systems that are more like a parallel system than a series system. In the following section, we define the MIS technique in terms of system success. 5.6.2 MIS Technique in Terms of System Success The states of the systems that the MIS technique is designed for are defined in terms of system failure. A system state represents the number of failed components in the system. The system state space is {0, 1, 2, . . . , N }, where N is the only failure state of the system while the others are the success states of the system. The system has N +1 levels of deterioration. When there is no component in the system, the system is at state 0, the perfect state. As more components are added, the system state may get worse or stay the same, but it cannot get any better because 0 is already the best state. Thus, it is a perfect mechanism for analysis of the F systems, which are systems that fail. The more components the system has, the lower the system reliability. Whenever the system has such a property, the described MIS can be conveniently applied. This characteristic is confirmed by Chao and Fu [49], who state that the reliability Rl is a decreasing function of l, where l indicates the number of components in the subsystem. This design is efficient for systems that are not very tolerant of component failures, in other words, where N is small. It is not efficient when N is large, say, very close to the number of the components, n, in the system. This has been illustrated in the previous section when we analyzed a parallel system. In a parallel system, the more components there are, the higher the system reliability. A parallel system is a G system, which means the system is working, because the system works if at least one component works. A parallel system is the most tolerant of component failures. It fails only if all components are failed.
DELTA–STAR AND STAR–DELTA TRANSFORMATIONS
171
In this section, we modify the MIS technique for efficient analysis of systems that are tolerant to component failures. The states of the system focus on system success. A state of the system indicates the number of components that are working in the system. Consider a system with N + 1 states, say 0, 1, . . . , N , where state N is the only working state while the others are failure states. We say that the working state is the absorbing state. Similarly, we can obtain a transition probability matrix for the system from one state to another when one more component is added. Then, the formula for reliability in the original MIS design will become the formula for unreliability in the newly defined MIS in this section. We will illustrate this for the parallel system structure. In a parallel system with n components, the system and any subsystem with l components (1 ≤ l ≤ n) may only be in states 0 and 1, where 0 indicates the failure state and 1 indicates the working state. Then the transition matrix can be expressed as q pl ⌳l = l . (5.51) 0 1 Then, we have n .
⌳l =
9n
l=1 ql
0
l=1
1−
9n
l=1 ql
1
.
With Π0 = (1, 0)T and u = (1, 0)T , we can obtain the following formulas for evaluation of parallel system reliability and unreliability: Qn =
n . l=1
ql ,
Rn = 1 −
n .
ql .
l=1
Apparently, the newly formulated structure is a lot easier to use for parallel system structures. The MIS technique for both F and G systems will be further illustrated in later chapters of the book.
5.7 DELTA–STAR AND STAR–DELTA TRANSFORMATIONS In network reliability evaluation, parallel and series reductions should always be applied first. When no more series and parallel reductions are possible, other techniques should be used. As illustrated earlier, an application of the decomposition method may make further parallel and/or series reductions possible. Another method called the delta–star and star–delta transformations may also make further parallel and/or series reductions possible. In this section, we introduce the delta–star and star–delta transformation techniques for two-terminal network reliability evaluation. It is assumed that the nodes of the network are perfect while the links are failureprone unless specified otherwise. The failure-prone components are independent. A
172
GENERAL METHODS FOR SYSTEM RELIABILITY EVALUATION
B
B
II
1 3
A
A
I
2
O
III C
(a) The Delta structure FIGURE 5.5
C
(b) The Star structure
(a) Delta and (b) star structures.
delta structure has three nodes and three links that form a delta, or triangle, shape, as shown in Figure 5.5a. The failure-prone components in a delta structure are labeled 1, 2, and 3. The nodes of the delta structure, labeled A, B, and C, may be connected to other nodes of a larger network. A star structure has four nodes and three links that form a star shape, as shown in Figure 5.5b. The failure-prone components in a star network are labeled I, II, and III. The node in the middle of the star structure is not connected to any node outside the star structure and is labeled O. The three end nodes of the star structure are labeled A, B, and C and may be connected to other nodes of a larger network. In a delta–star transformation, we replace the delta structure with the star structure, and the reliabilities of components I, II, and III in the star structure are expressed as functions of the reliabilities of components 1, 2, and 3 in the original delta structure. A delta–star transformation will be performed if it simplifies the original network. A delta–star transformation may make further delta–star transformations, star–delta transformation, parallel reductions, and series reductions possible. In a star–delta transformation, we replace the star structure with the delta structure and the reliabilities of components 1, 2, and 3 in the delta structure are expressed as functions of the reliabilities of components I, II, and III in the original star structure. A star–delta transformation will be performed if it simplifies the original network. A star–delta transformation may make further star–delta transformations, delta–star transformations, parallel reductions, and series reductions possible. The transformation to be performed must produce a structure that is equivalent to the original structure. For the delta structure to be equivalent to the star structure in network reliability evaluation, we must preserve the point-to-point probabilities. Notation • • •
A∼ B: the event that node A is connected to node B A∼ C: the event that node A is connected to node C B∼ C: the event that node B is connected to node C
DELTA–STAR AND STAR–DELTA TRANSFORMATIONS
173
For the two structures to be equivalent in network reliability evaluation, it is necessary and sufficient to preserve the following four point-to-point probabilities [18, 89, 204]: Pr(A ∼ B),
Pr(A ∼ C),
Pr(B ∼ C),
Pr({A ∼ B} ∪ {A ∼ C}). (5.52)
All other point-to-point probabilities are automatically preserved by these four probabilities. In the following, we introduce these two transformation techniques under different assumptions.
5.7.1 Star or Delta Structure with One Input Node and Two Output Nodes In this section, we consider a simpler version of the delta–star and star–delta transformations. It is assumed that one of the three nodes, A, B, or C, which has to be the same node in both structures, only receives input from outside of the structure while the other two nodes only send output to the rest of the network. For convenience, we will assume that A is the input node and B and C are the output nodes of both the delta and the star structures. Under this assumption, signals may flow from A to B and from A to C and signals may flow between B and C in either directions. Under this assumption, we only need to preserve the following point-to-point probabilities [203]: Pr(A ∼ B),
Pr(A ∼ C),
Pr({A ∼ B} ∪ {A ∼ C}).
(5.53)
To preserve these probabilities in delta–star and star–delta transformations, we have the following equations: Pr(A ∼ B) = p1 + q1 p2 p3 = pI pII ,
(5.54)
Pr(A ∼ C) = p2 + q2 p1 p3 = pI pIII ,
(5.55)
Pr({A ∼ B} ∪ {A ∼ C}) = 1 − q1 q2 = pI (1 − qII qIII ).
(5.56)
Solving for pI , pII , and pIII in terms of p1 , p2 , and p3 in equations (5.54)–(5.56), we find the following equations for the delta–star transformation: αβ , γ γ pII = , β γ pIII = , α pI =
(5.57) (5.58) (5.59)
174
GENERAL METHODS FOR SYSTEM RELIABILITY EVALUATION
where α = p1 + q1 p2 p3 ,
(5.60)
β = p2 + q2 p1 p3 ,
(5.61)
γ = p1 p2 + q1 p2 p3 + q2 p1 p3 .
(5.62)
Solving for p1 , p2 , and p3 in terms of pI , pII , and pIII in equations (5.54)–(5.56), we find the following equations for the star–delta transformation: p1 =
a − cp3 , q3
(5.63)
p2 =
b − cp3 , q3
(5.64)
c(1 − c) p32 − (a + b)(1 − c) p3 + a + b − ab − c = 0, where a = pI pII , b = pI pIII , and c = pI (1 − qII qIII ). We can also express p3 as (a + b)(1 − c) ± (a + b)2 (1 − c)2 − 4c(1 − c)(a + b − ab − c) p3 = , 2c(1 − c)
(5.65)
(5.66)
where 0 < p3 < 1. For the star–delta transformation, no simple closed-form solutions are available for the equivalent component reliabilities in the delta structure. One has to solve the quadratic equation in (5.65) or use equation (5.66) to find p3 . This shows that the star–delta transformation is not as easy to use as the Delta-Star transformation. We use the following example to illustrate the delta–star transformation. Example 5.10 Consider the bridge network given in Figure 5.6. The pivotal decomposition method was used in Example 5.3 for system reliability evaluation. The expression of the system reliability is given in equation (5.3).
B
1
2 3
A
4
D
5 C
FIGURE 5.6 Bridge network with nodes labeled.
DELTA–STAR AND STAR–DELTA TRANSFORMATIONS
175
Components 1, 3, and 4 form a delta structure. Node A only receives input from outside the star structure and nodes B and C only send output to node 2 or 5. The condition for use of the delta–star transformation is satisfied. Equations (5.57)–(5.62) can be used directly except that p4 should be used in the place of p2 : α = p1 + q1 p4 p3 , β = p4 + q4 p1 p3 , γ = p1 p 4 + q 1 p 4 p 3 + q 4 p 1 p 3 , αβ ( p1 + q1 p4 p3 )( p4 + q4 p1 p3 ) , = γ p1 p4 + q1 p4 p3 + q4 p1 p3 γ p1 p4 + q1 p4 p3 + q4 p1 p3 pII = = , β p4 + q4 p1 p3 p1 p4 + q1 p4 p3 + q4 p1 p3 γ pIII = = . α p1 + q1 p4 p3 pI =
After this delta–star transformation, the system structure is shown in Figure 5.7. Applying series and parallel reductions, we obtain the following system reliability of the original bridge network: p5 γ p2 γ 1− Rs = pI [1 − (1 − pII p2 )(1 − pIII p5 )] = pI 1 − 1 − β α (β − p2 γ )(α − p5 γ ) = pI 1 − αβ
αβ − αp2 γ − βp5 γ + p2 p5 γ 2 = pI 1 − αβ =
αβ αp2 γ + βp5 γ − p2 p5 γ 2 × = p2 α + p5 β − p2 p5 γ γ αβ
= p2 ( p1 + q1 p4 p3 ) + p5 ( p4 + q4 p1 p3 ) − p2 p5 ( p1 p4 + q1 p4 p3 + q4 p1 p3 ). B
II
2
III
5
I
D
A
C FIGURE 5.7
Bridge network after delta–star transformation.
176
GENERAL METHODS FOR SYSTEM RELIABILITY EVALUATION
When the components in the original bridge network are i.i.d., we have pI =
(1 + p − p 2 )2 , 3 − 2p
pII =
3 p − 2 p2 , 1 + p − p2
pIII =
3 p − 2 p2 , 1 + p − p2
Rs = 2 p 2 + 2 p 3 − 5 p 4 + 2 p 5 ,
which is the same as the expression of the system reliability obtained with other system reliability evaluation methods. In the above example, the delta–star transformation may also be applied to the delta structure formed by the components 2, 5, and 3, in which node D is the output node while nodes B and C are the input nodes to the identified delta structure. However, the star–delta transformation cannot be applied to the bridge structure because there are no star structures that satisfy the conditions for application of the transformation. Consider the network diagram given in Figure 5.8. There are seven failure-prone components (links) while the nodes are failure free in this network. The delta–star transformation may be applied to the delta structure formed by the components 2, 3, and 4 or the delta structure formed by the components 4, 5, and 6. There is no need to apply the delta–star transformation to both of these delta structures. With only one such delta–star transformation, we are able to utilize parallel and series reductions to find the reliability of the network. One may also use the star–delta transformation for this network. The star structure formed by the components 1, 2, and 3 may be transformed into a delta structure. The star structure formed by the components 5, 6, and 7 may also be transformed into a delta structure. After both these transformations are performed, the transformed network diagram is shown in Figure 5.9. After parallel reductions are applied, we get a bridge network. Another delta–star transformation is needed, as illustrated in the previous example, to find the reliability of the resulting bridge network. In other
C
2 A
1
5 4
B
3
E
7
F
6 D
FIGURE 5.8 Network to which delta–star and star–delta transformation techniques may be applied.
DELTA–STAR AND STAR–DELTA TRANSFORMATIONS
177
C
I
I' III
A
4
III'
F
II'
II D
FIGURE 5.9 Network in Figure 5.8 after two star–delta transformations.
words, two star–delta transformations and one delta–star transformation are needed to find the reliability of the network shown in Figure 5.8. This shows that star–delta transformation is not very efficient. As we have stated in the previous paragraph, a single delta–star transformation will enable us to find the reliability of the same network. The double-bridge network shown in Figure 5.10 has four delta structures, labeled D1 , D2 , D3 , and D4 , respectively. It is easy to see that node A is the input node to delta D1 and delta D2 . However, nodes B and C are not strictly output nodes of delta D1 and nodes C and D are not strictly output nodes of delta D2 . They may be either input nodes or output nodes depending on the status of the components in the system. It is also easy to see that node E is the output node for delta D3 and delta D4 . However, nodes B and C are not strictly input nodes for delta D3 and nodes C and D are not strictly input nodes for delta D4 either. Thus, the transformations covered in this section cannot be applied to the double-bridge network.
1 D1 A
2 D2
3 FIGURE 5.10
B
4 C
5 D
6 D3
7
E
D4
8
Double-bridge network.
178
GENERAL METHODS FOR SYSTEM RELIABILITY EVALUATION
5.7.2 Delta Structure in Which Each Node May Be either an Input Node or an Output Node We concentrate on the delta–star transformation in this section. In a large network, it may be easy to identify delta structures. However, it is often impossible for one of the nodes to be only an input node or only an output node for the identified delta structure. Each node is often either an input or an output node depending on the network requirement. In this case, the equations derived in the previous section cannot be used. When each node of the delta structure may be either an input node or an output node, we have to preserve all the probabilities in equation (5.52) in the delta–star transformation. If each structure has three failure-prone components, such a transformation is impossible except in some trial cases [241]. To make the delta–star transformation possible, Rosenthal and Frisque [204] allow the center node in the star structure to be failure prone too. In the following, we illustrate this approach. Let pO be the reliability of node O in the star structure. All other nodes are perfect. To preserve the probabilities in equation (5.52), we obtain the following equations: Pr(A ∼ B) = p1 + q1 p2 p3 = pI pO pII ,
(5.67)
Pr(A ∼ C) = p2 + q2 p1 p3 = pI pO pIII ,
(5.68)
Pr(B ∼ C) = p3 + q3 p1 p2 = p II pO pIII ,
(5.69)
Pr({A ∼ B} ∪ {A ∼ C}) = 1 − q1 q2 = pI pO (1 − qII qIII ).
(5.70)
From these equations, we obtain the following probabilities for the star structure: pI =
α+β −δ , γ
(5.71)
pII =
α+β −δ , β
(5.72)
α+β −δ , α αβγ , pO = (α + β − δ)2 pIII =
(5.73) (5.74)
where α = p1 + q1 p2 p3 ,
(5.75)
β = p2 + q2 p1 p3 ,
(5.76)
γ = p3 + q3 p1 p2 ,
(5.77)
δ = 1 − q1 q2 .
(5.78)
DELTA–STAR AND STAR–DELTA TRANSFORMATIONS
O
I
II
B
179
6
III A
C
E
III' 3
II'
I'
D
FIGURE 5.11
O'
Double-bridge network after two delta–star transformations.
Example 5.11 Consider the double-bridge network in Figure 5.10. The component reliabilities and unreliabilities pi and qi for 1 ≤ i ≤ 8 are known. Applying delta– star transformations to the delta structures D1 and D4 , we obtain the network shown in Figure 5.11 with the following reliabilities of the newly generated components: pI =
αI + βI − δI , γI
pII =
αI + βI − δI , βI
pIII =
αI + βI − δI , αI
pO =
αI βI γI , (αI + βI − δI )2
where αI = p1 + q1 p2 p4 ,
βI = p2 + q2 p1 p4 ,
γI = p4 + q4 p1 p2 ,
δI = 1 − q1 q2 ,
and pI =
αIV + βIV − δIV , γIV
pII =
αIV + βIV − δIV , βIV
pIII =
αIV + βIV − δIV , αIV
pO =
αIV βIV γIV , (αIV + βIV − δIV )2
180
GENERAL METHODS FOR SYSTEM RELIABILITY EVALUATION O
I
III-III'
A
3-II' FIGURE 5.12 ductions.
6-II
O'
E
I'
Double-bridge network after two delta–star transformations and series re-
where αIV = p8 + q8 p7 p5 , βIV = p7 + q7 p8 p5 , γIV = p5 + q5 p8 p7 , δIV = 1 − q8 q7 . Applying series reductions to the network in Figure 5.11 generates the network diagram in Figure 5.12 with the following new component reliabilities: p6-II = p6 pII , pIII-III = pIII pIII , p3-II = p3 pII . The network in Figure 5.12 is very similar to a standard bridge network except that nodes O and O are failure prone. The decomposition technique may be applied to this network for system reliability evaluation. We will decompose on the two failureprone nodes simultaneously. Rs = pO pO Rb + pO qO pI p6–II + qO pO pI p3–II , where Rb is the reliability of the standard bridge network, which can be calculated with equation (5.3). After relabeling the components and applying equation (5.3), we have Rb = pIII–III (1 − qI q3–II )(1 − q6–II qI ) + qIII–III [1 − (1 − pI p6–II )(1 − p3–II pI )]. 5.8 BOUNDS ON SYSTEM RELIABILITY We have covered many methods for evaluation of the exact reliability or the exact unreliability of a system. For large systems, it may take a significant amount of time to find these exact measures. Very often, a bounding technique can provide approximations for exact system reliability or unreliability in a much shorter time. In this
BOUNDS ON SYSTEM RELIABILITY
181
section, we describe the methods for bounding system reliability or system unreliability.
5.8.1 IE Method The IE method provides successive upper and lower bounds on system reliability when minimal paths are used and on system unreliability when minimal cuts are used. For highly reliable systems, the bounds with minimal cuts are much tighter than those with minimal paths. Since most of today’s engineering systems are highly reliable, the IE method with minimal cuts is more useful than that with minimal paths. The following example illustrates this point. Example 5.12 Consider the bridge structure again. Assume that components are i.i.d. with reliability p = 0.99. Compare the upper and lower bounds on system reliability and unreliability obtained with the IE method using minimal paths and minimal cuts. Using equation (5.19), we find the exact system reliability and unreliability: Rs = 2 × 0.992 + 2 × 0.993 − 5 × 0.994 + 2 × 0.995 = 0.9997980498, Q s = 1 − Rs = 0.0002029502. With the IE method using minimal paths, we have the following bounds on system reliability: Upper1 = S1 = 2 p2 + 2 p3 = 3.9007980000, Lower1 = S1 − S2 = 3.9007980000 − (5 p4 + p 5 ) = −1.8531721000, Upper2 = Lower1 + S3 = −1.8531721000 + 4 p 5 = 1.9507881000, Lower2 = Upper2 − S4 = 1.9507881000 − p5 = 0.9997980497. Based on these calculations, we can see that the IE method using minimal paths does not provide any useful bounds for the bridge structure until the exact system reliability is obtained. An upper bound greater than 1 is useless and a lower bound less than 0 is useless because system reliability must be greater than 0 and less than 1. However, if we use the IE method with minimal cuts, we have the following results. From Example 5.6 and with qc = 0.01, V1 = 2q 3 + 2q 2 = 0.0002020000, V2 = q 5 + 5q 4 = 0.0000000501, V3 = 4q 5 = 4 × 10−10 , V4 = q 5 = 10−10 .
182
GENERAL METHODS FOR SYSTEM RELIABILITY EVALUATION
Thus, the bounds on system reliability are Lower1 = 1 − V1 = 0.9997980000, Upper1 = 1 − V1 + V2 = 0.9997980501, Lower2 = 1 − V1 + V2 − V3 = 0.9997980497, Upper2 = 1 − V1 + V2 − V3 + V4 = 0.9997980498. 5.8.2 SDP Method The SDP method provides successive lower bounds on system reliability when minimal paths are used and successive upper bounds on system reliability when minimal cuts are used. Based on the algorithm developed and refined by Abraham [2], Locks [149], and Wilson [244], the minimal paths and minimal cuts are ordered from short to long. For systems with comparable component reliabilities, the probability for a shorter minimal path to work (or a shorter minimal cut to fail) is much higher than that for a longer minimal path (or a longer minimal cut). Thus, the SDP method usually provides tighter bounds on system reliability with fewer number of terms than the IE method. Example 5.13 For the bridge structure in Example 5.12, derive the lower bounds on system reliability using the SDP method with minimal paths and the upper bounds on system reliability using the SDP method with minimal cuts. Compare these bounds with those developed with the IE method. From Example 5.7, U1 = p2 = 0.9801000000, U2 = p2 q + p3 q = 0.0195039900, U3 = p3 q 2 = 0.0000970299, U4 = p3 q 2 = 0.0000970299. Thus, Lower1 = U1 = 0.9801000000, Lower2 = Lower1 + U2 = 0.9996039900, Lower3 = Lower2 + U3 = 0.9997010199, Lower4 = Lower3 + U4 = 0.9997980498. From Example 5.8, V1 = q 2 = 0.0001000000, V2 = pq 2 + pq 3 = 0.0000999900,
BOUNDS ON SYSTEM RELIABILITY
183
V3 = p2 q 3 = 0.0000009801, V4 = p2 q 3 = 0.0000009801. Thus, Upper1 = 1 − V1 = 0.9999000000, Upper2 = Upper1 − V2 = 0.9998000100, Upper3 = Upper2 − V3 = 0.9997990299, Upper4 = Upper3 − V4 = 0.9997980498. It is also clear from this example that the SDP method with minimal cuts provides tighter bounds on system reliability for systems with highly reliable components. 5.8.3 Esary–Proschan (EP) Method This method is based on Esary and Proschan [74]. Consider a system with l minimal paths. A minimal path works if all components in the minimal path work. A minimal path fails if at least one component in the path is failed. Let ρi represent the probability that all components in MPi work for i = 1, 2, . . . , l. The system fails if and only if all minimal paths are failed. If these minimal paths have no components in common, the unreliability of the system is then equal to the product of the probabilities that each minimal path is failed. This is only true of a parallel system. In a parallel system, each minimal path has one distinct component. Thus, not only are the components independent in a parallel system, but the minimal paths are also independent. However, for a general system with more than one minimal path, at least one pair of minimal paths will usually have some components in common. The probability for all minimal paths to fail is then usually larger than the product of the probabilities that each minimal path is failed. This is simply because the minimal paths have components in common. This fact can be mathematically shown as follows: Q s = Pr(MP1 ∩ MP2 ∩ · · · ∩ MPl ) = Pr(MP1 ) Pr(MP2 | MP1 ) · · · Pr(MPl | MP1 MP2 · · · MPl−1 ) ≥ Pr(MP1 ) Pr(MP2 ) · · · Pr(MPl ) = (1 − ρ1 )(1 − ρ2 ) · · · (1 − ρl ). We provide the following interpretations of the above inequality. The probability that MP2 is failed given that MP1 has failed is larger when these two minimal paths have common components than when they do not. The failure of the component that causes MP1 to fail may cause MP2 to fail at the same time if it is in both MP1 and MP2 . Similarly, we have Pr(MP j | MP1 MP2 · · · MP j−1 ) ≥ Pr(MP j ) for j > 2 whenever MP j has any component in common with any of the previous j −1 minimal
184
GENERAL METHODS FOR SYSTEM RELIABILITY EVALUATION
paths, under the assumption that components are independent. These minimal paths are said to be associated because the failure of one minimal path increases the chance for the other minimal paths to fail. Two components are said to be associated if the failure of one component increases the failure probability of the other component. As a result, we have the following upper bound on system reliability: Upper = 1 − (1 − ρ1 )(1 − ρ2 ) · · · (1 − ρl ).
(5.79)
Applying similar arguments to minimal cuts, we are able to obtain a lower bound on the reliability of a system with m minimal cuts. A minimal cut fails if all components in the cut are failed. It works if at least one component in the cut works. For the system to work, we need to have all minimal cuts work. If two minimal cuts have components in common, the probability for both minimal cuts to work is higher than the product of the probabilities for each minimal cut to work. This is because the component that make MCi work may simultaneously makes MC j work if it is in both MCi and MC j for i = j. These minimal cuts are also said to be associated because the failure of one minimal cut increases the chance for other minimal cuts to fail. Let γi be the probability that all components in MCi are failed and MCi be the event that the ith minimal cut works for 1 ≤ i ≤ m. Then, we have Rs = Pr(MC1 ∩ MC2 ∩ · · · ∩ MCm )
(5.80)
= Pr(MC1 ) Pr(MC2 | MC1 ) · · · Pr(MCm | MC1 MC2 · · · MCm−1 )
(5.81)
≥ Pr(MC1 ) Pr(MC2 ) · · · Pr(MCm )
(5.82)
= (1 − γ1 )(1 − γ2 ) · · · (1 − γm ).
(5.83)
As a result, we have the following lower bound on system reliability: Lower = (1 − γ1 )(1 − γ2 ) · · · (1 − γm ).
(5.84)
Example 5.14 Find the lower and upper bounds of the bridge structure with i.i.d. components using the Esary–Proschan method. Compare these results with the bounds obtained with the IE and the SDP methods: ρ1 = p 2 ,
ρ2 = p 2 ,
ρ3 = p 3 ,
ρ4 = p3 .
Hence, Upper = 1 − (1 − p 2 )2 (1 − p 3 )2 = 0.9999996507, and γ1 = q 2 ,
γ2 = q 2 ,
γ3 = q 3 ,
γ4 = q 3 .
BOUNDS ON SYSTEM RELIABILITY
185
So, Lower = (1 − q 2 )2 (1 − q 3 )2 = 0.9997980104. 5.8.4 Min–Max Bounds Barlow and Proschan [22] provide the min–max bounds on the reliability of a coherent system using minimal paths and minimal cuts. Let l be the number of minimal paths and m the number of minimal cuts. Let ρi be the probability that minimal path i contains all working components for 1 ≤ i ≤ l. Let γi be the probability that minimal cut i contains all failed components for 1 ≤ i ≤ m. The components are independent. The following equations provide the upper and lower bounds on system reliability: Lower = max ρi ,
(5.85)
Upper = min {1 − γi }.
(5.86)
1≤i≤l
1≤i≤m
These bounds can be interpreted as follows. The system is more reliable than the most reliable minimal path because other minimal paths are redundant paths for the system. The system is less reliable than the least reliable minimal cut because other minimal cuts are considered to be connected in series with this least reliable minimal cut. Example 5.15 Rework Example 5.12 with the min–max bounds: ρ1 = p2 = 0.9801000000,
ρ2 = p 2 = 0.9801000000,
ρ3 = p3 = 0.9702990000,
ρ4 = p 3 = 0.9702990000.
Hence, Lower = max ρi = 0.9801000000, 1≤i≤4
and 1 − γ1 = 1 − q 2 = 0.9999000000, 1 − γ2 = 1 − q 2 = 0.9999000000, 1 − γ3 = 1 − q 3 = 0.9999990000, 1 − γ4 = 1 − q 3 = 0.9999990000. So, Upper = min {1 − γi } = 0.9999000000. i≤i≤4
186
GENERAL METHODS FOR SYSTEM RELIABILITY EVALUATION
5.8.5 Modular Decompositions To evaluate a large and complex reliability system, we often have to decompose it into subsystems. Each subsystem may be further decomposed into additional subsystems. This process is called modular decomposition, as defined in Chapter 4. The technique of modular decomposition can be used not only for evaluation of the exact system reliability but also to derive bounds on system reliability. A hierarchical approach may be used in deriving bounds on system reliability. Component reliabilities are assumed to be known. Starting from the lowest subsystem level, we can use one of the methods covered so far to derive the bounds on the reliability of each subsystem at the lowest level. Once these bounds are obtained, these subsystems at the lowest level are treated as supercomponents with known reliability bounds. Higher level subsystems consisting of these supercomponents are analyzed. Bounds on these higher level subsystems may be obtained with the bounding methods. This process is repeated until we reach the system level wherein the upper and/or lower bounds on the system can be obtained from the bounds at its immediate subsystem levels. Example 5.16 Use modular decompositions to find the lower and upper bounds on the reliability of the system in Figure 5.13. Methods
IE
SDP
EP
min–max
Upper Lower
0.9997980501 0.9997980497
0.9997990299 0.9997010199
0.9999996507 0.9997980104
0.999900000 0.980100000
Thus, 0.9997980497 ≤ R(module i) ≤ 0.9997980501,
i = 1, 2.
Therefore, Lower = 0.99979804972 ≈ 0.9995961402, Upper = 0.99979805012 ≈ 0.9995961410.
1
2
6
3 4 FIGURE 5.13
7 8
5
9
10
System block diagram for Example 5.16.
BOUNDS ON SYSTEM RELIABILITY
187
5.8.6 Notes The system reliability bounding methods are very simple. One may choose to obtain several upper bounds with different methods. The lowest upper bound may be selected to approximate system reliability. Similarly, several lower bounds may be obtained first and the highest lower bound may then be selected to approximate exact system reliability. In application of the modular decomposition method for bound evaluation, we have to use all lower bounds or exact reliabilities of the modules to find the lower bound on system reliability. Similarly, we need to use all upper bounds or exact reliabilities of the modules to find the upper bound on system reliability.
6 GENERAL METHODOLOGY FOR SYSTEM DESIGN
Reliability is one of the most important considerations in engineering system design today. In optimal reliability design problems, the objective function may be maximization of system reliability, maximization of system availability, minimization of cost, or minimization of system life-cycle cost. The constraints may include budget restrictions, reliability requirements, and other considerations such as volume and weight. The design parameters or decision variables may be component reliability value, number of redundancies, or arrangement of known components. Reliability design problems can be formulated as optimization problems, and various optimization algorithms have been used to solve them (Tillman et al. [236], Kuo and Prasad [133], and Kuo et al. [132]). The major focus of recent work in the area of system reliability optimization is on the development of heuristic methods and metaheuristic algorithms for redundancy allocation problems. Little work is directed toward exact solutions for such problems. To the best of our knowledge, all of the reliability systems considered in this area belong to the class of coherent systems. The literature on reliability optimization methods can be classified into seven categories [133]: 1. Heuristics for redundancy allocation: special techniques developed for reliability problems. 2. Metaheuristic algorithms for redundancy allocation: perhaps the most attractive development in the 1990s. 3. Exact algorithms for redundancy allocation or reliability allocation: most are based on mathematical programming techniques, for example, the reduced gradient methods. 4. Heuristics for reliability redundancy allocation: a difficult but realistic situation in reliability optimization. 188
REDUNDANCY IN SYSTEM DESIGN
189
5. Multiobjective system reliability optimization: an important but not widely studied problem in reliability optimization. 6. Optimal assignment of interchangeable components: a unique scheme that often takes no effort. 7. Others: including decomposition, fuzzy apportionment, and effort function minimization. A special class of reliability design problems is the allocation of reliability values to various components. Assignment algorithms have been used to solve this kind of problem. Because of the special characteristics of this class of problems, many interesting results have been reported. In this chapter, we will focus on the issue of optimal reliability allocation for various system reliability structures. We will only cover algorithms that do not rely on integer programming or nonlinear programming optimization techniques. Readers may refer to Kuo and Prasad [133] and Kuo et al. [132] for coverage of such optimization techniques. Concepts useful for optimal reliability allocation are covered in the first few sections. These concepts include component reliability importance, relative criticality, majorization and Schur convex functions, and pairwise rearrangement. Optimal reliability allocation for series structures, parallel structures, parallel–series structures, and series–parallel structures is then examined in detail. Finally, optimal reliability allocation results for two-stage systems are provided. Some system structures have invariant optimal designs while the optimal designs of other system structures depend on the values of component reliabilities. Some algorithms presented will provide an optimal allocation while others are heuristic algorithms that do not guarantee optimal solutions. 6.1 REDUNDANCY IN SYSTEM DESIGN The most commonly used structures in reliability systems are the series and the parallel structures. Each of the components in a series system is essential for the function of the whole system. For example, an automobile may be modeled as a series system with the following four major components: engine, transmission, steering, and braking. A computer system may be considered to be a series system with the following major components: CPU, motherboard, hard disk, disk controller, display card, monitor, keyboard, and mouse. The number of components needed in a series system depends on the function of the system. The function of each component is an essential part of the function of the whole system. The reliability of a series system is very much affected by the number of components in the system. The more components a series system has, the lower the reliability of the system is. Figure 6.1a shows the reliability of a series system with i.i.d. component reliability p = 0.9 as a function of the number of components n in the system. It can be seen from Figure 6.1a that the reliability of the series system decreases as n increases. Even when the reliability of each component is 0.9, the reliability of the series system becomes only about 0.2 when there are 15 such com-
190
GENERAL METHODOLOGY FOR SYSTEM DESIGN
1 Parallel with p = 0.5
0.9
System reliability
0.8 0.7 0.6 0.5
Series with p = 0.9
0.4 0.3 0.2 0.1 0 0
2
4
6 8 10 12 14 Number of components, n
16
18
0.8
0.9
20
(a) 1 0.9
Parallel with n = 5
System reliability
0.8 0.7 0.6 0.5 0.4 Series with n = 5
0.3 0.2 0.1 0
0
0.1
0.2
0.3 0.4 0.5 0.6 0.7 Component reliability, p
1
(b) FIGURE 6.1 reliability p.
System reliability as a function of (a) system size n and (b) i.i.d. component
ponents in the system. As a result, significant effort has been made by researchers and designers to reduce the number of components that are connected in series. Integration has been used to combine the functions of several components. For example, the reliability of VCRs has been increased significantly because less moving parts are used in today’s designs. The reliability of a series system is smaller than that of the weakest component. When the components are i.i.d. with component reliability p, system reliability is smaller than p. Figure 6.1b shows the reliability of a five-component series system as a function of the i.i.d. component reliability p. The system reliability is very
REDUNDANCY IN SYSTEM DESIGN
191
close to zero when p is less than 0.6. For the system reliability to be close to 1, the component reliability must be much closer to 1. With new materials, advanced manufacturing technologies, and new designs, the reliabilities of components have been increasing steadily. However, after a certain limit, further increase of component reliability becomes very costly. An alternative method of increasing the reliability of a series system is to apply redundancy at the component level. This will produce parallel subsystems. A parallel system structure requires at least one component to work. As long as one component is working, the system is working. All other components are called redundant components. System reliability is higher than that of the best component. The more components a parallel system has, the higher the system reliability. Figure 6.1a shows the reliability of a parallel system with i.i.d. component reliability p = 0.5 as a function of the number of components in the system. It can be seen that the system reliability is virtually equal to 1 when n reaches 9, even when the component reliability is only 0.5. Actually, no matter how low the component reliability is, we can always achieve very high system reliability through redundancy. However, there is a diminishing return from each additional component as more components are used. In addition, there may be other concerns such as weight, volume, and costs that prevent us from using too many redundant components. Figure 6.1b shows the reliability of a five-component parallel system as a function of the i.i.d. component reliability p. From this figure, we see that the reliability of the parallel system is above 0.9 even when the component reliability is only about 0.4 and is virtually 1 when p is about 0.7. Redundancy is the most effective when applied at the lowest level in a hierarchical system. Consider a series system that has two i.i.d. components labeled 1 and 2. The reliability of each component is 0.9. The reliability of the series system with no redundancy is equal to 0.81. To increase the reliability of the system, we compare the following two alternative designs. Both designs have four i.i.d. components. Design 1 applies redundancy at the component level; namely, each component is allocated a redundancy. Design 1 is shown in Figure 6.2a. The reliability of the system following design 1 is equal to 0.9801. Design 2 applies redundancy at the subsystem level. The original system with two components in series is considered to be a subsystem. Another identical subsystem with two identical components connected in series is used as a redundant subsystem in design 2. Design 2 is shown in Figure 6.2b. The reliability of the system following design 2 is equal to 0.9639. This illustrates that whenever possible, redundancy should be applied at the lowest level in a hierarchical system. Kuo et al. [132] provide a mathematical proof for this result.
1
2
1
2
3
4
3
4
(a) FIGURE 6.2
(b) Different philosophies of using redundancy.
192
GENERAL METHODOLOGY FOR SYSTEM DESIGN
6.2 MEASURES OF COMPONENT IMPORTANCE 6.2.1 Structural Importance The structural importance of component i, denoted by IiS , is defined as IiS =
1 2n−1
2
3 φ(1i , xi ) − φ(0i , xi ) ,
(6.1)
xi
where xi represents the component state vector with xi removed and (1i , xi ) represents the component vector when component i is in state 1. In words, the structural importance of component i is the ratio between the number of component state vectors in which the state of component i dictates the state of the system and the total number of different component state vectors with n − 1 components in it. In equation (6.1), 2n−1 represents the total number of different component state vectors with n − 1 components in it and [φ(1i , xi ) − φ(0i , xi )] represents the number of component state vectors in which the state of component i dictates the state of the system. The structural importance of a component actually measures the importance of the position of the component. It is independent of the reliability value of the component under consideration and also independent of the reliability of every other component. To apply the concept of structural importance in system design, we should allocate more reliable components to positions with higher structural importance. This concept is useful in the early stages of system design when data on component reliability values are unavailable. It may be used to allocate effort in development of components at different positions. The following example illustrates this point. Example 6.1 Consider the system structure shown in Figure 6.3. Determine the structural importance of each component. In this example, there are three components. Components 2 and 3 are connected in parallel to form a subsystem and component 1 is connected in series with this subsystem. For component 1, x1 may assume a vector in the set {(·, 1, 1), (·, 1, 0), (·, 0, 1), (·, 0, 0)}. For the first three vectors in this set, the system state is completely determined by the state of component 1, while for the last state vector, the state of component 1 does not affect the system state at all. The following equations show these observations:
2 1 3 FIGURE 6.3
System structure for Example 6.1.
MEASURES OF COMPONENT IMPORTANCE
φ(1, 1, 1) − φ(0, 1, 1) = 1,
φ(1, 1, 0) − φ(0, 1, 0) = 1,
φ(1, 0, 1) − φ(0, 0, 1) = 1,
φ(1, 0, 0) − φ(0, 0, 0) = 0.
193
As a result, we have I1S =
1 3 (1 + 1 + 1 + 0) = . 2 4 2
For component 2, we have x2 ∈ {(1, ·, 1), (1, ·, 0), (0, ·, 1), (0, ·, 0)}, φ(1, 1, 1) − φ(1, 0, 1) = 0,
φ(1, 1, 0) − φ(1, 0, 0) = 1,
φ(0, 1, 1) − φ(0, 0, 1) = 0,
φ(0, 1, 0) − φ(0, 0, 0) = 0,
I2S =
1 1 (0 + 1 + 0 + 0) = . 4 22
Similarly, we can find the structural importance of component 3 to be I3S =
1 . 4
The calculated structural importances of the components indicate that component 1 occupies the most important position. As a result, more effort should be allocated to the enhancement of the reliability of component 1 than to each of the components 2 and 3. A component with a higher structural importance indicates that it occupies a more important position. Two components with equal structural importance occupy positions of equal importance in regard to their contribution toward system functioning. In a complex reliability system, many components will have identical structural importances. The components with higher structural importances should be allocated higher reliability values. However, structural importance cannot be used to allocate component reliability among components with equal structural importances. Take the bridge structure in Figure 4.6 as an example. Components 1, 2, 4, and 5 have the same structural importance of 38 , while component 3 has a structural importance of 1 8 . Based on these measures, given five component reliability values, component 3 should be assigned the lowest value for system reliability maximization. However, it is not clear how the remaining four reliability values should be assigned to the other four components. 6.2.2 Reliability Importance The reliability importance, or Birnbaum importance (B-importance), of component i is defined as IiB =
∂ Rs (p) , ∂pi
(6.2)
194
GENERAL METHODOLOGY FOR SYSTEM DESIGN
where pi is the reliability of component i, p is the vector of component reliabilities, and Rs is the reliability of the system. In words, we can say that the B-importance of component i is equal to the amount of increase in system reliability when the reliability of component i is improved by one unit. Based on this interpretation and noting that 0 ≤ pi ≤ 1, we can write the B-importance of component i in the form IiB = Rs (1i , pi ) − Rs (0i , pi ),
(6.3)
where pi represents the component reliability vector with the ith component removed. An alternative expression of the B-importance can be obtained from equation (6.3) and expressed as IiB = E(φ(1i , xi ) − φ(0i , xi )) = Pr(φ(1i , xi ) − φ(0i , xi ) = 1),
(6.4)
where φ is the structural function of the system. Thus, the B-importance of component i can be interpreted as the probability that component i is critical to the system’s function; that is, it dictates the state of the system. If we let p j = 0.5 for all j = i in equation (6.4), we have n−1 2 3 1 B i i i i φ(1i , x ) − φ(0i , x ) = IiS . Ii = E(φ(1i , x ) − φ(0i , x )) = 2 i x
This equation shows that when the components are i.i.d. with p = 0.5, the Bimportance and the structural importance of each component are equal to each other. Example 6.2 Consider n components that are labeled such that the component reliabilities are in nondecreasing order, that is, p1 ≤ p2 ≤ · · · ≤ pn . If these components are connected in series, we can find that the B-importances of the components are . pj, i = 1, 2, . . . , n. IiB = j=i
If the components are connected in parallel, the B-importances of the components are . IiB = (1 − p j ), i = 1, 2, . . . , n. j=i
In a series structure, the least reliable component has the highest B-importance. A series system is as strong as its weakest component. In a parallel structure, the most reliable component has the highest B-importance. A parallel system is as weak as its strongest component. In addition, we observe that the importance of component i is completely determined by the reliabilities of the other components. It is independent of the reliability value of component i itself. This is true in all systems; that is, the B-importance of the component is completely determined by the reliabilities of the components other than component i.
MEASURES OF COMPONENT IMPORTANCE
195
Based on B-importance, the most important component should be improved in order to achieve the higherst improvement in system reliability. However, this may not be practical in the real world. For example, in a parallel system, the most reliable component may have a reliability of 0.9999. Further improvement of this component may be very costly. It is more economical to improve the reliability of a less reliable component for system reliability enhancement. The definition to be introduced in the next sectin relates the importance of a component to its own reliability value. Example 6.3 Find the B-importances of the components in the bridge structure shown in Figure 4.6. The reliability of the bridge structure has been derived in Example 5.3 and is given below: Rs = p3 (1 − q1 q4 )(1 − q2 q5 ) + q3 [1 − (1 − p1 p2 )(1 − p4 p5 )]. The B-importances of the components are found with equation (6.2) as follows: I1B = p3 q4 (1 − q2 q5 ) + p2 q3 (1 − p4 p5 ), I2B = p3 q5 (1 − q1 q4 ) + p1 q3 (1 − p4 p5 ), I3B = (1 − q1 q4 )(1 − q2 q5 ) + (1 − p1 p2 )(1 − p4 p5 ) − 1, I4B = q1 p3 (1 − q2 q5 ) + q3 p5 (1 − p1 p2 ), I5B = q2 p3 (1 − q1 q4 ) + q3 p4 (1 − p1 p2 ). When the components are i.i.d. with reliability p, we have I1B = pq(1 − q 2 ) + pq(1 − p2 ) = pq(1 + 2 pq), I2B = pq(1 − q 2 ) + pq(1 − p2 ) = pq(1 + 2 pq), I3B = (1 − q 2 )(1 − q 2 ) − [1 − (1 − p2 )(1 − p 2 )] = 2 p 2 q 2 , I4B = qp(1 − q 2 ) + qp(1 − p 2 ) = pq(1 + 2 pq), I5B = qp(1 − q 2 ) + qp(1 − p 2 ) = pq(1 + 2 pq). We can see that when all components are i.i.d., components 1, 2, 4, and 5 have the same B-importance while component 3 has a lower B-importance. When components are not i.i.d., the ranking of B-importances will change as component reliabilities change. 6.2.3 Criticality Importance There are two different definitions of criticality importance. One definition is in terms of system success and the other in terms of system failure. We will introduce both of them.
196
GENERAL METHODOLOGY FOR SYSTEM DESIGN
The criticality importance of component i may be defined to be the probability that component i is critical to system success, and it is in the working state given that the system is in the working state. It represents the contribution of this component to the system’s working state given that the system is in the working state. We use IiC S to indicate the criticality importance of component i in terms of system success. Mathematically, IiC S is expressed as IiC S =
[Rs (1i , pi ) − Rs (0i , pi )] pi . Rs (p)
(6.5)
It can also be expressed as a function of the B-importance as follows: IiC S =
pi ∂ Rs (p) pi = IiB . ∂pi Rs (p) Rs (p)
(6.6)
The criticality importance in terms of system success measures the probability that component i is working and is the one that contributes to system success given that the system is working. The higher the value of IiC S , the higher the chance that component i is the one that contributes to system success. The criticality importance may also be defined in terms of system failure. We use IiC F to indicate the criticality importance of component i in terms of system failure. In this case, it measures the probability that component i is failed and is the one contributing to system failure given that the system is in the failed state, as expressed in IiC F =
[Q s (0i , pi ) − Q s (1i , pi )]qi qi = IiB . Q s (p) Q s (p)
(6.7)
We can use the criticality importance, defined in terms of system failure, in fault diagnosis. When we find that the system is failed, the component with the highest IiC F value is most likely to be the one that has caused the system to fail. As a result, the component with the higher IiC F should be checked first in fault diagnosis. For the same system structure and the same set of component reliabilities, IiC S and C F Ii often have different values for the same component i. The following example illustrates their differences. Example 6.4 Consider four components with reliabilities 0.8, 0.85, 0.9, and 0.95. If they are connected in series, their criticality importances have been found to be 1 in terms of system success and 0.3472, 0.2451, 0.1543, and 0.0731, respectively, in terms of system failure. If the series system is working, the criticality importance of each component is the same. This means that every component is critical to system success in a series system. If the series system is failed, component 1 has the highest criticality importance while component 4 has the lowest criticality importance. The higher the component reliability value, the lower its criticality importance. This means that the failure of a series system is most likely caused by the least reliable component.
MEASURES OF COMPONENT IMPORTANCE
197
If the components are connected in parallel, their criticality importances have been found to be 0.0006, 0.0009, 0,0014, and 0.0029, respectively, in terms of system success and 1 in terms of system failure. If the parallel system is working, any component may be the one that is making the system work. It is most likely for the most reliable component to be the critical component. That is why component 4 has the highest criticality importance in terms of system success while component 1 has the lowest. If the parallel system is down, the criticality importance of each component in terms of system failure is the same and equal to 1. This is because every component has to fail for the system to fail. Each component is critical to system failure in a parallel system no matter what reliability value it takes. 6.2.4 Relative Criticality Boland et al. [37] introduce the concept of the relative criticality of two components in a coherent system, which is useful in optimal system design. They also provide a method for determining the relative criticality of two components using minimal path sets and minimal cut sets. As usual, let x = (x 1 , x2 , . . . , xn ) represent the states of the components and p be the corresponding component reliability vector. Let (0i , 1 j , xi j ) be the vector x for which xi = 0 and x j = 1 and xi j be the vector obtained by deleting xi and x j from x. We will interpret pi j similarly. Definition 6.1 (Boland et al. [37]) Component i is more critical than component c j for the structure function φ, written as i > j if φ(1i , 0 j , xi j ) ≥ φ(0i , 1 j , xi j ) holds for all xi j and strict inequality holds for some xi j . Theorem 6.1 (Boland et al. [37]) Let P1 , P2 , . . . , Pm be the minimal path sets of a coherent system with components 1, 2, . . . , n and structure function φ(x). Let Ai = {Pr : i ∈ Pr , r = 1, 2, . . . , m}. If A j is a proper subset of Ai , written as c
A j ⊂ Ai , then i > j. If the words “minimal path sets” are replaced by “minimal cut sets,” the same conclusion holds. Example 6.5 Consider the system structure given in Figure 6.4. Examine the relative criticality of the components using Theorem 6.1.
FIGURE 6.4
System structure for Example 6.5.
198
GENERAL METHODOLOGY FOR SYSTEM DESIGN
First, we will illustrate Theorem 6.1 using minimal path sets: P1 = {1, 2, 4}, P2 = {1, 2, 5}, P3 = {1, 3, 4}, P4 = {1, 3, 5}, P5 = {1, 3, 6}; A1 = {P1 , P2 , P3 , P4 , P5 }, A3 = {P3 , P4 , P5 },
A2 = {P1 , P2 },
A4 = {P1 , P3 },
A5 = {P2 , P4 },
A6 = {P5 }.
We can see that Ai ⊂ A1 for 2 ≤ i ≤ 6 and A6 ⊂ A3 . Thus, component 1 is the most critical component and component 3 is more critical than component 6, or c
1 > 2, 3, 4, 5, 6
c
3 > 6.
and
This means that to maximize system reliability, component 1 should be assigned the largest reliability while component 3 should be more reliable than component 6. We are unable to make other conclusions using minimal path sets. Now consider the minimal cut sets: C1 = {1},
C2 = {2, 3},
A1 = {C1 }, A4 = {C3 , C4 },
A2 = {C2 },
C 3 = {4, 5, 6},
C 4 = {3, 4, 5};
A3 = {C2 , C4 },
A5 = {C3 , C4 },
A6 = {C3 }.
From this, we can see that A2 ⊂ A3 , A6 ⊂ A4 , and A6 ⊂ A5 . As a result, component 3 is more critical than component 2 and components 4 and 5 are more critical than component 6, or c
3>2
and
c
4, 5 > 6.
To maximize system reliability, we would make sure that component 3 will be more reliable than component 2 and components 4 and 5 more reliable than component 6. Utilizing these results, we can generate the diagram in Figure 6.5 using the following conventions:
1
3
2 FIGURE 6.5
4
5
6
Criticality diagram for system structure in Figure 6.4.
MAJORIZATION AND ITS APPLICATION IN RELIABILITY
199
c
1. If there is an arrow from i to j, it means i > j. c
c
c
2. If i > j and j > k, then i > k. c
3. If there is a path following directed arrows from i to j, then i > j. From the diagram in Figure 6.5, we can see that there are no arrows among components 3, 4, and 5, no arrows from components 4 or 5 to component 2, and no arrow from component 2 to component 6. This means that the relative criticality among these pairs of components cannot be determined with Theorem 6.1. Theorem 6.1 has to be used considering both minimal paths and minimal cuts in order to identify as many relative criticalities among components as possible. Boland et al. [37] also state the relationship between component criticality and component structural importance. If component i is more critical than component j, then component i also has a higher structural importance than component j. However, the reverse is not always true. Koutras et al. [122] propose another theorem for identification of additional criticality relationships. However, we do not find it to be any more powerful than Theorem 6.1.
6.3 MAJORIZATION AND ITS APPLICATION IN RELIABILITY In the search of optimal arrangements or allocations of components in a given system, the theory of majorization has been used by several authors. In this section, we provide an introduction to the concept of majorization and its applications in reliability theory. 6.3.1 Definition of Majorization Given two vectors of dimension n, x = (x1 , x 2 , . . . , x n ) and y = (y1 , y2 , . . . , yn ), we may intuitively say that elements of a vector x are less spread out than are the elements of a vector y. For example, if x = (3, 1, 2) and y = (2, 2, 2), then we will agree to say that x is more spread out than y. Majorization is one way to order the spread of two vectors. For an n-dimensional vector x = (x1 , x2 , . . . , x n ), let x(1) ≥ x (2) ≥ . . . ≥ x(n) denote the elements of x in nonincreasing order. We call (x(1) , x(2) , . . . , x (n) ) a nonincreasing arrangement of x. Definition 6.2 Consider two vectors of dimension n, x = (x1 , x2 , . . . , xn ) and m y = (y1 , y2 , . . . , yn ). We say that vector x majorizes vector y (x y) or vector y is m majorized by vector x (y ≺ x) if the following conditions are satisfied:
200
GENERAL METHODOLOGY FOR SYSTEM DESIGN h
x(i) ≥
i=1 n
for 1 ≤ h ≤ n − 1,
y(i)
(6.8)
i=1
x(i) =
i=1
Example 6.6
h
n
y(i) .
(6.9)
i=1
Let x = (3, 1, 2) and y = (2, 2, 2). Then x(1) = 3 > y(1) = 2, x (1) + x (2) = 3 + 1 ≥ y(1) + y(2) = 2 + 2,
x (1) + x (2) + x (3) = 3 + 1 + 2 = y(1) + y(2) + y(3) = 2 + 2 + 2. m
Hence x y. m
When x y, we say that the elements in vector x are more “spread out” than those in vector y. In other words, the elements in vector x are more “different” from one another than those in vector y. The following example illustrates this point: m
(1, 0, . . . , 0) m
1 1 2 , 2 , 0, . . . m
···
,0
1 1 1 1 m ,... , ,0 ,... , . n−1 n−1 n n
Similarly, one can define majorization based on the nondecreasing arrangement of a vector. For details, refer to Marshall and Olkin [164]. Because the definition of majorization is based on either the nondecreasing arrangement of the vector elements or the nonincreasing arrangement of the vector elements, the ordering of the elements m in the vectors does not affect majorization. For example, if x y, then any rearrangement of the elements of vector x majorizes any rearrangement of the elements m m m in vector y. Thus x x. If x is a permutation of y, then x y and y x. 6.3.2 Schur Functions To study Schur functions, we first define an order-preserving function. Definition 6.3 Let a binary relation ≤ be reflexive and transitive on a set A. Such a set is called a preordered set. A real function f defined on a preordered set A ⊂ Rn is said to be order preserving or monotonic or isotonic if x≤y
implies
where x ≤ y means xi ≤ yi for all i.
f (x) ≤ f (y),
x, y ∈ A,
MAJORIZATION AND ITS APPLICATION IN RELIABILITY
201
The function f (x) = x1 + x 2 + · · · + xn for x ∈ Rn is an order preserving function. The structure functions of coherent systems are order preserving. The reliability functions of coherent systems are order preserving. Definition 6.4 A real-valued function f (x) defined on a preordered set A ⊂ Rn is said to be Schur convex (Schur concave) on the set A if m
x≺y
implies
f (x) ≤ (≥) f (y),
x, y ∈ A.
If, in addition, we have f (x) < f (y) whenever x is not a permutation of y, then we say f (x) is strictly Schur convex on A. Thus a Schur function provides an inequality on the function values of two vectors. Before we provide several theorems and results that are useful to identify whether a function is a Schur function, we define a symmetric set and a symmetric function. A set A is a symmetric set if x ∈ A implies any permutation of x is also in A. A function f (x) is a symmetric function on set A if f (x) is the same for all permutations of x. Theorem 6.2 Let A ⊂ R be an open interval and let f : An → R be continuously differentiable. The function f is Schur convex if and only if 1. f is symmetric on An and 2. for all i = j and all x ∈ An , (xi − x j )
∂f ∂f − ∂ xi ∂x j
≥ 0.
For a proof of Theorem 6.2, readers are referred to Marshall and Olkin [164]. Example 6.7 Consider a parallel system with n components. Let p = ( p1 , . . . , pn ), where pi is the reliability 9nof component i. Then the reliability of the parallel (1 − pi ) is a Schur-convex function, because (1) system given by h(p) = 1 − i=1 h(p) is a symmetric function on (0, 1) and (2) n n . . ∂h ∂h ( pi − p j ) − (1 − pk ) ≥ 0. = ( pi − p j ) (1 − pk ) − ∂pi ∂p j k=i k= j Therefore, the reliability of a parallel system increases as the reliabilities of components are more spread out. Example 6.8 Consider a k-out-of-n system. Let p = ( p1 , . . . , pn ), where pi is the reliability of component i. Let Ri = − ln pi and let h k (p) be the system reliability given p. Suppose m
R = (R1 , R2 , . . . , Rn ) R = (R1 , R2 , . . . , Rn ).
202
GENERAL METHODOLOGY FOR SYSTEM DESIGN
Using Theorem 6.2, one can show that − ln h k (p) = − ln h k (e−R1 , . . . , e−Rn ) is a Schur-convex function. It follows that h k (p) ≥ h k (p ),
for k = 1, . . . , n − 1,
h n (p) = h n (p ). Therefore, we can calculate a lower bound of the reliability of the k-out-of-n systems of unlike components in terms of the reliability of the k-out-of-n systems of like components. Theorem 6.3 (Du and Hwang [69]) The following two conditions for a symmetric function f to be Schur convex are equivalent: 1. f (x) ≥ f (y) for all x and y such that x majorizes y and 2. f (c + u) ≥ f (c + v) for all c1 ≥ c2 ≥ · · · ≥ cn , u 1 ≥ u 2 ≥ · · · ≥ u n and v is a permutation of u. The theorems covered above and the following results are useful in determining if a symmetric function is Schur convex or not [164]: 1. Whenever f (x) is Schur convex, − f (x) is Schur concave and vice versa. 2. If I ⊂ R is an interval and g : I → R is a convex function, then f (x) =
n
g(xi )
i=1
3. 4. 5. 6.
is Schur convex on I n . For example, f (x) = x12 + x 22 + . . . + x n2 is Schur convex on Rn . If f is symmetric and convex, then f is Schur convex. If f is symmetric and convex in each pair of arguments with the other arguments fixed, then f is Schur convex. If f is symmetric and if f (x1 , s − x1 , x3 , . . . , xn ) is convex in x1 for each fixed s, x 3 , . . . , xn , then f is Schur convex. Let g be a continuous nonnegative function defined on an interval I ⊂ R. Then f (x) =
n .
g(xi ),
x ∈ I n,
i=1
is Schur convex on I n if and only if ln g is convex on I . The function is Schur concave if and only if ln g is concave.
MAJORIZATION AND ITS APPLICATION IN RELIABILITY
203
Example 6.9 Consider a series system with n stages. Suppose we allocate k1 , k2 , . . . , k n resources to each stage. Let X i (ki ) be the lifetime of stage i given ki resources. Then system lifetime given k 1 , . . . , kn is X s (k1 , . . . , kn ) = min{X 1 (k1 ), . . . , X n (kn )}. Let the corresponding system reliability be Rs (k1 , . . . , kn ) =
n .
F(t; ki ).
i=1
If ln F(t; ki ) is concave in ki for each t ≥ 0, then Rs (k1 , . . . , kn ) is Schur concave in k = (k1 , k2 , . . . , kn ). For any given two allocations k and k , we have m
k k
implies
Rs (t; k) ≤ Rs (t; k ).
Therefore, if a resource allocation is less spread out, it gives a larger system reliability. 6.3.3 L-Additive Functions We first define a lattice set. Definition 6.5 Let a binary relation ≤ be reflexive and transitive and antisymmetric on a set A. Such a set is called a partially ordered set. A partially ordered set A is a lattice if x ∧ y ∈ A and x ∨ y ∈ A for each x, y ∈ A. Here, x ∧ y is the greatest lower bound of x and y and x ∨ y is the least upper bound of x and y. For two vectors x, y ∈ Rn , let x ∧ y (x ∨ y) denote the vector of componentwise minima (maxima), that is, x ∧ y = (min{x1 , y1 }, min{x2 , y2 }, . . . , min{xn , yn }), x ∨ y = (max{x1 , y1 }, max{x 2 , y2 }, . . . , max{xn , yn }). Definition 6.6 A real-valued function f defined on a lattice set Rn is said to be lattice superadditive (lattice subadditive) if f (x ∨ y) + f (x ∧ y) ≥ (≤) f (x) + f (y)
(6.10)
for all x, y ∈ Rn . In the two-dimensional space, that is, n = 2, equation (6.10) states that a real function f defined on R2 is lattice superadditive (L-superadditive) if, for all x 1 ≤ y1 and x2 ≤ y2 , f (x1 , x2 ) + f (y1 , y2 ) − f (x1 , y2 ) − f (y1 , x2 ) ≥ 0.
(6.11)
204
GENERAL METHODOLOGY FOR SYSTEM DESIGN
If the reverse inequality holds, the function is called lattice subadditive (L-subadditive). More generally, a function f defined on Rn is L-superadditive if and only if f is L-superadditive in any pairs of arguments with others fixed. Block et al. [32] discuss properties of structure functions that are L-superadditive and L-subadditive properties. These properties are valid for both binary and multistate structure functions. Theorem 6.4 (Block et al. [32]) subadditive) if and only if
A structure function is L-superadditive (L-
φ(x ∨ y) − (φ(x) ∨ φ(y)) ≥ (≤) (φ(x) ∧ φ(y)) − φ(x ∧ y)
for all x and y. (6.12)
Proof The proof follows directly from equation (6.10) and φ(x) + φ(y) = (φ(x) ∨ φ(y)) + (φ(x) ∧ φ(y)). For a series system, we have φ(x) = min{x1 , x2 , . . . , xn }. Substituting this structure function in Theorem 6.4, we find that the right-hand side of equation (6.12) becomes 0, and we conclude that the structure function of a series system is L-superadditive. For a parallel system, we have φ(x) = max{x1 , x2 , . . . , xn }. Substituting this structure function in Theorem 6.4, we find that the left-hand side of equation (6.12) becomes 0, and we conclude that the structure function of a parallel system is L-subadditive. If a structure function is monotone, then both the left-hand side and the righthand side in equation (6.12) are nonnegative. The left-hand side of equation (6.12) becomes zero for a parallel system. As a result, the left-hand side measures how close the structure function of the system is to that of a parallel system. The righthand side of the condition becomes zero for a series system. Thus, the right-hand side measures how close the structure of the system is to a series system. The series system and the parallel system structures can be considered to be the two extreme cases of system structures. Based on Theorem 6.4, if the structure function of a system is L-superadditive, the system under consideration is more like a series system than a parallel system. If the structure function of a system is L-subadditive, the system is more like a parallel system than a series system. Theorem 6.5 (Block et al. [32]) (a) Let ψ(u, v) be L-superadditive and φ1 (x) and φ2 (y) be both nondecreasing or both nonincreasing functions. Then λ(x, y) = ψ(φ1 (x), φ2 (y)) is also L-superadditive. (b) If ψ(u) is nondecreasing convex and φ(x, y) is nondecreasing (or nonincreasing) L-superadditive, then λ(x, y) = ψ(φ(x, y)) is L-superadditive. Proof We will prove (a) assuming φ1 and φ2 are both nonincreasing. For any x1 ≤ x2 and y1 ≤ y2 , we have φ1 (x1 ) ≥ φ1 (x2 ) and φ2 (y1 ) ≥ φ2 (y2 ). Let u 1 = φ1 (x1 ), u 2 = φ1 (x2 ), v1 = φ2 (y1 ), and v2 = φ2 (y2 ). Then, we have u 1 ≥ u 2 and v1 ≥ v2 :
MAJORIZATION AND ITS APPLICATION IN RELIABILITY
205
λ(x1 , y1 ) + λ(x2 , y2 ) − λ(x 1 , y2 ) − λ(x2 , y1 ) = ψ(φ1 (x 1 ), φ2 (y1 )) + ψ(φ1 (x2 ), φ2 (y2 )) − ψ(φ1 (x1 ), φ2 (y2 )) − ψ(φ1 (x2 ), φ2 (y1 )) = ψ(u 1 , v1 ) + ψ(u 2 , v2 ) − ψ(u 1 , v2 ) − ψ(u 2 , v1 ) ≥0 since ψ(u, v) is L-superadditive. As a result, we have proved that λ(x, y) is Lsuperadditive. When both φ1 and φ2 are nondecreasing, (a) can be proved similarly. We will prove (b) assuming that φ(x, y) is nondecreasing L-superadditive. For any x1 ≤ x2 and y1 ≤ y2 , let u 1 = φ(x1 , y1 ), u 2 = φ(x2 , y1 ), u 3 = φ(x1 , y2 ), and u 4 = φ(x2 , y2 ). Because φ(x, y) is nondecreasing L-superadditive, we have u1 ≤ u2 ≤ u4,
u1 ≤ u3 ≤ u4,
u 1 + u 4 − u 2 − u 3 ≥ 0,
u4 − u2 ≥ u3 − u1.
(6.13)
Because ψ(u) is nondecreasing and convex, we have ψ(u 1 ) ≤ ψ(u 2 ) ≤ ψ(u 4 ), ψ(u 2 ) ≤ aψ(u 1 ) + (1 − a) ψ(u 4 ), ψ(u 3 ) ≤ bψ(u 1 ) + (1 − b)ψ(u 4 ),
ψ(u 1 ) ≤ ψ(u 3 ) ≤ ψ(u 4 ), u4 − u2 a= , u4 − u1 u3 − u1 1−b = . u4 − u1
It is apparent that a ≥ 1 − b due to condition (6.13): λ(x1 , y1 ) + λ(x2 , y2 ) − λ(x1 , y2 ) − λ(x2 , y1 ) = ψ(φ(x1 , y1 )) + ψ(φ(x2 , y2 )) − ψ(φ(x 1 , y2 )) − ψ(φ(x2 , y1 )) = ψ(u 1 ) + ψ(u 4 ) − ψ(u 2 ) − ψ(u 3 ) ≥ ψ(u 1 ) + ψ(u 4 ) − [aψ(u 1 ) + (1 − a) ψ(u 4 )] − [bψ(u 1 ) + (1 − b)ψ(u 4 )] = [ψ(u 4 ) − ψ(u 1 )] [a − (1 − b)] ≥ 0. As a result, λ(x, y) is L-superadditive. When φ(x, y) is nonincreasing L-superadditive, (b) may be proved in a similar way. Theorem 6.5 can be applied to the analysis of reliability systems. The following corollary is for systems that consist of subsystems connected in parallel or in series. It is based on Block et al. [32]. Corollary 6.1 Let φ1 (x1 , x2 , . . . , xk ) and φ2 (xk+1 , xk+2 , . . . , x n ) be both nondecreasing (or both nonincreasing) functions. (a) The function min{φ1 , φ2 } is
206
GENERAL METHODOLOGY FOR SYSTEM DESIGN
L-superadditive in the pairs xi and x j for 1 ≤ i ≤ k < j ≤ n. (b) The function max{φ1 , φ2 } is L-subadditive in the pairs xi and x j for 1 ≤ i ≤ k < j ≤ n. (c) If φ1 and φ2 are also nonnegative L-superadditive, then the product φ1 × φ2 , is also L-superadditive. (d) If φ(x1 , x 2 , . . . , xn ) is L-superadditive and fi (xi ) for i = 1, 2, . . . , n are all nondecreasing (or all nonincreasing), then φ( f 1 (x1 ), f 2 (x2 ), . . . , f n (xn )) is L-superadditive. (e) If φ(x1 , x2 ) is L-superadditive and f 1 (x1 ) is nonincreasing while f 2 (x2 ) is nondecreasing, then φ( f 1 , f 2 ) is L-subadditive. Based on this corollary, the function min{φ1 , φ2 } is L-superadditive only in the pairs xi and x j for 1 ≤ i ≤ k < j ≤ n, but not in general. The function max{φ1 , φ2 } is L-subadditive in the pairs xi and x j for 1 ≤ i ≤ k < j ≤ n, but not in general. In reliability terms, we have the following: 1. If two subsystems are both monotonic, the system constructed by connecting these two subsystems in series is L-superadditive in the pairs of components, one from each subsystem. 2. If two subsystems are both monotonic, the system constructed by connecting these two subsystems in parallel is L-subadditive in the pairs of components, one from each subsystem.
6.4 RELIABILITY IMPORTANCE IN OPTIMAL DESIGN Relationships between reliability allocation and the reliability importance are explored by Lin and Kuo [141]. The invariant optimal allocation is an allocation related only to the relative ordering rather than the magnitude of the component reliabilities. Lin and Kuo [141] have presented the following strong heuristic algorithm for obtaining optimal component assignment of a general system structure using the reliability importance concept. Algorithm of Lin–Kuo (LK) Heuristic For any two components i and j in a system, if Ii (p) ≥ I j (p) for all p, the system is said to have a consistent B-importance ordering: 1. Sort the available component reliabilities in nonincreasing order, p(1) ≥ · · · ≥ p(n) . 2. The initial condition of the system is given such that all components have the same lowest reliability p(n) . 3. Let S1 = {1, . . . , n}. 4. Do loop, for k = 1, . . . , (n − 1); S1 , S2 , . . . , Sn are sets of components that have not been assigned reliabilities yet. (a) Compute reliability importances from all components in Sk according to equation (6.3).
PAIRWISE REARRANGEMENT IN OPTIMAL DESIGN
207
(b) Find component ak that has the greatest reliability importance among components in Sk . (c) Assign p(k) to component ak . (d) Let Sk+1 = Sk \ {ak }. 5. Stop. The LK heuristic is a greedy heuristic from which two theorems are presented below. Proofs of the theorems can be found in Lin and Kuo [141]. Theorem 6.6 If a system has an invariant optimal allocation (a1 , . . . , an ), the system must have the consistent B-importance ordering. Theorem 6.7 If a system has an invariant optimal allocation, the solution generated by the LK heuristic must be the optimal allocation.
6.5 PAIRWISE REARRANGEMENT IN OPTIMAL DESIGN Boland et al. [37] use the concept of relative criticality of components and develop a monotonicity property of the reliability function under component pairwise rearrangement. These concepts are useful for finding the optimal arrangement of components in coherent systems. Let = (π1 , π2 , . . . , πn ) denote a permutation of {1, 2, . . . , n} and i j denote a permutation obtained from by interchanging πi and π j . Let (p) denote the vector ( pπ1 , pπ2 , . . . , pπn ). The system reliability for a given permutation is written as Rs ((p)) = E (p) φ(x). Theorem 6.8 (Boland et al. [37]) Let be any permutation such that πi < π j . Then Rs (i j (p)) ≥ Rs ((p)) holds for all p satisfying 0 < pπi ≤ pπ j < 1, with c strict inequality for some p, if and only if i > j holds. For a proof of Theorem 6.8, readers are referred to Boland et al. [37]. Based on this theorem, if one component is more critical and less reliable than another component, then exchange of the reliability values of these two components will increase system reliability. In the optimal arrangement of the components in a system, an inadmissible permutation is defined as one in which a more critical component is assigned a smaller reliability. This inadmissible permutation may be eliminated by interchanging this pair of assigned reliabilities. Malon [162] calls this exchange the “exchange principle.” To further develop a procedure for optimal assignment of components, Boland et al. [37] use the concepts of symmetric components. Two components are symmetric if the interchange of their reliabilities does not affect system reliability. The term permutation equivalent is used. Two components are permutation equivalent under if Rs ((p)) = Rs (i j (p)).
208
GENERAL METHODOLOGY FOR SYSTEM DESIGN
Here is the procedure that may be applied for optimal assignment of reliabilities to components [37]: 1. Eliminate all inadmissible permutations through pairwise interchange. 2. Delete all but one of the equivalent permutations. 3. Exhaustively evaluate all the remaining permutations to determine the optimal one. Example 6.10 Consider the system structure given in Figure 6.6. The seven components in the system are labeled 1–7. Suppose that the seven reliability values to be assigned to these components have been sorted into nondecreasing order as follows: p1 ≤ p2 ≤ · · · ≤ p7 . Our objective is to assign these component reliability values to the components in the system so that the system reliability is maximized. Let πi ∈ {1, 2, . . . , 7} indicate that pπi is assigned to component i. Each assignment = (π1 , π2 , . . . , π7 ) is a permutation of the integers in set {1, 2, . . . , 7}. Using Theorem 6.1, we can easily verify the following ordering of relative criticalities: c
j = 2, 3, 4, 5, 6, 7,
2 > j,
c
j = 3, 4,
c
j = 6, 7,
1 > j,
5 > j,
Components 3 and 4 are permutation equivalent. Components 6 and 7 are permutation equivalent. Modules {2, 3, 4} and {5, 6, 7} are permutation equivalent.
3 2 4 1 6 5 7 FIGURE 6.6 System structure for Example 6.10.
OPTIMAL ARRANGEMENT FOR SERIES AND PARALLEL SYSTEMS
209
Based on these findings, there are only three permutations to be evaluated: (7, 6, 5, 4, 3, 2, 1), (7, 6, 4, 3, 5, 2, 1), and (7, 6, 4, 3, 2, 5, 1).
6.6 OPTIMAL ARRANGEMENT FOR SERIES AND PARALLEL SYSTEMS Derman et al. [60] consider the problem of the optimal assembly of components into series systems in order to maximize the expected number of such assembled systems that will function satisfactorily. Each series system consists of n components each of which is a different type. There are n types of components with k components of each type available; thus, there is a total of nk components available. The reliabilities of these k components of type m are ordered such that p1m ≤ p2m ≤ · · · ≤ pkm (1 ≤ m ≤ n). We need to partition these nk components into k groups with n components of different types in each group. The n components in each group form a series system. The question is how to construct these k systems from the available components such that the expected number of such systems, E(N ), that will function satisfactorily is maximized. The answer is that E(N ) is maximized if the best of each type is put in one system, the remaining best of each type is put into the next system, and finally, the worst of each type is put into the last system. In other words, E(N ) is maximized if system j has components with reliabilities p1j , p2j , . . . , pnj . This assignment is optimal no matter what the actual values of the reliabilities are. The optimal assignment is dependent only on the ordering of the reliabilities. Such an optimal assignment or optimal arrangement is called an invariant optimal arrangement (or assignment or design). Derman et al. [61] consider another optimal system assembly problem wherein each of the k systems to be constructed is a parallel system. Call these k systems S1 , S2 , . . . , and Sk . System S j has n j components (1 ≤ j ≤ k) and the values of n j for j = 1, 2, . . . , k do not have to be the same. The available components may be assumed to be of the same type. There is a total of n = kj=1 n j components available for selection and the reliabilities of these n components are ordered as p1 ≤ p2 ≤ · · · ≤ pn . Under these conditions, if a partition can be found that makes the reliabilities of these k systems as close to one another as possible, then E(N ) is maximized. If one knows only the ordering of the available components, an invariant optimal partition or arrangement of the components does not exist. A heuristic algorithm is provided to seek better partitions via pairwise interchange of components. The objective here is to find partitions that tend to equalize the unreliabilities of the parallel systems. Suppose that we start with a partition S1 , S2 , . . . , Sk . Use Q j to represent the unreliability of the system S j (1 ≤ j ≤ k): Qj =
. i∈S j
qi ,
210
GENERAL METHODOLOGY FOR SYSTEM DESIGN
where qi = 1 − pi for 1 ≤ i ≤ n. Naturally these Q j ’s are not all equal to one another; otherwise we have found the optimal partition. To improve on the current partition, we can choose the two partitions with the largest and the smallest Q j values, say, S1 with Q 1 = min1≤ j≤k Q j and Sk with Q k = max1≤ j≤k Q j . Now pick a component from S1 that has unreliability qs (s ∈ S1 ) and a component from Sk that has unreliability qt (t ∈ Sk ) such that qs < qt or s > t. If the inequality " " " qs Q k qt Q 1 "" " " q − q " < |Q k − Q 1 | t s is satisfied, then the interchange of the two components qs of system S1 and qt of system Sk will improve the current partition [61]. Applying this heuristic will not guarantee optimal assignment. However, this heuristic does provide a mechanism for improving a given design using pairwise interchange of components.
6.7 OPTIMAL ARRANGEMENT FOR SERIES–PARALLEL SYSTEMS A series–parallel system has k subsystems connected in parallel while subsystem j (1 ≤ j ≤ k) has n j components connected in series. The reliability block diagram of a series–parallel system is given in Figure 4.16. El-Neweihi et al. [72] consider the optimal assignment of components to such a series–parallel system. Call the subsystems S1 , S2 , . . . , Sk . Without loss of generality, assume n 1 ≤ n 2 ≤ · · · ≤ n k , where n j is the number of components in the subsystem S j (1 ≤ j ≤ k). There is a total of n components (n = n 1 + n 2 + · · · + n k ) available for selection. Theorem 6.9 (El-Neweihi et al. [72]) Assume n 1 ≤ n 2 ≤ · · · ≤ n k . The reliability of the series–parallel system is maximized when the n 1 most reliable components are assigned to S1 , the next n 2 most reliable components are assigned to S2 , and the last n k components are assigned to Sk . Proof For a given allocation of components, the reliability of a series–parallel system can be expressed as Rs = 1 −
k . j=1
1 −
.
pi .
i∈S j
Let ti = − ln pi and call it the a-hazard of the ith component (1 ≤ i ≤ n). Let zj = i∈S j ti be the a-hazard of the subsystem S j (1 ≤ j ≤ k) and call z = (z 1 , z 2 , . . . , z k ) the subsystem a-hazard vector. We have z j > 0 (since 0 < pi < 1) for 1 ≤ i ≤ n and 1 ≤ j ≤ k. Apparently z is completely determined by the assignment of components. The reliability of the series–parallel system, Rs , can be expressed as a function of z:
211
OPTIMAL ARRANGEMENT FOR SERIES–PARALLEL SYSTEMS
Rs = g(z) = 1 −
k .
(1 − e−z j ).
j=1
It is obvious that g(z) is a symmetric function of z; that is, g(z) is the same for all permutations of the same vector z. If we can prove ∂g ∂g − ≥0 for all i = j, (zi − z j ) ∂z i ∂z j we can conclude that g(z) is Schur convex in z based on Theorem 6.2: k .
∂g 1 − e−zl , = −e−zi ∂z i l=1,l=i
∂g = −e−z j ∂z j (z i − z j )
∂g ∂g − ∂z i ∂z j
k .
1 − e−zl ,
l=1,l= j
= (z i − z j ) e−z j − e−zi
k .
1 − e−zl
l=1,l=i, j
= A(z i − z j ) e−z j − e−zi ≥ 0,
9k −zl ≥ 0. As a result, g(z) must be Schur convex in where A = l=1,l =i, j 1 − e z (refer to Section 6.3). Now if we can find an allocation that majorizes every other possible allocation, it will maximize g(z). Consider the allocation proposed in Theorem 6.9. Let z∗ be the subsystem ahazard vector of this allocation. Let z be the subsystem a-hazard vector of any other allocation. Let z [1] ≤ z [2] ≤ · · · ≤ z [k] be a rearrangement of z 1 , z 2 , . . . , z k . Since n k ≥ n k−1 ≥ · · · ≥ n 1 , we can easily verify the following: ∗ z k∗ ≥ z k−1 ≥ · · · ≥ z 1∗ ,
z k∗ ≥ z [k] ,
∗ z k∗ + z k−1 ≥ z [k] + z [k−1] ,
∗ ∗ + z k−2 ≥ z [k] + z [k−1] + z [k−2] , z k∗ + z k−1
.. . k j=1
z ∗j ≥
k
z [ j] .
j=1
m
Thus, we have z∗ z and the allocation z∗ maximizes the system reliability. The assignment philosophy stated in Theorem 6.9 is the same as the one discussed in Section 6.6, which is attributable to Derman et al. [60] for the optimal assembly of
212
GENERAL METHODOLOGY FOR SYSTEM DESIGN
individual series systems. If we treat the k subsystems in the series–parallel system as k separate and independent systems, the optimal assignment given in Theorem 6.9 will actually maximize E(N ). If the lifetimes of the available components are stochastically ordered, then the proposed arrangement will make the system lifetime and the system state stochastically the largest. This result may also be extended to multistate series–parallel systems, to be discussed in Chapter 12. Prasad et al. [197] extend the optimal allocation problem to a more general case wherein each position has a probability of being shock free. If a position experiences a shock, the component at that position will fail as a consequence of the shock. The system consists of k subsystems S1 ,S2 , . . . , Sk and the subsystem S j requires n j components (1 ≤ j ≤ k). The n 1 positions of the subsystem S1 are labeled 1, 2, . . . , n 1 , the n 2 positions of the subsystem S2 as n 1 + 1, n 1 + 2, . . . , n 1 + n 2 , and finally, the n k positions of the subsystem Sk as n 1 + n 2 + · · · + n k−1 + 1, . . . , n 1 + n 2 + · · · + n k . The n components available for assignment are labeled as 1, 2, . . . , n(n = n 1 + n 2 + · · · + n k ). We use p j to represent the reliability of the component j (1 ≤ j ≤ n). Without loss of generality, we also assume that the component reliabilities available for assignment have been arranged in the nondecreasing order of their values, namely, p1 ≤ p2 ≤ · · · ≤ pn .
(6.14)
Each available component will be assigned to a unique position in the system. If we use wi to represent the index of the component that is assigned to position i, a complete assignment is a permutation (w1 , w2 , . . . , wn ) of the component indexes 1, 2, . . . , n. The reliability of the component at a certain position is determined by the inherent quality of the component and the shock-free probability of the position. Let pi j represent the working probability of the component j that is assigned to the position i. Then, the system reliability is mj k . . 1 − Rs (w) = 1 − (6.15) piwi , j=1
i=m j −1 +1
where m 0 ≡ 0 and m j = n 1 + n 2 + · · · + n j for 1 ≤ j ≤ k. If we use ri to represent the probability that the position i is shock free and assume that the shock event is independent of the inherent component failure, we can write pi j = ri p j . The system reliability for the assignment w = (w1 , w2 , . . . , wn ) can then be written as mj k . . 1 − α j Rs (w) = 1 − pwi , (6.16) j=1
i=m j −1 +1
9m j ri . where α j = i=m j −1 +1 We can use a partition A = {A1 , A2 , . . . , Ak } to represent an allocation where A j (1 ≤ j ≤ k) is the set of components allocated to the subsystem S j . The system
OPTIMAL ARRANGEMENT FOR SERIES–PARALLEL SYSTEMS
reliability for the allocation A can be expressed as k . . 1 − α j Rs (A) = 1 − pi .
213
(6.17)
i∈A j
j=1
Now define ti = − ln pi for 1 ≤ i ≤ n and d j = − ln α j and z j = d j + i∈A j ti for 1 ≤ j ≤ k. Then z = (z 1 , z 2 , . . . , z k ) is called the subsystem a-hazard vector and z j is the a-hazard value of the subsystem S j . The system reliability for the partition A can then be expressed in terms of z as follows: Rs (z) = 1 −
k .
1 − e−z j .
(6.18)
j=1
We have proved in Theorem 6.9 that Rs (z) is Schur convex in z. Basic Reordering Operation Prasad et al. [197] provide a definition of the basic reordering operation (BRO) that may generate a better allocation by interchanging some components between two subsystems. Consider a current allocation A = (A 1 , A2 , . . . , Ak ) and two subsystems Sr with Ar and Ss with As (r < s). Let (z 1 , z 2 , . . . , z k ) be the subsystem a-hazard vector corresponding to A: 1. Arrange the components in Ar ∪ As in increasing order of their indexes (this will produce a decreasing order of ti ’s for i ∈ Ar ∪ As ). There are n r + n s components in Ar ∪ As . the set of the first nr components in the rearranged Ar ∪ As and 2. Let G r be yr = dr + i∈G r ti . Thus, G r contains the nr components with lower indexes be the set of the first n s components in the rearand lower reliabilities. Let G s ranged Ar ∪ As and ys = ds + i∈G s ti . Thus, G s contains the n s components with lower indexes and lower reliabilities. 3. Compare yr and ys . If yr ≥ ys , let Ar = G r and As = (Ar ∪ As )\G r . The set As contains the remaining components in the rearranged Ar ∪ As after the components in G r have been removed. If ys > yr , let As = G s and Ar = (Ar ∪ As )\G s . Now we have obtained As in the position of As and Ar in the position of Ar . Let Aj = A j for j = r and j = s (1 ≤ j ≤ k). Then, we have a new complete allocation A and let z = (z 1 , z 2 , . . . , z k ) be the subsystem a-hazard vector for the new allocation A . Based on the above manipulations, we know that (zr , z s ) majorizes (zr , z s ). As a result, z majorizes z. Since Rs (z) is Schur convex, we have Rs (z ) ≥ Rs (z). In other words, allocation A is at least as good as A. Reallocation of the components in Ar and As as just described is called the basic reordering operation on subsystems Sr and Ss . Prasad et al. [197] defines an allocation A to be ordered if there exists a permutation v = (v1 , v2 , . . . , vk ) of (1, 2, . . . , k) such that
214
GENERAL METHODOLOGY FOR SYSTEM DESIGN
Av1 = {1, 2, . . . , n v1 }, Av2 = {n v1 + 1, n v1 + 2, . . . , n v1 + n v2 }, .. .
Avk
k k−1 = n v j + 1, . . . , nv j . j=1
j=1
In words, an ordered allocation starts with the worst component, assigns the n v1 least reliable components to the same subsystem v1 , the next n v2 least reliable components to the subsystem v2 , and finally, the remaining n v j components to the subsystem vk . A totally ordered allocation is an ordered allocation that also meets the following requirement: the subsystem with more reliable components should also be a more reliable subsystem or at least not a less reliable one. Prasad et al. [197] prove that there exists an optimal allocation that is totally ordered. With this result, one needs to consider only totally ordered allocations in order to obtain an optimal allocation. They then provide two algorithms to generate totally ordered allocations. Algorithm 1 picks the subsystem that would have the lowest reliability if the worst remaining components are assigned to it while Algorithm 2 picks the subsystem that would have the highest reliability if the best remaining components are assigned to it. Both algorithms are listed later in this section. They then provide a theorem stating that if the two algorithms give the same totally ordered allocation, then this allocation must be optimal. If any one of the following conditions is satisfied, Algorithms 1 and 2 will yield the same allocation: 1. n 1 = n 2 = · · · = n k , 2. α1 = α2 = · · · = αk , and 3. nr ≤ n s whenever αr ≥ αs . A subsystem Sr is preferred to another subsystem Ss in allocation of less reliable components if the following dominance condition is satisfied: nr ≥ n s
and
αr ≤ αs .
(6.19)
If nr = n s , αr = αs and r < s, we say that subsystem Sr is preferred to subsystem Ss to avoid a tie. Using this dominance condition, we can check any current allocation to see if it is satisfied. The set {(r, s) : Sr is preferred to Ss } is called the preference relation set. If Sr is preferred to Ss and Sr currently has more reliable components assigned to it, we can then group the components in Sr and Ss together and pick the less reliable components in this group and assign them to Sr . Thus, only allocations with this dominance condition satisfied need to be evaluated for optimality. In other words, we need to consider only nondominated totally ordered (NDTO) allocations. Admissibility condition: Let (u 1 , u 2 , . . . , u r −1 ) be a partial sequence of elements 1, 2, . . . , k, where r < k and
OPTIMAL ARRANGEMENT FOR SERIES–PARALLEL SYSTEMS
215
L = {1, 2, . . . , k}\{u 1 , u 2 , . . . , u r −1 }. An element v ∈ L is said to be admissible at stage r with respect to (u 1 , u 2 , . . . , u r −1 ) if the following condition is satisfied for v: m+n ur−1 +n v
m+n ur−1
.
αu r−1
i=m+1
pi ≤ αv
.
pi ,
(6.20)
i=m+n ur−1 +1
where m = n u 1 + n u 2 + · · · + n u r−2 . Any element in {1, 2, . . . , k} is admissible at stage 1. This admissibility condition is suggested to be used in the generation of totally ordered allocations. We will now try to understand what the admissibility condition means. A complete allocation (u 1 , u 2 , . . . , u k ) indicates that the subsystem u 1 gets the n u 1 smallest reliabilities while the subsystem u k gets assigned the n u k largest reliabilities. If we have a partial assignment of the lowest component reliabilities to the first r − 1 subsystems, namely, u 1 , u 2 , . . . , u r −1 , the admissibility condition says that the next subsystem ur , which is to be allocated the worst remaining reliabilities, must be more reliable than the subsystem u r −1 . If this admissibility condition is applied in the generation of totally ordered allocations, all allocations will have u 1 , u 2 , . . . , u k with increasing subsystem reliabilities. Those allocations that violate this admissibility condition cannot be an optimal allocation. If Algorithms 1 and 2 yield the same allocation, we have found the optimal allocation. Otherwise, we need to search for an optimal NDTO allocation using Algorithm 3. Algorithm 1 (Prasad et al. [197]) Step 0. Initialize m = 0, I = {1, 2, . . . , k}, and h = 1, Step 1. Choose r ∈ I such that αr
m+n .r i=m+1
pi = min α j j∈I
m+n .j
pi .
i=m+1
Step 2. Set u h = r and h = h + 1. If h > k, go to step 4. Otherwise, go to step 3. Step 3. Set I = I \{r } and m = m + nr and go to step 1. Step 4. Stop. Here, u = (u 1 , u 2 , . . . , u k ) is a totally ordered allocation. Algorithm 2 (Prasad et al. [197]) Step 0. Initialize m = n, I = {1, 2, . . . , k}, and h = k. Step 1. Choose s ∈ I such that αs
m . i=m−n s +1
pi = max α j j∈I
m . i=m−n j +1
pi .
216
GENERAL METHODOLOGY FOR SYSTEM DESIGN
Step 2. Set vh = s and h = h − 1. If h = 0, go to step 4. Otherwise, go to step 3. Step 3. Set I = I \{s} and m = m − n s and go to step 1. Step 4. Stop. Here, v = (v1 , v2 , . . . , vk ) is a totally ordered allocation. Algorithm 3 (Prasad et al. [197]) Step 0. Take an NDTO allocation u = (u 1 , u 2 , . . . , u k ). Renumber the subsystems such that the totally ordered allocation under consideration is (1, 2, . . . , k); in other words, the subsystem 1 now has the least reliable components while the subsystem k has the most reliable components. Rederive the preference relation set. Set u = v = (1, 2, . . . , k). Set u∗ = u and R ∗ = R(1, 2, . . . , k). Step 1. Set s = k − 1, b = vs , L = {vk }, and I = {h : h ∈ L , h > b}. Step 2. If I = ∅, set L = L ∪ {b} and go to step 3. Otherwise, take the smallest element c from I . If there exists an element g in L such that Pg is preferred to Pc , set I = I \{c} and repeat step 2. Otherwise, check whether c is admissible in stage s with respect to (v1 , v2 , . . . , vs−1 ). If it is admissible, set vs = c, L = (L ∪ {b})\{c} and go to step 4. Otherwise set I = I \{c} and repeat step 2. Step 3. Set s = s − 1. If s = 0, go to step 6. Otherwise, set b = vs and I = {h : h ∈ L , h > b} and go to step 2. Step 4. Set s = s + 1, take the smallest element d in L that is admissible in stage s with respect to (v1 , v2 , . . . , vs−1 ), set vs = d, and go to step 5. If d is not available, go to step 3. Step 5. If s < k, set L = L\{d} and go to step 4. Otherwise, evaluate Rs (v1 , v2 , . . . , vk ). If Rs (v1 , v2 , . . . , vk ) > R ∗ , set u∗ = v and R ∗ = Rs (v1 , v2 , . . . , vk ). Set u = v and go to step 1. Step 6. Stop. The NDTO allocation (u 1 , u 2 , . . . , u k ) is optimal. Example 6.11 Consider a series–parallel system with 4 subsystems and a total of 26 components. Subsystem j is a series subsystem with 6, 4, 7, and 9 components for j = 1, 2, 3, 4, respectively. The shock-free probability for each component in subsystem 1 is 0.9537; in subsystem 2, it is 0.8552; in subsystem 3, it is 0.9968; and in subsystem 4, it is 0.9774. Using the notation defined in this section, we have k = 4,
n = 26,
n 1 = 6, n 2 = 4, n 3 = 7, for 1 ≤ i ≤ 6, 0.9537 0.8552 for 7 ≤ i ≤ 10, ri = 0.9968 for 11 ≤ i ≤ 17, 0.9774 for 18 ≤ i ≤ 26.
n 4 = 9,
We can calculate the α j value for j = 1, 2, 3, 4 as follows:
OPTIMAL ARRANGEMENT FOR SERIES–PARALLEL SYSTEMS
α1 =
6 .
ri = 0.95376 ≈ 0.7524,
α2 =
ri = 0.85524 ≈ 0.5349,
i=7
i=1
α3 =
10 .
217
17 .
ri = 0.99687 ≈ 0.9778,
α4 =
i=11
26 .
ri = 0.97749 ≈ 0.8140.
i=18
Let the available component reliabilities sorted in ascending order be 0.45, 0.46, 0.48, 0.49, 0.52, 0.52, 0.52, 0.57, 0.59, 0.60, 0.62, 0.62, 0.62, 0.64, 0.65, 0.65, 0.76, 0.79, 0.81, 0.86, 0.87, 0.88, 0.90, 0.97, 0.99, 0.99. Algorithm 1 Step 0. The available reliabilities are in nondecreasing order already. We also have m = 0, I = {1, 2, 3, 4}, and h = 1. Step 1. Calculate the following for each subsystem in set I : α1
n1 .
pi ≈ 0.0099,
α2
i=1
α3
n3 .
n2 .
pi ≈ 0.0260,
i=1
pi ≈ 0.0067,
α4
i=1
n4 .
pi ≈ 0.0019.
i=1
Subsystem 4 has the smallest indicator value, that is, r = 4. Thus, the nine smallest reliability values are assigned to the components in subsystem 4. Step 2. Set u 1 = 4 and h = 2. Since h < k = 4, go to step 3. Step 3. Set I = {1, 2, 3} and m = n 4 = 9. Go to step 1. Step 1. Calculate the following for each subsystem in set I : α1
9+n .1
pi ≈ 0.0448,
i=9+1
α2
9+n .2
pi ≈ 0.0765,
9+n .3
α3
i=9+1
pi ≈ 0.0378.
i=9+1
Subsystem 3 has the smallest indicator value, that is, r = 3. Thus, the next seven smallest reliability values are assigned to the components in subsystem 3. Step 2. Set u 2 = 3 and h = 3. Since h < 4, go to step 3. Step 3. Set I = {1, 2} and m = 9 + n 3 = 16. Go to step 1. Step 1. Calculate the following for each subsystem in set I : α1
16+n .1 i=16+1
pi ≈ 0.2409,
α2
16+n .2 i=16+1
pi ≈ 0.2237.
218
GENERAL METHODOLOGY FOR SYSTEM DESIGN
Subsystem 2 has a smaller indicator value, that is, r = 2. Thus, the next four smallest reliability values are assigned to the components in subsystem 2. Step 2. Set u 3 = 2 and h = 4. Since h = 4, go to step 4. Step 4. Set I = {1}, m = 20, and u 4 = 1. Then α1
26 .
pi ≈ 0.4929.
i=21
We have obtained a totally ordered allocation u = (u 1 , u 2 , u 3 , u 4 ) = (4, 3, 2, 1). This allocation dictates that the 9 smallest reliability values are allocated to the components in subsystem 4, the next 7 smallest reliability values to subsystem 3, the next 4 smallest reliability values to subsystem 2, and the remaining 7 reliability values to subsystem 1. The allocation results are tabulated below: Iteration Number, i
Subsystem Assigned, u i
Subsystem Size, n u i
Allocated Reliabilities
Subsystem Reliability, Ru i
1
4
9
0.0019
2
3
7
3 4
2 1
4 6
0.45, 0.46, 0.48, 0.49, 0.52, 0.52, 0.52, 0.57, 0.59 0.60, 0.62, 0.62, 0.62, 0.64, 0.65, 0.65 0.76, 0.79, 0.81, 0.86 0.87, 0.88, 0.90, 0.97, 0.99, 0.99
0.0378 0.2237 0.4929
Rs = 1 − Q 1 Q 2 Q 3 Q 4 ≈ 0.6219.
Algorithm 1 picks the subsystem that will have the smallest subsystem reliability if the worst available components are allocated to it. In contrast, Algorithm 2 picks the subsystem that will have the largest subsystem reliability if the best available components are assigned to it. In the next algorithm we illustrate how to find a totally ordered allocation: Algorithm 2 At the beginning, there are m = 26 available reliabilities, I = {1, 2, 3, 4}, and h = 4. Now we compare the subsystem reliabilities R j (1 ≤ j ≤ 4) if the best reliabilities are assigned to each subsystem: •
If the n 1 (= 6) best components are assigned to subsystem 1: R1 = α1
m .
pi ≈ 0.4929.
i=m−n 1 +1 •
If the n 2 (= 4) best components are assigned to subsystem 2: R2 = α2
m . i=m−n 2 +1
pi ≈ 0.4577.
OPTIMAL ARRANGEMENT FOR SERIES–PARALLEL SYSTEMS •
219
If the n 3 (= 7) best components are assigned to subsystem 3: R3 = α3
m .
pi ≈ 0.5509.
i=m−n 3 +1 •
If the n 4 (= 9) best components are assigned to subsystem 4: R4 = α4
m .
pi ≈ 0.2934.
i=m−n 4 +1
Since R3 is the largest, our conclusion is to allocate the 7 best reliabilities to subsystem 3, that is, v4 = 3 and R3 = 0.5509. Now there are m = 26 − 7 = 19 reliabilities left to be allocated to subsystems 1, 2, and 4 (I = {1, 2, 4}, h = 3). Among the remaining 19 reliabilities, if the best ones are allocated to each of the remaining three subsystems, the subsystem reliabilities can be compared: •
If the n 1 (= 6) best components are assigned to subsystem 1: R1 = α1
m .
pi ≈ 0.0989.
i=m−n 1 +1 •
If the n 2 (= 4) best components are assigned to subsystem 2: R2 = α2
m .
pi ≈ 0.1691.
i=m−n 2 +1 •
If the n 4 (= 9) best components are assigned to subsystem 4: R4 = α4
m .
pi ≈ 0.0255.
i=m−n 4 +1
Since R2 is the largest, our conclusion is to allocate the 4 best remaining reliabilities to subsystem 2, that is, v3 = 2 and R2 = 0.1691. Now there are m = 19 − 4 = 15 reliabilities left to be allocated to subsystems 1 and 4 (I = {1, 4}, h = 2). Among the remaining 15 reliabilities, if the best ones are allocated to subsystem 1 or subsystem 4, the subsystem reliabilities can be compared: •
If the n 1 (= 6) best components are assigned to subsystem 1: R1 = α1
m .
pi ≈ 0.0448.
i=m−n 1 +1 •
If the n 4 (= 9) best components are assigned to subsystem 4:
220
GENERAL METHODOLOGY FOR SYSTEM DESIGN
R4 = α4
m .
pi ≈ 0.0085.
i=m−n 4 +1
Since R1 is the larger one, our conclusion is to allocate the 6 best remaining reliabilities to subsystem 1, that is, v2 = 1 and R1 = 0.0448. Now there are m = 15 − 6 = 9 reliabilities left, which should be allocated to the remaining subsystem 4. The corresponding reliability of subsystem 4 is m .
R4 = α4
pi ≈ 0.0019.
i=m−n 4 +1
The resulting allocation is (v1 , v2 , v3 , v4 ) = (4, 1, 2, 3) and the corresponding system reliability is Rs = 1 − (1 − R1 )(1 − R2 )(1 − R3 )(1 − R4 ) ≈ 0.6442. The allocation results with Algorithm 2 are tabulated below: Iteration Subsystem Subsystem Number, i Assigned, u i Size, n u i 1 2 3 4
3 2 1 4
7 4 6 9
Allocated Reliabilities
Subsystem Reliability, Ru i
0.86, 0.87, 0.88, 0.90, 0.97, 0.99, 0.99 0.65, 0.76, 0.79, 0.81 0.60, 0.62, 0.62, 0.62, 0.64, 0.65 0.45, 0.46, 0.48, 0.49, 0.52, 0.52, 0.52, 0.57, 0.59
0.5509 0.1691 0.0448 0.0019
Since the two allocations obtained with Algorithms 1 and 2 are different, we cannot conclude if the optimal allocation has been obtained or not. We have to use Algorithm 3 to find all NDTO allocations. Algorithm 3 Take the NDTO allocation obtained with Algorithm 2, that is, u = (4, 1, 2, 3). Based on this vector u, subsystem 4 gets the lowest component reliabilities and subsystems 1, 2, and 3 are allocated the remaining component reliabilities from low to high. Now relabel these subsystems so that the subsystems 4, 1, 2, and 3 become subsystems 1, 2, 3, and 4, respectively. With these new labels, the assignment obtained with Algorithm 2 becomes u = (1, 2, 3, 4). The following table shows the old subsystem labels, the new subsystem labels, the α j value for each subsystem, and the reliability of each subsystem based on the new labels and the assignment u = (1, 2, 3, 4): New Subsystem Label, j
Old Subsystem Label
Number of components, n j
αj
Rj
1 2 3 4
4 1 2 3
9 6 4 7
0.8140 0.7524 0.5349 0.9778
0.0019 0.0448 0.1691 0.5509
OPTIMAL ARRANGEMENT FOR SERIES–PARALLEL SYSTEMS
221
Let v = u = (1, 2, 3, 4). The system reliability for this allocation is R ∗ = 0.6442. We need to find the preference relation set from the above table. Using the new subsystem labels and examining the table above, we see that subsystem 1 is preferred to subsystem 4 (because n 1 > n 4 and α1 < α4 ). This preference relationship indicates that subsystem 1 should always be assigned lower reliabilities than subsystem 4. In other words, the permutations of the current vector u should always have 1 preceding 4. No other preference relationships exist. Considering this preference relationship, we can see that only the following permutations need to be considered further: (1, 2, 3, 4), (1, 3, 2, 4), (2, 1, 3, 4), (2, 3, 1, 4), (3, 1, 2, 4), (3, 2, 1, 4), (1, 2, 4, 3), (1, 4, 2, 3), (2, 1, 4, 3), (1, 3, 4, 2), (1, 4, 3, 2), (3, 1, 4, 2). After we have eliminated some permutations using the dominance conditions, we can now use the admissibility condition to eliminate some of these remaining permutations. Admissibility specifies that after a few subsystems have been allocated the lowest available reliabilities, the next subsystem, which is to be allocated the lowest remaining reliabilities, should be more reliable than the subsystem that has just been allocated. In the following, we illustrate one iteration of Algorithm 3: Step 0. Start with the NDTO allocation u = v = (1, 2, 3, 4) with u∗ = v and R ∗ = 0.6442. Step 1. s = 3, b = v3 = 3, L = {v4 } = {4}, I = {4}. Step 2. c = 4. We need to check if subsystem 4 is admissible at stage 3 to the partial allocation (1, 2): R4 = α4 ×
22 .
pi ≈ 0.2035 > R2 = 0.0448.
i=16
Thus, subsystem 4 is admissible at stage 3, v3 = 4, L = {3}. Step 4. s = 4, d = 3. We need to see if subsystem 3 is admissible at stage 4 to the partial allocation (1, 2, 4): R3 = α3
26 .
pi ≈ 0.4577 > R4 = 0.2035.
i=23
Yes, it is admissible and v4 = 3. Step 5. We have found another NDTO allocation v = (1, 2, 4, 3): Rs ((1, 2, 4, 3)) = 1 − (1 − 0.0019)(1 − 0.0448)(1 − 0.2035)(1 − 0.4577) = 0.5882 < 0.6442. This NDTO allocation results in a lower system reliability than the best allocation so far. Set u = v = (1, 2, 4, 3) and go back to step 1.
222
GENERAL METHODOLOGY FOR SYSTEM DESIGN
TABLE 6.1 All NDTO Allocations Generated with Algorithm 3 for Example 6.11 Allocation (u 1 , u 2 , u 3 , u 4 )
Subsystem Reliability Ru 1
Subsystem Reliability Ru 2
Subsystem Reliability Ru 3
Subsystem Reliability Ru 4
System Reliability Rs
(1, 2, 3, 4) (1, 2, 4, 3) (1, 3, 2, 4) (1, 3, 4, 2) (1, 4, 2, 3) (1, 4, 3, 2)
0.0019 0.0019 0.0019 0.0019 0.0019 0.0019
0.0448 0.0448 0.0765 0.0765 0.0378 0.0378
0.1691 0.2035 0.0989 0.1106 0.2409 0.2237
0.5510 0.4577 0.5509 0.4929 0.4577 0.4929
0.6442 0.5882 0.6270 0.5843 0.6047 0.6219
If we continue to use Algorithm 3, we will find and evaluate all NDTO allocations. The system reliabilities of these allocations are compared and eventually the optimal allocation can be found. For this example problem, all the NDTO allocations generated and evaluated by Algorithm 3 are listed in Table 6.1. From Table 6.1, we can see that all allocations meet the requirement Ru 1 ≤ Ru 2 ≤ Ru 3 ≤ Ru 4 and the optimal allocation is (1, 2, 3, 4). Using the optimal allocation, we should assign the lowest remaining reliabilities to subsystems 1 (with nine components), 2 (with six components), 3 (with four components), and 4 (with seven components) in this order. The highest system reliability achievable is 0.6442. Remember that the optimal allocation refers to the new subsystem labels.
6.8 OPTIMAL ARRANGEMENT FOR PARALLEL–SERIES SYSTEMS A parallel–series system has k subsystems connected in series while subsystem j (1 ≤ j ≤ k) has n j components connected in parallel. The reliability block diagram of a parallel–series system is given in Figure 4.17. The optimal assignment of reliabilities to a parallel–series system is similar to the study of parallel systems covered in Section 6.6. The optimal assignment cannot be determined if only the ordering of the component reliabilities is known. Generally speaking, the optimal assignment depends on not only the ordering but also the actual values of the component reliabilities [72]. An exception exists when each subsystem has only two components. In this case, an invariant optimal design exists that maximizes system reliability [25, 72, 197]. The invariant optimal design in this special case is to pair up the best and the worst remaining components and assign each pair to a subsystem. Call the subsystems in the parallel–series system C 1 , C2 , . . . , C k . The subsystem C j has n j components connected in parallel (1 ≤ j ≤ k). Suppose that there are n components available for selection (n = n 1 + n 2 + · · · + n k ) with component reliabilities p1 ≤ p2 ≤ · · · ≤ pn . Without loss of generality, assume that n 1 ≤ n 2 ≤ · · · ≤ n k . The reliability of the system is minimized if we allocate the n k best components to the subsystem Ck , the next n k−1 best components to the subsystem
OPTIMAL ARRANGEMENT FOR PARALLEL–SERIES SYSTEMS
223
Ck−1 , and finally, the last n 1 components to the subsystem C 1 . On the other hand, the reliability of the system would be maximized if the components are allocated such that the reliabilities of the subsystems are as close to one another as possible [72]. The integer programming technique is suggested by El-Neweihi et al. [72] to find the optimal assignment that depends on component reliability values. Define ti = − ln(1 − pi ) for 1 ≤ i ≤ n and z j = i∈C j ti for 1 ≤ j ≤ k. The reliability function of the system can be expressed as Rs = g(z) =
k .
1 − e−z j .
j=1
Using Theorem 6.2, we can verify that g(z) is Schur concave in z. As a result, if z∗ majorizes z, z∗ would minimize system reliability. On the other hand, if z∗ is majorized by every other allocation, z∗ would maximize system reliability. Thus, we would like to find the allocation with a z∗ vector that is majorized by every other possible allocation. Recall that the vector that is majorized has elements that are closer to one another. Using the principle that the reliabilities of the subsystems in a parallel–series system should be as close to one another as possible in order to maximize system reliability, Baxter and Harche [25] provide the following heuristic for optimal component assignment. Baxter and Harche Heuristics Step 0. The available components are ordered in ascending order of their reliabilities, that is, p1 ≤ p2 ≤ · · · ≤ pn . The subsystems are ordered in ascending order of their sizes, that is, n 1 ≤ n 2 ≤ · · · ≤ n k . In the following assignment, whenever subsystem i has been allocated n i components, it is skipped in subsequent allocations. Step 1. Starting from the most reliable component, allocate one component to each of subsystems 1, 2, . . . , k in this order. Step 2. Starting from the most reliable remaining component, allocate one component to each of subsystems k, k − 1, . . . , 1 in this order. Step 3. Evaluate the reliabilities of the subsystems with the allocated compo9 nents, that is, Ri = 1 − j∈Ci q j for i = 1, 2, . . . , k, where Ci represents the set of components that have been allocated to subsystem i so far. Rearrange the subsystems in ascending order of their reliability values. Starting from the most reliable remaining component, allocate one component to each of the reordered subsystems. Step 4. If every subsystem is full, stop. Otherwise, repeat step 3. This heuristic is called the top-down heuristic because it always uses the best remaining component in an assignment. Correspondingly, there is a bottom-up
224
GENERAL METHODOLOGY FOR SYSTEM DESIGN
heuristic that always uses the worst remaining component in an assignment. Baxter and Harche [25] comment that the bottom-up approach does not work as well probably because in a parallel system the most reliable component has the highest B-importance measure. Neither heuristic guarantees the optimal allocation. Baxter and Harche [25] also provide bounds on the absolute error and relative error E of the heuristic solution in comparison with the real optimal solution. Example 6.12 Consider a parallel–series system with n 1 = n 2 = n 3 = 4. There are 12 components to be allocated to the system. The reliabilities of these components have been ordered in ascending order as follows: i pi
1 0.51
2 0.55
3 0.58
4 0.62
5 0.65
6 0.68
7 0.73
8 0.75
9 0.80
10 0.92
11 0.93
12 0.97
for i = 1, . . . , 12. We illustrate the use of the Baxter and Harche heuristic in allocation of these component reliabilities to the system: Step 0. dered. Step 1. tively. Step 2. Step 3.
The component reliabilities and the subsystems have been properly orAllocate components 12, 11, and 10 to subsystems 1, 2, and 3, respecAllocate components 9, 8, and 7 to subsystems 3, 2, and 1, respectively. Now C1 = {12, 7}, C2 = {11, 8}, and C3 = {10, 9}. Then R1 = 1 − q12 q7 = 0.9919,
R2 = 1 − q11 q8 = 0.9825,
R3 = 1 − q10 q9 = 0.9840. The subsystems are ordered as 2, 3, and 1 in ascending order of the Ri values. Allocate components 6, 5, and 4 to subsystems 2, 3, and 1, respectively. Step 3. Now C1 = {12, 7, 4}, C2 = {11, 8, 6}, and C3 = {10, 9, 5}. Then R1 = 1 − q12 q7 q4 ≈ 0.9969,
R2 = 1 − q11 q8 q6 = 0.9944,
R3 = 1 − q10 q9 q5 = 0.9944. The subsystems are ordered as 2, 3, and 1 in ascending order of the Ri values. Allocate components 3, 2, and 1 to subsystems 2, 3, and 1, respectively. Step 4. Every subsystem is full. Now we have C1 = {12, 7, 4, 1}, C2 = {11, 8, 6, 3}, and C3 = {10, 9, 5, 2}. The subsystem reliabilities are R1 = 1 − q12 q7 q4 q1 ≈ 0.9985,
R2 = 1 − q11 q8 q6 q3 ≈ 0.9976,
R3 = 1 − q10 q9 q5 q2 ≈ 0.9975.
OPTIMAL ARRANGEMENT FOR PARALLEL–SERIES SYSTEMS
225
The reliability of the system is Rs = R1 R2 R3 ≈ 0.9936. Though the above heuristic may be applied to parallel–series systems with different n j values for 1 ≤ j ≤ k, we believe that it is most effective when n j is constant. The assignment obtained with the Baxter and Harche heuristic may be further improved by the next algorithm, by Prasad and Raghavachari [198]. Prasad and Raghavachari Heuristic Step 0. Start with a current allocation, C = (C 1 , C2 , . . . , Ck ). Determine the a-hazard vector (z 1 , z 2 , . . . , z k ) for the k subsystems based on the current allocation. For subsystem j (1 ≤ j ≤ k), its a-hazard is z j = i∈C j [− ln(1 − pi )]. Set index INCR = 1. Define ti = − ln(1 − pi ) for i = 1, 2, . . . , n Step 1. If INCR = 0, take the current allocation (C1 , C2 , . . . , Ck ) as the best one and stop. Otherwise, set g = 1, j = 2, and INCR = 0. Step 2. If g = k, go to step 1. Otherwise, go to step 3. Step 3. If z j > z g , find u ∈ C g and v ∈ C j , if they exist, such that tv > tu and " " " ," " " " " "(tv − tu ) − z j − z g " = min "(tc − tb ) − z j − z g ": b ∈ C g , c ∈ C j , " " " 2 2 " 0 < (tc − tb ) < (z j − z g ) and set the following: C g = {C g ∪ {v}}\{u}, z g = z g + tv − tu ,
C j = {C j ∪ {u}}\{v}, z j = z j + tu − tv ,
INCR = 1.
If z j < z g , find u ∈ C g and v ∈ C j , if they exist, such that tv < tu and " " " ," " " " " "(tu − tv ) − z g − z j " = min "(tb − tc ) − z g − z j " : b ∈ C g , c ∈ C j , " " " 2 2 " 0 < (tb − tc ) < (z g − z j ) and set the following: C g = {C g ∪ {v}}\{u}, z g = z g + tv − tu ,
C j = {C j ∪ {u}}\{v}, z j = z j + tu − tv ,
INCR = 1.
Set j = j + 1. If j ≤ k, repeat step 3. Otherwise, set g = g + 1 and j = g + 1 and go to step 2.
226
GENERAL METHODOLOGY FOR SYSTEM DESIGN
Example 6.13 Use the heuristic by Prasad and Raghavachari [198] to improve the allocation obtained with the heuristic by Baxter and Harche in Example 6.12. The current allocation is C1 = (12, 7, 4, 1), C2 = (11, 8, 6, 3), and C3 = (10, 9, 5, 2). Step 0. j = 1 and INCR = 1. The following table summarizes the data for the current allocation: Subsystem C 1 , z 1 = 6.4968
Subsystem C 2 , z 2 = 6.0525
Subsystem C3 , z 3 = 5.9835
i
pi
qi
ti
i
pi
qi
ti
i
pi
qi
ti
12 7 4 1
0.97 0.73 0.62 0.51
0.03 0.27 0.38 0.49
3.5066 1.3093 0.9676 0.7133
11 8 6 3
0.93 0.75 0.68 0.58
0.07 0.25 0.32 0.42
2.6593 1.3863 1.1394 0.8675
10 9 5 2
0.92 0.80 0.65 0.55
0.08 0.20 0.35 0.45
2.5257 1.6094 1.0498 0.7985
Step 1. g = 1, j = 2, INCR = 0. Step 2. g = k. Step 3. z j = z 2 = 6.0525 < z g = z 1 = 6.4968; z g − z j = 0.4443; 1 2 (z g − z j ) ≈ 0.2222: b = 7,
c = 6,
tb − tc = 0.1699,
b = 7,
c = 3,
tb − tc = 0.4418,
b = 4,
c = 3,
tb − tc = 0.1001,
" " " " "(tb − tc ) − 12 (z g − z j )" = 0.0523, " " " " "(tb − tc ) − 12 (z g − z j )" = 0.2196, " " " " "(tb − tc ) − 12 (z g − z j )" = 0.1221,
The minimum of these three values is 0.0523. Thus, u = 7, v = 6, C g = C1 = (12, 6, 4, 1), C j = C2 = (11, 8, 7, 3), z g = z 1 = 6.3269, z j = z 2 = 6.2224, INCR = 1, and j = 3. We need to go to step 3 again. At the end of the current step 3, a component in C1 has been swapped with a component in C 2 while C 3 is kept the same. The modified C 1 and C2 have closer z 1 and z 2 values now. This means that the reliabilities of the first two subsystems are closer to each other after the interchange of two components between subsystems 1 and 2. The reliability of the new allocation, that is, C1 = (12, 6, 4, 1), C2 = (11, 8, 7, 3), and C3 = (10, 9, 5, 2), can be calculated as R1 ≈ 0.9982,
R2 ≈ 0.9980,
R3 ≈ 0.9975,
Rs = R1 R2 R3 ≈ 0.9937.
If we continue to use the Prasad–Raghavachari heuristic [198], we eventually reach an allocation such that no further improvements can be made. This is left to the reader as an exercise.
TWO-STAGE SYSTEMS
227
6.9 TWO-STAGE SYSTEMS Malon [162] studied the optimal allocation of components in two-stage coherent systems. A two-stage coherent system is built with identically structured modules or subsystems. Each module consists of a number of components. The objective is to assign the required number of different components among these modules with the objective of maximizing system reliability. An example of such a two-stage system is a parallel–series system shown in Figure 4.16. In this system, the m modules are connected in parallel while module i has n j components connected in series. A series–parallel system shown in Figure 4.17 is another special case of a two-stage system. Consider the bridge structure in Figure 4.6. If each of the five components in Figure 4.6 can be further decomposed into a series structure with a few subcomponents, the bridge structure can be treated as a two-stage system. If each of the five components can be decomposed into a bridge structure with five subcomponents, the bridge structure can also be treated as a two-stage system. Definition 6.7 (Malon [162]) Greedy module assembly rule: Construct one module with the best available components, construct another module with the best remaining components, and so on. The greedy module assembly rule generates an ordered allocation defined by Prasad et al. [197] and an ordered partition defined by Hwang [103]. However, Malon [162] provides the most general results on the application of this rule. Let R = (R1 , R2 , . . . , Rk ) be an arbitrary vector defined in [0, 1]k . Consider a coherent system of k subsystems whose reliability function is given by Rs (R) when the subsystem i has reliability Ri (1 ≤ i ≤ k). Let denote the set of permutations of {1, 2, . . . , k} and = (σ(1) , σ(2) , . . . , σ(k) ) is an arbitrary permutation in the set . The following theorem and corollary are provided for development of the main results. Theorem 6.10 (Malon [162])
The function f : (−∞, 0]k → [0, 1] defined by
f (y1 , y2 , . . . , yk ) = max Rs (e yσ (1) , e yσ (2) , . . . , e yσ (k) ) ∈
is Schur convex in y. Proof (Malon [162]) Since the value of f is equal to the highest possible reliability value of the system when all permutations of y = (y1 , y2 , . . . , yk ) are considered, f is a symmetric function. Based on Theorem 6.2, it is enough to show that ∂f ∂f ≥0 − (yi − y j ) ∂yi ∂y j for all i, j ∈ {1, 2, . . . , k}. Suppose that Rs (e yσ (1) , e yσ (2) , . . . , e yσ (k) ) is maximized at = ∗ . Then, f (y1 , y2 , . . . , yk ) = Rs (e yσ ∗ (1) , e yσ ∗ (2) , . . . , e yσ ∗ (k) ). Then, σ ∗ (i) represents the index of the module that is assigned to position i. We will use θi =
228
GENERAL METHODOLOGY FOR SYSTEM DESIGN
σ ∗ −1 (i) to represent the position that the ith module is assigned to. Let p∗ = (e yσ ∗ (1) , e yσ ∗ (2) , . . . , e yσ ∗ (k) ). Then, ∂ ∂f = Rs (e yσ ∗ (1) , e yσ ∗ (2) , . . . , e yσ ∗ (k) ) ∂yi ∂yi ∂ 4 yi y j = e e Rs (1θi , 1θ j , p∗ ) + e yi (1 − e y j )Rs (1θi , 0θ j , p∗ ) ∂yi +(1 − e yi )e y j Rs (0θi , 1θ j , p∗ ) + (1 − e yi )(1 − e y j )Rs (0θi , 0θ j , p∗ )
5
= e yi e y j Rs (1θi , 1θ j , p∗ ) + e yi (1 − e y j )Rs (1θi , 0θ j , p∗ ) − e yi e y j Rs (0θi , 1θ j , p∗ ) − e yi (1 − e y j )Rs (0θi , 0θ j , p∗ ). A similar computation for ∂ f /∂y j and some algebraic simplification yield = 4 5 ∂f ∂f (yi − y j ) − = (yi − y j ) e yi Rs (1θi , 0θ j , p∗ ) − Rs (0θi , 0θ j , p∗ ) ∂yi ∂y j 4 5> −e y j Rs (0θi , 1θ j , p∗ ) − Rs (0θi , 0θ j , p∗ ) . If yi > y j , then e yi > e y j and hence Rs (1θi , 0θ j , y∗ ) > Rs (0θi , 1θ j , y∗ ) by Theorem 6.8, since otherwise the configuration, which is presumed to be optimal, could be improved by interchanging the modules in positions θi and θ j . The same argument for the case yi < y j establishes that the product (yi − y j )(∂ f /∂yi − ∂ f /∂y j ) is always nonnegative. This completes the proof. Theorem 6.10 tells us that the reliability of the optimal configuration of any coherent system is Schur convex as a function of the logarithms of its subsystem reliabilities. Consider a system with k subsystems. Each subsystem consists of L types of components, m l of type l, l ∈ {1, 2, . . . , L}. All components in each subsystem are arranged in series; in other words, each subsystem is a series structure. Constructing such a system is equivalent to partitioning km l components of type l into k sets Al1 , Al2 , . . . , Alk , each of size m l , l ∈ {1, 2, . . . , L}. The components in sets A1j , A2j , . . . , A Lj are connected in series to form the subsystem j for 1 ≤ j ≤ k. The greedy partition or the ordered partition will assign the best components of every type to a single module, the remaining best components of every type to another module, and so on. Denote the greedy partition by A∗ = (A∗lj ), 1 ≤ l ≤ L and 1 ≤ j ≤ k. Theorem 6.11 (Malon [162]) The reliability of a system consisting of series subsystems is maximized when the modules are constructed via ordered partitions of every type of component and then installed in the system in an optimal way; in other words, greedily assembled modules, properly installed, maximize the reliability of such systems. This theorem also applies when only one type of component is used and the total number of components are distributed among the subsystems wherein each subsys-
TWO-STAGE SYSTEMS
229
tem may require a different number of components. This is the situation studied by a few other researchers and reported earlier in this chapter, for example, the parallel– series structure. Theorem 6.12 (Malon [162]) Series structures are the only subsystem structures for which greedy assembly is always appropriate. Malon [162] also provides arguments on proving the main results without using the theory of majorization. The main result is that greedily assembled modules properly installed in the system will maximize system reliability if the subsystems are series structures. He uses the exchange principle based on the result stated in Theorem 6.8. First assume that we assemble the modules in any manner and install them in a system in an arbitrary way. Consider the modules Mi and M j in positions i and j. If position i has a higher relative criticality than position j, use the exchange principle based on Theorem 6.8 to exchange the components in the module at position i with the components in the module at the position j such that every component at position i is at least as reliable as every component at position j. Every exchange of a pair of components at positions i and j based on the exchange principle will improve or, at least, not reduce system reliability. A sequence of at most n − 1 such exchanges will put the best components into a single module while increasing system reliability. At most n − 2 additional exchanges will gather the best remaining components into another single module, and so on. This process will transform any initial configuration into a system of greedily assembled modules after at most n2 exchanges, while improving the system reliability at every step. To see the connection between the use of the exchange principle and the Schur convexity, consider a Schur-convex function f : Rn → R and x ∈ Rn . A Schurconvex function f (x) is increased if x is more spread. Suppose xi > x j ; one can increase (or at least not decrease) the value of the function by incrementing xi and simultaneously decrementing x j by the same amount. Since the system reliability as a function of the logarithms of module reliabilities when modules are optimally arranged is Schur convex, incrementing log xi and decrementing log x j by the same amount increase system reliability when log xi > log x j , that is, when module i is more reliable than module j. Note that incrementing log xi and decrementing log x j by the same amount are equivalent to multiplying xi and dividing x j by a constant value α > 1. If the modules have series structures, then this is precisely what happens when a component in the ith module is exchanged for a more reliable component in the jth module. When the respective component reliabilities are pi and pj with pi < pj , the exchange multiplies the module reliability pi by p j / pi > 1 and divides pj by the same amount. Baxter and Harche [24] provide another study of the optimal allocation of components to two-stage systems. They use the concepts of superadditive and subadditive to explain the results given by Malon [162] and also provide some new results using duality. Let C = {1, 2, . . . , n} be the set of components in a system, x be the component state vector, and φ be the structure function of the system. The system is then referred
230
GENERAL METHODOLOGY FOR SYSTEM DESIGN
to as (C, φ). We define (A, g) as a module of the system if A ⊂ C and if there exists c a structure function χ such that φ(x) = χ (g(x A ), x A ) for all x, where x A denotes c {xi |i ∈ A} and A denotes the complement of the set A. We call χ the organizing structure. If there exist modules (A1 , g1 ), (A2 , g2 ), . . . , (Ak , gk ) such that φ(x) = χ (g1 (x A1 ), g2 (x A2 ), . . . , gk (x Ak )) for all x, we say that (C, φ) has modular decomposition {(A1 , g1 ), (A2 , g2 ), . . . , (Ak , gk )}. It is assumed that such a modular decomposition is available for the system studied. We further assume that |Ai | = n/k and gi = g for all i. Now we classify the structure function of the system under study into two categories. We say that φ ∈ S1 if the organizing structure is Schur convex and the gi ’s are L-superadditive and φ ∈ S2 if the organizing structure is Schur concave and the gi ’s are L-subadditive. Theorem 6.13 (Baxter and Harche [24]) If φ ∈ S1 , the greedy algorithm yields an optimal configuration of the system (C, φ). Malon [162] proves that the greedy algorithm maximizes the reliability of a binary system if and only if the modules are in series. This is because the series system is the only binary structure that is L-superadditive. If each module has only two components and there are k modules in the system, then the balanced algorithm yields an optimal configuration of the system (C, φ) under the condition that φ ∈ S2 . If modules have more than two components, there is no invariant optimal allocation. The optimal allocation depends on the values of the component reliabilities. Heuristics or confined searches will have to be used. Let S3 (S4 ) denote the class of structure functions φ that are both Schur convex and L-superadditive (Schur concave and L-subadditive). Baxter and Harche [24] also state the dual relationship between the systems S3 and S4 and the greedy and the balanced algorithms in the following theorem. Theorem 6.14 (Baxter and Harche [24]) Suppose that φ ∈ S3 ∪ S4 . Then φ is optimally configured by the greedy algorithm if and only if φ D is optimally configured by the balanced algorithm. 6.10 SUMMARY In this chapter, we have reviewed several useful concepts for reliability allocation. These concepts such as B-importance, majorization, and Schur-convex function and relative criticality will be used in later chapters when we discuss other system structures. As we have mentioned at the beginning of this chapter, we have avoided covering the applications of integer programming, nonlinear programming, and other optimization techniques for optimal reliability allocation. Readers may refer to Kuo and Prasad [133] and Kuo et al. [132] for discussions of applications of these optimization techniques in optimal reliability system design.
7 THE k-OUT-OF-n SYSTEM MODEL
An n-component system that works (or is “good”) if and only if at least k of the n components work (or are good) is called a k-out-of-n:G system. An ncomponent system that fails if and only if at least k of the n components fail is called a k-out-of-n:F system. Based on these two definitions, a k-out-of-n:G system is equivalent to an (n − k + 1)-out-of-n:F system. The term k-out-of-n system is often used to indicate either a G system or an F system or both. Since the value of n is usually larger than the value of k, redundancy is generally built into a k-out-of-n system. Both parallel and series systems are special cases of the k-out-of-n system. A series system is equivalent to a 1-out-of-n:F system and to an n-out-of-n:G system while a parallel system is equivalent to an n-out-of-n:F system and to a 1-out-of-n:G system. The k-out-of-n system structure is a very popular type of redundancy in faulttolerant systems. It finds wide applications in both industrial and military systems. Fault-tolerant systems include the multidisplay system in a cockpit, the multiengine system in an airplane, and the multipump system in a hydraulic control system. For example, it may be possible to drive a car with a V8 engine if only four cylinders are firing. However, if less than four cylinders fire, then the automobile cannot be driven. Thus, the functioning of the engine may be represented by a 4-out-of-8:G system. The system is tolerant of failures of up to four cylinders for minimal functioning of the engine. In a data processing system with five video displays, a minimum of three displays operable may be sufficient for full data display. In this case the display subsystem behaves as a 3-out-of-5:G system. In a communications system with three transmitters, the average message load may be such that at least two transmitters must be operational at all times or critical messages may be lost. Thus, the transmis231
232
THE k-OUT-OF-n SYSTEM MODEL
sion subsystem functions as a 2-out-of-3:G system. Systems with spares may also be represented by the k-out-of-n system model. In the case of an automobile with four tires, for example, usually one additional spare tire is equipped on the vehicle. Thus, the vehicle can be driven as long as at least 4-out-of-5 tires are in good condition. Among applications of the k-out-of-n system model, the design of electronic circuits such as very large scale integrated (VLSI) and the automatic repairs of faults in an on-line system would be the most conspicuous. This type of system demonstrates what is called the voting redundancy. In such a system, several parallel outputs are channeled through a decision-making device that provides the required system function as long as at least a predetermined number k of n parallel outputs are in agreement. In this chapter, we provide a detailed coverage on reliability evaluation of the k-out-of-n systems. Methods for finding both the exact and the approximate system reliability values are introduced. The performance measures of both nonrepairable and repairable k-out-of-n systems are addressed. In addition, the weighted k-out-ofn system model is discussed in this chapter. In our discussions, it is assumed that the working of the components is independent of one another unless otherwise specified.
7.1 SYSTEM RELIABILITY EVALUATION In this section, we concentrate on techniques for reliability evaluation of k-out-ofn:G systems. The k-out-of-n:G system with i.i.d. components is first studied. Several approaches for system reliability evaluation, when the components are not necessarily s-identical, are then introduced in detail. Finally, bounds on system reliability, when components are not necessarily s-independent, are discussed. Notation •
n: number of components in the system
•
k: minimum number of components that must work for the k-out-of-n:G system to work
•
pi : reliability of component i, i = 1, 2, . . . , n
•
p: reliability of each component when all components are i.i.d.
•
qi : unreliability of component i, qi = 1 − pi , i = 1, 2, . . . , n
•
q: unreliability of each component when all components are i.i.d., q = 1 − p
•
Re (k, n): probability that exactly k out of n components are working
•
R(k, n): reliability of a k-out-of-n:G system or probability that at least k out of the n components are working, where 0 ≤ k ≤ n and both k and n are integers
•
Q(k, n): unreliability of a k-out-of-n:G system or probability that less than k out of the n components are working, where 0 ≤ k ≤ n and both k and n are integers, Q(k, n) = 1 − R(k, n)
SYSTEM RELIABILITY EVALUATION
233
7.1.1 The k-out-of-n:G System with i.i.d. Components In a k-out-of-n:G system with i.i.d. components, the number of working components follows the binomial distribution with parameters n and p. Thus, we have n i n−i pq , Pr(exactly i components work) = i = 0, 1, 2, . . . , n. (7.1) i The reliability of the system is equal to the probability that the number of working components is greater than or equal to k: R(k, n) =
n n i=k
i
pi q n−i .
(7.2)
Equation (7.2) is an explicit formula that can be used for reliability evaluation of the k-out-of-n:G system. If we apply the pivotal decomposition to component n or directly use equation (7.26) developed by Rushdi [208], the system reliability of a k-out-of-n:G system with i.i.d. components can be expressed as R(k, n) = p R(k − 1, n − 1) + (1 − p)R(k, n − 1) = p(R(k − 1, n − 1) − R(k, n − 1)) + R(k, n − 1) = p Pr(exactly k − 1 out of n − 1 components work) + R(k, n − 1) n − 1 k n−k + R(k, n − 1). (7.3) p q = k−1 Rearranging the terms in equation (7.3), we obtain the expression n − 1 k n−k for n ≥ k. p q R(k, n) − R(k, n − 1) = k−1
(7.4)
Equation (7.4) represents the improvement in system reliability by increasing the number of components in the system from n − 1 to n. As n increases, this improvement amount in system reliability will become smaller. Thus, there is an optimal design issue of determining the system size n, which will be addressed later. Equation (7.3) can be used recursively for system reliability evaluation with the boundary condition R(k, n) = 0
for n < k.
(7.5)
Using equation (7.4) and the boundary condition given in equation (7.5), we can express the reliability of a k-out-of-n:G system as follows: R(k, n) =
n i=k
[R(k, i) − R(k, i − 1)] = pk
n i − 1 i−k q . k−1 i=k
(7.6)
234
THE k-OUT-OF-n SYSTEM MODEL
From the equations for system reliability given above, we can see that the reliability of a k-out-of-n:G system with i.i.d. components is a function of n, k, and p. An increase in n or p or both or a decrease in k will increase the system’s reliability. Equation (7.4) represents the increase in system reliability by increasing the number of components in the system from n − 1 to n. In the following, we give an expression for the increase in system reliability for each unit of decrease in k: R(k, n) = Pr(at least k components work) = Pr(at least k − 1 components work) − Pr(exactly k − 1 components work) n pk−1 q n−k+1 . = R(k − 1, n) − k−1
(7.7)
Or equivalently, we have n pk−1 q n−k+1 . R(k − 1, n) − R(k, n) = k−1
(7.8)
With the various expressions of R(k, n) derived so far, we can easily find the expressions of the unreliability Q(k, n) of the k-out-of-n:G system. For example, the following is obvious from equation (7.2): Q(k, n) = 1 − R(k, n) =
n i n−i pq . i
k−1 i=0
(7.9)
To find the expression for the sensitivity of system reliability on component reliability in this i.i.d. case, we can take the first derivative of R(k, n) with respect to p. Using equation (7.6), we have n k−1 n−k d R(k, n) q . (7.10) p =k k dp Exercises 1. Verify equation (7.10). 2. Find similar expressions of R(k, n) or Q(k, n) and other measures for the k-out-of-n:F systems. 3. Analyze the performance of a 3-out-of-6:G system with p1 = 0.5, p2 = 0.6, p3 = 0.7, p4 = 0.8, p5 = 0.9, and p6 = 0.95. 7.1.2 The k-out-of-n:G System with Independent Components For k-out-of-n:G systems with components whose reliabilities are not necessarily identical, we can use the concept of minimal path sets to evaluate system reliability. However, more efficient algorithms for reliability evaluation of such systems were
SYSTEM RELIABILITY EVALUATION
235
reported by Barlow and Heidtmann [20] and Rushdi [208]. These two algorithms have the same complexity as O(k(n − k + 1)). The use of Markov chain imbeddable structures confirms the same result. Belfore [26] uses fast Fourier transform (FFT) and proposes an O(n(log2 n)2 ) algorithm for reliability evaluation of k-out-of-n:G systems. In this section, we illustrate the use of minimal path sets in system reliability evaluation. In addition, we illustrate these other approaches to deriving efficient algorithms for reliability evaluation of k-out-of-n:G systems. Minimal Path Sets or Minimal Cut Sets Approach As discussed earlier, the reliability of any system is equal to the probability that at least one of the minimal path sets works. The unreliability of the system is equal to the probability that at least one minimal cut set is failed. For a minimal path set to work, each component in the set must work. For a minimal cut set to fail, all components in the set must fail. In a n minimal cut sets. k-out-of-n:G system, there are nk minimal path sets and n−k+1 Each minimal path set contains exactly k different components and each minimal cut set contains exactly n − k + 1 components. Thus, all minimal path sets and minimal cut sets are known. The question remaining to be answered is how to find the probability that at least one of the minimal path sets contains all working components or the probability that at least one minimal cut set contains all failed components. The IE method can be used for reliability evaluation of a k-out-of-n:G system since all the minimal path sets and minimal cut sets are known. The IE method has the disadvantage of involving many canceling terms. Heidtmann [92] and McGrady [165] provide improved versions of the IE method for reliability evaluation of the kout-of-n:G system. In their improved algorithms, the canceling terms are eliminated. However, both algorithms are still enumerative in nature. For example, the formula provided by Heidtmann [92] using minimal path sets is as follows: R(k, n) =
n i=k
(−1)i−k
i −1 k−1
i .
j1 < j2 <···< ji =1
p j .
(7.11)
In this equation, for each fixed i value, the inner summation term gives us the probability that i components are working properly regardless of whether the other n − i components are working or not. The total number of terms to be summed together in the inner summation series is equal to ni . If all the components are i.i.d., equation (7.11) gives another formula for reliability evaluation of a k-out-of-n:G system with i.i.d. components: n n i −1 R(k, n) = (−1)i−k pi . (7.12) k − 1 i i=k Equation (7.12) is apparently not as efficient as those given in Section 7.1.1. The SDP method can also be used for reliability evaluation of the k-out-of-n:G systems. Like the improved IE method given in equation (7.11), the SDP method is also easy to use for the k-out-of-n:G systems. However, we will see later that there are much more efficient methods than the IE (and its improved version) and the SDP
236
THE k-OUT-OF-n SYSTEM MODEL
method for evaluating k-out-of-n:G systems. In the following, we present an example to illustrate the use of minimal path sets with the IE method, Heidtmann’s improved IE method, and the SDP method for reliability evaluation of a 2-out-of-4:G system. Example 7.1 Evaluate the reliability of a 2-out-of-4:G system with p1 = 0.91, p = 0.92, p3 = 0.93, and p4 = 0.94. The number of minimal path sets is equal to 42 2 = 6. We will use Si to represent the ith minimal path set as listed below: S1 = x 1 x2 ,
S2 = x 1 x3 ,
S3 = x1 x4 ,
S4 = x 2 x3 ,
S5 = x 2 x4 ,
S6 = x3 x4 .
With the IE method, we can calculate system reliability as follows: R(2, 4) = Pr(S1 ∪ S2 ∪ S3 ∪ S4 ∪ S5 ∪ S6 ) = Pr(S1 ) + Pr(S2 ) + Pr(S3 ) + Pr(S4 ) + Pr(S5 ) + Pr(S6 ) − Pr(S1 S2 ) − Pr(S1 S3 ) − Pr(S1 S4 ) − Pr(S1 S5 ) − Pr(S1 S6 ) − Pr(S2 S3 ) − Pr(S2 S4 ) − Pr(S2 S5 ) − Pr(S2 S6 ) − Pr(S3 S4 ) − Pr(S3 S5 ) − Pr(S3 S6 ) − Pr(S4 S5 ) − Pr(S4 S6 ) − Pr(S5 S6 ) + Pr(S1 S2 S3 ) + Pr(S1 S2 S4 ) + Pr(S1 S2 S5 ) + Pr(S1 S2 S6 ) + Pr(S1 S3 S4 ) + Pr(S1 S3 S5 ) + Pr(S1 S3 S6 ) + Pr(S1 S4 S5 ) + Pr(S1 S4 S6 ) + Pr(S1 S5 S6 ) + Pr(S2 S3 S4 ) + Pr(S2 S3 S5 ) + Pr(S2 S3 S6 ) + Pr(S2 S4 S5 ) + Pr(S2 S4 S6 ) + Pr(S2 S5 S6 ) + Pr(S3 S4 S5 ) + Pr(S3 S4 S6 ) + Pr(S3 S5 S6 ) + Pr(S4 S5 S6 ) − Pr(S1 S2 S3 S4 ) − Pr(S1 S2 S3 S5 ) − Pr(S1 S2 S3 S6 ) − Pr(S1 S2 S4 S5 ) − Pr(S1 S2 S4 S6 ) − Pr(S1 S2 S5 S6 ) − Pr(S1 S3 S4 S5 ) − Pr(S1 S3 S4 S6 ) − Pr(S1 S3 S5 S6 ) − Pr(S1 S4 S5 S6 ) − Pr(S2 S3 S4 S5 ) − Pr(S2 S3 S4 S6 ) − Pr(S2 S3 S5 S6 ) − Pr(S2 S4 S5 S6 ) − Pr(S3 S4 S5 S6 ) + Pr(S1 S2 S3 S4 S5 ) + Pr(S1 S2 S3 S4 S6 ) + Pr(S1 S2 S3 S5 S6 ) + Pr(S1 S2 S4 S5 S6 ) + Pr(S1 S3 S4 S5 S6 ) + Pr(S2 S3 S4 S5 S6 ) − Pr(S1 S2 S3 S4 S5 S6 ) ≈ 0.998467. With equation (7.11), we have 4 i−2 i − 1 R(2, 4) = (−1) 1 j i=2
i .
1 < j2 <···< ji
=1
p j
= ( p1 p2 + p1 p3 + p1 p4 + p2 p3 + p2 p4 + p3 p4 )
237
SYSTEM RELIABILITY EVALUATION
− 2( p1 p2 p3 + p1 p2 p4 + p1 p3 p4 + p2 p3 p4 ) + 3 p1 p2 p3 p4 ≈ 0.998441. With the SDP method, we have R(2, 4) = Pr(S1 ∪ S2 ∪ S3 ∪ S4 ∪ S5 ∪ S6 ) = Pr(S1 ) + Pr(S 1 S2 ) + Pr(S 1 S 2 S3 ) + Pr(S 1 S 2 S 3 S4 ) + Pr(S 1 S 2 S 3 S 4 S5 ) + Pr(S 1 S 2 S 3 S 4 S 5 S6 ) = Pr(x 1 x2 ) + Pr(x1 x 2 x3 ) + Pr(x1 x 2 x 3 x4 ) + Pr(x 1 x2 x3 ) + Pr(x 1 x2 x 3 x4 ) + Pr(x 1 x 2 x3 x4 ) ≈ 0.998441. It is clear that the IE method involves much more calculation than either the improved IE method or the SDP method. Because there are many canceling terms in the IE method, the round-off errors are obvious in its final result. Generating Function Approach by Barlow and Heidtmann Barlow and Heidtmann [20] present two BASIC programs for reliability evaluation of k-out-of-n:G systems with independent components. The first program uses the following generating function and its expanded form: gn (z) =
n .
(qi + pi z) =
i=1
n
Re (i, n)z i ,
(7.13)
i=0
where z is a dummy variable. As we have defined in the notation, Re (i, n) represents the probability that there are exactly i working components in the system. Through examination of the expanded form of gn (z), we find that Re (i, n) also represents the coefficient of z i in the generating function. The BASIC program computes all Re (i, j) entries recursively. Rushdi [208] provides better explanations of this algorithm. In fact, the algorithm relies on the equation R(k, n) =
n
Re (i, n),
(7.14)
i=k
which is obvious based on the definition of a k-out-of-n:G system. The algorithm obtains Re (i, n) through the recursive relation Re (i, j) = q j Re (i, j − 1) + p j Re (i − 1, j − 1),
0 ≤ i ≤ n,
0 ≤ j ≤ n, (7.15)
with the boundary conditions
238
THE k-OUT-OF-n SYSTEM MODEL
Re (−1, j) = Re ( j + 1, j) = 0
for j = 0, 1, 2, . . . .
Re (0, 0) = 1.
(7.16) (7.17)
To derive this recursive relation, first construct the following generating function: g j−1 (z) =
j−1 .
(qi + pi z) =
i=1
j−1
Re (i, j − 1)z i .
(7.18)
i=0
Since g j (z) = (q j + p j z)g j−1 (z), a comparison of the coefficients of z i in both sides of the equation j
Re (i, j)z i = (q j + p j z)
i=0
j−1
Re (i, j − 1)z i
i=0
=
j 4 5 q j Re (i, j − 1) + p j Re (i − 1, j − 1) z i
(7.19)
i=0
leads to equation (7.15). To find out the computational complexity of this algorithm, we examine the number of entries, Re (i, j), that should be calculated with equation (7.15) utilizing boundary conditions in equations (7.16) and (7.17). As shown in Figure 7.1, the total number of entries is equal to (n − k + 1)(k + 1) − 1 + 12 (n − k)2 . Each such entry requires three basic arithmetic operations (two multiplications and one addition). We then need to use equation (7.14) to find the system reliability, which requires n − k basic arithmetic operations. As a result, the total number of basic arithmetic operations required is equal to 2 3 3 (n − k + 1)(k + 1) − 1 + 12 (n − k)2 + n − k = (n − k)(1.5n + 1.5k + 4) + 3k. From this expression, we can see that the computational complexity of the algorithm is O(n 2 ) when k is small (close to 1) and O(n) when k is large (close to n). Generally speaking, the complexity of this algorithm is O(n 2 ). The number of arithmetic operations required for system reliability evaluation can be reduced by noting that we are only interested in finding the probability that at least k components are working. Thus, the calculation of Re (i, j) can be avoided. The second BASIC program by Barlow and Heidtmann [20] avoids calculating Re (i, j) and requires only 3k(n−k +1) arithmetic operations. This computational complexity is also achieved by the algorithm proposed by Rushdi [208]. We will present Rushdi’s algorithm in the following section.
239
-1 0 1 2 ... ... k k+1 ... n-k n-k+1 n-k+2 ... ... ... n
i
j
0 0 1 0
2 0 (0,2) (1,2) (2,2) 0
FIGURE 7.1
1 0 (0,1) (1,1) 0 ... ... ... ... ... ... ... k 0 (0,k) (1,k) (2,k) ... ... (k,k) k+1 0 (0,k+1) (1,k+1) (2,k+1) ... ... (k,k+1) (k+1,k+1) ... ... ... ... ... ... ... ... ... ... n-k n-k+2 n-k+1 0 (0,n-k) (0,n-k+1) (1,n-k) (1,n-k+1) (1,n-k+2) (2,n-k) (2,n-k+1) (2,n-k+2) ... ... ... ... ... ... (k,n-k) (k,n-k+1) (k,n-k+2) (k+1,n-k) ... ... ... ... ... ... (n-k,n-k) ... 0 (n-k+1,n-k+1) ... ... 0 0 ... ... ... ... ... ... ... ... ... ... 0
...
... ... ... ... ... ... ... ... ... ... 0
...
The Re (i, j) entries to be calculated with the algorithm by Barlow and Heidtmann.
3 0 (0,3) (1,3) (2,3) (3,3) 0 ... ... ... ... ... ... ... ... ... ... 0
...
(k,n) (k+1,n) ... (n-k,n) (n-k+1,n) (n-k+2,n) ... ... ... (n,n)
n
240
THE k-OUT-OF-n SYSTEM MODEL
Exercises 1. Consider a system with n = 5 components. Verify that the coefficient of z i for i = 0, 1, . . . , 5 does represent the probability that there are exactly i working components in the system. 2. Compute the reliability and unreliability of a 3-out-of-8:G system with the algorithm given in this section. 3. Use the generating function approach to derive a similar algorithm for the k-out-of-n:F system. Symmetric Switching Function Approach by Rushdi This approach starts with an analysis of the structure function of the k-out-of-n:G system. The structure function φ(x) of a k-out-of-n:G system is symmetric based on Definition 4.4. It can only take two possible values (0 or 1) under the binary assumption of component and system states, like an on–off switch. This is why we name this approach the symmetric switching function approach. In this section, x i indicates the state of component i and S(k, n), instead of φ(x), indicates the structure function of the system. Both xi and S(k, n) are binary variables with a value of 1 indicating the working state and 0 indicating the failed state. The complements of these variables are represented by x i and S(k, n), respectively. Based on these definitions of S(k, n) and S(k, n), we have the following expressions for system reliability and system unreliability: R(k, n) = Pr(S(k, n) = 1),
Q(k, n) = Pr(S(k, n) = 1).
To find an expression of the system state, we can use pivotal decomposition on the nth component, as shown below: S(k, n) = x n S(k − 1, n − 1) + x n S(k, n − 1),
(7.20)
S(k, n) = x n S(k − 1, n − 1) + x n S(k, n − 1).
(7.21)
Based on these two equations, the state of a k-out-of-n:G system can be expressed as a function of the states of two subsystems with the same n −1 components. However, the minimum numbers of components required for these two subsystems to work are different. One requires at least k components to work while the other requires at least k − 1 components to work. These two subsystems with n − 1 components can be further decomposed on the last component, namely component n − 1, until we reach some boundary conditions. Thus, an iterative expression can be used to describe this decomposition process. Consider a system with j components that requires at least i components to work for the system to work. We have the following equations to express the state of such a system as a function of the states of two subsystems: S(i, j) = x j S(i − 1, j − 1) + x j S(i, j − 1),
(7.22)
S(i, j) = x j S(i − 1, j − 1) + x j S(i, j − 1),
(7.23)
SYSTEM RELIABILITY EVALUATION
241
where i may take any integer value from 1 to k and j may take values from 0 to n. The following boundary conditions are needed for equations (7.22) and (7.23): S(0, j) = S( j + 1, j) = 1,
(7.24)
S( j + 1, j) = S(0, j) = 0.
(7.25)
Equations (7.22) and (7.23) are in pivotal decomposition form. Because of the assumption that the components are s-independent, they can be immediately converted to the following algebraic reliability expressions: R(i, j) = p j R(i − 1, j − 1) + q j R(i, j − 1),
(7.26)
Q(i, j) = p j Q(i − 1, j − 1) + q j Q(i, j − 1).
(7.27)
Equations (7.26) and (7.27) are recursive relations that are valid for 1 ≤ i ≤ k. Their boundary conditions can be directly obtained from equations (7.24) and (7.25) as follows: R(0, j) = Q( j + 1, j) = 1,
(7.28)
R( j + 1, j) = Q(0, j) = 0.
(7.29)
Solutions for the reliability R(k, n) or the unreliability Q(k, n) is easily obtained by programming in languages that allow a program to call itself recursively. However, a closer look at the recursive relations (7.26) and (7.27) reveals that they can be easily represented by what is called a signal flow graph (SFG). As an illustration, Figure 7.2 shows the SFG for the computation of R(3, 7). In Figure 7.2, a node at position (i, j) represents R(i, j). The black nodes in the first row with i = 0 are “source” nodes with values of 1, that is, R(0, j) = 1. The white nodes at i = j + 1 are source nodes with zero values, that is, R( j + 1, j) = 0 for j ≥ 0. The values at
j 0
1
2
3
4
5
7
6
i 0 p1 1
p2
q1
q2 p2
2
p3
p4 q3
p3
q2
3
q3
FIGURE 7.2
q4 p4
q3 p3
p5 q5 p5
q4 p4
p6 q5
p5 q4
q6 p6
q5
q6
p7 q7
Signal flow graph for obtaining R(3, 7) and Q(3, 7).
242
THE k-OUT-OF-n SYSTEM MODEL
other nodes, say (i, j), have to be calculated by adding the product of the immediate top-left entry and p j to the product of the immediate left entry and q j . The same graph in Figure 7.2 can also be used for the computation of Q(3, 7) provided that the graph nodes (i, j) are understood to represent the unreliabilities Q(i, j) instead of the reliabilities R(i, j), and the two types of source nodes interchange their values; that is, the black nodes at i = 0 become zero values [Q(0, j) = 0] and the white nodes at i = j + 1 become unity values [Q( j + 1, j) = 1]. The algorithm proceeds efficiently by directly constructing (i.e., computing the element values of) the parallelogram with corners (1, 1), (1, n − k + 1), (k, k), and (k, n). The number of elements in the parallelogram is k(n − k + 1). Each element of the parallelogram requires three arithmetic operations (namely, one multiplication and two additions) for its evaluation. This can be easily seen by invoking the relation q j = 1 − p j to simplify (7.26) and (7.27) into the following forms: R(i, j) = p j R(i − 1, j − 1) + (1 − p j )R(i, j − 1) = R(i, j − 1) + p j (R(i − 1, j − 1) − R(i, j − 1)) ,
(7.30)
Q(i, j) = (1 − q j )Q(i − 1, j − 1) + q j Q(i, j − 1) = Q(i − 1, j − 1) + q j (Q(i, j − 1) − Q(i − 1, j − 1)) .
(7.31)
This means that the algorithm by Rushdi requires 3k(n −k +1) arithmetic operations, and its computational complexity can be written as O(k(n − k + 1)). Computation of the R(i, j) or Q(i, j) entries shown in Figure 7.2 can be processed by row, by column, or even diagonally. However, to minimize the memory requirements, this is done columnwise, for the R(3, 7) case, with due attention paid to the parallelogram boundaries. In this case the algorithm requires a memory storage of k + 1 = 4 scalars, in addition to the memory needed to store pi for 1 ≤ i ≤ n. The additional storage requirement for any such problem is min{k + 1, n − k} by calculating columnwise (if k is smaller) or rowwise (if n − k + 1 is smaller). It is also interesting to note that this algorithm has the same computational complexity for its reliability and unreliability evaluations. Detailed comparisons of the time and memory requirements of the algorithms by Sarje and Prasad [215], Rushdi [208], and Barlow and Heidtmann [20] are conducted by Rushdi [209]. The results are shown in Table 7.1. The time requirement is measured by the number of multiplications, additions, and array references. Table 7.1 shows that the algorithms by Barlow and Heidtmann and Rushdi are computationally more efficient and require less memory than the algorithm by Sarje and Prasad. Risse [202] and Pham and Upadhyaya [194] also give detailed comparisons of such algorithms that generally agree with this result. Sample outputs of the algorithm by Rushdi for reliability and unreliability evaluations of a 5-out-of-8:G system are shown in Tables 7.2 and 7.3, respectively. In both tables, we have assumed that component reliabilities are p j = 0.9 − 0.01( j − 1) for j = 1, 2, . . . , 8. Another advantage of the algorithms by Barlow and Heidtmann [20] and Rushdi [208] is worth noting. All the intermediate entries needed for calculating R(k, n) or
243
SYSTEM RELIABILITY EVALUATION
TABLE 7.1 Comparison of Time and Space Complexities Algorithm
Sarje and Prasad [215]
Temporal complexity Multiplications 4k(n − k) + 4 Additions 2k(n − k + 1) + n References to one-dimensional arrays 4k(n − k) + k References to two-dimensional arrays 4k(n − k) + 2k + 6 Spatial Complexity: Memory requirements
3(n − k + 1) + 2n
Rushdi [208]
Barlow and Heidtmann [20]
k(n − k + 1) 2k(n − k + 1)
(k + 1)(n − k + 1) − 1 (2k + 1)(n − k + 1) − 2
4k(n − k + 1) + 1
(4k + 3)(n − k + 1) − 3
0
0
min(k + 1, n − k + 2)
k+2
Source: Rushdi [209].
TABLE 7.2 Calculating R(5, 8) of 5-out-of-8:G System with Symmetric Switching Function Approach
i
j
0
1
2
3
0 1 2 3 4 5
1 0
1 0.900000 0
1 0.989000 0.801000 0
1 0.998680 0.966440 0.704480 0
4
5
6
0.999828 0.994489 0.932437 0.613246 0
0.999081 0.985802 0.887750 0.527391
0.997089 0.971094 0.833697
7
8
0.992930 0.949110 0.985480
TABLE 7.3 Calculating Q(5, 8) of 5-out-of-8:G System with Symmetric Switching Function Approach
i
j
0
1
2
3
0 1 2 3 4 5
0 1
0 0.100000 1
0 0.011000 0.199000 1
0 0.001320 0.033560 0.295120 1
4
5
6
0.000172 0.005511 0.067563 0.386754 1
0.000919 0.014198 0.112250 0.472609
0.002911 0.028906 0.166303
7
8
0.007070 0.050890 0.014520
Q(k, n) are meaningful numbers that represent R(i, j) or Q(i, j) for 1 ≤ i ≤ k and i ≤ j ≤ n − k + i. These numbers are available to the reliability engineer at no extra cost, and can enable one to make a valid economic assessment of redundancy. For example, row 5 in Table 7.2 represents the reliability R(5, j), where j varies from 5 to 8. The incremental system reliability achieved by increasing system size from j to j + 1 is j R(i, j) = R(5, j + 1) − R(5, j).
(7.32)
244
THE k-OUT-OF-n SYSTEM MODEL
The economic equivalence of this incremental reliability can, therefore, be estimated and compared to the cost of adding an additional component, thereby obtaining the optimal number of components for the 5-out-of- j system. Exercises 1. Compute the reliability of a 2-out-of-6:G system with pi = 0.6 + 0.06i for i = 1, 2, . . . , 6. 2. How do you use the algorithm covered in this section to evaluate the reliability of a k-out-of-n:F system? 3. What can you conclude through examination of the entries in the same column in Table 7.2 or 7.3? MIS Technique To imbed the k-out-of-n:G system into the Markov chain following Definition 5.1, we can define the state space as S = {s0 , s1 , . . . , sk } = {0, 1, . . . , k}, the partition as Si = {i} for i = 0, 1, . . . , k (here, N = m = k), and the Markov chain {Yl , l ≥ 0} as 1. Yl = i if exactly i of the components 1, 2, . . . , l are working (0 ≤ i < k) and 2. Yl = k if at least k of the components 1, 2, . . . , l are working. Recall that pi j represents the probability for the Markov chain to make a transition from state i to state j. The transition matrix of the Markov chain is
⌳l = ( pi j )(k+1)×(k+1)
ql =
where i, j = 0, 1, . . . , k.
pl ql
pl .. .
..
.
ql
pl ql
, pl 1 (k+1)×(k+1) (7.33)
Unmarked entries of the matrix are all equal to zero. This transition matrix provides the probabilities for the system with one more component, namely component l, to be in state j (0 ≤ j ≤ k) given that the system with l − 1 components is in state i(0 ≤ i ≤ k). In the k-out-of-n:G system, we are interested in knowing whether the number of working components has reached or exceeded k. That is why the system state space includes 0, 1, . . . , k and represents the progressive increase in the number of working components as the system size increases. When the system size reaches n, the probability that the system is in state k is the reliability of the system. Making use of the transition probability matrix given in equation (7.33) and noting N = m = k, we obtain the following recursive equations with Theorem 5.2:
SYSTEM RELIABILITY EVALUATION
a0 (l) = ql a0 (l − 1),
l ≥ 1,
a j (l) = pl a j−1 (l − 1) + ql a j (l − 1),
1 ≤ j < k,
ak (l) = pl ak−1 (l − 1) + ak (l − 1),
k ≤ l ≤ n,
245
(7.34) j ≤ l ≤ n,
(7.35) (7.36)
where a j (l) is the probability that there are exactly j working components in a system with l components for 0 ≤ j < k and ak (l) is the probability that there are at least k working components in the l-component subsystem. The following boundary conditions are immediate: a0 (0) = 1,
(7.37)
a j (0) = 0,
j > 0,
(7.38)
a j (l) = 0,
l < j.
(7.39)
The reliability of the system is R(k, n) = ak (n). The recursive equations (7.34)–(7.36) can also be represented by a signal flow diagram similar to the one shown in Figure 7.2. These recursive equations have the same iterative structure as the algorithms given by Barlow and Heidtmann [20] and Rushdi [208]. The computational complexity of the recursive equations (7.34)–(7.36) is ∼ 3k(n − k + 1) or O(k(n − k + 1)). Example 7.2 Consider a 5-out-of-8:G system with component reliabilities pl = 0.9 − 0.01(l − 1) for l = 1, 2, . . . , 8. Table 7.4 lists the results using the MIS approach. Compare this table with Table 7.2. From Table 7.4, we see that the MIS approach not only provides the reliability of a k-out-of-n:G system but also the probabilities that there are at least k working components in the l-component subsystem for l = k, k + 1, . . . , n. In addition, we also know the probabilities that there are exactly j working components in the lcomponent subsystems for j = 0, 1, . . . , k − 1 and l = j, j + 1, . . . , j + n − k. For example, from the column with l = 6 in Table 7.4, we see that in the sixcomponent subsystem, the probability that there are exactly three working compo-
TABLE 7.4 Reliability Evaluation of 5-out-of-8:G System with MIS Approach l pl ql
j
0 1 2 3 4 5
0
1 0.90 0.10
2 0.89 0.11
3 0.88 0.12
1 0
0.100000 0.900000 0
0.011000 0.188000 0.801000 0
0.001320 0.032240 0.261560 0.704880 0
4 0.87 0.13
5 0.86 0.14
6 0.85 0.15
7 0.84 0.16
8 0.83 0.17
0.005340 0.062052 0.319192 0.613246 0
0.013280 0.098052 0.360360 0.527391
0.025996 0.137392 0.833697
0.043802 0.949111
0.985482
246
THE k-OUT-OF-n SYSTEM MODEL
nents is 0.025996, that there are exactly four working components is 0.137392, and that there are exactly five working components is 0.833697. From these entries, we can also find that the probability that there are at least three working components in the six-component subsystem is equal to 0.025996 + 0.137392 + 0.833697 = 0.997085. Fast Fourier Transform Method by Belfore Belfore [26] uses the generating function approach as developed by Barlow and Heidtmann [20] and applies the FFT in computation of the products of the generating functions. An algorithm for reliability evaluation of k-out-of-n:G systems results from such a combination that has a computational complexity of O(n(log2 n)2 ). In the following, we explain this FFT approach. Consider the following generation function: gn (z) =
n .
( pi + qi z).
(7.40)
i=1
It is a polynomial function of variable z. The coefficient of term z i in this polynomial function represents the probability that exactly i components are failed and thus the other n − i components are working. By factoring out the products of the component reliabilities, we can express equation (7.40) in the form gn (z) = Pπ
n .
(1 + ai z) = Pπ (1 + A(z)),
(7.41)
i=1
where Pπ = p1 p2 · · · pn , qi for i = 1, 2, . . . , n, ai = pi A(z) = b1 z + b2 z 2 + · · · + bn z n , bi = a j1 a j2 · · · a ji
for i = 1, 2, . . . , n.
1≤ j1 < j2 <··· ji ≤n
Using the form of the generating function in equation (7.41) rather than the one in equation (7.40) results in fewer computations because the multiplications by 1 are implicit and the increases in FFT sizes are delayed since the FFT is applied to A(z) [26]. Once the coefficients of A(z), or bi for i = 1, 2, . . . , n, are calculated, we can find the reliability of a k-out-of-n:G system with the following equation:
R(k, n) = Pπ 1 +
n−k i=1
bi .
(7.42)
247
SYSTEM RELIABILITY EVALUATION
Suppose, for ease of explanation, that n is a power of 2. Define A(z) as an nthorder polynomial function of z. The term 1 + A(z) in equation (7.41) can be viewed as a product of two generating functions, 1 + A1 (z) and 1 + A2 (z), where A1 (z) and A2 (z) are (n/2)th-order polynomial functions of z. We have 1 + A(z) = [1 + A1 (z)][1 + A2 (z)] = 1 + A1 (z) + A2 (z) + A1 (z)A2 (z). As a result, A(z) = A1 (z) + A2 (z) + A1 (z)A2 (z).
(7.43)
To find A(z), first we need to compute the product of two (n/2)th-order polynomial functions, A1 (z) and A2 (z). In turn A1 (z) and A2 (z) each is equal to the sum of two lower order [(n/4)th-order] polynomial functions plus their product. This process can be repeated until the order of the polynomial functions is low enough so that the exact values of its coefficients become apparent. Finding the expression of the product of two (n/2)th-order polynomial functions A1 (z) and A2 (z) is equivalent to finding the coefficients of the resulting polynomial function. This can be achieved using FFT. First, we define a discrete function corresponding to Ai (z)(i = 1, 2). This discrete function takes values of the coefficients of Ai (z)(i = 1, 2) over the definition domain of {0, 1, 2, . . . , (n/2) − 1}. If A1 (z) and A2 (z) are in the forms A1 (z) = c1 z + c2 z 2 + · · · + cn/2 z n/2 , A2 (z) = d1 z + d2 z 2 + · · · + dn/2 z n/2 , the two discrete functions are in the forms cx+1 if x = 0, 1, 2, . . . , n/2 − 1, f 1 (x) = 0 otherwise, dx+1 if x = 0, 1, 2, . . . , n/2 − 1, f 2 (x) = 0 otherwise. The convolution of these two discrete functions is given by f (x) = f 1 (x) ∗ f 2 (x) ≡
n/2−1
f 1 (x − y) f 2 (y),
y=0
where ∗ denotes the convolution operator. We have to note that the definition domain of the resulting function, f (x), is {0, 1, 2, . . . , n − 1}. This is in agreement with the product of two (n/2)th-order polynomial functions being an nth-order polynomial function. The values of f (x) for x ∈ {0, 1, 2, . . . , n} provide the coefficients of the resulting polynomial function, namely A1 (z)A2 (z). We can then use equation (7.43)
248
THE k-OUT-OF-n SYSTEM MODEL
to find the coefficients of A(z). However, we need an efficient method to find the convolution of two discrete functions. Based on the convolution theorem in Fourier theory, the Fourier transform of the convolution of two functions is equal to the product of the Fourier transforms of these two individual functions. Thus, to find the coefficients of the product of two polynomial functions, we can first find the FFT of two discrete functions corresponding to the two polynomial functions, multiply these two FFTs in the frequency domain, and finally conduct the inverse FFT on the resulting product to obtain the desired coefficients of the resulting polynomial function. For details on FFT, readers are referred to Bracewell [41]. Using the generating function form as shown in equation (7.41) and assuming n is a power of 2, the number of operations required to multiply 1 + A1 (z) and 1 + A2 (z), where A1 (z) and A2 (z) each is a (n/2)th-order polynomial function, using FFT [26] is T (n) = 15n log2 (n) + 11n − 2.
(7.44)
Apparently, large overheads are involved in the FFT approach. Thus, it is not efficient to use this approach for small n values. When n is small, we can use the algorithm provided by Barlow and Heidtmann [20] to directly find the coefficients of the generating function. Belfore [26] shows that the FFT approach is more efficient than the algorithm by Barlow and Heidtmann (BH) when n is larger than 512. The following algorithm based on Barlow and Heidtmann [20] is used to compute the coefficients of the generating functions for small n values: BH(n, a[1 : n], Az[1 : n]) integer i, j; Az[1] = a[1]; For i = 2 To n By 1 Do Az[i] = Az[i − 1] ∗ a[i]; For j = i − 1 To 2 By −1 Do Az[ j] = Az[ j] + Az[ j − 1] ∗ a[i]; EndFor Az[1] = Az[1] + a[i]; EndFor Return Az[1 : n]; End In the BH algorithm shown above, a[1 : n] is an array of size n holding the ratios qi / pi and Az[1 : n] is an array of size n holding the coefficients of z i in the resulting generating function for i = 1, 2, . . . , n. The sizes of these arrays are determined by the calling algorithm through the argument n. The following algorithm is used to calculate the coefficients of the generating function shown in equation (7.41) for large n values:
SYSTEM RELIABILITY EVALUATION
249
GF FFT(n, a[1 : n], Az[1 : n]) If n <= threshold Then Call BH(n, a, Az); Else Call GF FFT(n/2, a[1 : n/2], Az[1 : n/2]); Call GF FFT(n−n/2, a[n/2 + 1 : n], Az[n/2 + 1 : n]); F F T si ze = 2liub(log2 n) ; Initialize temp1, temp2; For i = 1 To n/2 By 1 Do temp1[i].r eal = Az[i]; EndFor For i = n/2 + 1 To n By 1 Do temp2[i − n/2].r eal = Az[i]; EndFor Compute FFT of temp1; Compute FFT of temp2; For i = 1 To F F T si ze By 1 Do temp1[i] = temp1[i] ∗ temp2[i]; EndFor Compute the inverse FFT of temp1 and assign it to I temp1; Az[1] = Az[1] + Az[n/2 + 1]; For i = 2 To n/2 By 1 Do Az[i] = Az[i] + Az[n/2 + i] + I temp1[i − 1]; EndFor For i = n/2 + 1 To n−n/2 By 1 Do Az[i] = Az[n/2 + i] + I temp1[i − 1]; Endfor For i = n − n/2 + 1 To n By 1 Do Az[i] = I temp1[i − 1]; Endfor EndIf Return Az[1 : n]; End
In this algorithm, n/2 is defined to be the largest integer less than or equal to n/2 when n is not divisible by 2; liub indicates the lowest integer upper bound; temp1 and temp2 are complex variable arrays, both of size FFT size; a[n/2 + 1 : n] indicates that a subarray of a[1 : n] with values in positions n/2+1 through n of a[1 : n] being used; and the calling algorithm defines the sizes of the arrays in the arguments. The threshold value is specified by the user so that when n is smaller than the threshold, the BH algorithm is used for calculating the coefficients of the generating function. The system reliability of a k-out-of-n:G system can be calculated with the following algorithm according to equation (7.42):
250
THE k-OUT-OF-n SYSTEM MODEL
R sys FFT(n, k, p[1 : n], q[1 : n]) P pi = 1; For i = 1 To n By 1 Do P pi = P pi ∗ p[i]; a[i] = q[i]/ p[i]; EndFor Call GF FFT(n, a[1 : n], Az[1 : n]); Rel = 1; For i = 1 To n − k By 1 Do Rel = Rel + Az[i]; EndFor Rel = P pi ∗ Rel; Return Rel; End The lower and upper bounds on the complexity of the algorithm R sys FFT for a threshold value of 2 are given by Belfore [26] as Tlower (n) =
1 2
× 15n(log2 n)2 +
1 2
× 37n log2 n − 23n + 2,
Tupper (n) = 15n(log2 n)2 + 67n log2 n + 6n + 2.
(7.45) (7.46)
For simplicity, we can say that the FFT approach has a time complexity of O(n(log2 n)2 ). 7.1.3 Bounds on System Reliability When the components in a k-out-of-n system are s-independent, the algorithms presented in Section 7.1.2 are quite efficient for evaluation of exact system reliability. However, the components in the system may not be independent in some cases. To add to the difficulty, the way in which the components are dependent on each other may not be completely understood. In this section, we provide discussion on system reliability approximation when components are not necessarily independent. Associated Components As introduced earlier, the concept of association indicates that two random variables have nonnegative covariance. In this case, we may use the theorem given in Barlow and Proschan [22] to find the upper and lower bounds on system reliability of k-out-of-n:G systems. Let P1 , P2 , . . . , Pr represent the minimal path sets. We have r = nk and there are k components in each of these path sets. Let K 1 , K 2 , . . . , K t represent minimal n and there are n − k + 1 components in each the minimal cut sets. Then, t = n−k+1 of these minimal cut sets. The following bounds on system reliability are given by
RELATIONSHIP BETWEEN k-OUT-OF-n G AND F SYSTEMS
251
Barlow and Proschan [22]: max
1≤i≤r
. j∈Pi
. p j ≤ Rs ≤ min 1 − (1 − p j ) . 1≤i≤t
(7.47)
j∈K i
Unspecified Dependence of Components Without making any assumptions on how components are dependent on one another, Lipow [144] provides a simple formula for the lower bound of the reliability of a k-out-of-n:G system: Rs ≥ max p j − k + 1, (7.48) 1≤i≤r
j∈Pi
where r = nk and Pi is the ith minimal path set. This formula was derived from the IE method for system reliability evaluation using minimal path sets. It is useful only when component reliabilities are pretty close to 1 and k is not too big. Exercises 1. Analyze the closeness of the bounds given in (7.47) to exact system reliability for a k-out-of-n:G system. Consider cases when component reliabilities are high and low. 2. Develop an upper bound for system reliability using the IE method when component dependency is unspecified. Under what conditions will this bound be close to the exact system reliability? 3. Analyze the closeness of the bounds given in (7.48) to exact system reliability for a k-out-of-n:G system. Consider cases when component reliabilities are high and low.
7.2 RELATIONSHIP BETWEEN k-OUT-OF-n G AND F SYSTEMS In the previous section, we illustrated different approaches for reliability evaluation of k-out-of-n:G systems. Exercises were given for following similar approaches to derive algorithms for reliability evaluation of k-out-of-n:F systems. In this section, we provide a formal discussion of the relationship between k-out-of-n G and F systems and how reliability evaluation algorithms for these two types of systems are closely related. 7.2.1 Equivalence between k-out-of-n:G and (n − k + 1)-out-of-n:F Systems Based on the definitions of these two types of systems, a k-out-of-n:G system is equivalent to an (n − k + 1)-out-of-n:F system. Similarly, a k-out-of-n:F system is equivalent to an (n −k +1)-out-of-n:G system. This means that provided the systems have the same set of component reliabilities, the reliability of a k-out-of-n:G system
252
THE k-OUT-OF-n SYSTEM MODEL
is equal to the reliability of an (n − k + 1)-out-of-n:F system and the reliability of a k-out-of-n:F system is equal to the reliability of an (n −k +1)-out-of-n:G system. As a result, we can use the algorithms that have been covered in the previous section for the k-out-of-n:G systems in reliability evaluation of the k-out-of-n:F systems. The procedure is simple and is outlined below: Procedure for Using Algorithms for the G Systems in Reliability Evaluation of the F Systems Utilizing the Equivalence Relationship 1. Given k, n, p1 , p2 , . . . , pn for a k-out-of-n:F system. 2. Calculate k1 = n − k + 1. 3. Use k1 , n, p1 , p2 , . . . , pn to calculate the reliability of a k1 -out-of-n:G system. This reliability is also the reliability of the original k-out-of-n:F system. 7.2.2 Dual Relationship between k-out-of-n G and F Systems Barlow and Proschan [22] provide the following definition of a dual structure. Definition 7.1 Given a structure φ, its dual structure φ D is given by φ D (x) = 1 − φ(1 − x),
(7.49)
where 1 − x = (1 − x1 , 1 − x2 , . . . , 1 − xn ). With a simple variable substitution of 1−x for x, we have the equation φ D (1 − x) = 1 − φ(x).
(7.50)
We can interpret equation (7.50) as follows. Given a primal system with component state vector x and the system state represented by φ(x), the state of the dual system is equal to 1 − φ(x) if the component state vector for the dual system can be expressed as 1 − x. In the binary system context, each component and the system may only be in two possible states, either working or failed. We will say that two components with different states have opposite states. For example, if component 1 is in state 1 and component 2 is in state 0, components 1 and 2 have opposite states. Suppose a system (called system 1) has component state vector x and system state φ(x). Consider another system (called system 2) with the same number of components as system 1. If each component in system 2 has the opposite state of the corresponding component in system 1 and the state of system 2 becomes the opposite of the state of system 1, then system 1 and system 2 are duals of each other. Now we examine the k-out-of-n G and F systems. Suppose that in the k-out-ofn:G system, there are exactly j working components and the system is working (in other words, j ≥ k). Now assume that there are exactly j failed components in the k-out-of-n:F system. Since j ≥ k, the k-out-of-n:F system must be in the failed state. If j < k, the k-out-of-n:G system is failed, and at the same time the k-outof-n:F system is working. Thus, the k-out-of-n G and F systems are duals of each
253
RELATIONSHIP BETWEEN k-OUT-OF-n G AND F SYSTEMS
other. Using the equivalence relationship described in the previous section, we can also say that the dual of a k-out-of-n:G system is an (n − k + 1)-out-of-n:G system. Similarly, we can say that a k-out-of-n:F system is the dual of an (n − k + 1)-out-ofn:F system. These dual and equivalence relationships between the k-out-of-n G and F systems are summarized below: 1. 2. 3. 4. 5. 6.
A k-out-of-n:G system is equivalent to an (n − k + 1)-out-of-n:F system. A k-out-of-n:F system is equivalent to an (n − k + 1)-out-of-n:G system. The dual of a k-out-of-n:G system is a k-out-of-n:F system. The dual of a k-out-of-n:G system is an (n − k + 1)-out-of-n:G system. The dual of a k-out-of-n:F system is a k-out-of-n:G system. The dual of a k-out-of-n:F system is an (n − k + 1)-out-of-n:F system.
Using the dual relationship, we can summarize the following procedure for reliability evaluation of the dual system if the available algorithms are for the primal system: Procedure for Using Algorithms for the G Systems in Reliability Evaluation of the F Systems Utilizing the Dual Relationship 1. Given k, n, p1 , p2 , . . . , pn for a k-out-of-n:F system. 2. Calculate qi = 1 − pi for i = 1, 2, . . . , n. 3. Treat qi as the reliability of component i in a k-out-of-n:G system and use the algorithms for the G system discussed in the previous section to evaluate the reliability of the G system. 4. Subtract the calculated reliability of the G system from 1 to obtain the reliability of the original k-out-of-n:F system. Using the dual relationship, we can also obtain algorithms for k-out-of-n:F system reliability evaluation from those developed for the k-out-of-n:G systems. We only need to change reliability measures to unreliability measures and vice versa. Take the algorithm developed by Rushdi [208] as an example. The formulas for reliability and unreliability evaluation of a k-out-of-n:G system are given in equations (7.26) and (7.27) with boundary conditions in equations (7.28) and (7.29). By changing R(i, j) to Q(i, j), Q(i, j) to R(i, j), pi to qi , and qi to pi in those four equations, we obtain the following equations for reliability and unreliability evaluation of a k-out-of-n:F system: Q F (i, j) = q j Q F (i − 1, j − 1) + p j Q F (i, j − 1),
(7.51)
R F (i, j) = q j R F (i − 1, j − 1) + p j R F (i, j − 1),
(7.52)
with the boundary conditions Q F (0, j) = R F ( j + 1, j) = 1,
(7.53)
Q F ( j + 1, j) = R F (0, j) = 0.
(7.54)
254
THE k-OUT-OF-n SYSTEM MODEL
To avoid confusion, the subscript F is added to indicate that these measures are for the F system. Similar steps can be applied to other algorithms for the G systems to derive the corresponding algorithms for the F systems. It is because of such close relationships between the k-out-of-n G and F systems that we often refer to them collectively as the k-out-of-n systems. Example 7.3 Consider a k-out-of-n:F system with k = 3, n = 7, and pi = 0.8 + 0.02i for i = 1, . . . , 7. Use the two procedures listed in Section 7.2 to evaluate the reliability of the system. The 3-out-of-7:F system is equivalent to a 5-out-of-7:G system with the same set of components. Table 7.5 lists the calculations needed to find the reliability of the 5-out-of-7:G system. The reliability of the 5-out-of-7:G system is found to be 0.959836, which is equal to the reliability of the original 3-out-of-7:F system. Each entry in Table 7.5 can be interpreted in terms of either a k-out-of-n:G subsystem or a (n − k + 1)-out-of-n:F subsystem. For example, 0.853982 in the column labeled 5 and the row labeled 4 represents the reliability of a 4-out-of-5:G subsystem and at the same time the reliability of the equivalent 2-out-of-5:F subsystem. We can also use the dual relationship between a k-out-of-n:F system and a kout-of-n:G system. From the given pi values, calculate all qi = 1 − pi for i = 1, 2, . . . , 7. Treat these qi ’s as component reliability values and apply the formulas for the k-out-of-n:G system. Table 7.6 lists these calculations. The rightmost entry at the bottom row in Table 7.6 is the reliability of the 3-out-of-7:G system. To find the reliability of the original 3-out-of-7:F system, we need to subtract this value from 1: R F (3, 7) = 1 − 0.040163 = 0.959837. TABLE 7.5 Reliability Evaluation of Equivalent 5-out-of-7:G System n k 0 1 2 3 4 5
0
1
2
1 0
1 0.820000 0
1 0.971200 0.688800 0
3 0.995968 0.931664 0.592368 0
4
5
6
7
0.998252 0.890948 0.521284 0
0.978521 0.853982 0.469156
0.968558 0.823196
0.959836
TABLE 7.6 Reliability Evaluation of Dual System: 3-out-of-7:G System n k 0 1 2 3
0
1
2
3
4
5
6
7
1 0
1 0.180000 0
1 0.311200 0.028800 0
1 0.407632 0.068336 0.004032
1 0.478716 0.109052 0.011748
0.530845 0.146018 0.021479
0.176804 0.031442
0.040164
NONREPAIRABLE k-OUT-OF-n SYSTEMS
255
Exercises 1. Derive the formulas for reliability evaluation of a k-out-of-n:F system based on the algorithm by Barlow and Heidtmann. 2. Derive the formulas for reliability evaluation of a k-out-of-n:F system based on the MIS approach. 3. Compute the reliabilities of the k-out-of-n F and G systems with k = 2, 3, 4 and n = 5, 6, 7.
7.3 NONREPAIRABLE k-OUT-OF-n SYSTEMS In the previous sections, we have discussed the so-called static properties of k-outof-n systems. Reliability has not been expressed as a function of time. But, in fact, reliability and other performance measures of any system are functions of time. Starting from this section, we will provide stochastic analyses of the k-out-of-n systems. In reality, it is sometimes impossible to repair a system until its mission is complete. In this case, the reliability of the system is a decreasing function of time. In this section, we examine the performance measures of nonrepairable k-out-of-n systems. Notation • • • • • • • • • • • • • • • • •
Ti : lifetime of component i, a random variable Ts : lifetime of the system, a random variable Ri (t): Pr(Ti ≥ t), reliability function of component i R(t): reliability function of each component when components are i.i.d. Fi (t): 1 − Ri (t), CDF or unreliability function of component i F(t): CDF or unreliability function of each component when components are i.i.d. f i (t): pdf of the lifetime of component i f (t): pdf of the lifetime of each component when components are i.i.d. h i (t): failure rate function of component i h(t): failure rate function of each component when components are i.i.d. Rs (t): reliability function of the system Fs (t): 1 − Rs (t), CDF or unreliability function of the system f s (t): pdf of the lifetime of the system h s (t): failure rate function of the system R(t; k, n): reliability function of the k-out-of-n:G system MTTFs : mean time to failure of the system MTTF(k, n): mean time to failure of a k-out-of-n:G system
256
THE k-OUT-OF-n SYSTEM MODEL
7.3.1 Systems with i.i.d. Components When the components in a k-out-of-n:G system are i.i.d., the reliability function of the system can be expressed as n n Rs (t) = (7.55) R(t)i F(t)n−i . i i=k This equation is directly obtained from equation (7.2) by replacing p with R(t) and q with F(t). Similarly, the CDF of the system lifetime is given by Fs (t) = 1 − Rs (t) =
k−1 i=0
n R(t)i F(t)n−i . i
(7.56)
The pdf of the system lifetime is then fs (t) =
d Fs (t) n f (t)F(t)n−k R(t)k−1 . =k k dt
(7.57)
Usually, as the system is used, the components in the system will fail one by one. The system is failed as soon as the (n − k + 1)th component is failed. If we use ti to indicate the lifetime of component i, the system lifetime is then equal to the (n − k + 1)th smallest ti . The expected lifetime of the system, or mean time to failure, can be evaluated using the standard equation ∞ ∞ MTTFs = t f s (t) dt = Rs (t) dt. (7.58) 0
0
In the following, we first illustrate that when all i.i.d. components have IFR or even constant failure rates, the system has IFR. No specific component lifetime distributions are assumed: ∞ f s (x) d x, Rs (t) = t
∞
f (x)F(x)n−k R(x)k−1 d x 1 Rs (t) t = = h s (t) f s (t) f (t)F(t)n−k R(t)k−1 ∞ F(x) n−k R(x) k−1 1 = f (x) d x. f (t) t F(t) R(t) Let y = R(x)/R(t); then dy = −[ f (x)/R(t)] d x: 1 1 1 1 − y R(t) n−k k−1 y dy. = h s (t) h(t) 0 F(t)
(7.59)
Since [1 − y R(t)]/F(t) is decreasing in t and h(t) is assumed to be IFR, we conclude that h s (t) is increasing in t based on equation (7.59). This indicates that if all
NONREPAIRABLE k-OUT-OF-n SYSTEMS
257
components have IFR, the k-out-of-n:G structure preserves this IFR property of the components. If all components have constant failure rates, the k-out-of-n:G system would have IFR as long as k = n and a constant failure rate when k = n. It is generally impossible to find more specific expressions of the performance measures of the k-out-of-n:G system. However, when the components follow the exponential distribution, some explicit results can be derived. When all components follow the exponential lifetime distribution with CDF F(t) = 1 − e −λt , the expressions of system reliability and unreliability are Rs (t) =
n n
(e−λt )i (1 − e−λt )n−i ,
(7.60)
n (e−λt )i (1 − e−λt )n−i , i
(7.61)
i
i=k
Fs (t) =
k−1 i=0
respectively. The MTTF of the system can be derived as follows. Based on equation (7.7), we have, for k ≥ 2, n Rs (t; k, n) = R(t; k − 1, n) − e−λt (k−1) (1 − e−λt )n−k+1 . (7.62) k−1 Integrating both sides of this equation results in the following recursive equation: ∞ n e−λt (k−1) (1 − e−λt )n−k+1 dt MTTF(k, n) = MTTF(k − 1, n) − k−1 0 n−k+1 n − k + 1 (−1) j n 1 = MTTF(k − 1, n) − j λ k − 1 j=0 k−1+ j = MTTF(k − 1, n) −
1 . λ(k − 1)
(7.63)
The following equation is used in the above derivations: N N (−1) j j=0
j
a+ j
=
N !(a − 1)! (N + a)!
for a ≥ 1.
(7.64)
MTTF(1, n) represents the MTTF of a parallel system, which is (1/λ) nj=1 (1/j). Using this boundary condition and applying equation (7.63) recursively, we find MTTF(k, n) =
n 1 1 . λ j=k j
(7.65)
Substituting k = n in equation (7.65) provides the MTTF of a series system, 1/(nλ), as is expected.
258
THE k-OUT-OF-n SYSTEM MODEL
Using equation (7.59), we can express the system failure rate as h s (t) =
λ 1
y 0
k−1
[(1 − ye
−λt
)/(1 − e
−λt
. )]
n−k
(7.66)
dy
No closed-form expression for h s (t) can be obtained even in the case when all components have exponential lifetime distributions. Exercises 1. Verify equation (7.57). 2. Verify equation (7.63). 7.3.2 Systems with Nonidentical Components It is generally difficult to write an expression for k-out-of-n system reliability when the components do not have identical lifetime distributions. It is possible to derive the desired expressions for simple cases. For components with exponential lifetime distributions such that component i has a constant failure rate λi (1 ≤ i ≤ n), we have the following expressions of system reliability and MTTF for a 2-out-of-3:G system: Rs (t; 2, 3) = e−(λ1 +λ2 )t + e−(λ1 +λ3 )t + e −(λ2 +λ3 )t − 2e −(λ1 +λ2 +λ3 )t , MTTF(2, 3) =
1 1 2 1 + + − . λ1 + λ2 λ1 + λ3 λ2 + λ3 λ1 + λ2 + λ3
7.3.3 Systems with Load-Sharing Components Following Exponential Lifetime Distributions Consider a k-out-of-n:G system with i.i.d. components each following the exponential lifetime distribution. When the system is put into operation at time zero, all components are working and they are equally sharing the constant load that the system is supposed to carry. In this case, the failure rate of every component is denoted by λ0 . When the system experiences the first failure, the remaining n − 1 working components must carry the same load on the system. As a result, the failure rate of each working component becomes λ1 , which is usually higher than λ0 . When i components are failed, the failure rate of each of the n − i working components is represented by λi (0 ≤ i ≤ n − k). The system is failed when more than n − k components are failed. For such a system with no repair provisions, Scheuer [218] provides an analysis of the system’s performance measures. Notation •
λi : failure rate of each surviving component when i components have failed (0 ≤ i ≤ n−k). Assume λ0 ≤ λ1 ≤ · · · ≤ λn−k due to practical considerations.
NONREPAIRABLE k-OUT-OF-n SYSTEMS • • •
259
Ti : time to the ith failure (T0 ≡ 0), i = 1, 2, . . . , n − k + 1 X i : time between the (i − 1)th failure and the ith failure, X i = Ti − Ti−1 , i = 1, 2, . . . , n − k + 1 αi : failure rate of the system when there are i failed components, αi = (n − i + 1)λi−1 , i = 1, 2, . . . , n − k + 1
Since all components are i.i.d. following the exponential distributions, the interarrival times of failures are independent random variables and X i follows the exponential distribution with parameter αi for 1 ≤ i ≤ n − k + 1. The lifetime of the system is equal to the (n − k + 1)st failure time, that is, Ts = Tn−k+1 = X 1 + X 2 + · · · + X n−k+1 . The MTTF of the system is then MTTFs =
n−k+1 i=1
n−k+1 1 1 = . αi (n − i + 1)λi−1 i=1
The distribution of Ts is the distribution of a sum of n −k +1 independent random variables, each following the exponential distribution with possibly different parameters. To find the distribution of Ts and the reliability function of the system, we need to distinguish the following three cases. Case I: α1 = α2 = · · · = αn−k+1 ≡ α This case arises when the load of the system is equally shared by surviving components. If the failure rate of each surviving component is directly proportional to the load it carries, we can write λi as d λi = c , i = 0, 1, 2, . . . , n − k, n−i where d is the load on the system and c is a constant. Using this equation, we can verify the following: αi = (n − i + 1)λi−1 = cd ≡ α,
i = 1, 2, . . . , n − k + 1.
Thus, under case I, X i ’s for i = 1, 2, . . . , n − k + 1 are i.i.d. random variables following the same exponential distribution with parameter α. As a result, Ts , a sum of these n − k + 1 i.i.d. random variables, follows the gamma distribution with scale parameter α and shape parameter n − k + 1. The pdf of this gamma distribution is f (t) =
α(αt)n−k −αt e . (n − k)!
The reliability function of the system is then Rs (t) =
n−k
(αt) j −αt e . j! j=0
(7.67)
260
THE k-OUT-OF-n SYSTEM MODEL
Case II: α1 , α2 , . . . , αn−k+1 Take Distinct Values In this case, the lifetime of the system is a sum of n − k + 1 independent random variables each with a distinct exponential distribution parameter. The pdf of the system lifetime is a convolution of the pdf’s of these n − k + 1 exponential random variables. With the technique of Laplace transform, the pdf of the system’s lifetime is found to be
f s (t) =
n−k+1 .
αi
n−k+1
i=1
i=1
e−αi t . 9n−k+1 (α − α ) j i j=1, j=i
From f s (t), we find the reliability function of the system: Rs (t) =
n−k+1
Ai e−αi t ,
(7.68)
i=1
Ai =
n−k+1 . j=1, j=i
αj , α j − αi
i = 1, 2, . . . , n − k + 1.
(7.69)
Case III: α1 , α2 , . . . , αn−k+1 , Are neither Identical nor Distinct Specifically, assume that these αi ’s take a (1 < a < n) distinct values, β1 , β2 , . . . , βa . With possibly some renumbering of these αi values, assume α1 = α2 = · · · = αr1 ≡ β1 , αr1 +1 = αr 1+2 = · · · = αr1 +r2 ≡ β2 , .. . αr1 +r2 +···+ra−1 +1 = · · · = αr1 +r2 +···+ra ≡ βa , r1 + r2 + · · · + ra = n − k + 1,
1 ≤ ri < n,
i = 1, 2, . . . , a.
Under case III, the interarrival times of failures are divided into a (a > 1) groups. Group j has r j identical interarrival times following the exponential distribution with the same parameter. The interarrival times in different groups follow exponential distributions with different parameters. If we define the lifetime of each group as the sum of the interarrival times within the group, such group lifetimes then follow the gamma distribution. The lifetime of group j, denoted by V j , for j = 1, 2, . . . , a follows the gamma distribution with scale parameter β j and shape parameter r j . In addition, these group lifetimes are independent. As a result, we can write the lifetime of the system, Ts , as a sum of the lifetimes of the groups, each following a different gamma distribution: Ts = V1 + V2 + · · · + Va . The reliability function of the system is given by
NONREPAIRABLE k-OUT-OF-n SYSTEMS
Rs (t) = B
rj a
j (−β j )
r j −
j=1 =1
( − 1)!β r j −+1
i=0
(β j t)i e−β j t , i!
261
(7.70)
where B=
a .
r
βjj,
(7.71)
j=1
j (t) =
d −1 dt −1
a .
(βi + t)−ri .
(7.72)
i=1,i= j
These equations can be derived as follows. Assume that V j has the gamma distribution with scale parameter β j and shape parameter r j (a positive integer) and its pdf can be written as f j (t) =
β j (β j t)r j −1 e−β j t . (r j − 1)!
The Laplace transform of f j (t) is L j (s) =
βj βj + s
r j
.
The pdf of Ts is a convolution of the individual pdf’s of the lifetimes of these a groups. The Laplace transform of a convolution of functions is equal to the product of the Laplace transforms of these individual functions. As a result, the Laplace transform of the pdf of Ts is Ls (s) =
r j a . βj . βj + s j=1
(7.73)
The inverse Laplace transform of equation (7.73) will give the pdf of Ts [73]: f s (t) = B
rj a
j (−β j ) t r j − e−β j t . ( − 1)!(r − )! j j=1 =1
From this pdf of Ts , we can find the Rs (t) as given in equation (7.70). Exercises 1. Derive equation (7.68). 2. Derive the system reliability function under case II.
(7.74)
262
THE k-OUT-OF-n SYSTEM MODEL
7.3.4 Systems with Load-Sharing Components Following Arbitrary Lifetime Distributions Liu [145] provides an analysis of the k-out-of-n:G system with i.i.d. components whose lifetime distributions are not necessarily exponential. Repair of failed components is not allowed. Surviving components equally share the constant load of the system. The lifetime distribution of a component under a constant load can be represented by the accelerated failure time model (AFTM) or the accelerated life model. The parametric form of the AFTM for each component is assumed to be known. The AFTM specifies that the effect of load on the lifetime of a component is multiplicative in time. The reliability function of a component under the AFTM can be expressed as (7.75) R(t, z) = R0 (tψ(z)), where z is a vector representing the loads on the component, ψ(z) is an acceleration factor, and R0 (·) is the reliability function of an arbitrary statistical distribution. For more discussions on AFTM, readers are referred to Nelson [175]. When there is only one type of load, z, commonly used forms of ψ(z) include ψ(z) = eαz ,
ψ(z) = z α . β
For example, if R0 (·) is of the Weibull distribution, that is, R0 (x) = e−(t/η) , and ψ(z) = z α , we can write the load-dependent reliability function of the component as t zα β R(t; z) = exp − . (7.76) η When R0 (·) is Weibull, the AFTM is equivalent to the proportional hazard model (PHM) [59], wherein the load acts multiplicatively on the failure rate. When R0 (·) is not Weibull, the AFTM is not equivalent to the PHM. In the following, we illustrate the reliability analysis of a k-out-of-n:G system with i.i.d. load-sharing components whose lifetimes can be modeled with AFTM as given in equation (7.75). Notation • •
R(t; z): reliability function of each component when the total load on the system is z z n− j : total load to be shared by n− j surviving components when j components are failed
When k = n, we have a series system. All components have to work for the system to work. Since components are independent, R(t; n, n) =
n .
R(t; z n ) = [R(t; z n )]n .
i=1
When k = n −1, for the system to survive beyond t, either all components survive beyond t or one component fails at time x(0 < x < t) and all other components
263
NONREPAIRABLE k-OUT-OF-n SYSTEMS
survive the remaining time duration t − x: t 4 5n−1 R(t; n − 1, n) = R(t; n, n) + n f (x; z n ) R(t − x − x; ˆ z n−1 ) d x, 0
(7.77) where xˆ = xψ(z n )/ψ(z n−1 ). When k = n − 2, we have R(t; n − 2, n) = R(t; n − 1, n) t x1 n! + f (x; z n ) f (x1 − x + x; ˆ z n−1 ) (n − 2)! 0 0 5n−2 4 d x d x1 , (7.78) × R(t − x1 + xˆ1 ; z n−2 ) where xˆ = xψ(z n )/ψ(z n−1 ) and xˆ1 = (x1 − x + x)ψ(z ˆ n−1 )/ψ(z n−2 ). When k = n − 3, R(t; n − 3, n) = R(t; n − 2, n) t x2 x1 n! + f (x; z n ) f (x 1 − x + x; ˆ z n−1 ) (n − 3)! 0 0 0 × f (x2 − x 1 + xˆ1 ; z n−2 ) 4 5n−3 × R(t − x 2 + xˆ2 ; z n−3 ) d x d x1 d x2 ,
(7.79)
ˆ where xˆ = xψ(z n )/ψ(z n−1 ), xˆ1 = (x 1 − x + x)ψ(z n−1 )/ψ(z n−2 ), and xˆ2 = (x2 − x1 + xˆ1 )ψ(z n−2 )/ψ(z n−3 ). Generally, the following equation can be used for evaluation of R(t; j, n) for 1 ≤ j < n: R(t; j, n) = R(t; j + 1, n) x2 x 1 n! t xn− j −1 xn− j −2 + ··· f (x; z n ) j! 0 0 0 0 0 ˆ z n−1 ) f (x2 − x 1 + xˆ1 ; z n−2 ) × · · · × f (x1 − x + x; × f (x n− j−2 − x n− j−3 + xˆn− j−3 ; z n−(n− j−2) )
4 × f (x n− j−1 − x n− j−2 + xˆn− j−2 ; z n−(n− j−1) ) × R(t − x n− j−1 5j +xˆn− j−1 ; z n−(n− j) ) d x d x1 d x2 · · · d xn− j−2 d x n− j−1 , (7.80) where xˆi = (xi − xi−1 + xˆi−1 )ψ(z n−i )/[ψ(z n−(i+1))] for i = 1, 2, . . . , n − j − 1 and s0 ≡ s. The procedure outlined above is enumerative in nature. More efficient methods for handling arbitrary load-dependent component lifetime distributions are needed.
264
THE k-OUT-OF-n SYSTEM MODEL
7.3.5 Systems with Standby Components As we mentioned before, the k-out-of-n system structure has built-in redundancy. Actually the system requires only k components to work for the system to work. In deriving the equations for k-out-of-n system performance evaluations so far in this chapter, we have treated the extra n −k components as active redundant components. In other words, they are in hot standby mode. In this section, we will analyze the kout-of-n system with cold and warm standby components. Cold Standby with i.i.d. Components and Perfect Switching In Chapter 4, we discussed standby systems with n components. When the components are i.i.d. following the exponential lifetime distribution with parameter λ, the lifetime of the system follows the gamma distribution with scale parameter λ and shape parameter n. In the following, we will show that the lifetime of a k-out-of-n:G system with i.i.d. cold standby components can also be described by a gamma distribution. For a k-out-of-n:G system with standby components, k components are put into operation initially and n − k components are in standby. Whenever one of the active components is failed, one of the standby components is switched into operation. No repair provisions are allowed. Sensing and switching are assumed to be perfect. The system is failed when n − k + 1 component failures have been experienced. The k active components can be viewed as a series subsystem since all of them are required to work for the k-out-of-n:G system to work. As was explained in Chapter 4, the failure rate of a series system is equal to the sum of the failure rates of the components when all components have constant failure rates. If all components in the k-out-ofn:G system are i.i.d. with a constant failure rate λ, then the series subsystem with k active components has a failure rate of kλ. Whenever one of the components in this series subsystem is failed, it is replaced by a standby component and a new series subsystem is formed. Because of the memoryless property of the exponential distribution, each series subsystem follows the exponential lifetime distribution with parameter kλ. The system is failed when the k-component series subsystem, including the last standby component, is failed. Thus, we have the following expression of system lifetime: Ts = T1 + T2 + · · · + Tn−k+1 ,
(7.81)
where Ti represents the lifetime of the ith k-component series subsystem. Even though these k-component series subsystems have components in common, their lifetimes T1 , T2 , . . . , Tn−k+1 are i.i.d. random variables because of the memoryless property of the exponential distribution. The sum of i.i.d. random variables with the exponential distribution follows the gamma distribution. Thus, Ts follows the gamma distribution with scale parameter kλ and shape parameter n − k + 1: f s (t) = kλe −kλt
(kλt)n−k , (n − k)!
t ≥ 0,
(7.82)
NONREPAIRABLE k-OUT-OF-n SYSTEMS
Rs (t) = e−kλt MTTFs =
n−k
(kλt) j , j! j=0
n−k+1 . kλ
265
(7.83)
(7.84)
The derivations outlined above are based on the fact that the n−k+1 k-component series subsystems have i.i.d. exponential lifetime distributions. This is satisfied only when each component follows the exponential distribution. When the lifetime distribution of each component is not exponential, we cannot use the results shown above. Warm Standby System with i.i.d. Components and Perfect Switching As mentioned in Chapter 4, warm standby systems are more complicated to analyze because both active and dormant components may fail. Assuming that all components are i.i.d. and the lifetime of each component follows the exponential distribution with parameter λa in the active state and parameter λd in the dormant state, She and Pecht [227] provide a closed-form expression for system reliability function. Notation • • • •
λa : constant failure rate of an active component λd : constant failure rate of a dormant or standby component f a (·), Ra (·): pdf and reliability function of an active component, respectively f d (·), Rd (·): pdf and reliability function of a dormant component, respectively
The event that the system survives beyond time t may be expressed as the union of the following mutually exclusive events: 1. The k active components all survive beyond time t. 2. One of the k active components fails in interval (x, x + d x) for 0 < x < t, all n − k dormant components survive beyond time x, and the (n − 1)component subsystem with k active and n−k−1 dormant components survives the remaining time period t − x. 3. One of the n−k dormant components fails in interval (x, x+d x) for 0 < x < t; all k active components survive beyond time x; and the (n−1)-component subsystem with k active and n −k −1 dormant components survives the remaining time period t − x. Based on this decomposition, we can express R(t; k, n) as t k f a (x)Rd (x)n−k Rs (t − x; k, n − 1) d x R(t; k, n) = e −kλa t + 1 0 t n−k f d (x)Ra (x)k R(t − x; k, n − 1) d x. (7.85) + 1 0
266
THE k-OUT-OF-n SYSTEM MODEL
This equation can be applied recursively until we reach R(z; k, k) = e−kλa z , which is the reliability function of a k-component series system. The closed-form expression for the system reliability is R(t; k, n) =
1
n−k
(−1)i
n−k i
(n − k)!λn−k i=0 d n−k . (kλa + jλd ) e−(kλa +iλd )t . ×
(7.86)
j=0, j=i
When λd = λa = λ, equation (7.86) reduces to the system reliability function of a k-out-of-n:G system with active redundancy given in equation (7.60). When λd = 0, we get the reliability function of a k-out-of-n:G system with cold standby components, as given in equation (7.83). Exercises 1. Derive the expression of the MTTF of the warm standby system. 2. Verify that equation (7.86) reduces to equation (7.60) when λd = λa = λ. 3. Verify that equation (7.85) reduces to equation (7.83) when λd = 0.
7.4 REPAIRABLE k-OUT-OF-n SYSTEMS We have discussed the k-out-of-n:G systems with active redundant components, with standby components, or with load-sharing components. In this section, we will develop a general model for analysis of such systems when they are repairable. After such a model is developed, we will analyze various system performance measures under different assumptions. When a k-out-of-n:G system is put into operation, all n components are in good condition. As the system is used, components will fail one after another. The system is failed when the number of working components goes down below k or the number of failed components has reached n − k + 1. If resources are allocated to repair failed components, we should be able to keep the number of failed components below n − k + 1 for a much longer time. This way, we expect to prolong the system life cycle. Whenever the number of failed components at any instant of time is higher than n − k, the system is failed and its life cycle is complete. Many situations exist in which more than one failed component can be repaired simultaneously (in parallel). This can be achieved when there exist more than one repairman or repair facility. As the number of repair facilities is increased, we expect a better chance of extending the system operating time until its first failure. In the following section we describe a general repairable k-out-of-n:G system model with multiple repair facilities. The components may be in active redundancy, standby, or load sharing. Such a model will allow us to evaluate such performance
REPAIRABLE k-OUT-OF-n SYSTEMS
267
measures of the system as mean time to failure, steady-state availability, and mean time between failures. 7.4.1 General Repairable System Model Here are the model descriptions and assumptions: 1. The system is a k-out-of-n:G structure with possibly cold standby and/or loadsharing components. 2. The failure of each component is self-revealing. 3. All active components are i.i.d. following the exponential lifetime distributions. However, the parameter of the lifetime distribution of each component may change depending on the load applied on the component. 4. There are r identical repair facilities available (1 ≤ r ≤ n − k + 1). Only one repair facility may be assigned to the repair of a failed component. The time needed by any repair facility to repair any failed component is i.i.d. with the exponential distribution. 5. Whenever a component fails, repair immediately commences if a repair facility is available; if not, the failed component must wait for the first available repair facility. Components are repaired on a first-come, first-served basis. 6. The system is considered failed as soon as the number of components in the failed state has reached n − k + 1. 7. While the system is down, no further units can fail. 8. The state of the system is defined to be the number of failed components in the system that are either waiting for or are receiving repair. 9. The system state is decreased by 1 whenever a failed component becomes operational and increased by 1 whenever a working component becomes failed. 10. The probability that two or more components are restored to the working condition or become failed in a small time interval is negligible. Notation • • • • • • • •
i: number of failed components in the system, i = 0, 1, . . . , n − k + 1 t: time λi : failure rate of the system when there are i failed components, 0 ≤ i ≤ n − k µi : repair rate of the system when there are i failed components, 1 ≤ i ≤ n−k+1 Pi (t): probability that there are i failed components in the system at time t, 0 ≤ i ≤n−k+1 Pi : steady-state probability, Pi = limt→∞ Pi (t), 0 ≤ i ≤ n − k + 1 Pi (t): first derivative of Pi (t), 0 ≤ i ≤ n − k + 1 Li (s): Laplace transform of Pi (t), 0 ≤ i ≤ n − k + 1
268
THE k-OUT-OF-n SYSTEM MODEL
λ0
0 µ1
• •
2 µ2
FIGURE 7.3
λ2
λ1
1
...
µ3
λn-k-1
λn-k
n-k µn-k
n-k+1 µn-k+1
General transition diagram for repairable k-out-of-n:G system.
As (t): point availability of the system at time t As : steady-state availability of the system, As = limt→∞ As (t)
Based on the model descriptions, the system state transition diagram is given in Figure 7.3. The numbers in the circles in Figure 7.3 indicate the system states. The system state n − k + 1 indicates system failure. To evaluate Pi (t + t), we note that at time t + t the system can be in state i only if one of the following disjoint events occurs: 1. At time t the system is in state i and during (t, t + t) no change in system state occurs. 2. At time t the system is in state i − 1 and a transition to state i occurs during (t, t + t). 3. At time t the system is in state i + 1 and a transition to state i occurs during (t, t + t). 4. During (t, t + t), the system state changes by two or more. Since t is very small, the probability of the last event is o(t), as assumed. As a result, we have Pi (t + t) = Pi (t)(1 − λi t)(1 − µi t) + Pi−1 (t)λi−1 t × (1 − µi−1 t) + Pi+1 (t)µi+1 t (1 − λi+1 t) + o(t) = Pi (t) − Pi (t)(λi + µi ) t + Pi−1 (t)λi−1 t + Pi+1 (t)µi+1 t + o(t).
(7.87)
Rearranging the terms in equation (7.87) and letting t → 0, we have Pi (t) = −(λi + µi )Pi (t) + λi−1 Pi−1 (t) + µi+1 Pi+1 (t) for i = 0, 1, . . . , n − k + 1,
(7.88)
where Pi (t) ≡ 0 for i < 0 or i > n − k + 1. We shall assume the initial conditions Pi (0) = 0 if i = 0 and P0 (0) = 1, that is, all components are assumed to be initially in the working state. Considering these initial conditions and assumptions, we can rewrite equation (7.88) as P0 (t) = −λ0 P0 (t) + µ1 P1 (t),
(7.89)
REPAIRABLE k-OUT-OF-n SYSTEMS
269
Pi (t) = −(λi + µi )Pi (t) + λi−1 Pi−1 (t) + µi+1 Pi+1 (t) for 1 ≤ i ≤ n − k, Pn−k+1 (t)
(7.90)
= −µn−k+1 Pn−k+1 (t) + λn−k Pn−k (t).
(7.91)
One of these equations can be written as a linear combination of the other n − k + 1 equations because the system must be in one of the n − k + 2 states at any instant of time, that is, P0 (t) + P1 (t) + · · · + Pn−k+1 (t) = 1
for any t ≥ 0.
(7.92)
Thus, equation (7.91) should be replaced by equation (7.92). The set of differential equations to be solved are equations (7.89), (7.90), and (7.92). Solving this set of differential equations results in the probability distribution of the system in various states as a function of time. Once this distribution is found, we can evaluate system performance measures such as mean time to failure, mean time between failures, and steady-state availability of the system. The Laplace transform is an effective method for solving systems of differential equations. Taking Laplace transforms of equations (7.89), (7.90), and (7.92) yields the following linear equations in terms of Li (s) for i = 0, 1, . . . , n − k + 1: (s + λ0 )L0 (s) − µ1 L1 (s) = 1,
(7.93)
(s + λi + µi )Li (s) − λi−1 Li−1 (s) − µi+1 Li+1 (s) = 0 for 1 ≤ i ≤ n − k, s(L0 (s) + L1 (s) + · · · + Ln−k+1 (s)) = 1.
(7.94) (7.95)
Using matrix notation, we can rewrite these equations as DX = B, where
s + λ0 −λ0 0 D= .. . 0 s
1 0 . B= , .. 0 1 (n−k+2)×1
L0 (s) L1 (s) .. X= , . Ln−k (s) Ln−k+1 (s) (n−k+2)×1 −µ1 s + λ1 + µ1 −λ1 .. . 0 s
0 −µ2 s + λ2 + µ2 .. . 0 s
0 0 −µ3 .. . 0 s
0 0 0 .. . 0 s
(7.96)
··· ··· ··· .. . ··· ···
where D is an (n − k + 2) × (n − k + 2) matrix.
0 0 0 .. .
−λn−k−1 s
0 0 0 .. .
s + λn−k + µn−k s
0 0 0 .. .
, −µn−k+1 s
270
THE k-OUT-OF-n SYSTEM MODEL
7.4.2 Systems with Active Redundant Components The general repair model is used for evaluation of system performance measures. Reliability Function and Mean Time to Failure Even when the system under consideration is a repairable system, we are still interested in finding the MTTF of the system. This is different from the nonrepairable system case that was analyzed earlier. In this case, as components fail, they also get repaired. If repairs are timely enough, the system may experience more than n − k cumulative component failures without experiencing a system failure. Since we are interested in finding the MTTF of the system, we need to assume that state n − k + 1 is an absorbing state. As soon as the number of components in the failed state at any instant of time reaches n − k + 1, the system is considered failed. As a result, we have to assume µn−k+1 = 0. When the system is in state i(0 ≤ i ≤ n − k), there are i failed components and n − i active working components in the system, and the failure rate of the system is λi = (n − i)λ. If the number of failed components is less than or equal to the total number of repair facilities, all failed components are being repaired, and thus the repair rate of the system is µi = iµ(1 ≤ i ≤ r ). However, if i > r , µi will be a constant equal to r µ as all repair facilities are being used and some failed components are waiting for repair. The following summarizes these conditions: λi = (n − i)λ, 0 ≤ i ≤ n − k, for 0 ≤ i ≤ r, iµ µi = r µ for r < i ≤ n − k, 0 for i = n − k + 1. In the system of differential equations, Pn−k+1 (t) is the probability that the system is in the failed state at time t. Thus, the reliability function of the system is Rs (t) = 1 − Pn−k+1 (t). We can use the Laplace transform to find Pn−k+1 (t). Because µn−k+1 = 0, equation (7.91) can be written as Pn−k+1 (t) = λn−k Pn−k (t) = kλPn−k (t), t Pn−k (x) d x, Pn−k+1 (t) = kλ
(7.97) (7.98)
0
assuming that Pn−k+1 (0) = 0. Note that Pn−k+1 (t) and Pn−k+1 (t) actually represent the CDF and the pdf, respectively, of the system lifetime. Also, because µn−k+1 = 0, Pn−k+1 (t) disappears from equation (7.90). As a result, equations (7.89) and (7.90) include n − k + 1 equations and n − k + 1 variables, P0 (t), P1 (t), . . . , Pn−k (t). After the Laplace transform, the system of linear equations (7.96) includes n − k + 1 equations and n − k + 1 variables. The last entries in vectors X and B and the last row and the last column of matrix D are removed.
REPAIRABLE k-OUT-OF-n SYSTEMS
271
The method of determinants may be used to solve equation (7.96) for Ln−k (s), Ln−k (s) =
|D | , |D|
(7.99)
where D is the matrix obtained from D by replacing the (n − k + 1)st column (the last column) of D by vector B. The determinant of matrix D is |D | =
n−k−1 .
λi =
i=0
n!λn−k . k!
(7.100)
Since |D| is a polynomial in s with degree n − k + 1 and leading coefficient 1, we 9n−k+1 can write |D| = i=1 (s − si ) where each si is a (distinct) root of the polynomial. Therefore, 1 = |D|
n−k+1 .
−1
(s − si )
=
i=1
n−k+1
n−k+1 .
−1 1 , s − sj
(s j − si )
i=1,i= j
j=1
−1 n−k+1 . n!λn−k n−k+1 1 |D | = Ln−k (s) = (s j − si ) . |D| k! s − sj j=1 i=1,i= j
(7.101)
(7.102)
An inverse Laplace transform of equation (7.102) yields −1 n−k+1 . n!λn−k n−k+1 (s j − si ) es j t , Pn−k (t) = k! j=1 i=1,i= j
(7.103)
−1 t n−k+1 . n!λn−k+1 n−k+1 Rs (t) = 1 − (s j − si ) es j x d x (k − 1)! j=1 i=1,i= j 0
=1−
=1−
n!λn−k+1 n−k+1 (k − 1)! n−k+1
n−k+1 .
−1 sj i=1,i= j
es j t
j=1
−1 (s j − si )
C j (es j t − 1),
(7.104)
j=1
where Cj =
n−k+1 .
i=1,i= j
si
n−k+1 .
i=1,i= j
−1 (s j − si )
.
272
THE k-OUT-OF-n SYSTEM MODEL
9n−k+1 Derivation of equation (7.104) uses the fact that [n!/(k − 1)!]λn−k+1 = i=1 si , which is the constant term in the polynomial representing the determinant of the matrix D. It can be shown that si < 0 for all i; thus t 2 Rs (t) → 0 as t → +∞. Now that an expression of the system reliability function is known, we can derive the MTTF of the system based on its definition: MTTFs = Since Li (0) =
∞ 0
∞
Rs (t) dt =
0
n−k ∞ i=0 0
Pi (t) dt.
(7.105)
Pi (t) dt, MTTFs can also be written as MTTFs =
n−k
Li (0).
(7.106)
i=0
If we let Di denote the matrix obtained from D by replacing the (i + 1)th column by the vector B, then |Di |s=0 . |D|s=0
(7.107)
n!λn−k+1 , (k − 1)!
(7.108)
(k − 1)! n−k |Di |s=0 . n−k+1 n!λ i=0
(7.109)
Li (0) = Since |D|s=0 = we have MTTFs =
Provided that the s j ’s are known, another way for evaluation of MTTFs is to use (x) is actually the pdf of the system lifetime. With equation the fact that Pn−k+1 (7.97), an equivalent form for Rs (t) is ∞ ∞ Pn−k+1 (x) d x = kλPn−k (x) d x. (7.110) Rs (t) = t
t
This implies that ∞ ∞
MTTFs =
0
t
kλPn−k (x) d x dt.
Exercises 1. Derive Rs (t) and MTTFs when k = 2, n = 3, and r = 1. 2. Derive Rs (t) and MTTFs when k = 3, n = 5, and r = 2.
(7.111)
REPAIRABLE k-OUT-OF-n SYSTEMS
273
3. Verify equation (7.101). 4. Verify equation (7.104). Steady-State Availability As the k-out-of-n:G system is used, the number of failed components in the system changes. When it reaches n−k +1, the system is failed and all repair facilities are utilized to repair failed components. As soon as the number of failed components goes down below n −k +1, the system starts working again. Thus, the system state changes between up and down over time. The probability that the system is in the working state at time t is called the point availability of the system. The system point availability is given by As (t) = 1 − Pn−k+1 (t).
(7.112)
To evaluate the availability of the system, we have to treat state n −k +1 as a transient state too, that is, µn−k+1 = r µ. The following summarizes the system parameters: 0 ≤ i ≤ n − k, λi = (n − i)λ, iµ if 0 ≤ i ≤ r , µi = rµ if r < i ≤ n − k + 1. To evaluate the point availability As (t) of the system, the general repair system model can be used. With the Laplace transform technique, it suffices to find Ln−k+1 (s). Using the method of determinants, we observe that Ln−k+1 (s) =
|D | , |D|
(7.113)
where D is the matrix obtained from D upon replacing the (n − k + 2)nd column (the last column) by vector B. Since |D | =
n!λn−k+1 , (k − 1)!
(7.114)
we have Ln−k+1 (s) =
n!λn−k+1 . (k − 1)!|D|
(7.115)
The steady-state availability of the system is given by As = lim As (t) = 1 − lim Pn−k+1 (t) t→∞
t→∞
sn!λn−k+1 . s→0 (k − 1)!|D|
= 1 − lim sLn−k+1 (s) = 1 − lim s→0
(7.116)
The procedure outlined above is necessary if one is interested in the availability of the system as a function of time t. However, the mathematical derivation is very
274
THE k-OUT-OF-n SYSTEM MODEL
tedious. If one is only interested in the steady-state availability of the system, there is no need to derive the point availability first. The differential equations of the system given in equations (7.89), (7.90), and (7.92) can be used directly. First, we try to find the steady-state (or time-independent) distribution of the system in different states. This solution is provided by defining Pi = lim Pi (t)
for i = 0, 1, . . . , n − k + 1,
t→∞
(7.117)
provided that the limits exist. Taking the limit of both sides of equations (7.89), (7.90), and (7.92) as t → ∞ and noting that limt→∞ Pi (t) = 0, we obtain −λ0 P0 + µ1 P1 = 0, −(λi + µi )Pi + λi−1 Pi−1 + µi+1 Pi+1 = 0
(7.118) for i = 1, . . . , n − k + 1, (7.119)
P0 + P1 + · · · + Pn−k+1 = 1.
(7.120)
A simple induction argument shows that λ0 · · · λi−1 P0 µ1 · · · µi
Pi =
for i = 1, . . . , n − k + 1.
(7.121)
Applying equation (7.120), we have
P0 = 1 +
n−k+1 i=1
λ0 · · · λi−1 Pi = µ1 · · · µi
λ0 · · · λi−1 µ1 · · · µi
1+
−1
n−k+1 i=1
,
(7.122)
λ0 · · · λi−1 µ1 · · · µi
−1 for i = 1, . . . , n − k + 1, (7.123)
As = 1 − Pn−k+1
λ0 · · · λn−k =1− µ1 · · · µn−k+1
1+
n−k+1 i=1
λ0 · · · λn−k µ1 · · · µn−k+1
−1 . (7.124)
Mean Time between Failures The mean time between failures (MTBF) is defined to be the expected length of operating time of the system between successive failures. It does not include the time that the system spends in the failed state. The mean time to repair (MTTR) indicates the average length of time that the system stays in the failed state. It is often necessary to calculate MTBF quickly in order to make timely design decisions. Although a general formula is known, it is not easily remembered nor
275
REPAIRABLE k-OUT-OF-n SYSTEMS
derived. Angus [13] presents a simple way of obtaining an expression of MTBF. With this method, the MTBF expression is easily reproduced by remembering a few simple concepts. In the following, we assume that there are n − k + 1 repair facilities. This means that no failed components need to wait for repair. The system transition parameters are summarized as follows: λi = (n − i)λ,
0 ≤ i ≤ n − k,
µi = iµ,
1 ≤ i ≤ n − k + 1.
The MTBF of the system is the average (successful operating) time between visits to state n − k + 1, the system down state. It is the average time for the system to go from a working state (with possibly some failed components) to the failed state. This should be distinguished from the mean time to the first failure (MTTF) of the system, which is defined as the average time for the system to go from the working state with zero failed components to the failure state. In the following, we illustrate the derivation of MTBF of a k-out-of-n:G system. Let N (t) indicate the number of failed components in the system at time t. Because of the Markov nature of the process {N (t); t ≥ 0}, once the process arrives at the state n − k + 1, the sequence of times between successive visits to state n − k + 1 forms an i.i.d. sequence of random variables. The mean of each of these random variables is MTBF + MTBR. The portion of this average that represents successful operation time is MTBF. It follows from the renewal theory (in particular, the analysis of alternating renewal processes) that As = lim Pr(system is working at time t) = lim Pr(N (t) ≤ n − k) t→∞
=
t→∞
MTBFs . MTBFs + MTBRs
(7.125)
Solving for MTBFs gives MTBFs =
As × MTBRs . 1 − As
(7.126)
Each component has a MTBF of 1/λ and a MTBR of 1/µ. The steady-state availability of each component A is A=
µ . λ+µ
When the system is down, there are n −k +1 units undergoing repair, and because of the Markov assumptions, MTBRs =
1 . µ(n − k + 1)
(7.127)
276
THE k-OUT-OF-n SYSTEM MODEL
Because there is always a repair facility available for a failed component and the components are s-independent, the limiting probability (t → ∞) of finding exactly j (k − 1 ≤ j ≤ n) components working at time t is given by the truncated binomial distribution (truncated because the Markov process is not allowed to visit states n − k + 2, n − k + 3, . . . , n): Pj = = As = MTBFs = =
Pr(exactly j components are available) Pr(At least k − 1 components are available) n j n− j j A (1 − A) n for j = k − 1, k, k + 1, . . . , n, n i n−i i=k−1 i A (1 − A) n n i A (1 − A)n−i Pk + Pk+1 + · · · + Pn = n i=k i n i , n−i i=k−1 i A (1 − A) n n i n−i As MTBRs i=k i A (1 − A) n = 1 − As µ(n − k + 1) k−1 Ak−1 (1 − A)n−k+1 n−k n j j=0 j (λ/µ) . n kλ k (λ/µ)n−k
(7.128)
(7.129)
(7.130)
(7.131)
This formula is easily recalled by remembering the following basic concepts: 1. As = MTBFs /(MTBFs + MTBRs ). 2. MTBR = 1/[µ(n −k +1)] since n −k +1 components are under simultaneous repair when the system is down. 3. The number of components working as t → ∞ follows the truncated binomial distribution. 4. MTBFs = As × MTBRs /(1 − As ). Exercise 1. Verify equation (7.131).
7.4.3 Systems with Load-Sharing Components When the working components equally share the load of the system, the failure rate of each component depends on the load that it has to carry. The load that is allocated to each component depends on the number of failed components that exist in the system. Shao and Lamberson [226] provide an analysis of a repairable k-out-of-n:G system with load-sharing components considering imperfect switching. The sensing and switching mechanism is responsible for detection of component failures and the redistribution of the load of the system equally among surviving components. System performance measures such as reliability and availability are analyzed. Several
REPAIRABLE k-OUT-OF-n SYSTEMS
277
errors exist in this paper that are corrected by Akhtar [6]. Newton [176] provides an alternative argument for evaluation of the MTTF and MTBF of such systems. In this section, we consider a repairable k-out-of-n:G system with load-sharing components. For simplicity of analysis, the sensing and switching mechanism is assumed to be perfect. Service is needed whenever the number of failed components in the system changes (when another component is failed or when a failed component is repaired) to redistribute the load of the system. When the sensing and switching mechanism is imperfect, we say that the system has imperfect fault coverage. This will be discussed in a later section in this chapter. Other assumptions are as given in the general model for a repairable k-out-of-n:G system described in Section 7.4.1. We provide expressions of system reliability, availability, MTTF, and MTBF of such systems. Assumptions 1. The failure rates of all working components are the same. They are dependent on the number of working components in the system. 2. A repaired component is as good as new and immediately shares the load of the system. Notation •
λi : failure rate of each component when there are i failed components, i = 0, 1, . . . , n − k. Generally, we have λ0 ≤ λ1 ≤ · · · ≤ λn−k .
The only difference between the load-sharing system model in this section and the repairable system model with active redundant components discussed in Section 7.4.2 is that the component failure rate is not a constant any more. The failure rate of the system with i failed components, αi , can be written as αi = (n − i)λi ,
i = 0, 1, . . . , n − k.
(7.132)
The same techniques as used in Section 7.4.2 can be used to derive the required system performance measures. Exercise 1. Find expressions of Rs (t), MTTFs , MTBFs , and As . 7.4.4 Systems with both Active Redundant and Cold Standby Components Morrison and Munshi [170] provide an analysis of a k-out-of-n:G system with additional standby components. The system consists of n active components where at least k of them have to work for the system to work. In addition, there are m spare components available. All components are i.i.d. There are r repair facilities. Repairs are perfect. The lifetime of an active component is exponentially distributed. Repair
278
THE k-OUT-OF-n SYSTEM MODEL
time also follows the exponential distribution. Detection of active component failure and switching of a standby component to the active state are perfect and instant. We will use i to represent the number of failed components in the system and to indicate the state of the system. Both cold and hot standby are analyzed by Morrison and Munshi. We will not discuss the hot standby case here as it is exactly the same as if all of the components are active, which has been discussed in previous sections. Notation • • •
n: number of active components used m: number of standby components or spares γ : λ/µ
Since the spare components are in cold standby, we have the following failure rates and repair rates at different system states: if 0 ≤ i ≤ m, nλ (7.133) λi = (n + m − i)λ if m < i ≤ n + m, 0 if i > n + m, iµ if i ≤ r, µi = (7.134) rµ if i > r. With these system state transition parameters and following the same procedure as outlined in Section 7.4.1, we can develop a set of differential equations for state probabilities. Solving these differential equations, we can obtain expressions of system state probabilities. Letting time go to infinity, we find the following steady-state probabilities: i n i 0 ≤ i ≤ min{r, m}, γ P0 , i! ni i r + 1 ≤ i ≤ m, i−r γ P0 , r r! (7.135) Pi = n! m i n P γ , m + 1 ≤ i ≤ r, 0 (n + m − i)!i! n! 1 nm γ i max{r, m} + 1 ≤ i ≤ n + m. P0 , (n + m − i)! r i−r r ! If all spares are in hot standby, the following is provided for verification purposes: (n + m)! i 0 ≤ i ≤ r, (n + m − i)!i! γ P0 , Pi = (n + m)! i > r. γ i P0 , (n + m − i)!r i−r r !
WEIGHTED k-OUT-OF-n:G SYSTEMS
279
Exercise 1. Verify equation (7.135). 7.5 WEIGHTED k-OUT-OF-n:G SYSTEMS Wu and Chen [247] propose a variation of the k-out-of-n:G system, called the weighted k-out-of-n:G model. In a weighted k-out-of-n:G system, component i carries a weightof wi , wi > 0 for i = 1, 2, . . . , n. The total weight of all components n wi . The system works if and only if the total weight of working is w, w = i=1 components is at least k, a prespecified value. Since k is a weight, it may be larger than n because they have different measuring units. Such a weighted k-out-of-n:G system is equivalent to a weighted (w − k + 1)-out-of-n:F system wherein the system fails if and only if the total weight of failed components is at least w−k +1. With this definition, the k-out-of-n:G system is a special case of the weighted k-out-of-n:G system wherein each component has a weight of 1. A recursive equation is provided by Wu and Chen [247]. In the following, R(i, j) represents the probability that a system with j components can output a total weight of at least i. Then, R(k, n) is the reliability of the weighted k-out-of-n:G system. The following recursive equation can be used for reliability evaluation of such systems: R(i, j) = p j R(i − w j , j − 1) + q j R(i, j − 1),
(7.136)
which requires the following boundary conditions: R(i, j) = 1
for i ≤ 0, j ≥ 0,
(7.137)
R(i, 0) = 0
for i > 0.
(7.138)
It should be noted that wi (1 ≤ i ≤ n) may not be integer. When wi = 1 for all 1 ≤ i ≤ n, we have the usual k-out-of-n:G system. The computational complexity of equation (7.136) is O(k(n − k + 1)) when wi = 1 for all i. However, when wi > 1 for all 1 ≤ i ≤ n, the number of terms to be computed may be much less than k(n − k + 1), as illustrated in the following example. Example 7.4 Consider a weighted 5-out-of-3:G system. It has three components with weights 2, 6, and 4. The system works if and only if the total weight of working components is at least 5. This is a very simple example. We can easily solve the problem without using equation (7.136). The following are the minimal paths of the system: Component 2 works with a total output of 6. Components 1 and 3 work with a total output of 6. Thus, we can find the system reliability as Rs = Pr(x2 ∪ x 1 x3 ) = Pr(x 2 ) + Pr(x 2 x1 x3 ) = p2 + q2 p1 p3 .
280
THE k-OUT-OF-n SYSTEM MODEL
If we apply equation (7.136), here are the terms to be calculated: R(1, 1) = p1 R(−1, 0) + q1 R(1, 0) = p1 , R(5, 1) = p1 R(3, 0) + q1 R(5, 0) = 0, R(1, 2) = p2 R(−5, 1) + q2 R(1, 1) = p2 + p1 q2 , R(5, 2) = p2 R(−1, 1) + q2 R(5, 1) = p2 , R(5, 3) = p3 R(1, 2) + q3 R(5, 2) = p3 ( p2 + p1 q2 ) + q3 p2 = p2 + q2 p1 p3 . The total number of terms calculated is only five, much less than (n +1)(k +1) = 20.
8 DESIGN OF k-OUT-OF-n SYSTEMS
With the results discussed in the previous chapter on system reliability evaluation of k-out-of-n systems, in this chapter we address the issue of optimal system design of these systems. We first examine the measures of component reliability importance in a k-out-of-n system and then consider the selection of the optimal k and/or n values. Additional factors to be considered in optimal system design include imperfect fault coverage, common-cause failures, and dual failure modes. Other issues in optimal system design are briefly mentioned at the end of this chapter.
8.1 PROPERTIES OF k-OUT-OF-n SYSTEMS 8.1.1 Component Reliability Importance The B-importance as defined in equation (6.2) is often used to measure the reliability importance of a component. For a k-out-of-n:G system with independent components, a system reliability expression can be obtained by letting j = n and i = k in equation (7.26): R(k, n) = pn R(k − 1, n − 1) + qn R(k, n − 1).
(8.1)
To find the reliability importance of component n, simply take the first derivative of equation (8.1) with respect to pn : In =
∂ R(k, n) = R(k − 1, n − 1) − R(k, n − 1) ∂pn
= Pr(exactly k − 1 out of the other n − 1 components work).
(8.2) 281
282
DESIGN OF k-OUT-OF-n SYSTEMS
In words, equation (8.2) indicates that the reliability importance of component n is the probability that the other n −1 components contain exactly k −1 working components. Because of the structure of a k-out-of-n:G system, the reliability importance of any single component is the probability that the other n − 1 components contain exactly k − 1 working components. Applying this result, we find immediately an explicit expression for the reliability importance of any component when all components are i.i.d.: n − 1 k−1 n−k Ii = p q for i = 1, 2, . . . , n. (8.3) k−1 To understand better the meaning of the component reliability importance expression, we may think of a k-out-of-n:G system with a constant k and a variable n. The importance of adding one more component, the (n + 1)th component, is the probability that there are exactly k − 1 components working in the original n-component system. If in the n-component system exactly k − 1 components have relatively high reliabilities, then the (n + 1)th component in consideration is relatively more important than otherwise. If there are already k or more very good components in the n-component system, the addition of the (n + 1)th component is not that important. The importance of a component is the probability that adding this component will form the first working minimal path for the system. To calculate the reliability importance of component n, we simply apply equation (8.2). If the complexity for system reliability evaluation is O(k(n − k + 1)), the computational complexity for evaluation of the component reliability importance is also O(k(n − k + 1)). To find the reliability importance of component i for i = n, the easiest way is to label this component as component n and then use equation (8.2). This can be easily implemented as the only step is to swap the reliabilities of component i and component n and then calculate the reliability importance of component n. Exercises 1. What is the structural importance of each component in a k-out-of-n:G system with (a) identical components and (b) nonidentical components? 2. What are the expressions of structural and Birnbaum importance for the k-outof-n:F systems? 8.1.2 Effects of Redundancy in k-out-of-n Systems In this section we examine the shape of a reliability of the k-out-of-n system with i.i.d. components as a function of p. As will be shown, the shape of the system reliability as a function of p exhibits the “S curve.” Notation •
R(k, n, p): reliability of a k-out-of-n:G system with i.i.d. component reliability p
PROPERTIES OF k-OUT-OF-n SYSTEMS
R(1,n,p) 1
R(n,n,p) 1
0
1 p
(a)
FIGURE 8.1
283
R(k,n,p) 1
0
1 p
(b)
0
b(k,n)
1 p
(c)
The k-out-of-n:G system reliability as a function of k and p.
Two special cases of the k-out-of-n system are the parallel system (1-out-of-n:G or n-out-of-n:F) and the series system (n-out-of-n:G or 1-out-of-n:F). In a parallel system with i.i.d. components, the system reliability is always larger than the component reliability if n > 1 and 0 < p < 1 (see Figure 8.1a). In a series system with n > 1 and 0 < p < 1, the system reliability is always smaller than the component reliability (see Figure 8.1b). For a fixed k such that 1 < k < n, the k-out-of-n system reliability is less than p when p is small and greater than p when p is large. There exists a break-even point, b(k, n), such that R(k, n) = p at p = b(k, n) (see Figure 8.1c). We say that the system reliability as a function of p follows the S curve [115]. This shows that a general k-out-of-n system behaves within the two extremes of the series and the parallel systems. A series system does not have any redundancy in it while a parallel system is a 100% redundant system. A general k-out-of-n system with 1 < k < n has a certain degree of redundancy built in it. Birnbaum et al. [31] and Rushdi and Al-Hindi [210] provide tables listing the break-even points at which the k-out-of-n:G systems turn from being more like series systems (namely, system reliability is less than component reliability) to being more like parallel systems (namely, system reliability is higher than component reliability) for different values of k and n. Table 8.1 lists such break-even points for k-out-of-n:G systems with 1 ≤ k ≤ n ≤ 15. For example, consider the row with n = 7 in Table 8.1. For a 2-out-of-7:G system with i.i.d. components, the system is more like a series system if p < 0.058 and more like a parallel system when p > 0.058. For a 6-out-of-7:G system with i.i.d. components, the system is more like a series system when p < 0.942 and more like a parallel system when p > 0.942. Because of the S-shaped curve, a k-out-of-n system may deteriorate from being more like a parallel system to being more like a series system as the reliability of each i.i.d. component decreases in time. In other words, system reliability is higher than component reliability when the system is new. As the system and the components are used, system reliability decreases faster than component reliability. Eventually, system reliability becomes lower than component reliability. In the construction of a table listing the break-even points for k-out-of-n systems, Rushdi and Al-Hindi [210] provide the following theorem.
284
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
n
k
N/A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
1
1.000 0.500 0.232 0.131 0.084 0.058 0.042 0.032 0.025 0.021 0.017 0.014 0.012 0.010
2
1.000 0.768 0.500 0.347 0.256 0.197 0.158 0.129 0.108 0.093 0.080 0.070 0.062
3
1.000 0.869 0.653 0.500 0.396 0.322 0.268 0.228 0.197 0.172 0.152 0.136
4
1.000 0.916 0.744 0.604 0.500 0.421 0.361 0.314 0.276 0.246 0.220
5
1.000 0.942 0.803 0.678 0.579 0.500 0.433 0.387 0.345 0.311
6
1.000 0.958 0.842 0.732 0.639 0.563 0.500 0.448 0.405
7
1.000 0.968 0.871 0.772 0.686 0.613 0.552 0.500
8
9
1.000 0.975 0.892 0.803 0.724 0.655 0.595
TABLE 8.1 The Break-Even Points on the S-curve for k-out-of-n:G Systems
1.000 0.979 0.907 0.828 0.754 0.698
10
1.000 0.983 0.920 0.848 0.780
11
1.000 0.986 0.930 0.864
12
1.000 0.988 0.938
13
1.000 0.990
14
1.000
15
OPTIMAL DESIGN OF k-OUT-OF-n SYSTEMS
285
Theorem 8.1 (Rushdi and Al-Hindi [210]) b(k, n) = 1 − b(n − k + 1, n).
(8.4)
Proof The (n − k + 1)-out-of-n:G system is equivalent to a k-out-of-n:F system and a k-out-of-n:F system is the dual of a k-out-of-n:G system. According to the discussions in Section 7.2.2, we can write the following for the k-out-of-n F and G systems: R F (k, n, 1 − p) = 1 − RG (k, n, p),
(8.5)
R G (n − k + 1, n, 1 − p) = R F (k, n, 1 − p),
(8.6)
where the subscripts F and G indicate the F and the G systems, respectively. From these two equations, we have R G (n − k + 1, n, 1 − p) = 1 − RG (k, n, p).
(8.7)
If b(k, n) is the solution of p to the equation RG (k, n, p) − p = 0, then substituting b(k, n) into the position of p and RG (k, n, p) in equation (8.7) leads to RG (n − k + 1, n, 1 − b(k, n)) − [1 − b(k, n)] = 0, which means that 1 − b(k, n) is the solution p of the following equation: RG (n − k + 1, n, p) − p = 0. In other words, we have proved b(n − k + 1, n) = 1 − b(k, n). Theorem 8.1 can be used to eliminate some calculations when one constructs a table similar to Table 8.1. Take the row with n = 7 in Table 8.1 as an example. The entries with k > n/2 can be found from those entries with k < n/2. For example, 0.942 = 1 − 0.058 and 0.744 = 1 − 0.256 for n = 7 in the table. Exercises 1. Prove that, for odd n, b((n + 1)/2, n) = 0.5. 2. Verify the entries for n = 7 in Table 8.1. 3. Construct a table similar to Table 8.1 for the k-out-of-n:F systems.
8.2 OPTIMAL DESIGN OF k-OUT-OF-n SYSTEMS There are many design issues related to k-out-of-n:G systems. The simplest ones include the determination of system size n, the simultaneous determination of both k
286
DESIGN OF k-OUT-OF-n SYSTEMS
and n, and the determination of optimal replacement time. In this section, we discuss models and solutions for these optimization problems. 8.2.1 Optimal System Size n In a k-out-of-n:G system, at least k components need to work for the system to work. For a fixed k value, the higher the system size n, the higher the reliability of the system. The difference between n and k represents the degree of redundancy built into the k-out-of-n:G system. However, as n increases, there is a diminishing benefit for each additional component. In addition, the cost of the system increases as n increases. Model by Pham [189] For a k-out-of-n:G system with i.i.d. components, Pham [189] proposes a model for determination of optimal system size n and provides optimal solutions. The model assumptions are as follows: 1. 2. 3. 4. 5.
All components are i.i.d. with reliability p (unreliability q), 0 < p < 1. The cost of each component is c. The cost of system failure is d. A constant k is given. The objective is to find n that minimizes the expected total cost, E(Tn ).
With equation (7.2), the objective function to be minimized is
n n i n−i E(Tn ) = cn + d(1 − R(k, n)) = cn + d 1 − . pq i i=k
(8.8)
There is only one decision variable, n, in this objective function. It may take only integer values in the range [k, ∞). We will examine the trend of E(Tn ) as a function of n by examining the increment in E(Tn ) when n increases by one unit. Define E(Tn ) ≡ E(Tn+1 ) − E(Tn ),
(8.9)
Rn (k, n) ≡ R(k, n + 1) − R(k, n).
(8.10)
E(Tn ) = c − d Rn (k, n).
(8.11)
Then, we have
Apparently, the trend of E(Tn ) is heavily influenced by the trend of R(k, n). Based on equation (7.4), we have n pk q n−k+1 Rn (k, n) = R(k, n + 1) − R(k, n) = k−1 k−1 n p qn. =p (8.12) k−1 q
OPTIMAL DESIGN OF k-OUT-OF-n SYSTEMS
287
This quantity, Rn (k, n), is itself a function of n. The increment in Rn (k, n) can again be examined by allowing n to increase by one unit to n + 1: (Rn (k, n)) = Rn (k, n + 1) − Rn (k, n) n (n + 1)q k n−k+1 =p q −1 k−1 n−k+2 ≥0
if and only if n ≤ n 0 ,
where ? n0 ≡
@ k−1 −1 . p
(8.13)
In words, Rn (k, n) is an increasing function of n for k ≤ n ≤ n 0 and a decreasing function of n for n > n 0 . To state the optimal solution, n ∗ , that minimizes the objective function given in equation (8.8), further define A cB . (8.14) n a ≡ inf n ∈ [n 0 , ∞) : Rn (k, n) < d In words, n a is the smallest integer n such that n ≥ n 0 and Rn (k, n) < c/d. The optimal integer solution, n ∗ , that minimizes the expected total system cost, E(Tn ), is given in Theorem 8.2. Theorem 8.2 (Pham [189]) Fix p, k, d, and c with Rn (k, n), n 0 , and n a as defined in equations (8.12), (8.13), and (8.14), respectively. The optimal value n ∗ such that the expected total cost of a k-out-of-n:G system is minimized is as follows: 1. If Rn (k, n 0 ) < c/d, then n ∗ = k. 2. If Rn (k, n 0 ) ≥ c/d and Rn (k, k) ≥ c/d, then n ∗ = n a . 3. If Rn (k, n 0 ) ≥ c/d and Rn (k, k) < c/d, then k if E(Tk ) ≤ E(Tn a ), n∗ = na if E(Tk ) > E(Tn a ). For a proof of the theorem, readers are referred to Pham [189]. The same results in a different form are presented in Pham [188]. One possible case of the relationship among k0 , n 0 , and n a is shown in Figure 8.2. Suich and Patterson [233] consider the design of the k-out-of-n:G structure assuming that it is a subsystem in a larger system. They then provide an expression of the expected total cost caused by the failure of the subsystem. The reliability of the whole system excluding this k-out-of-n:G subsystem is factored in as the authors argue that the cost caused by failures of other than the k-out-of-n:G subsystem should not be considered in the design of the k-out-of-n:G subsystem. In our discussions
288
DESIGN OF k-OUT-OF-n SYSTEMS
0.6 0.5
∆Rn(k,n)
0.4 0.3 c/d 0.2 ∆Rn(k,k)
0.1 0
k
0
n0
5
FIGURE 8.2
10 n
na
15
20
Shape of function Rn (k, n) versus n.
throughout this chapter, we simply consider the optimal design of a k-out-of-n system no matter whether it is used as a subsystem or not. Readers may refer to Suich and Patterson [233] for other discussions. Model by Nakagawa [172] Nakagawa [172] assumes that the total cost of system failure is equal to nc1 + c2 , which depends on system size. This cost includes replacement of failed components and inspection of nonfailed components. The mean cost rate is to be minimized in determination of system size n. Notation • • •
λ: failure rate of each component c1 : acquisition cost of a component c2 : additional cost of the system which is replaced at failure
The mean cost rate is equal to the total cost of system failure divided by the MTTF of the system: C(n) =
nc1 + c2 n . (1/λ) i=k (1/i)
(8.15)
Function C(n) is discrete. To find the optimal number n ∗ that minimizes C(n) for a given k value, we attempt to find the n value such that C(n + 1) ≥ C(n),
(8.16)
which is equivalent to L(n) ≥
c2 , c1
(8.17)
OPTIMAL DESIGN OF k-OUT-OF-n SYSTEMS
289
where L(n) is defined as L(n) ≡ (n + 1)
n 1 i=k
i
−n
for n ≥ k.
This newly defined function, L(n), is a monotonically increasing function over the definition domain of n because of the following observations: L(n + 1) − L(n) =
n+1 i=k
L(n) ≥
1 > 0, i
n+1 −k →∞ k
as n → ∞.
As a result, there is a unique smallest n value that satisfies the condition in (8.17). Consequently, there is a unique smallest n value that satisfied the condition in (8.16). The optimal cost rate must be in the following range: λc1 n ∗ < C(n ∗ ) ≤ λc1 (n ∗ + 1).
(8.18)
This can be easily verified by observing the following: n + c2 /c1 n + c2 /c1 C(n) n + c2 /c1 = n = (n + 1) . (8.19) = λc1 [n + L(n)]/(n + 1) n + L(n) i=k (1/i) Substituting n ∗ into equation (8.19) and noting that condition (8.17) is satisfied, we have C(n ∗ ) ≤ n ∗ + 1. λc1 The other half of the inequality given in (8.18) can be verified with the following: C(n ∗ − 1) n ∗ − 1 + c2 /c1 C(n ∗ ) > = n∗ × ∗ > n∗. λc1 λc1 n − 1 + L(n ∗ − 1) This model by Nakagawa [172] has been extended to include two different types of failures by Sheu and Kuo [230]. Whenever a component fails, there is a probability 1 − a (0 < a < 1) that it is a type A failure and a probability a that it is a type B failure. A type A failure is minimally repaired instantaneously with a cost c while a type B failure is left alone. The system is failed when the total number of failed components (due to type B failures) reaches n − k + 1. When the system fails, the total replacement cost is nc1 + c2 . Under these model assumptions, the mean cost rate of the system is C(n) =
expected cost in a renewal cycle nc1 + c2 + c(1 − a)(n − k + 1)/a n . = MTTF (1/λa) i=k (1/i)
290
DESIGN OF k-OUT-OF-n SYSTEMS
Define the function n 1 c(1 − a) c(1 − a)(k − 1) L(n) = c1 + (n + 1) −n + a i a i=k
for n ≥ k.
Function L(n) can be verified to be a strictly increasing function to infinity. To find the value of n such that C(n + 1) ≥ C(n) and C(n − 1) < C(n) is equivalent to finding the value of n such that L(n) ≥ c2 and L(n − 1) < c2 . The optimal value n ∗ is equal to either k when L(k) > c2 or the smallest n value that makes L(n) ≥ c2 . Exercises 1. Find the optimal system size n given the following data: k = 2, 3, c = 15, 25, 35, d = 150, 250, 500, p = 0.75, 0.85, 0.95. 2. Under what conditions would n ∗ = k? In other words, when would a series system structure be optimal? 3. How would the optimal solution n ∗ change as a function of k, c, d, c/d, and p? 4. What general conclusions can be made for the k-out-of-n:F systems based on the results on the k-out-of-n:G systems?
8.2.2 Simultaneous Determination of n and k When n is given, one may need to determine the optimal value of k. Pham [188] provides some results for this optimization problem. However, more often one needs to determine the optimal values of n and k simultaneously. This problem is covered in this section. Suich and Patterson [233] present several models for simultaneous determination of k and n in the optimal design of a k-out-of-n:G system. As we discussed in the previous section, the value of n represents the degree of redundancy that is built into the k-out-of-n:G system. The value of k represents the minimum number of components that have to work for the system to be considered working. Thus, the selection of k actually represents the selection of the type of components to be used in the system. In other words, it represents the selection of the capability of the components. For example, in the design of a power plant, for a given required system output in terms of megawatts (MW) of electricity to be generated, we may choose to use a smaller number of large units each with a high electricity output or a larger number of small units each with a small electricity output. In the operation of an existing power plant where all units have already been installed, we may be able to run a smaller number of units each at full power or run a larger number of units each at half power to generate the required total output. For a given required total output, it is generally more expensive to have a larger number of small units than to have a smaller number of large units. For example, buying one generating unit with a rated electricity output of 100 MW may cost $2 million. Buying two identical generating units each with a rated electricity output
OPTIMAL DESIGN OF k-OUT-OF-n SYSTEMS
291
of 50 MW may cost a total of $2.5 million. In this example, a two-unit subsystem costs more than a one-unit subsystem even though they have the same total output capability. Suich and Patterson [233] use c3 g(k) to represent the cost of a k-unit subsystem with a specified total output, where c3 is the cost of a one-unit subsystem capable of the same total output. The cost of each unit in the k-unit subsystem is then c3 g(k)/k. The form of g(k) will reflect how the selection of the k value affects the cost of each unit in the system and g(k) generally increases with k. If g(k) = 1 for all possible k values, then a subsystem consisting of a single large unit costs the same as a subsystem consisting of k smaller units. The model assumptions are summarized below: 1. All components are i.i.d. with reliability p (unreliability q), 0 < p < 1. This means that changes of k and/or n will not affect the reliability of each component. 2. The cost of a component is dependent on k. We will use c3 g(k)/k to represent the cost of each component, where c3 is the cost of a component that is capable of providing the full output required from the whole system. 3. The cost of system failure is d. 4. The objective is to find n and k that minimize the total expected cost, E(T (k, n)). The objective function to be minimized is E(T (k, n)) =
nc3 g(k) + d(1 − R(k, n)), k
(8.20)
where R(k, n) is given in equation (7.2). For this model, no analytical solutions are available. A BASIC program was used by Suich and Patterson [233] to solve for the optimal k and n values. The following example is based on Suich and Patterson [234]. Example 8.1 Consider the problem of building a space electrical power system. The cost due to system failure is d = 216. The cost of acquiring a single component of full required power is c3 = 1. A rough rule of thumb says that the cost of a smaller power unit in a k-unit subsystem is proportional to the electrical output raised to the power of 0.7, that is, c3 (1/k)0.7 . We can write g(k) = (1/k)0.7 . Using equation (8.20) and a simple computer program, we can easily find the optimal n and k values that minimize the expected total cost. Figure 8.3 shows the expected total cost as a function of i.i.d. component reliability p for different n and k combinations. From this figure, we can see that when p > 0.95, n = 2, and k = 1; that is, a parallel system with two components is the best system structure. When 0.90 < p < 0.95, a 2-out-of-4:G system is the best system structure. Other sensitivity analyses on some cost parameters may also be performed.
292
DESIGN OF k-OUT-OF-n SYSTEMS
6 n = 2, k = 1 n = 4, k = 2 n = 5, k = 3 n = 7, k = 4 n = 8, k = 5
E(T(k,n))
5
4
3
2 0.88 0.89
FIGURE 8.3
0.9
0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 p
Expected total cost as function of p for different n and k combinations.
Suich and Patterson [233] also propose models for determination of n and k values to minimize the expected system loss function when component reliability deteriorates with time. They are not discussed here. A special case of the model discussed above is when one is to design a system with an odd number of components such that the system works if and only if at least half of the components work. In this case, we are to find the optimal value of k such that the k-out-of-(2k − 1):G system has the lowest expected total cost. Pham [188] studies this problem by assuming that the cost of each component is independent of the k value. Let c be the cost of each component and d be the cost of system failure. The following is the objective function to be minimized: E(T (k)) = c(2k − 1) + d(1 − R(k, 2k − 1)).
(8.21)
The optimal solution to this model is given in Theorem 8.3. Theorem 8.3 (Pham [188]) For fixed p, c, and d, there exists an optimal value of k, k ∗ that minimizes E(T (k)): , c inf k : w(k) < if q < 0.5 and k ≥ 1, d(0.5 − q) k∗ = (8.22) 1 otherwise, where 2k − 1 ( pq)k . w(k) = k−1
OPTIMAL DESIGN OF k-OUT-OF-n SYSTEMS
293
Exercise 1. Consider similar design problems for a k-out-of-n:F system. What general conclusions can be made? 8.2.3 Optimal Replacement Time Nakagawa [172] considers the problem of determining optimal replacement time of a k-out-of-n:G system. The model assumptions are as follows: • • •
All components are i.i.d. following the exponential lifetime distribution with parameter λ. The system is replaced at time T or at failure, whichever occurs first. The cost of replacing a nonfailed system is nc1 while the cost of replacing a failed system is nc1 + c2 .
The mean time to replacement of the system has an expression similar to the one given in equation (8.56). Thus, we can express the mean cost rate of the system as C(T ) =
nc1 + c2 (1 − Rs (T )) , T 0 Rs (t) dt
(8.23)
where Rs (t) =
n n i=k
i
n−i . e−λti 1 − e−λt
(8.24)
Because a k-out-of-n:G system with i.i.d. exponential components has an increasing failure rate (see Section 7.1.1), there exists a finite optimal preventive replacement time. Using usual calculus techniques, if there exists an interior solution (0 < T < ∞) to maximize C(T ), it would be given by the equation g(T ) =
nc1 , c2
(8.25)
where n 1 n i −1 n−1 (−1)i−k 1 − e−λT i n k−1 k − 1 i=k i i g(T ) ≡
i−k n e−λT n i 1 − e−λT i=k −
n−i n −λT i 1 − e−λT . e i
k−1 i=0
(8.26)
294
DESIGN OF k-OUT-OF-n SYSTEMS
Function g(T ) has the following properties: 1. g(0) = 0, 2. g(∞) = limT →∞ g(T ) = k 3.
g (T )
≥ 0.
n i=k
(1/i) − 1, and
In words, g(T ) is a monotonically increasing function of T approaching a constant value as T goes to infinity. If g(∞) > nc1 /c2 , there exists an interior solution, T ∗ . Otherwise, T ∗ → ∞. Define
n 1 nc1 . (8.27) − = i kc2 i=k+1 If > 0, there exists a finite and unique T ∗ that satisfies equation (8.25) and minimizes C(T ). If ≤ 0, then T ∗ → ∞, namely, we should make no replacement of the system before failure. If k = n, the system is a series system that has an exponential distribution. Thus, no preventive replacement will be made either. This work by Nakagawa [172] has been extended by Sheu and Kuo [230] to include two types of failure, as in the case discussed in Section 8.2.1. Similar notation is used. In this case the objective function is k−1 n− j n e−aλT j 1 − e−aλT g(T ) = nc1 + c2 j j=0 j−1 n−i n− c(1 − a) n−k n −aλT i 1 − e −aλT + e i a j=0 i=0 −1 j−1 n 1 n−i 1 n . × e−aλT i 1 − e−aλT aλ j i j=k i=0 Standard calculus techniques are needed to find T ∗ that minimizes g(T ). Sheu and Kuo [229] also extend this model to the case when minimal repair costs are time dependent and stochastic.
8.3 FAULT COVERAGE The k-out-of-n systems have redundancy built in them. Redundancy is one of the methods used for increasing system reliability. Generally speaking, the higher the level of redundancy, the higher the system reliability. Active redundancy and/or standby redundancy may be used by designers depending on applications. As we have mentioned in the analysis of standby systems, not only do the redundant com-
FAULT COVERAGE
295
ponents have to work when called upon, the sensing and switching mechanisms have to work as well. If a system cannot detect, locate, and recover from faults and errors that have occurred within some of the components, it will fail, even though there are still good redundant or standby components in the system. Thus, when one is using a k-out-of-n structure, the issue of fault coverage has to be addressed. The term fault coverage has been used in dependability analysis of fault-tolerant systems. Fault coverage is defined to be the probability for the system to recover given that faults occur [70]. This error handling may be achieved through error masking by use of active redundancy, instruction retry by asking the system to perform the same task again, or system reconfiguration through disconnection of the failed components and connection of a standby component on-line. Much of the reported research results on fault coverage refers to computer software systems. For recent work in this area, readers are referred to Dugan and Trivedi [70], Reibman and Zaretsky [200], Doyle et al. [66], and Amari et al. [10]. In the case of imperfect fault coverage, the fault coverage probability is directly linked to intrinsic component failure probability. If a component fails and such a failure is covered, the system may not fail as long as there is adequate redundancy in the system. However, if a component fails and such a failure is not covered, the system fails even though there is adequate redundancy in the system. Component failure and imperfect fault coverage have to occur simultaneously for the system to fail. 8.3.1 Deterministic Analysis Ignoring the time factor, Amari et al. [10] conduct a deterministic analysis of a kout-of-n:G system considering fault coverage. The system considered consists of n i.i.d. components each having a reliability of p. Individual component failures are independent. If the failure of a component occurs and this failure is not covered by the system, system failure occurs immediately even if adequate redundancy exists in the system. Even when every component failure is covered by the system, the system fails if less than k components work. The following additional notation is used. Notation • • • • • • • •
pc : Pr(system recovers | a component fails) qc : 1 − pc w: 1 − qqc B: covered failure probability of each component, B = qpc C: uncovered failure probability of each component, C = q − B = qqc P: conditional reliability of a component given that no uncovered failure has occurred to it, P = p/( p + B) Q: 1 − P 9n Pu (n): = i=1 (1 − C) = (1 − C)n , Pr(no uncovered failures in the system with n components)
296 • •
DESIGN OF k-OUT-OF-n SYSTEMS
Rc (n): conditional reliability of the system with n components given that there are no uncovered failures Rs (n): reliability of the system with n components considering imperfect failure coverage
Based on the defined notation, we have p + B + C = 1,
(8.28)
p p p = = , P= p+B 1−C w qpc Q =1− P = , w Pu (n) = wn .
(8.29) (8.30) (8.31)
The reliability of any system considering imperfect fault coverage can be evaluated with the conditional probability formula Rs (n) = Pr(no uncovered component failures) × Pr(system works | no uncovered component failures) = Pu (n)Rc (n).
(8.32)
A proof of equation (8.32) is given by Amari et al. [9]. In this equation, Rc (n) can be evaluated with the same formula for system reliability evaluation when imperfect fault coverage does not exist with one exception: The conditional component reliability P and conditional component unreliability Q have to be used. With this equation, we can easily find the reliability of a k-out-of-n:G system with imperfect fault coverage:
n n n i n−i n i n = (8.33) P Q p ( pc q)n−i . Rs (n) = w i i i=k i=k For a k-out-of-n:G system with perfect fault coverage, system reliability is a monotonically increasing function of system size n. However, when imperfect fault coverage is present, the reliability of a k-out-of-n:G system does not always increase when n is increased. Figure 8.4 shows the relationship between system reliability and system size when imperfect fault coverage is considered. This shows that there is an optimization issue to be addressed in determining the system size to maximize system reliability. Lemma 8.1 Define
n f c (n) ≡ Rc (n) = Rc (n + 1) − Rc (n) = k−1
P Q
k Q n+1 ,
(8.34)
FAULT COVERAGE
297
0.9 0.8 0.7 RS (n)
0.6 0.5 0.4 0.3 0.2 0.1 0 0
2
4
6
n
8
10
12
14
FIGURE 8.4 Reliability of a k-out-of-n:G system with imperfect fault coverage does not always increase as system size increases (k = 2, p = 0.75, pc = 0.8).
k−1 , k+1 ? @ k−1 − 1. n0 ≡ P
P0 ≡
(8.35) (8.36)
If P ∈ (0, P0 ], f c (n) attains its maximum at n = n 0 + 1. If P ∈ (P0 , 1), f c (n) attains its maximum at n = k. Proof To find a maximum point of f c (n), we need to examine the trend of f c (n), that is, fc (n): k P n+1 Q n+2 f c (n) = f c (n + 1) − f c (n) = k−1 Q k P n − Q n+1 k−1 Q k P n n+1 n+1 = Q Q−1 . k−1 Q n−k+2 As a result, f c (n) ≥ 0 if and only if (n + 1)Q ≥ n − k + 2
? or
n≤
@ k−1 − 1 ≡ n0. P
Similarly, f c (n) ≤ 0 if and only if n ≥ (k − 1)/P − 1. Recall that we have n ≥ k too. If n 0 < k, or equivalently, P > P0 , we have f c (n) ≤ 0 for all n ≥ k. As a result, fc (n) attains maximum at n = k. If 0 < P ≤ P0 , we have n 0 ≥ k and
298
DESIGN OF k-OUT-OF-n SYSTEMS
f c (n) increases in n for k ≤ n ≤ n 0 and decreases in n for n > n 0 . We also have the following: • •
If (k − 1)/P is integer, then f c (n 0 − 1) < f c (n 0 ) = f c (n 0 + 1) > fc (n 0 + 2). If (k − 1)/P is not integer, then f c (n 0 ) < f c (n 0 + 1) > f c (n 0 + 2).
In either case, f c (n) attains its maximum at n 0 + 1 provided n 0 ≥ k. Now we are ready to state how to determine the optimal value of n that maximizes system reliability. Theorem 8.4 (Amari et al. [10]) Define , f c (n) 1−w < , n 1 ≡ inf n ∈ [n 0 , ∞) : Rc (n) w , 1−w f c (n) < , n 2 ≡ inf n ∈ [k, n 0 ] : Rc (n) w , f c (n) 1−w < . n 3 ≡ inf n ∈ [k, ∞) : Rc (n) w
(8.37) (8.38) (8.39)
For fixed p, pc , and k, the optimal system size n ∗ such that system reliability Rs (n) is maximized can be determined with the following procedure: If 0 < P < P0 , then n ∗ = n 1
if
otherwise.
n∗ = n2
f c (n 0 ) 1−w ≥ , Rc (n 0 ) w
If P0 < P < 1, then n∗ = n3. Amari et al. [10] also provide some results for the k-out-of-(2k − 1):G system. Since n = 2k − 1, determination of the optimal system size is equivalent to determination of the optimal k value. In this case, we will write various reliability measures as a function of k instead of n, that is, Rs (k) ≡ Rs (2k − 1), Rc (k) ≡ Rc (2k − 1), Pu (k) ≡ Pu (2k − 1), and f c (k) ≡ f c (2k − 1). Substituting n = 2k − 1 into equation (8.33), we have the following system reliability expression:
2k−1 2k − 1 2k−1 i 2k−1−i . (8.40) P Q Rs (k) = w i i=k
FAULT COVERAGE
Using a similar approach, we can examine the trend of f c (k): 2k − 1 k k f c (k) = Rc (k + 1) − Rc (k) = P Q (2P − 1). k
299
(8.41)
From such examinations, the following conclusions can be made: • • • • •
If P = 0.5, f c (k) = 0 and Rc (k) is a constant. If 0 < P < 0.5, f c (k) is an increasing function of k and Rc (k) is a decreasing function of k. If 0.5 < P < 1, f c (k) is a decreasing function of k and Rc (k) is an increasing function of k. If 0 < p ≤ 0.5, Rs (k) attains maximum at k = 1. If 0.5 < p < 1, Rs (k) attains maximum at B A k = inf k : f c (k)/Rc (k) < (1 − w 2 )/w2 .
Exercises 1. Verify the expression of f c (k) in equation (8.41). Hint: Use a recursive equation for system reliability. 2. Verify the results for optimal design of the k-out-of-(2k − 1):G system. 3. Prove Theorem 8.4. 8.3.2 Stochastic Analysis Sometimes, we are interested in system reliability or availability as a function of time. A stochastic analysis needs to be performed. In this section, we focus on the analysis of the stochastic performance of a k-out-of-n:G system with imperfect fault coverage. Some comments on the differences between fault coverage and common-cause failures need to be made here. Common-cause failures are due to events other than individual component failures. For example, one case of common-cause failure is lightning on electronic equipment. The intrinsic functions of the electronic equipment under specified conditions are performing perfectly. However, due to the external event of lightning, several units of such electronic equipment may fail simultaneously. Thus, we often treat intrinsic component failures and common-cause failures as independent events. Considering a k-out-of-n:G system with imperfect fault coverage, Akhtar [7] uses Markov techniques to analyze system reliability, availability, and MTTF. The results from the work of Akhtar are discussed in this section. Akhtar assumes that a system continues to deteriorate even if it is failed as long as the failure is not caused by a fault coverage failure. However, we assume that the system does not deteriorate once it is in the failed state.
300
DESIGN OF k-OUT-OF-n SYSTEMS
pcλ0
pcλ1
0
1 µ1
(1-pc)λ0
pcλ2
...
2 µ2
pcλn-k-2
µ3
(1-pc)λ1 (1-pc)λ2
pcλn-k-1
n-k-1
n-k
µn-k-1 (1-pc)λn-k-1
µn-k λn-k
n-k+1 FIGURE 8.5
Transition diagram of k-out-of-n:G system with imperfect fault coverage.
Component lifetimes and repair times are assumed to be exponentially distributed. The failure rate and repair rate of each component are λ and µ, respectively. A component failure is covered with probability pc and not covered with probability 1− pc . Repairs are perfect. When a component failure is not covered or the number of working components is less than k, the system is failed. The number of repair facilities may be 1 or unlimited. State i with 0 ≤ i ≤ n−k indicates that there are i failed components in the system and the system is working. State n − k + 1 is used to indicate system failure due to imperfect fault coverage or more than n − k failed components. State n − k + 1 is an absorbing state. The system state transition diagram under these assumptions is shown in Figure 8.5, in which λi and µi are the system failure rate and repair rate of state i, respectively. Based on the system state transition diagram in Figure 8.5, we obtain the following differential equations: P0 (t) = −λ0 P0 (t) + µ1 P1 (t), Pi (t)
(8.42)
= −(λi + µi )Pi (t) + pc λi−1 Pi−1 (t) + µi+1 Pi+1 (t) for i = 1, 2, . . . , n − k − 1,
Pn−k (t)
= −(λn−k + µn−k )Pn−k (t) + pc λn−k−1 Pn−k−1 (t),
(t) = λn−k Pn−k (t) + Pn−k+1
n−k−1
(1 − pc )λi Pi (t),
i=0
where for i = 0, 1, 2, . . . , n − k, λi = (n − i)λ µ for a single repair facility, µi = iµ for unlimited repair facilities and i = 1, 2, . . . , n − k, and P0 (t) + P1 (t) + P2 (t) + · · · + Pn−k+1 (t) = 1.
(8.43) (8.44) (8.45)
FAULT COVERAGE
301
Case I: When No Repairs Are Allowed at All In this case, we are assuming µ = 0. The system is completely nonrepairable. The standard approach for solving the set of differential equations shown above may be used to find the system reliability function. However, an alternative approach can be used to derive the state distribution of the system as follows. Each component has an exponential lifetime distribution with parameter λ. For the system to be in state i (0 ≤ i ≤ n − k) at time t, exactly n − i components must survive beyond time t; exactly i components have failed by time t; and all of these i failures are covered by the system. The probability for each component to survive time t is e−λt . Thus, we have n−i i i n 1 − e−λt i = 0, 1, . . . , n − k (8.46) Pi (t) = pc e−λt i Pn−k+1 (t) = Q s (t) = 1 −
n−k
Pi (t),
(8.47)
i=0
Rs (t) =
n−k
Pi (t).
(8.48)
i=0
These equations can also be obtained directly from equation (8.33) by letting p = e−λt and q = 1 − e−λt . The availability function is the same as the reliability function. The steady-state availability of the system is zero. Using equations (8.48) and (7.63), we have MTTFs =
∞ 0
Rs (t) dt =
pi 1 n−k c . λ i=0 n − i
(8.49)
Case II: When Repairs Are Allowed While a System Is Not Failed In this case, it is assumed that repairs are allowed when the system is not in a failure state. Whenever the system is failed, due to either more than n − k failed components or imperfect fault coverage, the system reaches an absorbing state. Standard approaches should be followed to solve the set of differential equations. Other Studies The fault coverage factor discussed so far is only used for modeling the coverage of the failure of a component. It refers to the capability of the system to reconfigure itself to mask or recover from the failure of an individual component. The phenomenon of a system that is capable of reconfiguring itself may have other impacts in addition to covering the failures of components. Shao and Lamberson [225] describe a fault detection, isolation, and system reconfiguration unit as a builtin test (BIT). The BIT is treated as a separate unit. It may fail to cover the failure of one of the n regular components. It may also fail by itself. In another study by Shao and Lamberson [226], a sensing and switching mechanism is responsible for detection of the failure of a component and the completion of repair of a failed component. When a component becomes failed, the sensing and switching mechanism is expected to detect the failure, switch this component off-line, and redistribute the
302
DESIGN OF k-OUT-OF-n SYSTEMS
load of the system among all working components. When a failed component is repaired, the sensing and switching mechanism is expected to detect it, switch this component on-line, and redistribute the load of the system among all working components. The service of the sensing and switching mechanism is required both when a component becomes failed and when a failed component is repaired. The function of such a sensing and switching mechanism is more than fault coverage. In their studies, Shao and Lamberson [225, 226] consider working state i(0 ≤ i ≤ n − k) wherein i components are failed and all these failures are covered; failure state n − k + 1 wherein n − k + 1 components are failed and all these failures are covered; failure states j B (0 ≤ j ≤ n −k) wherein the BIT unit is failed and j regular components are failed; and failure states j F (1 ≤ j ≤ n − k + 1) wherein the BIT is unable to cover the failure of one of the regular components and j regular components are failed. The same procedure can be followed with the Laplace transform technique for solving differential equations to find the probability distribution of the system state. However, because there are so many different states, it is difficult to obtain analytical solutions. Exercises 1. Verify equation (8.49). 2. Solve the set of differential equations. Find expressions of the system availability function and MTTF.
8.4 COMMON-CAUSE FAILURES Common-cause failures, also called common-mode failures, critical human errors, or shocks, describe single events that cause multiple component failures. These events may be outside the components, for example, storms, floods, lightning, seismic activities, maintenance errors, other human intervention errors, and sudden changes in environment such as temperature, pressure, humidity, and dust. They may also be caused by failures of some of the components, for example, the leak of a highpressure water tank may cause failures of motors and other electrical components [240]. Two types of common-cause failures have been studied in the literature. A lethal common-cause failure results in the failure of all of the components or of the system while a nonlethal common-cause failure results in the failure of several components. Common-cause failures may dramatically reduce the benefits of redundancy in system reliability improvement as they may cause simultaneous failures of multiple components. Researchers have studied the effects of common-cause failures on power transmission lines [65], nuclear power plants [135], and general systems [116]. A k-out-of-n:G system fails whenever at least n − k + 1 components are failed. In addition to components failing one by one, there may be other factors that contribute to system failure. Common-cause failure and critical human error are among these
COMMON-CAUSE FAILURES
303
factors. Recent studies on k-out-of-n systems with common-cause failures include Chung [57], Jung [113], Chari [52], and Dhillon and Anude [64]. Most studies assume that the occurrence of a common-cause event will make the entire system fail. In other words, common-cause failures are lethal. It is also often assumed that common-cause failures and individual component failures are independent of each other. Under these assumptions, to model lethal common-cause failures without considering the time factor is very straightforward. The system reliability is equal to the probability that common-cause failures do not occur multiplied by the reliability of the system when there are no common-cause failures. In a nonrepairable system subject to common-cause failures, the system reliability function can also be easily expressed. For example, consider a k-out-of-n:G system with i.i.d. components whose lifetimes are exponentially distributed with parameter λ and the lifetime of lethal common-cause failure is also exponentially distributed with parameter λc . Then the system reliability function can be expressed as Rs (t) = e−λc t
n n i=k
i
e−λt
i
1 − e−λt
n−i
.
(8.50)
Jung [113] considers k-out-of-n:G systems with nonidentical components subject to nonlethal common-cause failures. A nonlethal common-cause failure results in a subset of components failing simultaneously. Attempts have been made to provide expressions for MTTF and MTTR of the system. However, because components are nonidentical, no closed formulas have been derived. 8.4.1 Repairable System with Lethal Common-Cause Failures Chung [57] analyzes a k-out-of-n:F system considering multiple critical errors. Whenever one of these critical errors occurs, the system is failed. The system state may be 0, 1, . . . , k − 1, k, k + 1, . . . , k + M, where states {0, 1, . . . , k − 1} are the working states, state k is a system failure state indicating that there are at least k failed components, and state k + i indicates system failure due to critical error number i, where i = 1, 2, . . . , M. There are r repair facilities. The lifetimes and repair times of components are all exponentially distributed. Standard Markov chain techniques and Laplace transforms are used for derivation of system reliability function and steady-state system availability. Dhillon and Anude provide common-cause failure analyses of a nonrepairable k-out-of-n:G system [64] and a repairable k-out-of-n:G system [63] each with an additional warm standby component. The standby component is switched into operation as soon as one of the n original active components is failed provided that the sensing and switching mechanism is working when needed. Whenever a commoncause event occurs, the system is failed. The system may also fail when the total number of working active components is less than k. Standard Markov techniques and Laplace transforms are used in the analysis of system reliability. Adachi and Kodama [4] consider a k-out-of-n:G system with i.i.d. components subject to lethal common-cause failures. Exponential distributions are used. There is
304
DESIGN OF k-OUT-OF-n SYSTEMS
only one repair facility. Three repair policies are considered: (1) failed components are repaired one by one; (2) when the system fails due to a common-cause failure, k units are repaired in a batch; and (3) when the system fails due to a common-cause failure, n units are repaired in a batch. System reliability and availability functions and steady-state availability are analyzed under each repair policy. In the following, we describe the model of a standard k-out-of-n:G system subject to lethal common-cause failures. The system has n i.i.d. components. It is failed whenever the total number of working components is less than k or a lethal commoncause failure event has occurred. Whenever the system is down, due to commoncause failures or not, repair of the system will restore the system to “as good as new” condition; that is, the system will start operation again with zero failed components in it. The failures of components and the occurrence of common-cause failures are independent. Lifetimes and repair times of components are assumed to be exponentially distributed. We will use the word “intrinsic” to indicate failures of components due to non-common-cause failures. Notation •
• • • •
i: system state: when 0 ≤ i ≤ n − k + 1, it represents the number of failed components in the system; when i = n − k + 2, it represents that the system is failed due to common-cause failure. There are two failure states {n − k + 1, n − k + 2} and n − k + 1 working states {0, 1, 2, . . . , n − k}. λi : usual system failure rate when it is in state i, 0 ≤ i ≤ n − k λi : common-cause system failure rate when it is in state i, 0 ≤ i ≤ n − k µi : usual repair rate of the system when it is in state i, 1 ≤ i ≤ n − k µi : repair rate of the system when it is down, due to common-cause failures or not, n − k + 1 ≤ i ≤ n − k + 2
Figure 8.6 depicts the system state transition diagram. From this diagram, differential equations can be derived. The Laplace transform may be used to solve these
n-k+1
µ'n-k+1
λn-k
λ0
0
1
2
µ1 λ'0
λ2
λ1
µ3
µ2 λ'1
...
λ'2
λn-k-2
λn-k-1
n-k-1
n-k
µn-k-1
µn-k
λ'n-k-1
λ'n-k
µ'n-k+2
n-k+2 FIGURE 8.6 Transition diagram of k-out-of-n:G system with common-cause failures.
COMMON-CAUSE FAILURES
305
differential equations. System reliability function, steady-state system availability, MTTF, MTBF, and other system performance measures may be evaluated. These are left as exercises for the reader. For more complicated models and analyses, the reader is referred to references cited earlier in this section. Exercise 1. Develop expressions for system reliability function, availability function, MTTF, MTBF, and steady-state availability based on Figure 8.6. 8.4.2 System Design Considering Lethal Common-Cause Failures Consider a k-out-of-n:G system with lethal common-cause failures. The system may fail due to individual component failures or common-cause failures. Whenever the system is failed, it is replaced by a new one. Suppose components are i.i.d. having an exponential lifetime distribution with parameter λ. The interarrival time of commoncause failures follows the exponential distribution with parameter λc . Bai et al. [17] consider determination of n to minimize mean cost rate. Two models are proposed. In model 1, no planned replacements or inspections are considered. In model 2, periodic inspections are performed. At inspection time points, all failed components are detected and replaced. Model 1: Determination of System Size n to Minimize Mean Cost Rate Notation • • • • • • • •
c1 : purchase cost of each component c2 : additional cost caused by system failure c2 : c2 /c1 λ: intrinsic failure rate of each component λc : arrival rate of common-cause failures δ: λc /λ C1 (n): mean cost rate of the system Sn : mean time to failure of the system
The system reliability function is Rn (t) = e
−λc t
n i n−i n e −λt 1 − e−λt . i i=k
The system MTTF is Sn =
∞ 0
Rn (t) dt =
n (i + δ) n! . λ(n + δ + 1) i=k i!
(8.51)
306
DESIGN OF k-OUT-OF-n SYSTEMS
The system MTTF as a function of system size n has the following properties: 1. Sn increases in n and limn→∞ Sn = 1/λc , 2. Sn+1 = 1/[λ(n + δ + 1)] + [(n + 1)/(n + δ + 1)]Sn , and 3. Sn+1 − Sn decreases in n and limn→∞ (Sn+1 − Sn ) = 0. The mean cost rate of the system can be expressed as C1 (n) =
c1 (n + c2 ) . Sn
At this moment, we do not know if C 1 (n) is a unimodal function or not. However, we do know that it is finite at n = k and it goes to positive infinity as n → ∞. Thus, a necessary condition for n to be a local minimum of C1 (n) is C 1 (n + 1) − C 1 (n) ≥ 0
and
C 1 (n) − C 1 (n − 1) ≤ 0.
(8.52)
Now we will examine the change in the mean cost rate when the system size goes from n to n + 1: C1 (n) ≡ C1 (n + 1) − C1 (n) =
c1 (n + 1) + c1 c2 c1 n + c1 c2 − Sn+1 Sn
=
c1 Sn − (c1 n + c1 c2 )(Sn+1 − Sn ) . Sn+1 Sn
(8.53)
Now define L(n) ≡
Sn − n. Sn+1 − Sn
Then, according to equation (8.53), the necessary conditions in equation (8.52) are equivalent to the following: L(n) ≥ c2
and
L(n − 1) < c2 .
Based on the properties of Sn , we can conclude that L(n) is an increasing function of n [a simple examination of L(n +1)− L(n) will confirm this]. In addition, L(n) goes to positive infinity as n → ∞. As a result, we can conclude that L(n) will cross c2 at most once. This means that the necessary condition is also going to be a sufficient condition if a solution exists. If L(k) > c2 , the optimal system size should be k. If L(k) < c2 , the optimal system size will be n ∗ , which is the smallest n value that makes L(n) ≥ c2 .
COMMON-CAUSE FAILURES
307
Model 2: Determination of System Size n and Inspection Interval T to Minimize Mean Cost Rate Additional Notation • • • • • •
c3 : inspection cost of the system (c3 < c2 ) c3 : c3 /c1 c : c2 − c3 L 2 (n, T ): average cost during a renewal cycle S2 (n, T ): average length of the renewal cycle C2 (n, T ): mean cost rate of the system
In this model, we consider periodic inspections at time interval T . Whenever the system fails, due to common-cause or non-common-cause failures, all components are replaced. Whenever an inspection is conducted after a working time of T , all failed components are detected and replaced. We need to determine the optimal system size n and the optimal inspection interval T . The length of the renewal period is the minimum of T and system life. The cost during a system renewal cycle includes three possibilities: (1) system failure because the number of working components becomes less than k within T , (2) system failure due to common-cause failures before time T , and (3) cost of inspection and component replacement at time T . As a result, the average cost in a renewal cycle can be expressed as L 2 (n, T ) = (c1 n + c2 )e−λc T
n−i n −λT i 1 − e−λT e i
k−1 i=0
+ (c1 n + c2 ) 1 − e−λc T +e
−λc T
n , i n−i n −λT −λT [c1 (n − i) + c3 ] . e 1−e i i=k
Using equation (8.51), the average length of a renewal cycle is T n n! (i + δ) S2 (n, T ) = Iφ (i + δ, n − i + 1) , Rn (t) dt = λ(n + δ + 1) i=k i! 0 where Iφ (a, b) is the incomplete beta function. The mean cost rate under model 2 is C2 (n, T ) =
L 2 (n, T ) . S2 (n, T )
Let p = 1 − e−λT and q = 1 − p. The following procedure is proposed by Bai et al. [17] for finding the optimal values of p and n. The optimal value of T can then be found from the optimal value of p.
308
DESIGN OF k-OUT-OF-n SYSTEMS
1. For given k, δ, and c2 , the optimal value of n ∗1 of n minimizing C1 (n) is determined using model 1. 2. For each n from k to n ∗1 , the optimal value p∗ (n) of p minimizing C2 (n, T ) is determined numerically, for example, by the bisection method, and its mean cost rate C2 (n, p ∗ (n)) is calculated. 3. Finally, the pair (n ∗ , p∗ (n ∗ )) is searched that has the minimal mean cost rate among n ∗1 − k + 1 pairs of (n, p ∗ (n)). 8.4.3 Optimal Replacement Policy with Lethal Common-Cause Failures Sheu and Liou [231] consider the determination of optimal replacement time for a k-out-of-n:F system subject to shocks. The only possible failure causes of the system are shocks. Intrinsic component failure mechanisms are not considered. The system has a k-out-of-n:F structure with i.i.d. components. Shocks may hit the system. When a shock hits the system, j components will fail simultaneously with probability p j for j = 0, 1, 2, . . . , n. The system is failed if a shock causes k or more components to fail simultaneously. Whenever the system is failed, it is replaced by a new one (this is called unplanned replacement). If less than k components are failed due to a shock, minimal repairs are performed immediately on the system to restore the functions of all failed components. The system is replaced by a new one if it survives a duration of T (this is called planned replacement). Whenever the system is replaced, planned or unplanned, the time is reset to zero. The times of minimal repairs and planned and unplanned replacement are negligible. The objective of the analysis is to determine the optimal planned replacement interval T to minimize the long-run expected cost per unit of time, or mean cost rate. Notation • • • • • • • • •
c p : cost of a planned replacement of the system cu : cost of an unplanned replacement of the system c j (t): cost of minimal repair when a shock causes exactly j components to fail at time t for 1 ≤ j < k λ(t): arrival rate of shocks at time t T : length of planned system replacement interval N (t): number of shocks experienced in time interval (0, t] L j (t): number of shocks experienced in time interval (0, t] that each causes exactly j components to fail for j = 0, 1, 2, . . . , k − 1 L(t): number of shocks experienced in time interval (0, t] that each causes at least k components to fail p j (t): probability that the shock at time t causes exactly j components to fail for j = 0, 1, . . . , n, nj=0 p j (t) = 1
It is further assumed that {N (t), t ≥ 0} forms a nonhomogeneous Poisson process with intensity λ(t). Then, it can be shown that {L j (t), t ≥ 0} for j = 0, 1, . . . , k −1
COMMON-CAUSE FAILURES
309
and {L(t), t ≥ 0} are independent nonhomogeneous Poisson processes with intenn sity p j (t)λ(t) for j = 0, 1, . . . , k − 1 and i=k pi (t)λ(t), respectively (see, e.g., Savits [217]). This is similar to the classical decomposition of a Poisson process for constant p j values. Let Y1 = inf{t ≥ 0 : L(t) = 1}. Here, Y1 represents the first time point at which the system is failed if no planned replacements are used (i.e., T → ∞). In this case, the CDF of the system lifetime is given by Fk (y) = 1 − Pr(Y1 > y) = 1 − Pr(L(y) = 0)
n y pi (x) λ(x) d x . = 1 − exp − 0
(8.54)
i=k
Because we have a specified planned replacement time T , the first time point at which the system is replaced is represented by Y1∗ = min{Y1 , T }, where Y1∗ is also called the length of the first system replacement cycle. Let Yi∗ indicate the length of the ith system replacement cycle for i = 1, 2, . . . . Then, {Y1∗ , Y2∗ , . . . } forms a renewal process. If we use Ci∗ to represent the cost of the system during the ith replacement cycle, then {Yi∗ , Ci∗ } for i = 1, 2, . . . constitutes a renewal reward process. The term L j (Y1∗ ) indicates the number of shocks experienced by the system that cause exactly j components to fail for j = 0, 1, . . . , k − 1 before the system is replaced (planned or unplanned) for the first time. Note that Y1 and hence Y1∗ are independent of {L j (t), t ≥ 0} for j = 0, 1, 2, . . . , k − 1. If we use D(t) to denote the expected cost of operating the system over time interval [0, t], then the long-run expected cost per unit of time as a function of our decision variable T , denoted by C(T ), can be expressed as [205] C(T ) = lim
t→∞
E(C1∗ ) D(t) = . t E(Y1∗ )
(8.55)
To find an explicit expression of C(T ) as a function of T , we need to find E(C1∗ ) and E(Y1∗ ). We find E(Y1∗ ) with the following equation: E(Y1∗ )
=
T 0
x d Fk (x) + T [1 − Fk (T )] =
0
T
[1 − Fk (x)] d x.
(8.56)
To find an expression of E(C1∗ ), we need to find out the expected number of minimal repairs experienced by the system during a replacement cycle. Recall that {L j (t), t ≥ 0} for j = 0, 1, . . . , k − 1 are nonhomogeneous Poisson processes with intensities p j (t)λ(t), respectively. Let A j (t) represent the total cost incurred in interval (0, t] due to shocks each causing j components to fail for j ≤ k − 1. Then, we have t c j (x) p j (x)λ(x) d x. E(A j (t)) = 0
310
DESIGN OF k-OUT-OF-n SYSTEMS
Let A(t) represent the total cost in interval (0, t] due to all shocks each causing at most k − 1 components to fail. Then, we have E(A(t)) =
k−1
E(A j (t)) =
j=0
k−1 t
c j (x) p j (x)λ(x) d x.
j=0 0
In addition to minimal repair costs, there are possible planned and unplanned replacement costs. If Y1 > T , there is the planned replacement cost. Otherwise, there is the unplanned replacement cost. In addition, when Y1 < T , we also need to integrate over all possible unplanned replacement time in the interval (0, T ) to find the expected minimal repair cost: 4 5 E(C 1∗ ) = Pr(Y1 > T ) c p + E(A(T )) + Pr(Y1 < T )cu +
T
E(A(y)) d Fk (y)
0
= c p (1 − Fk (T )) + E(A(T ))[1 − Fk (T )] + cu Fk (T ) + Fk (T )E(A(T )) T Fk (y) d E(A(y)) − 0
= c p (1 − Fk (T )) + cu Fk (T ) +
k−1 T j=0 0
[1 − Fk (y)]c j (y) p j (y)λ(y) dy.
As a result, we have the following objective function: c p (1 − Fk (T )) + cu Fk (T ) + C(T ) = T 0
k−1 T j=0 0
[1 − Fk (y)]c j (y) p j (y)λ(y) dy
[1 − Fk (x)] d x
.
(8.57) The objective function C(T ) in equation (8.57) can be minimized with a suitable optimization technique. If λ(y), c j (y), and p j (y) are all continuous, we can take the first-order derivative of C(T ) with respect to T . Under this continuity assumption, Sheu and Liou [231] prove that there exists a unique optimal solution under certain conditions. Special cases of k-out-of-n:F systems are studied as examples. Exercise 1. Derive equation (8.57). 8.4.4 Nonlethal Common-Cause Failures Nonlethal common-cause failures are more difficult to model as their occurrence does not necessarily cause the system to fail right away. The occurrence of a common-cause failure may cause a specific subset of components to fail or a random number of components to fail when the components are i.i.d. We need to consider
DUAL FAILURE MODES
311
both the frequency of occurrence of nonlethal common-cause failures and the impact on the system each time they occur. Chari [52] attempts to consider both lethal and nonlethal common-cause failures to determine the number of i.i.d. components in the system needed to minimize system mean cost rate. The number of components that are failed due to a nonlethal common cause is assumed to follow the binomial distribution with parameters n and p, where p is the intrinsic reliability of each i.i.d. component at the time the nonlethal common cause occurs. However, nonlethal common-cause failures are not correctly considered in derivation of the system reliability function. More research is needed on how to take nonlethal common-cause failures into consideration in the system reliability analysis of k-out-of-n systems.
8.5 DUAL FAILURE MODES There are two well-known examples of components or systems subject to dual failure modes. One is the structure of a relay circuit. When a relay circuit is energized and thus required to close, it may fail to do so due to the presence of dust or other insulating media. When it is deenergized and thus required to open, it may fail to do so because the contacts are stuck together due to overheating. The other example is a home security system. It may fail to detect a break-in due to mechanical or electrical circuit failures. It may also create a false alarm due to the presence of a pet. Dual failure modes are also referred to as competing failure modes. A review of research on systems with dual failure modes was provided by Lesanovsky in 1993 [139]. It is important that we clarify the terminology that is to be used to describe dual failure modes. When we say that a component fails to open, it means that the component is stuck in the closed mode. Thus, a component that fails to open is said to be failed closed. The term failed closed is also referred to as failed short, as in a short circuit. When we say that a component fails to close, it means that the component is stuck in the open mode. Thus, a component that fails to close is said to be failed open. This may be confusing, as different terms are used by different authors. For example, in some papers qo is used to indicate the probability that a component fails to open and qc is used to indicate the probability that a component fails to close. In other papers, the meanings of these two symbols are completely reversed, namely, qo is used to indicate the probability that a component is failed open and qc is used to indicate the probability that a component is failed closed. Other terms used to describe dual failure modes include failure to operate and failure to idle. Cautions should be taken when comparing equations published in different articles on the topic of dual failure modes. As we have discussed before, some systems may experience two mutually exclusive failure modes, namely, failure to open and failure to close. A component may fail to open on demand or fail to close on demand. Because of these component failures, the system may fail to operate properly. In a k-out-of-n system with i.i.d. components subject to dual failure modes, the system may fail when too many components fail to open or when too many components fail to close. The numbers of components
312
DESIGN OF k-OUT-OF-n SYSTEMS
that have to fail to open and fail to close for the system to fail are often different. For convenience of reference, we provide the definition of k-out-of-n:G systems with dual failure modes in the next paragraph. A k-out-of-n:G system subject to dual failure modes is defined to be closed (working) if at least k components are closed (working) and defined to be open (working) if at least n − k + 1 components are open (working). Otherwise the system is defined to be failed. Thus, the system fails to close (stuck open or failed in open mode) if at least n − k + 1 components fail to close and fails to open (stuck closed or failed in close mode) if at least k components fail to open. Based on the dual relationship between a k-out-of-n:G and a k-out-of-n:F system, we can similarly define the kout-of-n:F systems with dual failure modes. In the following discussions, we focus on the k-out-of-n:G systems with dual failure modes and assume that all components are i.i.d. unless stated otherwise. Notation • • • • • • • • • • •
qo : probability of a component’s failure to open or its being failed short or failed closed qc : probability of a component’s failure to close or its being failed open po : 1 − q o pc : 1 − qc r : 1 − qo − qc , the probability that a component works properly s: qo /(qc + qo ), the probability that a failure is a failure to open Ro : probability that the system does not fail to open Q o : 1 − Ro Rc : probability that the system does not fail to close Q c : 1 − Rc Rs : reliability of the system
System reliability can be expressed as a function of component failure probabilities, qo and qc . An alternative is to express it as a function of component working probability, r , and the probability for a component failure to be a failure to open, s. Both qo and qc can be expressed as functions of r and s: qo = s(1 − r ),
po = 1 − s(1 − r ),
qc = (1 − s)(1 − r ),
pc = 1 − (1 − s)(1 − r ) = s(1 − r ) + r.
The probability that the system does not fail to open is equal to the probability that at least n − k + 1 components do not fail to open or at most k − 1 components fail to open: n n n i n−i k−1 n n−i i n−i i po qo = po qo = 1 − p qo . Ro = i i i o i=k i=n−k+1 i=0 n
(8.58)
313
DUAL FAILURE MODES
The probability that the system does not fail to close is equal to the probability that at least k components do not fail to close: Rc =
n n
i
i=k
pci qcn−i .
(8.59)
Based on Barlow et al. [21], the reliability of any system subject to mutually exclusive dual failure modes is equal to the probability that it does not fail in one mode subtracted by the probability that it fails in the other mode: Rs = Ro − (1 − Rc ) = Ro + Rc − 1 =
n n i=k
i
pci qcn−i
−
n n i=k
i
pon−i qoi . (8.60)
If we define the function Vk,n (x) =
n n i=k
i
x i (1 − x)n−i ,
(8.61)
then equation (8.60) can be written as Rs = Vk,n ( pc ) − Vk,n (qo ) = Vk,n (1 − qc ) − Vk,n (qo ) = Vk,n (s(1 − r ) + r ) − Vk,n (s(1 − r )).
(8.62)
To indicate that system reliability is dependent on k, n, s, and r , we will use Rk,n (s, r ) to represent system reliability: Rk,n (s, r ) ≡ Rs = Vk,n (s(1 − r ) + r ) − Vk,n (s(1 − r )).
(8.63)
8.5.1 Optimal k or n Value to Maximize System Reliability Each component may fail to open or fail to close. Thus, the system may fail to open or fail to close. If the value of k increases, we can see that Ro will increase from equation (8.58) and, at the same time, Rc will decrease from equation (8.59). There should be a k value that maximizes system reliability as expressed in equation (8.60). Ben-Dov [27] optimizes the k value to maximize the reliability of a k-out-of-n:G system subject to dual failure modes. When qo , qc , and n are fixed, the reliability of such a system is only dependent on the k value. Thus, we can write Rs in equation (8.60) as Rs (k). Theorem 8.5 (Ben-Dov [27]) Rs (k) is attained at
For fixed n, qo , and qc , the maximum value of k ∗ = k0 + 1,
(8.64)
314
DESIGN OF k-OUT-OF-n SYSTEMS
where k0 =
n ln (qc / po ) . ln [qc qo /( pc po )]
(8.65)
If k0 is an integer, both k0 and k0 + 1 maximize Rs (k). To prove this theorem, we examine Rs (k), namely, the change in Rs (k) when k goes to k + 1. This change would be zero when k takes the value of k0 . Noting that pc > qo and po > qc based on the definitions of these individual parameters, we conclude Rs (k) > 0 when k < k0 and Rs (k) < 0 when k > k0 . Thus, k0 + 1 maximizes Rs (k). Another interesting result attributed to Ben-Dov [27] is the observation that k0 expressed in equation (8.65) [and thus, k ∗ expressed in equation (8.64)] is an increasing function of qo and a decreasing function of qc . This result agrees with our intuition. If qo increases, each component is more likely to fail to open. Since at least n − k + 1 components have to open properly for the system to open properly, the probability for the system to open properly is increased if n − k + 1 is decreased or, equivalently, k is increased. As a result, k ∗ increases with qo . If qc is increased, each component is more likely to fail to close. Since at least k components have to close properly for the system to close properly, the probability for the system to close properly can be increased if k is reduced. As a result, k ∗ decreases with qc . Since s is the probability for a failure to be a failure to open (0 ≤ s ≤ 1), we can say that k ∗ is an increasing function of s. This result is further illustrated in the following example. Example 8.2 Consider k-out-of-3:G systems with i.i.d. components, where k = 1, 2, 3. Substituting these different k values in equation (8.62), we can obtain system reliability functions Rs (k). When k = 1, we have a parallel system. When k = 3, we have a series system: Rs (1) = (1 − qc3 ) + po3 − 1 = [1 − s(1 − r )]3 − (1 − s)3 (1 − r )3 , Rs (2) = 3 pc2 qc + pc3 − 3 po qo2 − qo3 = [s(1 − r ) + r ]2 [2(1 − s)(1 − r ) + 1] − s 2 (1 − r )2 [3 − 2s(1 − r )] , Rs (3) = pc3 + (1 − qo3 ) − 1 = [s(1 − r ) + r ]3 − s 3 (1 − r )3 . Now let r = 0.7. Then the Rs (k) functions for k = 1, 2, 3 are functions of s for 0 ≤ s ≤ 1, which are plotted in Figure 8.7. From this figure, Rs (1) is the largest when s is small (close to zero), Rs (3) is the largest when s is large (close to 1), and Rs (2) is the largest when s is in the middle (around 0.5). This means that as s increases, we prefer to have a k-out-of-n:G system with a larger k value. To be more specific, when 0 ≤ s < 0.172, a parallel system is the best structure. When 0.172 < s < 0.828, a 2-out-of-3:G structure is the best. When 0.828 < s ≤ 1, a series structure is the best. Each of the three system structures, namely, 1-out-of-3:G, 2-out-of-3:G, and 3-out-of-3:G, is optimal in a certain range of values of s. The domain of s, [0, 1], is
DUAL FAILURE MODES
315
1.4 1.2
RS(k)
1.0
RS(1)
RS(3)
RS(2)
0.8 0.6 0.4 0.2 0.172
0 0 FIGURE 8.7
0.2
0.828 0.4
0.6 s
0.8
1.0
Reliability of k-out-of-n:G systems with dual failure modes.
divided into three regions such that one of the three system structures is optimal in a distinct region. More general results than the ones described above are reported by Phillips [195]. The k-out-of-n:G systems with k ∈ {1, 2, . . . , n} are considered to be a group of systems. We are interested in determining which system, that is, which k value, will give the highest system reliability for different component unreliability values. Phillips [195] discovers that this optimal k value can be completely specified by the value of s for 0 ≤ s ≤ 1 for any given 0 < r < 1. In the following discussions, we assume that 0 < r < 1, that is, component reliability is neither 0 nor 1. Using equation (8.63), we can easily verify the following for 1 ≤ k ≤ n − 1: Rk,n (0, r ) > Rk+1,n (0, r ),
(8.66)
Rk,n (1, r ) < Rk+1,n (1, r ).
(8.67)
These two relations indicate that R1,n (s, r ) = max1≤k≤n Rk,n (s, r ) for s in an interval including 0 as its lower limit and Rn,n (s, r ) = max1≤k≤n Rk,n (s, r ) for s in an interval including 1 as its upper limit. In other words, the parallel system (1-out-ofn:G) is the optimal structure when s is close to 0 and the series system (n-out-of-n:G) is the optimal structure when s is close to 1. Let xk,n be the value of s that satisfies the following equation: Rk,n (s, r ) = Rk+1,n (s, r ).
(8.68)
Using equation (8.63), we can easily verify that such a root is unique. In addition, we have xk,n < xk+1,n .
316
DESIGN OF k-OUT-OF-n SYSTEMS
This means that the k-out-of-n:G system is the best structure when s is in the interval (xk−1,n , xk,n ). To be more specific, we can say that the 1-out-of-n:G structure is the best when 0 ≤ s < x 1,n , the 2-out-of-n:G structure is the best when x 1,n < s < x2,n , . . . , and the n-out-of-n:G structure is the best when x n−1,n < s ≤ 1. In Example 8.2, where r = 0.7 and n = 3, we have x1,3 ≈ 0.172 and x 2,3 ≈ 0.828. Some other properties of xk,n are summarized as follows: •
xk,n = 1 − x n−k,n for k = 1, 2, . . . , n − 1.
•
xk,2k = 12 for k = 1, 2, . . . . Rk,n (s, r ) attains its maximum value at xk−1,n−1 for 2 ≤ k ≤ n − 1. For specified s and r values, increasing n will always increase system reliability as long as the optimal k value is used. = > limr →0 xk,n = {k/n}. In words, if the reliability of each component is very low (close to zero), the k-out-of-n:G structures for k = 1, 2, . . . , n will equally share the interval (0, 1) of s, that is, xk+1,n − xk,n = 1/n for 0 ≤ k ≤ n − 1, where x0,n ≡ 0: 0 for k < n2 , lim xk,n = 1 for k > n2 , r →1 1 for k = n2 . 2
• • •
•
In words, as r gets closer to 1, the lengths of the intervals xk+1,n − x k,n becomes closer to 0 except for the following two cases:
1.0 x1,6
System Reliability
0.9
x5,6
0.8
n=6
x2,6
x1,5
x4,6
x3,6 x2,5
x1,4
n=5
x3,5
x4,5 x3,4
n=4 x2,4
0.7
n=3
x1,3
0.6
x2,3
n=2 x1,2
n=1
0.5 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Proportion of Failures to Open, s FIGURE 8.8 System reliability of best structures as function of s with given 1 ≤ n ≤ 6 and c r = 0.5. (From [195], 1980 IEEE.)
317
DUAL FAILURE MODES •
•
When n is odd and k = (n −1)/2: In this case, the interval (x k,2k+1 , xk+1,2k+1 ) converges to interval (0, 1). In other words, when component reliability r is high, no matter how component failure probability is distributed among failure to open and failure to close (0 < s < 1), the (k + 1)-out-of-(2k + 1):G system structure is always the best. When n is even and k = n/2 − 1 or k = n/2: In this case, xk,2k − xk−1,2k approaches 12 and xk+1,2k − xk,2k approaches 12 . In other words, the k-out-of2k:G system is optimal for 0 ≤ s < 0.5 and the (k + 1)-out-of-2k:G system is optimal for 0.5 < s < 1.
Figure 8.8 shows the system reliability of the best structures (or k values) for different s values (0 ≤ s ≤ 1) and n values (1 ≤ n ≤ 6) with r = 0.5. For fixed k, qo , and qc , one may also determine the optimal n value to maximize system reliability. The results will be analogous to the ones that have been presented in terms of k. Readers may refer to Pham [187] for such studies. 8.5.2 Optimal k or n Value to Maximize System Profit In the previous section, we considered the maximization of system reliability through determination of an optimal k or n value. In this section, we consider a different objective function in optimal system design, namely maximization of average profit. Such studies are reported by Sah and Stiglitz [212], Sah [211], and Pham and Pham [193]. The system under consideration is required to be in two desirable states, sometimes closed and sometimes open. The problem is that the system may fail to be in the desired state when needed. Thus, the system may open successfully, fails to open, closes successfully, or fails to close. We will temporarily use α to indicate the probability that the system is desired to be closed. Then, the probability that the system is desired to be open is 1 − α. Let B1 , B2 , B3 , and B4 be the profits generated by the system when it closes successfully, fails to close, opens successfully, and fails to open, respectively. It is possible for B2 or B4 to both be negative. The average profit can then be expressed as Average profit = α[B1 Rc + B2 (1 − Rc )] + (1 − α)[B3 Ro + B4 (1 − Ro )] = c(Rc − a Q o ) + b,
(8.69)
where c = (1 − α)(B1 − B2 ), a = (1 − α)(B3 − B4 )/α(B1 − B2 ), and b = α B2 + (1 − α)B3 . Since c > 0 and b is constant, maximizing the average profit is equivalent to maximizing Rc − a Q o . As a result, the following form of objective function is used when one is interested in maximizing average system profit: U (k, n) = Rc − a Q o ,
(8.70)
where a (≥ 0) can be interpreted as the loss in profit when the system goes from closing successfully to failing to close, relative to the gain in profit when the system
318
DESIGN OF k-OUT-OF-n SYSTEMS
goes from failing to open to closing successfully. A mathematical form of a can be found in equation (8.69). When a = 1, equation (8.70) gives the expression of system reliability. In other words, when a = 1, maximizing system profit is equivalent to maximizing system reliability. We consider only two possible decision variables, k and/or n. They can take only positive integer values. The following results can be derived through examination of the change in the objective function when a decision variable is incremented by one unit. Thus, no detailed proofs are provided. Note that the definitions of qo and pc in Pham and Pham [193] are different from those we use in this book. Theorem 8.6 (Pham and Pham [193]) When k is fixed, the maximum value of U (k, n) is attained at n ∗ : k if n 0 < k, ∗ n = (8.71) n 0 + 1 if n 0 ≥ k, where n0 =
ln a + k ln [ po pc /(qo qc )] − 1. ln ( po /qc )
(8.72)
If n 0 ≥ k and n 0 is an integer, both n 0 and n 0 + 1 maximize U (k, n). Theorem 8.7 (Pham and Pham [193]) U (k, n) is attained at k ∗ : 0 ∗ k = k0 + 1 n
When n is fixed, the maximum value of if k0 < 0, if 0 ≤ k0 < n, if k0 ≥ n,
(8.73)
where k0 =
n ln ( po /qc ) − ln a . ln [ pc po /(qc qo )]
(8.74)
If k0 is an integer, both k0 and k0 + 1 maximize U (k, n). We note that when a = 1, the problem and solutions reduce to the ones covered in the previous section, namely maximizing system reliability. When both k and n are treated as decision variables, there does not exist a unique optimal solution (n ∗ , k ∗ ) [193]. For any selected n value, we can use Theorem 8.7 to find the best k value, k ∗ . We can see that k ∗ increases in n. When n → ∞, we have k ∗ → ∞, Rc → 1, Q o → 0, and U (k ∗ , n) → 1. Some sensitivity analyses are performed on the effects of parameter changes on the optimal k value and the resulting system profit by Sah [211].
DUAL FAILURE MODES
319
Exercises 1. Prove Theorem 8.6. 2. Prove Theorem 8.7. 3. Consider the k-out-of-(2k − 1):G system. 8.5.3 Optimal k and n Values to Minimize System Cost The objective functions used in the previous two sections are only functions of system working and failure probabilities. Noting that system size n affects not only system reliability but also system cost directly, Pham and Malon [192] introduce the following system cost function to be minimized: T (k, n) = co Q o + cc Q c + dn,
(8.75)
where co and cc are the costs when the system fails to open and to close, respectively, and d is the cost of each i.i.d. component. Again one has to note that the definitions of qo and qc used in Pham and Malon [192] are different from those used in this book. As indicated by the objective function, there are two possible decision variables, n and k. The following results are given for determination of k given n, n given k, and n and k jointly. Theorem 8.8 (Pham and Malon [192]) T (k, n) is attained at
When n is specified, the minimum of
k ∗ = min{n, max{1, k0 + 1}},
(8.76)
n ln ( po /qc ) − ln (cc /co ) . ln [ pc po /(qc qo )]
(8.77)
where k0 =
If k0 is a positive integer, then both k0 and k0 + 1 minimize T (k, n). To find n that minimizes T (k, n) with k fixed, we first need to define some additional notation: γ1 = a=
cc co
(qo − δ)γ1n f (n) = pc − δ
po qc
γ2 =
B = aγ1k−1 γ2k
pc , qo δ=
k−1 , n+1
γ1n−k+1 n k n−k+1 a− , p q h(n) = k−1 c c γ2k
320
DESIGN OF k-OUT-OF-n SYSTEMS
? n0 =
C
@ ln a + k ln(γ1 γ2 ) , ln γ1
n1 =
D k−1 −1 , pc
for k ≤ n 2 ≤ n 1 , n 2 = f −1 (B) , d n 3 = inf n ∈ [n 2 , n 0 ] : h(n) < . co The following procedure can then be used to find n ∗ that minimizes T (k, n) given k. The first condition that is true is used: • • • • •
If n 0 ≤ k, n ∗ = k. If h(n 2 ) < d/co , n ∗ = k. If h(k) ≥ d/co , n ∗ = n 3 . If T (k, k) < T (k, n 3 ), n ∗ = n 3 . Otherwise, n ∗ = k.
If we are to find n ∗ and k ∗ simultaneously, the following theorem is provided. Theorem 8.9 (Pham and Malon [192]) where kn ∗ = αn ∗ , α=
ln γ1 , ln(γ1 γ2 )
n∗ ≤
Here, T (k, n) is minimized at (kn ∗ , n ∗ ),
(r 2 /2π) (cc /d)2 + β , α(1 − α)
β =1−
ln a . ln(γ1 γ2 )
For the k-out-of-(2k − 1):G system, there exists a unique k ∗ that minimizes T (k, 2k − 1): , 2d k ∗ = inf k : l(k) < , cc where
( po − 1/2)(qo po )k 2k − 1 1 k l(k) = 2 + (qc pc ) pc − . k−1 a 2
Exercise 1. Under what conditions is the model covered in this section equivalent to the models covered in the previous two sections? Repairable k-out-of-n:G systems with dual failure modes have been studied by some authors, for example, see Moustafa [171]. Components are i.i.d. with the rates of failure to open and failure to close denoted by λo and λc , respectively. The state
OTHER ISSUES
321
i, j-1 o λi-1,j
µci,j
λci,j-1
λoi,j
i, j
i-1, j µoi,j
µci,j+1
i+1, j λci,j
µoi+1,j
i, j+1 FIGURE 8.9 modes.
System state transition diagram for k-out-of-n:G system subject to dual failure
of the system, (i, j), represents that i components fail to open and j components fail to close. The rates of failure to open and failure to close of the system in state o and λc , respectively. The rates to repair a failure to open (i, j) are indicated by λi, j i, j o and and a failure to close when the system is in state (i, j) are represented by µi, j c µi, j , respectively. The system state transition diagram is given in Figure 8.9. The total number of different system states is (n − k)(k − 1) + 1. The mathematical analysis through derivations of differential equations is very tedious for this model. No closed-form solutions are available. More than two disjoint failure modes have been studied. For example, Pham [191] considers k-out-of-n:G systems with three mutually exclusive failure modes. As the number of different failure modes increases, the analysis becomes more tedious. Exercise 1. Analyze the reliability and availability of a 2-out-of-4:G system subject to dual failure modes.
8.6 OTHER ISSUES 8.6.1 Selective Replacement Optimization Consider a system that is used for missions of equal length. At the end of each mission, an inspection is performed to determine the state of each component, working or failed. After the state of each component is determined, a decision is made on which failed components should be replaced by new ones before the system is used for the next mission. We need to balance the costs of replacing failed components with those of possible system failure during the next mission. It is assumed that inspection of the system does not cost anything. The objective is to find the optimal replacement policy that minimizes the long-run average cost per mission period.
322
DESIGN OF k-OUT-OF-n SYSTEMS
This problem can be formulated as a dynamic programming model. However, solving such a dynamic programming model using standard dynamic programming techniques is too time consuming and thus impractical when system size n is large. Flynn et al. [76] show that one can restrict attention to a class of policies for which the computations are not too demanding. The main idea in Ref. [76] is to restrict attention to critical component policies (CCPs). For such a policy, the decisions are determined such that a failed component is replaced if and only if it is in a so-called critical component set. Under this decision rule, failed components that are not in CCPs stay failed at the beginning of the next mission. The authors also prove that there is always a CCP that minimizes the long-run average cost per mission period. Chung and Flynn [56] study such a selective replacement optimization problem for the k-out-of-n:F system structure. A failed component is replaced if and only if it is in a CCP. Computation of such a CCP is a binary nonlinear programming problem, which can be solved by searching through a set with O(n k−1 ) points. This approach is practical for small k values. If all components in the system are i.i.d., the time complexity of the algorithm for finding the CCP is O(n 2k−1 ). 8.6.2 TMR and NMR Structures A triple modular redundant (TMR) system is actually a 2-out-of-3:G or F system structure. A k-out-of-(2k − 1):G or F structure is called an n modular redundant (NMR) system. The TMR system is a special case of the NMR system. It is a simple majority voting system. The system is considered working when more than half of the components are working properly. Such structures have been widely used in the aviation and nuclear industries. In nuclear reactors, three identical and independent counters are often used to monitor the radioactivity of the air in the ventilation system. Whenever at least two out of the three counters register a dangerous level of radioactivity, operators are alerted to the need for reactor shutdown [190]. In the design of the Boeing 777 airplane, TMR structures are used [252]. The microprocessors are considered to be the core of the primary flight computer, and three totally different microprocessors are used. They are the Intel 80486, Motorola 68040, and AMD 29050. At least two such processors are required to work properly for the processor subsystem to be considered working properly. The selection of these dissimilar microprocessors also led to dissimilar interface hardware circuitries. Chen [53] describes two implementations of the TMR structure in the design of a spacecraft control system. The control system has a length of r s in a two-dimensional free space. The position of the system is specified by the x–y coordinates of two weighted wheels at points r and s. The control system constantly adjusts its navigation direction by performing a point-to-point movement from point s to a target point p. Three hardware actuators are used and at least two have to work for the control system to work properly. A base rotational actuator may rotate 360◦ around point s to change the direction of point r . A linear movement actuator moves along r s along the current direction of point S. Another rotational actuator is added around point r . Both a static software implementation and a dynamic software implementation
OTHER ISSUES
323
are described to control the three actuators such that they function as a 2-out-of-3:G system. The NMR structures are widely used in critical systems because they use simple majority rule; that is, at least half of the components have to work for the system to work. Because of the high cost in each of these components, the system size is often fairly small. For example, in the TMR system, n = 3. Quite often the n values in the NMR systems are not much larger than 3. To further increase the reliability of NMR systems, designers sometimes use additional spare components. Whenever one of the active components is failed, a spare is switched into operation. The techniques used for reliability analysis and optimal design of NMR systems are the same as those used for k-out-of-n:G systems as they are special cases of the k-out-of-n:G systems. 8.6.3 Installation Time of Repaired Components Gupta and Sharma [90] consider the issue of steady-state availability evaluation when switching a component from off-line to on-line takes a random amount of time. Each component, then, may be in one of three possible states, on-line and working, off-line and failed (waiting for repair or being repaired), and off-line and in working condition (waiting for installation or being installed). The detection and removal of failed components are assumed to be instant. Exponential distribution is used to model working time, repair time, and installation time of each component. It is assumed that there are unlimited repair and installation facilities available. The maximum number of failed components allowed may be n; in this case, components may continue to fail even when the system is down (the number of good components is less than k). The maximum number of failed components allowed in the system may also be n − k + 1; in this case, as soon as the number of good components is less than k, the system is down and these good components will not fail before the system is brought to the working state again. Gupta and Sharma [91] extend their work [90] by assuming that there are additional spare units available and the numbers of repair and installation facilities are limited. The system has n active components. There are m additional spare units in standby. The numbers of repair and installation facilities are indicated by r and u, respectively. Both cases described above may be analyzed using the Markov chain technique. However, Gupta and Sharma [90, 91] propose a method to determine the transition diagram of the systems without calculating the state transition matrix. A computerbased algorithm is developed for evaluation of steady-state system availability. 8.6.4 Combinations of Factors In this chapter, we have studied the k-out-of-n systems considering various factors such as cold and warm standby units, imperfect sensing and switching mechanisms, human errors and other common-cause failures, and imperfect fault coverage. Introduction of these factors often complicates the task of reliability analysis, as demon-
324
DESIGN OF k-OUT-OF-n SYSTEMS
strated in previous sections. It is obviously even more complicated to deal with all of these factors simultaneously. Some researchers have made attempts to do so. However, so far no simple and efficient frameworks have been developed for such analyses. For research on some of these issues, readers are referred to Sur [235]. 8.6.5 Partial Ordering The k-out-of-n:G system with i.i.d. components preserves some orderings in the components. Barlow and Proschan [22] show that convex and star ordering are preserved under the formation of a k-out-of-n:G system with i.i.d. components. Singh and Vijayasree [232] show that the likelihood ratio ordering, failure rate ordering, and stochastic ordering are also preserved under the formation of a k-out-of-n:G system with i.i.d. components. Counterexamples are given to show that the k-outof-n:G system structure generally does not preserve ordering in variability, mean residual life, or harmonic-average mean residual life.
9 CONSECUTIVE-k-OUT-OF-n SYSTEMS
Consecutive-k-out-of-n system models have been proposed for system reliability evaluation and the design of integrated circuits, microwave relay stations in telecommunications, oil pipeline systems, vacuum systems in accelerators, computer ring networks (k loop), and spacecraft relay stations. Such systems are characterized by logical or physical connections among components in lines or circles. Suppose that n components are linearly (circularly) connected in such a way that the system malfunctions if and only if at least k consecutive components fail. This type of structure is called the linear (circular) consecutive-k-out-of-n:F system, denoted by linear (circular) Con/k/n:F in short. If the system works if and only if at least k consecutive components work, the system structure is called the linear (circular) consecutive-k-out-of-n:G system, or linear (circular) Con/k/n:G system in short. The consecutive-k-out-of-n systems (both linear and circular) include the series and the parallel systems as special cases. For example, when k = 1, the linear and circular consecutive-k-out-of-n:F system becomes the series system, and when k = n, the linear and circular consecutive-k-out-of-n:F system becomes the parallel system. Figure 9.1 illustrates a linear consecutive-3-out-of-6:F system. Whenever the number of consecutive failures is less than 3, the signal flow from the source to the sink is not interrupted and the system works. Figure 9.2 shows a circular consecutive2-out-of-8:F system. Whenever the number of consecutive failures reaches 2, the circular signal flow is interrupted and the system fails. For a circular system, we usually assume that the components are labeled from 1 to n clockwise. The reliability of the Con/k/n:F system was first studied by Kontoleon [118], but the name consecutive-k-out-of-n:F system originates from a paper by Chiang and Niu [54]. To give a clear picture about the Con/k/n:F and G systems to the readers, we list the following examples provided by authors such as Chiang and Niu 325
326
CONSECUTIVE-k-OUT-OF-n SYSTEMS
1
2
3
4
5
6
FIGURE 9.1 Linear consecutive-3-out-of-6:F system.
1 8
2
7
3
6
4 5
FIGURE 9.2
Circular consecutive-2-out-of-8:F system.
[54], Bollinger and Salvia [39], Chao and Lin [51], Hwang [102], Kuo et al. [134], and Zuo and Kuo [260]. Example 9.1 (Microwave Stations of a Telecom Network) A sequence of n microwave stations relay signals from place A to place B. Stations are equally spaced between places A and B. Each microwave station is able to transmit signals to a distance including k other microwave stations. It is evident that such a system fails if and only if at least k consecutive microwave stations fail. The reliabilities of the stations may be different because of differences in environmental conditions and operational procedures among the individual microwave stations and station failures are likely to be s-independent. Example 9.2 (Oil Pipeline System) Consider a pipeline system for transporting oil from point A to point B by n pump stations. Pump stations are equally spaced between points A and B. Assume that each pump station is able to transport oil to a distance including k(k > 1) other pump stations. If one pump station is down, the flow of oil is not interrupted because the following k − 1 stations can still carry the load. However, when at least k consecutive pump stations fail, the oil flow is interrupted and the system fails. In this case, it is most likely that the pump failure
CONSECUTIVE-k-OUT-OF-n SYSTEMS
327
probabilities are dependent. This is because the load on neighboring pumps increases when a pump fails. Example 9.3 (Vacuum System in an Electron Accelerator) In the vacuum system of an electronic accelerator, the core consists of a large number (500–800) of identical components (vacuum bulbs). The vacuum system fails if at least a certain number of components that are adjacent to one another fail. The components are placed sequentially along a ring. To assure a minimum reliability requirement of the vacuum system for a specified period of time, a minimum warranty request needs to be figured out and forwarded to the manufacturer of the components. This is a good example of the circular Con/k/n:F system. Example 9.4 (Photographing of a Nuclear Accelerator) In analysis of the acceleration activities that occur in a nuclear accelerator, high-speed cameras are used to take pictures of the activities. Because of the high speed of the activities and the high cost involved in implementing such an experiment, the photographing system must be very reliable and accurate. A set of n cameras are installed around the accelerator. If and only if at least k consecutive cameras work properly can the photographing system work properly. The problems of interest include the evaluation of the reliability of the photographing system and the optimal arrangement of the cameras with different reliabilities. This is an example of the circular Con/k/n:G system. In this chapter, we provide a thorough coverage of consecutive-k-out-of-n system structures. The covered topics include algorithms for system reliability evaluations, optimal system design, and system life distributions. Throughout this chapter, we use the following nomenclature for the consecutive-k-out-of-n systems: • • • •
Lin/Con/k/n:F: linear consecutive-k-out-of-n:F Lin/Con/k/n:G: linear consecutive-k-out-of-n:G Cir/Con/k/n:F: circular consecutive-k-out-of-n:F Cir/Con/k/n:G: circular consecutive-k-out-of-n:G
We adopt the following assumptions unless specified otherwise: 1. The system and its components may only be in two possible states: working or failed. 2. The failures of components are independent. 3. The components in a circular system are labeled clockwise from 1 to n. Notation • •
n: number of components in a system k: minimum number of consecutive working (failed) components required for the system to work (fail)
328 • • • • • • • • • •
• • • • • • • • •
CONSECUTIVE-k-OUT-OF-n SYSTEMS
p: component reliability in a system with i.i.d. components q: component unreliability in a system with i.i.d. components, q = 1 − p pi : reliability of component i, i = 1, 2, . . . , n qi : unreliability of component i, qi = 1 − pi , i = 1, 2, . . . , n Ii : reliability importance of component i, i = 1, 2, . . . , n R L (k, n): reliability of a Lin/Con/k/n:F system, sometimes explicitly denoted by R L (k, p1 , . . . , pn ) Q L (k, n): unreliability of a Lin/Con/k/n:F system, Q L (k, n) = 1− R L (k, n) RC (k, n): reliability of a Cir/Con/k/n:F system Q C (k, n): unreliability of a Cir/Con/k/n:F system, Q C (k, n) = 1 − RC (k, n) Ri (k, n −1): reliability of a Lin/Con/k/(n −1):F subsystem consisting of components i + 1, . . . , n, 1, . . . , i − 1 for i = 1, . . . , n. Sometimes it is explicitly denoted by Ri (k, pi+1 , . . . , pn , p1 , . . . , pi−1 ). R L (k, j): reliability of a Lin/Con/k/j:F subsystem consisting of components n− j +1, . . . , n. Sometimes it is explicitly denoted by R L (k, pn− j+1 , . . . , pn ). R L (k, (i, j)): reliability of a Lin/Con/k/( j − i + 1):F subsystem consisting of components i, i + 1, . . . , j Q L (k, (i, j)): Q L (k, (i, j)) = 1 − R L (k, (i, j)) : permutation vector (π(1), π(2), . . . , π(n)) a: greatest integer lower bound of a l L (k, n): lower bound on reliability of the Lin/Con/k/n:F system u L (k, n): upper bound on reliability of the Lin/Con/k/n:F system lC (k, n): lower bound on reliability of the Cir/Con/k/n:F system u C (k, n): upper bound on reliability of the Cir/Con/k/n:F system
9.1 SYSTEM RELIABILITY EVALUATION 9.1.1 Systems with i.i.d. Components In this section, we consider both Lin and Cir/Con/k/n:F systems with i.i.d. components. Most researchers have used the combinatorial approach for reliability evaluation of such systems. With the combinatorial approach, one is able to find an explicit expression of system reliability as a function of component reliability p. However, the disadvantage is that one often has to evaluate the factor in front of the terms pi q j . Linear Consecutive-k-out-of-n:F Systems Chiang and Niu [54] developed a closed formula for the reliability of a Lin/Con/2/n:F system. Let j indicate the number of failed components in the system. If j > (n + 1)/2, there must be a pair of failed components that are adjacent to each other. For the system to work, j must be less than or equal to (n + 1)/2. If j < (n + 1)/2, the system works if there is at least one working component between every two failed components. The number of ways to arrange n balls including j black balls and n − j white balls along a line such
SYSTEM RELIABILITY EVALUATION
329
that no two black balls are adjacent to each other is equal to n− jj+1 . This result is obvious based on the following observations. The n − j white balls divide the line segment of the system into n − j + 1 cells. Each cell may have either zero or one black ball. Since there are j failed components, j of these n − j + 1 cells will have exactly one ball while the remaining cells will have no ball at all. The number
of ways to pick j cells out of n − j + 1 cells is equal to n− jj+1 . As a result, the reliability of a Lin/Con/2/n:F system can be expressed as R L (2, n) =
(n+1)/2 j=0
n − j + 1 j n− j . q p j
(9.1)
For a general k value, that is, 2 ≤ k ≤ n − 1, the reliability of a Lin/Con/k/n:F system with i.i.d. components can be expressed in the form R L (k, n) =
n
N ( j, k, n) p n− j q j ,
(9.2)
j=0
where N ( j, k, n) represents the number of ways to arrange j failed components in a line such that no k or more failed components are consecutive. Bollinger and Salvia [39] interpret N ( j, k, n) as the number of binary numbers of length n containing exactly j ones with at most k − 1 ones consecutive. In their interpretation, a binary 1 represents a failed component while a binary 0 represents a working component. Derman et al. [62] interpret N ( j, k, n) as the number of ways to place j identical balls in n − j + 1 distinct urns subject to the restriction that at most k − 1 balls may be placed in any single urn. In this interpretation, the n − j working components divide the Lin/Con/k/n:F system into n − j + 1 segments. The first segment is to the left of the first working component and the last segment is to the right of the last working component. Between every two working components that are close to each other is a different segment. Each of these n − j + 1 segments may contain a certain number of failed components. This number may be 0, 1, 2, . . . , j. However for the system to work, the number of failed components in each segment must be less than k. If each of these n − j + 1 segments is considered to be a different urn, the problem is then to find the number of ways to allocate j identical balls to these urns such that the number of balls in each urn is less than k. Researchers using the combinatorial approach have been concentrating on the development of algorithms for evaluation of N ( j, k, n). In terms of the notation introduced in equation (9.2), we can express N ( j, k, n) when k = 2 as follows: @ ? n+1 < j ≤ n, 0, 2 (9.3) N ( j, 2, n) = ? @ n+1 n− j +1 , 0≤ j≤ . j 2
330
CONSECUTIVE-k-OUT-OF-n SYSTEMS
This expression of N ( j, 2, n) has been reported by Chiang and Niu [54]. When k = 3, Derman et al. [62] provide the following closed expression of N ( j, 3, n): n− j+1
N ( j, 3, n) =
i=n−2 j−1
n− j +1 n− j +1−i . j − 2i i
(9.4)
This expression is a special case of a more general expression of M( j, m, r ) developed by Derman et al. [62], as stated in the following lemma. Note that N ( j, k, n) = M( j, k − 1, n − j + 1). Lemma 9.1 (Derman et al. [62]) Let M( j, m, r ) indicate the number of ways in which j identical balls can be placed in r distinct urns subject to the requirement that at most m balls may be placed in any one urn: r , 1 ≤ j ≤ r, j M( j, 1, r ) = (9.5) 0, j > r, r −i r , M( j, 2, r ) = j − 2i i i=max{ j−r,0} min{ j/2,r }
M( j, m, r ) =
r r i=0
i
M( j − mi, m − 1, r − i),
(9.6)
m ≥ 3.
(9.7)
Proof When m = 1, each of the r urns have either zero or one ball. Then M( j, 1, r ) is equal to the number of ways to divide the r urns into two groups, one group with j urns and the other with r − j urns. Each of the j urns in the first group has exactly one ball in it. The total number of ways to make such an allocation is equal to rj . If j > r , such an allocation is impossible. When m = 2, let i indicate the number of urns that are allocated exactly two balls (i = 0, 1, . . . , r ). If j > r , at least j − r urns will be allocated exactly two balls each because of the limitation of m = 2. The number of ways to pick these i urns for allocation of exactly two balls each is equal to ri . The number of remaining urns is equal to r −i and the number of remaining balls to be allocated to the remaining urns is equal to j − 2i. Each of the remaining urns may be allocated at most one ball. The number of ways to allocate the remaining balls to the remaining urns is represented by M( j − 2i, 1, r − i). Thus, for each i such that 0 ≤ i ≤ j/2 and 0 ≤ i ≤ r , the total number of ways to allocate the balls is equal to ri M( j − 2i, 1, r − i). As a result, we have r −i r M( j, 2, r ) = . j − 2i i i=max{ j−r,0} min{ j/2,r }
For m ≥ 3, let i indicate the number of urns that will be allocated exactly m balls. Similar arguments can be used to determine the number of ways to pick these i urns
SYSTEM RELIABILITY EVALUATION
331
and then allocate the remaining balls to the remaining urns with at most m −1 balls in each of the remaining urns. The recursive formula given in equation (9.7) is obtained as a result. Using Lemma 9.1, we have the following general recursive equation for N ( j, k, n): N ( j, k, n) = M( j, k − 1, n − j + 1) n− j+1 n− j +1 = i i=0 × N ( j − i(k − 1), k − 2, n − j − i + 1)
k > 3.
(9.8)
Bollinger and Salvia [39] report another approach for evaluation of N ( j, k, n), given below: 0, n , j N ( j, k, n) = k N ( j − i + 1, k, n − i),
j = n ≥ k, 0 ≤ j ≤ k − 1,
(9.9)
k ≤ j < n.
i=1
The first two cases of equation (9.9) can be easily verified. For the general case with k ≤ j < n, let component i be the first working component counting from component 1. Then, i has to be less than or equal to k for the system to work. That is why the summation is for 1 ≤ i ≤ k. If component i(1 ≤ i ≤ k) is the first working component, the first i −1 components are failed and the subsystem including components i +1, i +2, . . . , n must have j −i +1 failed components and there should not exist any consecutive k failed components in this n − i component subsystem. The number of ways to have such a remaining subsystem is denoted by N ( j − i + 1, k, n − i). In addition to equation (9.9), Bollinger [38] also provides an approach to derive N ( j, k, n) directly through the construction of a table, Tk , whose rows are indexed by N = 0, 1, 2, . . . , n + 1 and columns indexed by J = 0, 1, 2, . . . , n as follows. If we use C k (N , J ) to represent the entry in the table Tk at the intersection of row N and column J , then we have N ( j, k, n) = C k (n − j + 1, j) for 0 ≤ j ≤ n. Given n and k ≥ 2, form rows 0 through n + 1 and columns 0 through n of the table Tk as follows [38]: Procedure for Generating the Table Tk 1. The row with N = 0 has a single 1 followed by zeros. 2. The row with N = 1 has k consecutive 1’s followed by zeros.
332
CONSECUTIVE-k-OUT-OF-n SYSTEMS
3. Any entry in any row with N ≥ 2 is the sum of the entry just above it and that entry’s k − 1 immediate left neighbors, with zeros making up any shortage near the left-hand side of the table. The following observations are noted for the proposed procedure. 1. When k = 2, the table T2 simply provides the Pascal triangle and the entries are the usual binomial coefficients. 2. The tables Tk are only dependent on the k values. These tables Tk for different k values can be generated and stored. Thus, the entries in Tk can be considered known just like the binomial coefficients. 3. The entries C k (N , J ) in any single row are symmetric. This feature simplifies the generation of the tables. Example 9.5 Find the reliability of a Lin/Con/k/n:F system with n = 11, k = 3, and p = 0.9. Using the approach provided by Bollinger [38], we can generate Table 9.1 with any spreadsheet software. The entries of use for calculation of system reliability are highlighted in the table. Since N ( j, k, n) = Ck (n − j + 1, j), we have the following from Table 9.1: N (0, 3, 11) = C3 (12, 0) = 1,
N (1, 3, 11) = C3 (11, 1) = 11,
N (2, 3, 11) = C3 (10, 2) = 55,
N (3, 3, 11) = C3 (9, 3) = 156,
N (4, 3, 11) = C3 (8, 4) = 266,
N (5, 3, 11) = C3 (7, 5) = 266,
N (6, 3, 11) = C3 (6, 6) = 141,
N (7, 3, 11) = C3 (5, 7) = 30,
TABLE 9.1 Calculation of C3 (N, J) for Lin/Con/3/11:F System J
N
0 1 2 3 4 5 6 7 8 9 10 11 12
0
1
2
3
4
5
6
7
8
9
10
11
1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 10 11 12
0 1 3 6 10 15 21 28 36 45 55 66 78
0 0 2 7 16 30 50 77 112 156 210 275 352
0 0 1 6 19 45 90 161 266 414 615 880 1221
0 0 0 3 16 51 126 266 504 882 1452 2277 3432
0 0 0 1 10 45 141 357 784 1554 2850 4917 8074
0 0 0 0 4 30 126 393 1016 2304 4740 9042 16236
0 0 0 0 1 15 90 357 1107 2907 6765 14355 28314
0 0 0 0 0 5 50 266 1016 3139 8350 19855 43252
0 0 0 0 0 1 21 161 784 2907 8953 24068 58278
0 0 0 0 0 0 6 77 504 2304 8350 25653 69576
SYSTEM RELIABILITY EVALUATION
N (8, 3, 11) = C3 (4, 8) = 1,
N (9, 3, 11) = C3 (3, 9) = 0,
N (10, 3, 11) = C3 (2, 10) = 0,
N (11, 3, 11) = C3 (1, 11) = 0.
333
Using equation (9.2), we find the reliability of the system to be R L (3, 11) =
11
N ( j, 3, 11)0.911− j 0.1 j ≈ 0.9918.
j=0
Equations (9.8) and (9.9) for calculation of N ( j, k, n) are both recursive. A closed-form expression for N ( j, k, n) is independently provided by Goulden [85], Hwang [101], and Lambiris and Papastavridis [137]. In their derivations, Lambiris and Papastavridis [137] and Goulden [85] used generating functions while Hwang [101] applied equation (9.9) due to Bollinger [38]. In the following, we explain the derivation of the closed-form expression of N ( j, k, n) using generating functions. As explained earlier, N ( j, k, n) can be interpreted as the number of ways of allocating j identical balls into n − j + 1 distinct urns subject to the restriction that at most k − 1 balls are placed in any one urn. Define the generating function g(z): g(z) = (1 + z + · · · + zk−1 )n− j+1 .
(9.10)
The expression (1 + z + z 2 + · · · + z k−1 ) represents the number of balls that may be allocated in each urn. In this expression, z has different exponents ranging from 0 to k − 1 and z i is used to indicate that there are i balls in this urn, where 0 ≤ i ≤ k − 1. The exponent n− j +1 indicates that there are exactly n− j +1 urns. After expansions, this generating function becomes a polynomial function of z. The coefficient of z j in the final expression of this generating function represents the number of ways of having j total balls in all urns and no more than k − 1 balls in each urn. The question is how to find the coefficient of z j in g(z). Before we go ahead with the derivation, we introduce a few equations that will be used. From Feller [75], we have −n i n+i −1 = (−1) , n>0 i ≥ 0. (9.11) i i The following equation can be easily verified with Taylor’s expansion: (1 + t)
−n
∞ ∞ −n i i n+i −1 i = (−1) t = t , i i i=0 i=0
n > 0,
−1 < t < 1.
(9.12)
With equations (9.11) and (9.12), we can express g(z) as
g(z) = (1 + z + · · · + z
k−1 n− j+1
)
=
1 − zk 1−z
n− j+1
334
CONSECUTIVE-k-OUT-OF-n SYSTEMS
= (1 − z k )n− j+1 (1 − z)−(n− j+1) n− j+1 ∞ n− j +m m i n − j + 1 ik = (−1) z z i m i=0 m=0 =
j+1 ∞ n− m=0 i=0
n − j + 1 n − j + m ik+m (−1) . z m i i
Setting j = ik + m and noting that n − ik n − ik = , j − ik n− j we get g(z) =
j+1 ∞ n− n − j + 1 n − ik j z . (−1)i n− j i j=ik i=0
(9.13)
The coefficient of z j , denoted by N ( j, k, n), is then expressed as N ( j, k, n) =
n− j+1
(−1)i
i=0
n− j +1 i
n − ik . n− j
(9.14)
Noting that i cannot be larger than j/k, we can rewrite equation (9.14) as N ( j, k, n) =
j/k i=0
(−1)i
n− j +1 i
n − ik . n− j
(9.15)
In equation (9.2), the number of failed components j is allowed to go from 0 to n. It is obvious that when j is n or close to n, the system must be failed. This means that N ( j, k, n) = 0 when j is close to n. In a study of sequential failures, Bollinger and Salvia [40] observe that there exists a maximum number of failed components that the system may experience to reach the failure state. Once the system is failed, no more component failures may be experienced by the system. Satam [216] provides the correct expression for this number. We are interested in a different number for efficient use of equation (9.2). We would like to find the maximum number of total component failures that a Lin/Con/k/n:F system may experience without being failed. We use M to indicate this number. Once this number is found, the upper bound on the value of j in the summation of equation (9.2) becomes available. To find the maximum number of component failures that a Lin/Con/k/n:F system may tolerate without being failed, we need to concentrate on the best arrangement of the working and failed components such that the maximum number of failed components are allowed and the system is still working. One more component failure on top of the M failures would guarantee system failure. If (n + 1)/k is integer, the
335
SYSTEM RELIABILITY EVALUATION
arrangement that allows the maximum number of total component failures without causing system failure would be × ·· · ×! o × ·· · ×! o × ·· · ×! o . . . o × ·· · ×!, k−1
k−1
k−1
k−1
where × indicates a failed component and o indicates a working component. In this case, we have M=
n+1 n+1 (k − 1) = n + 1 − k k
if
n+1 is integer. k
If (n + 1)/k is not integer, the arrangement that allows the maximum number of total component failures without causing system failure would be × ·· · ×! o × ·· · ×! o × ·· · ×! o . . . o × ·· · ×! o × ·· · ×!, k−1
k−1
k−1
k−1
e
where 0 ≤ e ≤ k − 2. In this case, we have @ ? @ ? @ ? n+1 n+1 n+1 (k − 1) + e = (k − 1) + n − k M= k k k ? @ n+1 n+1 =n− is not integer. if k k In summary, we have n+1 n − k + 1 ? @ M= n+1 n − k
if n + 1 is a multiple of k, (9.16) if n + 1 is not a multiple of k.
Equation (9.2) can now be written as R L (k, n) =
M
N ( j, k, n) p n− j q j ,
(9.17)
j=0
where M is given in equation (9.16). In addition to the use of equation (9.2), closed-form expressions of the reliability of the Lin/Con/k/n:F system have been developed. We list without proof the following results provided by Goulden [85] and Hwang [101]: R L (k, n) =
n/(k+1)
(−1)i p i q ki
i=0
R L (k, n) =
(n+1)/(k+1) i=0
n − ki n − k(i + 1) − qk , i i
(−1)i pi−1 q ki
(9.18)
n − ki n − ki + 1 . (9.19) −q i i
336
CONSECUTIVE-k-OUT-OF-n SYSTEMS
When n ≤ 2k, the following closed-form expressions for the reliability of Lin/Con/k/n:F systems are available. 1, 0 ≤ n < k, R L (k, n) = (9.20) k k 1 − q − (n − k) pq , k ≤ n ≤ 2k. Circular Consecutive-k-out-of-n:F Systems Now we consider a Cir/Con/k/n:F system with i.i.d. components, which is a variation of the Lin/Con/k/n:F system. In such a system, the n components are placed on a circle so that the first and the nth components become adjacent to each other. Derman et al. [62] introduced the concept of the Cir/Con/k/n:F system and provided a recursive formula for the reliability evaluation of such a system with i.i.d. components: RC (k, n) = p 2
k−1
(i + 1)q i R L (k, n − i − 2),
(9.21)
i=0
where R L (k, n) indicates the reliability of a Lin/Con/k/n:F system. This equation reduces a circular system reliability evaluation problem into a linear system reliability evaluation problem. For any 0 ≤ i ≤ k − 1, R L (k, n − i − 2) may be evaluated with equation (9.17) or other recursive equations to be introduced later. The following arguments can be used for verification of equation (9.21). Pick a point between any two components of the circular system. Let N and N indicate the number of failed components until the first working component is reached, counting clockwise and counterclockwise, respectively: Pr(N = i) = Pr(N = i) = pq i ,
i = 0, 1, . . . , n − 1.
For i < n − 1, Pr(N + N = i) can be expressed as Pr(N + N = i) =
i
Pr(N = j) Pr(N = i − j)
j=0
=
i
pq j pq i− j = p 2 q i ,
i = 0, 1, 2, . . . , n − 2.
j=0
The reliability of the Cir/Con/k/n:F system is equal to the probability that there is a run of exactly i failures covering the selected point and the remaining n − i − 2 components form a working Lin/Con/k/n:F system, where i may take values from 0 to k − 1. This verifies equation (9.21). A direct combinatorial approach is also available for the Cir/Con/k/n:F systems. Similar to equation (9.17), the following general equation can be used: RC (k, n) =
M j=0
Nc ( j, k, n) pn− j q j ,
(9.22)
SYSTEM RELIABILITY EVALUATION
337
where M is the maximum number of failed components that may exist in the system without causing the system to fail and Nc ( j, k, n) is the number of ways of arranging n components including j failed ones in a circle such that at most k − 1 failed ones are consecutive. Using the same approach as for the linear system, we can derive the value of M as follows. If n is a multiple of k, then M=
n n (k − 1) = n − . k k
If n is not a multiple of k, then, (n ) (n ) (n ) (k − 1) + n − 1 − k =n−1− . M= k k k In summary, we have n n − k M= (n ) n − −1 k
if n is a multiple of k, (9.23) if n is not a multiple of k.
The factor Nc ( j, k, n) can be expressed in terms of N ( j, k, n), which is for the linear systems, as given in the equation Nc ( j, k, n) =
n N ( j, k, n − 1), n− j
0 ≤ j < n.
(9.24)
Goulden [85] provides a proof of this result using generating functions. We provide the arguments given by Hwang and Yao [108] in their proof of this equation. Let Cn denote the set of working Cir/Con/k/n systems with n components including j failed ones (0 ≤ j < n). Thus, less than k consecutive failures exist in each system. We are interested in the number of different working circular systems, denoted by | C n | = Nc ( j, k, n). Let L n−1 denote the set of linear systems with n − 1 components that can be generated by removing a working component from the circular systems in Cn . Each of the linear systems in L n−1 is automatically a working system with j failed components. Some of the systems in L n−1 may be the same. The number of different linear systems is denoted by N ( j, k, n − 1). The same linear system can be produced by n different circular systems. This is due to the fact that rotating the ring of n components of a circular system clockwise by one component produces a different circular system. After n rotations, we come back to the first circular system. Figure 9.3 illustrates five components arranged on a circle. Rotating the system in Figure 9.3a clockwise by one component generates the system in Figure 9.3b, which is changed to Figure 9.3c by one more rotation. Note that the component labels are not rotated in Figure 9.3. This shows that we have different circular systems. However, if working component 2 in Figure 9.3a, working component 3 in Figure 9.3b, and working component 4 in Figure 9.3c are removed, we reach the same linear arrangement of working and failed components, namely, FFSF, where F
338
CONSECUTIVE-k-OUT-OF-n SYSTEMS
1
1
1
5
5
2
4
2
4
3
5
4
3
(a)
(b)
FIGURE 9.3
2
3 (c)
Different circular systems generated from rotations of the circle.
represents a failed component and S a working component. As a result, the number of linear systems that can be generated from the different circular systems in Cn can be expressed as | L n−1 | = n N ( j, k, n − 1).
(9.25)
Another way to express the total number of linear systems that can be generated is to examine the number of ways there are to remove a working component from the circular systems. Since there are n − j working components in each circular system, the total number of linear systems that can be generated from different circular systems can also be expressed as | L n−1 | = (n − j)| Cn |.
(9.26)
Equating equations (9.25) and (9.26) verifies equation (9.24). In addition to using equation (9.22) for reliability evaluation of Cir/Con/k/n:F systems, the following equations due to Goulden [85] and Hwang [101] may also be used: n/(k+1) (−1)i−1 n − ik − 1 pi q ki RC (k, n) = 1 − q n + n , n ≥ k (9.27) i i −1 i=1 RC (k, n) = −q n +
n/(k+1)
(− pq k )i
i=0
+k
n/(k+1)−1 i=0
n − ki i
k i+1
(− pq )
n − k(i + 1) − 1 , i
n ≥ k. (9.28)
The most efficient formula for reliability evaluation of Cir/Con/k/n:F systems with i.i.d. components is not derived with the combinatorial approach. Lambiris and Papastavridis [137] first reported the formula and Du and Hwang [68] and Chang et al. [45] proved this result: RC (k, n) = RC (k, n − 1) − pq k RC (k, n − k − 1),
n ≥ 2k + 1.
(9.29)
SYSTEM RELIABILITY EVALUATION
339
Equation (9.29) for the circular system has a form similar to equation (9.38) for the linear system. As a result, equation (9.29) also has a computational complexity of O(n). However, this equation does not have a corresponding version when the components are not identical. Another formula provided by Du and Hwang [68] is RC (k, n) =
k−1
pq i RC (k, n − i − 1)
n ≥ 2k.
(9.30)
i=0
When n ≤ 2k + 1, the following closed-form expressions of the reliability of Cir/Con/k/n:F systems are available by Zuo and Kuo [260]. 0 ≤ n < k, 1, n (9.31) RC (k, n) = 1 − q , n = k, k < n ≤ 2k + 1. 1 − q n − npq k , For a Con/k/n:F system with i.i.d. components, the higher the n value, the lower the reliability of the system, and the higher the k value, the higher the system reliability. In the extreme cases of k = 1 and k = n, it becomes the series and parallel systems, respectively. Thus, if n is large, component reliability has to be very large for the Con/k/n:F system to have a high reliability. 9.1.2 Systems with Independent Components Kontoleon [118] reported the first algorithm for reliability evaluation of a Lin/Con/k/n:F system. The system was assumed to have independent components. The algorithm generates all state combinations of n components with at least k components failed. The combinations with at least k consecutive failures are identified. The probabilities of the occurrences of the identified state combinations are added together to obtain the failure probability of the system. This algorithm involves an enumeration of system states and is, therefore, inefficient. Since then, various approaches have been used for reliability analysis of Con/k/n systems. Event Decomposition Approach In this section, we present equations that are developed through a decomposition of the event that the system works. Different decompositions result in different equations. The efficiencies of these equations for system evaluation will be compared. Linear Systems Chiang and Niu [54] consider system reliability evaluation of a Lin/Con/k/n:F system with i.i.d. components. However, this approach can also be applied when components are not necessarily identical. The components of the system are numbered consecutively from 1 through n. Let L indicate the index of the first failed component. For example, if component 1 works and component 2 is failed, then L = 2. Let M indicate the index of the first working component after component L. Then, the system is failed if M − L ≥ k. For the system to work, we must
340
CONSECUTIVE-k-OUT-OF-n SYSTEMS
have M − L < k and the subsystem consisting of components M + 1, M + 2, . . . , n is working. The reliability of a Lin/Con/k/n:F system can then be expressed as R L (k, n) = Pr(system works | L = l, M = m) Pr(L = l, M = m) l
=
m
n−k+1 l+k−1
R L (k, (m + 1, n)) pm
l=1 m=l+1
l−1 .
m−1 n−k+1 . . pi qj + pi ,
i=1
j=l
i=1
(9.32) 9b ≡ 1 if b < a, R L (k, n) = 0 for n < 0, and R L (k, n) = 1 for 0 ≤ n < k. where i=a The last term in equation (9.32) represents the probability that the system works and L > n − k + 1. Equation (9.32) is recursive. The computational complexity of equation (9.32) is O(kn 2 ). The last product term in the equation can be calculated in O(n) time. The first term involves two nested summations, a total of O(kn) summations for the calculation of R L (k, n) = R L (k, (1, n)). We need to evaluate R L (k, (i, n)) for i = 1, 2, . . . , n. As a result, the computational complexity of equation (9.32) is O(kn 2 ) + O(n) = O(kn 2 ). When the components are i.i.d., we have R L (k, n) =
n−k+1 r +k−1
R L (k, n − m) pr q m−r + p n−k+1 .
(9.33)
r =1 m=r +1
Using the event decomposition approach, Hwang [100] provides two equations for reliability evaluation of the Lin/Con/k/n:F system. Let E i be the event that component i is the last working one. Since E i ’s are disjoint events and exactly one of the events E n−k+1 , . . . , E n must occur for the system to function, we have R L (k, n) =
n
Pr(E i )R L (k, i − 1)
i=n−k+1
=
n i=n−k+1
R L (k, i − 1) pi
n .
qj,
(9.34)
j=i+1
9b xi ≡ 1 for a > b. with the boundary condition R L (k, n) = 1 for n < k and i=a The computational complexity of equation (9.34) is O(kn) based on the following. The terms pn , pn−1 qn , pn−2 qn−1 qn , . . . , pn−k+1 qn−k+2 · · · qn can be computed in O(k) time because each succeeding term can be obtained from the immediate preceding term using one division and two multiplications. Assuming R L (k, i − 1) is known for n − k + 1 ≤ i ≤ n, R L (k, n) can be found in O(k) time since it involves a summation over k terms. We need to find R L (k, j) for 1 ≤ j ≤ n. As a result, the computational complexity of equation (9.34) is O(kn). Alternatively, let Fi be the event that the system first fails at component i. The index i indicates the smallest integer such that the k consecutive components {i −
SYSTEM RELIABILITY EVALUATION
341
k + 1, i − k + 2, . . . , i − 1, i} all fail. In particular, if i > k, then Fi implies that component i − k is working and the subsystem consisting of components 1 through i − k − 1 must be working. Since the Fi ’s are disjoint events, and one of them must occur for the system to fail, we have Q L (k, n) =
n
i .
R L (k, i − k − 1) pi−k
i=k
qj
j=i−k+1 n .
= Q L (k, n − 1) + R L (k, n − k − 1) pn−k
qj,
(9.35)
j=n−k+1
with boundary conditions Q L (k, n) = 0 for n < k and p0 ≡ 1. In terms of system reliability, the following equation is directly available from equation (9.35): n .
R L (k, n) = R L (k, n − 1) − R L (k, n − k − 1) pn−k
qj.
(9.36)
j=n−k+1
Equation (9.36) is also independently developed by Shanthikumar [222]. The computational complexities of equations (9.35) and (9.36) are both O(n). The prod9 uct term, pn−k nj=n−k+1 q j , needed for calculation of R L (k, n), requires k operations. Similar terms for R L (k, n − 1), R L (k, n − 2), . . . would require two divisions and two multiplications. If, in addition, R L (k, n − 1) and R L (k, n − k − 1) are known, the calculation of R L (k, n) requires a constant number of operations. We need to evaluate R L (k, j) for 1 ≤ j ≤ n. As a result, the computation complexity of equation (9.36) is O(n) + O(k) or simply O(n). Equation (9.36) is the most efficient simple recursive formula for system reliability of Lin/Con/k/n:F systems. When the components are i.i.d., equations (9.35) and (9.36) become Q L (k, n) = Q L (k, n − 1) + pq k R L (k, n − k − 1), k
R L (k, n) = R L (k, n − 1) − pq R L (k, n − k − 1).
(9.37) (9.38)
For n ≤ 2k, the following closed-form equations are provided by Zuo and Kuo [260]:
R L (k, n) =
1, 1−
n−k+1 i=1
pi+k
i+k−1 .
0 ≤ n < k,
qj,
pn+1 ≡ 1,
k ≤ n ≤ 2k.
(9.39)
j=i
When the components are i.i.d., the corresponding equations are 1, R L (k, n) = 1 − q k − (n − k) pq k ,
0 ≤ n < k, k ≤ n ≤ 2k.
(9.40)
342
CONSECUTIVE-k-OUT-OF-n SYSTEMS
Circular Systems The event decomposition approach has also been applied to Cir/Con/k/n:F systems. The reliability of a circular system can be expressed as a function of the reliabilities of linear subsystems or as a function of the reliabilities of circular subsystems. Equation (9.21) applied by Derman et al. [62] to circular systems with i.i.d. components is extended to systems with independent components by Hwang [100]:
l m k−1 n . . qi pn−(l−m) qj RC (k, n) = pm+1 l=0 m=0
i=1
j=n−(l−m)+1
(9.41) × R L (k, (m + 2, n − (l − m) − 1))
s−1 n . . qi pt q j R L (k, (s + 1, t − 1)). (9.42) = ps s−1+n−t
i=1
j=t+1
Note that s is the first working component while t is the last working component in the sequence from 1 through n. Since s − 1 + n − t is the number of failed components between components t and s (clockwise), it must be less than k for the system to work. The complexity of these two equations O(nk 2).Since s < k and is 9 9 s−1 qi pt nj=t+1 q j . n − t < k, there are at most k 2 distinct products, ps i=1 All these products can be computed in O(k 2 ) time. Corresponding to each of such products, there is the problem of evaluating the reliability of a linear subsystem with at least n − k − 1 components, which takes O(n) time. As a result, equations (9.41) and (9.42) take O(k 2 n) time. When the components are i.i.d., equation (9.42) becomes RC (k, n) = p 2 q s−1+n−t R L (k, t − s − 1). (9.43) s−1+n−t
Antonopoulou and Papastavridis [15] developed another recursive formula for a Cir/Con/k/n:F system with independent components: RC (k, n) = pn R L (k, n − 1) + qn RC (k, n − 1) i k−1 n . . pi+1 − q j pn−k+i i=0
j=1
qj
j=n−k+i+1
× R L (k, (i + 2, n − k + i − 1)).
(9.44)
To verify this equation, note that the event that the circular system with n components works can be decomposed into the following two events: (1) component n works and the linear subsystem with n − 1 components works and (2) component n fails and the circular subsystem with n − 1 components works except when exactly k consecutive components including component n fail and the other components form a working linear subsystem. The condition in the second event indicates
SYSTEM RELIABILITY EVALUATION
343
that the two components adjacent to the failure string of length k are working. Equation (9.44) reflects the relationship between the event of interest and the decomposed events. This equation involves the evaluation of the reliability of linear subsystems. Since it takes O(n) time to find R L (k, n), the computations of R L (k, n − 1) and R L (k, (i + 2, n − k + i − 1)) for i = 0, . . . , k − 1 will take O(kn) time. The computation of (q1 · · · qi pi+1 )(qn qn−1 · · · qn−k+i+1 pn−k+i ), for i = 0, 1, . . . , k − 1 takes O(kn) time. Thus, the computation for RC (k, n) takes O(kn) plus the time needed for RC (k, n − 1). If we assume inductively RC (k, n − 1) takes O(k(n − 1)) time, it follows that RC (k, n) takes O(kn) time [15]. When the components are i.i.d., equation (9.44) becomes RC (k, n) = p R L (k, n − 1) + q RC (k, n − 1) − kp 2 q k R L (k, n − k − 2). (9.45) Wu and Chen [246] decompose the event that the circular system is failed into the events that a few linear systems are failed. Notation •
• •
Sys-i: an (n + i)-component linear system (0 ≤ i ≤ k − 1) consisting of components 1, 2, . . . , i, i + 1, . . . , n, 1, 2, . . . , i, wherein components 1, 2, . . . , i are all failed Q L (k, n + i): probability that Sys-i is failed Q L (k, n + i): probability that Sys-i with component n + i removed is failed
Using the sum-of-disjoint-products method, the failure probability of the original circular system can be expressed as k−1 i . 5 4 q j Q L (k, n + i) − Q L (k, n + i) . (9.46) Q C (k, n) = Q L (k, n) + i=1
j=1
Since the complexity of Q L (k, n) is O(n), the complexity for each of Q L (k, n + i) or Q L (k, n + i) is O(n + k) = O(n), for i = 0, 1, . . . , k − 1. The computational complexity of calculating q1 , q1 q2 , . . . , q1 q2 · · · qk−1 is O(k). The complexity of equation (9.46) is then O(n) + O(kn) + O(k) = O(kn). Closed-form expressions of the reliability of the circular system are available for special n values [238, 260]: 1, 0 ≤ n < k, n . qi , n = k, 1 − (9.47) RC (k, n) = i=1 i+k−1 n n . . pi+k qj − qi , k < n ≤ 2k + 1, 1 − i=1
where q j ≡ q j−n for j > n.
j=i
i=1
344
CONSECUTIVE-k-OUT-OF-n SYSTEMS
MIS Approach The MIS approach has been formally defined and illustrated in Section 5.6. It has been used for reliability evaluation of the k-out-of-n systems in Section 7.1.2. A Lin/Con/k/n:F system fails if and only if at least k consecutive components fail. Define the state space of the imbedded Markov chain to be S = {0, 1, . . . , k}. State i indicates that the length of the latest consecutive failure string is i for 0 ≤ i ≤ k − 1 and state k indicates that the length of the latest consecutive failure string is greater than or equal to k. Thus, state k is a failure state of the system and an absorbing state of the Markov chain. Using the notation in Definition 5.1, we have N = m = k. The transition probability matrix of the imbedded Markov chain is pl ql pl ql . .. .. . ⌳l = . (9.48) p q l l pl ql 1 (k+1)×(k+1) Take row 2 of the transition matrix as an example. It can be interpreted as follows. If the (l − 1)-component subsystem is in state 1 (this means that component l − 1, the last component in the subsystem, is failed, component l − 2 is working, and there do not exist any consecutive k failures before component l − 2), the probability for the l-component subsystem to be in state 0 is pl and the probability for the l-component subsystem to be in state 2 is ql . State i of the subsystem when it is not failed is the number of consecutive failures at the end of the subsystem (0 ≤ i < k). It is equal to k when the subsystem is failed due to k or more consecutive failures at the end of the string of the components in the subsystem. Making use of Theorems 5.2 and 5.3, we obtain the following. The reliability R L (k, n) of the Lin/Con/k/n:F system is given by R L (k, n) = 1 − ak (n), where ak (n) can be computed through the following recurrence relations: a0 (l) = pl (1 − ak (l − 1)), a j (l) = ql a j−1 (l − 1),
j = 1, 2, . . . , k − 1,
ak (l) = ql ak−1 (l − 1) + ak (l − 1), with initial conditions a0 (0) = 1 and a j (0) = 0 for j = 0. To find ak (n), we need to evaluate at most kn entries of a j (l) for 0 ≤ j ≤ n and 0 ≤ l ≤ n, and each entry requires a constant number of operations. As a result, the computational complexity of evaluating R L (k, n) is O(kn). Though this is not the most efficient algorithm, the approach provides good insight into the analysis of Con/k/n:F systems. This approach cannot be used for circular systems. If we start the Markov chain at component 1, we would have to come back to components 1 through k − 1 after reaching component n. This would violate the memoryless property required in Markov chain analysis. Hwang and Wright [106] used the transfer matrix approach
345
SYSTEM RELIABILITY EVALUATION
TABLE 9.2 Calculation of a j (l) of Lin/Con/4/11:F System with MIS Approach l
j =0
j =1
j =2
j =3
j =4
0 1 2 3 4 5 6 7 8 9 10 11
1.000000 0.700000 0.720000 0.740000 0.760000 0.775912 0.793654 0.811875 0.830495 0.849440 0.868644 0.888040
0.000000 0.300000 0.196000 0.187200 0.177600 0.167200 0.155182 0.142858 0.129900 0.116269 0.101933 0.086868
0.000000 0.000000 0.084000 0.050960 0.044928 0.039072 0.033440 0.027933 0.022857 0.018186 0.013952 0.010193
0.000000 0.000000 0.000000 0.021840 0.012230 0.009884 0.007814 0.006019 0.004469 0.003200 0.002182 0.001395
0.000000 0.000000 0.000000 0.000000 0.005242 0.007932 0.009909 0.011316 0.012279 0.012904 0.013288 0.013507
and developed an O(k 3 ln(n/k)) algorithm for the Lin/Con/k/n:F system. However, it involves a large overhead for small systems. This is not discussed here. Example 9.6 Consider a Lin/Con/k/n:F system with n = 11, k = 4, and pi = 0.7 + 0.02(i − 1) for 1 ≤ i ≤ 11. The entries a j (l) can be easily evaluated with the recursive equations given above using any spreadsheet software. Table 9.2 illustrates the calculations of these entries. The entry at the bottom under column j = 4 indicates a4 (11) = 0.013507. As a result, we have R L (4, 11) = 1 − 0.013507 = 0.986493. Approximations and Bounds In many applications, exact system reliability is not necessary. Reasonably good bounds or approximations that can be easily computed are usually sufficient. This section reviews developments in the bounds for system reliability. Chiang and Niu [54] presented the first bounds for Con/k/n:F systems with i.i.d. components. These bounds can be easily extended to linear systems with independent components. Since the failure of any k consecutive components causes the failure of a Con/k/n:F system, any k consecutive components constitute a minimal cut set. Furthermore, k consecutive components are the only type of minimal cut sets for a Con/k/n:F system. There are n − k + 1 minimal cut sets in a Lin/Con/k/n:F system. If the system is working, there is at least one working component in every minimal cut set. From this argument, the lower bound for a Lin/Con/k/n:F system is developed and given below. The same argument applies to Cir/Con/k/n:F systems with the exception that there are n minimal cut sets in the circular system [62]: n−k+1 i+k−1 . . 1 − qj if components are independent, (9.49) l L (k, n) = j=i i=1 if components are i.i.d., (1 − q k )n−k+1
346
CONSECUTIVE-k-OUT-OF-n SYSTEMS
lC (k, n) =
n i+k−1 . . 1 − qj i=1
if components are independent, (9.50)
j=i
(1 − q k )n
if components are i.i.d.,
where q j ≡ q j−n for j > n. These lower bounds may be improved by calculating the exact reliability of a subsystem of size i(k ≤ i ≤ n − 1 for the linear case and k ≤ i ≤ n for the circular case): j+k−1 n−k+1 . . 1 − qm , (9.51) l L (k, n) = R L (k, i) m= j
j=i−k+2
lC (k, n) = R L (k, i)
n . j=i−k+2
1 −
j+k−1 .
qm ,
(9.52)
m= j
where qm ≡ qm−n for m > n. The higher the i value, the better the lower bound and the higher the complexity. One may choose a suitable i value depending on the accuracy desired. Zuo [257] constructs a k-fold series redundant system from a Lin/Con/k/n:F system in the development of an upper bound on the reliability of a linear system. A k-fold series redundant system is a parallel–series system in which k subsystems are connected in parallel and each subsystem has a number of components connected in series. In converting a Lin/Con/k/n:F system into a k-fold series redundant system, components i, k + i, 2k + i, . . . , m i k + i are included in subsystem i, with a total of m i + 1 components (m i ≡ (n − i)/k, i = 1, 2, . . . , k). For example, subsystem 1 includes components 1, k + 1, 2k + 1, . . . , m 1 k + 1, with a total of m 1 + 1 = (n − 1)/k + 1 components. In fact, these k subsystems are some of the minimal paths of the Lin/Con/k/n:F system. If the Con/k/n:F system fails, all of these k subsystems will fail; thus the k-fold series redundant system will fail. The reverse is not necessarily true. For example, if components 1, k +2, k +3, k +4, . . . , 2k are the only components that are failed, then the k-fold series redundant system is failed but the original Lin/Con/k/n:F system is working. Therefore, the system reliability of the k-fold series redundant system constructed this way provides a lower bound on the reliability of the Lin/Con/k/n:F system. The k-fold series redundant system constructed from a Lin/Con/k/n:F system is shown in Figure 9.4. As a result, we have mi k . . 1 − (1 − p jk+i ) if components are independent, i=1 j=0 (9.53) l L (k, n) = k . m +1 i (1 − p ) if components are i.i.d., 1 − i=1
SYSTEM RELIABILITY EVALUATION
k+1
2k+1
...
m1k+1
2
k+2
2k+2
...
m2k+1
k
2k
...
mkk+k
...
1
347
3k
FIGURE 9.4 The k-fold series redundant system constructed from a Lin/Con/k/n:F system (m i = (n − i)/k, i = 1, 2, . . . , k).
where m i = (n − i)/k. The lower bound developed here is better than that in Chiang and Niu [54] when the common component reliability is relatively low. To obtain an upper bound, Chiang and Niu [54] partition the Lin/Con/k/n:F system into n/k + 1 independent subsystems, where each subsystem has k consecutive components except the last one, which has n − kn/k components. For circular systems, exactly the same subsystems are obtained. Since n − kn/k < k, the last subsystem does not fail. Because the system works, all n/k+1 subsystems must work. Hence, the upper bounds for the linear and circular Con/k/n:F system are n/k ik . . 1 − qj if components (9.54) u L (k, n) = u C (k, n) = i=1 j=(i−1)k+1 are independent, (1 − q k )n/k if components are i.i.d. Derman et al. [62] provide the following equation for the upper bounds of the linear and circular systems: u(k, n) = 1 −
E 2 (N ) , E(N 2 )
(9.55)
where N is a random variable representing the number of minimal cut sets whose components all fail: i+k−1 . n−k+1 for a linear system, qj i=1 j=i (9.56) E(N ) = n i+k−1 . for a circular system, qi i=1
with q j ≡ q j−n if j > n.
j=1
348
CONSECUTIVE-k-OUT-OF-n SYSTEMS
The following equation is provided for evaluation of E(N 2 ): E(N 2 ) = E Ii + Ii I j = E(Ii ) + E(Ii I j ), i= j
i
where
1 Ii = 0
i
(9.57)
i= j
if all the components in E i are failed, otherwise,
where E i indicates the ith minimal cut in the system. As we know, there are n −k +1 minimal cuts in a linear system and n minimal cuts in a circular system. As a result, the indexes i and j in equation (9.57) may take values from 1 through n − k + 1 for a linear system and from 1 through n for a circular system. For a circular system with i.i.d. components, the following explicit upper bound is provided by Derman et al. [62]: u c (k, n) = 1 −
npq k . p + (n − 2k − 1) pq k + 2q(1 − q k−1 )
(9.58)
Salvia [213] provides another set of bounds for the reliability of a Lin/Con/k/n:F system with i.i.d. components: l L (n, k) = 1 − (n − k + 1)q k ,
(9.59)
u L (n, k) = 1 − (n − k + 1) p n−k q k .
(9.60)
These results are obvious from the following observations. Let Ai indicate that all components in minimal cut set E i are failed:
n−k+1 n−k+1 1 Q L (k, n) = Pr Ai ≤ Pr(Ai ) = (n − k + 1)q k , i=1
Q L (k, n) =
n−k
i=1
ri,k,n p n−(k+i) q k+i ≥ r0,k,n p n−k q k = (n − k + 1) pn−k q k ,
i=0
where ri,k,n = ij=0 N (k + i, k + i; n) and N (k + i, k + j; n) is the number of configurations of n components having k +i total failures and k + j of these consecutive. The bounds in equations (9.59) and (9.60) are good only when q is small. Under the assumption that q < k/(k + 1) in the linear and circular Con/k/n:F systems with i.i.d. components, Papastavridis [180] provides the following upper and lower bounds for R L (k, n) and RC (k, n): bm n+1 − e < R L (k, n) < a M n+1 + e,
(9.61)
M n − (k − 1)q n < RC (k, n) < M n + (k − 1)q n ,
(9.62)
SYSTEM RELIABILITY EVALUATION
349
where m =1−
pq k , (1 − q k )k
(9.63)
M = 1 − pq k , a=
mk
− qk
(9.64) ,
(9.65)
b=
Mk − qk , M k − (k + 1) pq k
(9.66)
e=
2(k − 1)q n+2 . p[k + (k + 1)q]
(9.67)
m k − (k + 1) pq k
The following approximation was provided by Feller [75] and Griffith and Govindarajulu [88] for the reliability of a Lin/Con/k/n:F system with i.i.d. components when n is large: R L (k, n) ≈
1 − qx , (k + 1 − kx) px n+1
(9.68)
where x is the unique positive root of a different form 1/q in the following equation: pq k s k+1 − s + 1 = 0.
(9.69)
For the approximation of the reliability of a Cir/Con/k/n:F system, with i.i.d. components having reliability p, Papastavridis [181], working with the generating function techniques, proved that " " " " (9.70) " R L (k, n) − (1 − pq k )n " < (k − 1)q n . Another interesting inequality, which is valid for the non-i.i.d case as well, is the following: R L (k, n) −
n
q j · · · qn q1 · · · qk+n+ j−i ≤ RC (k, n) ≤ R L (k, n).
(9.71)
j=n−k+2
Chrysaphinou and Papastavridis [55] and Papastavridis and Koutras [177] employed this inequality to the circular model. They also showed that certain limit theorems are valid for the linear nonmaintained and maintained Con/k/n systems. Exercise 1. Compare the upper and lower bounds for linear and circular Con/4/11:F systems with i.i.d. components.
350
CONSECUTIVE-k-OUT-OF-n SYSTEMS
9.2 OPTIMAL SYSTEM DESIGN For Con/k/n:F systems, there also exists the problem of optimal system design. Such a problem includes the determination of the values of k, n, and pi for 1 ≤ i ≤ n. A more interesting problem that has been extensively studied in the literature for Con/k/n:F systems is the optimal allocation of given reliability values to the positions or components of the system. A Con/k/n:F system is a coherent system. The reliability importances of the components exhibit an interesting pattern when the components are i.i.d. Both concepts of Birnbaum importance and relative criticality that are introduced in Chapter 6 have been used in the optimal allocation of reliabilities to the components of these systems. The concept of invariant optimal design has been very useful in reliabilitybased design of Con/k/n:F systems. Given the ranking of n reliabilities, p[1] ≤ p[2] ≤ · · · ≤ p[n] . The optimal allocations of these reliabilities to the components of many Con/k/n:F systems are often fixed and independent of the actual values of the given reliabilities. The ranking of the reliability values is adequate for determination of the optimal allocation. Such optimal designs, which are solely determined by the ranking of the reliability values, are called invariant optimal designs. An optimal design that is dependent on the actual values of the reliabilities is called a variant optimal design. In this section, we explain the reported research results on component reliability importance, identification of invariant optimal designs, and algorithms for variant optimal designs. 9.2.1 B-Importances of Components Using the reliability importance definition given by Birnbaum [30] (see Chapter 6), Papastavridis [182] provides the following formula for evaluation of the reliability importance of component i, denoted by I L (i), in a Lin/Con/k/n:F system: I L (i) =
R L (k, i − 1)R L (k, n − i) − R L (k, n) , qi
(9.72)
where R L (k, j) is the reliability of a Lin/Con/k/n:F subsystem with components n − j + 1, n − j + 2, . . . , n. Equation (9.72) can be easily verified following the definition of B-importance in equation (6.3). This is left to readers as an exercise. Griffith and Govindarajulu [88] provide the following expression of component reliability importance, denoted by IC (i), for the Cir/Con/k/n:F system: IC (i) =
Ri (k, n − 1) − RC (k, n) , qi
(9.73)
where Ri (k, n − 1) is the reliability of a Lin/Con/k/n:F subsystem with components i + 1, i + 2, . . . , n, 1, 2, . . . , i − 1. Equation (9.73) can be easily verified with equation (6.3). This is also left to readers as an exercise.
OPTIMAL SYSTEM DESIGN
351
When the components are i.i.d., we have the following expression of the component reliability importance for linear and circular Con/k/n:F systems: I L (i) =
R L (k, i − 1)R L (k, n − i) − R L (k, n) , q
(9.74)
IC (i) =
R L (k, n − 1) − RC (k, n) . q
(9.75)
The B-importances of the components of Lin/Con/k/n:F systems with i.i.d. components play a very important role in determining the invariant optimal designs of such systems. However, for circular systems, they are not as useful because all of the components have the same B-importance when the components are i.i.d. Since the Lin/Con/k/n:F system becomes a series system when k = 1 and a parallel system when k = n, the B-importances of all i.i.d. components are the same in these cases. Considering systems with i.i.d. components, Papastavridis [182] suggests that the nearer to the center a component is located, the higher its B-importance. Kuo et al. [134] point out that this statement is true only in special cases. They also observe the following results on the comparison of the B-importances of the components of a Lin/Con/k/n:F system. Zakaria et al. [254] provide a counterintuitive aspect of component importance in a Lin/Con/k/n:F system. Theorem 9.1 For a Lin/Con/k/n:F system with i.i.d. components, the following inequalities hold: I L (i) = I L (n − i + 1),
1 ≤ i ≤ n2 ,
(9.76)
I L (i) < I L (i + 1),
1 ≤ i < min{k, n − k + 1},
(9.77)
I L (i) > I L (i + 1),
max{k, n − k + 1} ≤ i < n,
(9.78)
I L (i) = I L ( j),
n < 2k
(9.79)
n − k + 1 ≤ i < j ≤ k.
This theorem can be easily verified with equations (9.74). This is left to readers as an exercise. Theorem 9.1 can be stated in words as follows: 1. The B-importances of the components in a Lin/Con/k/n:F system with i.i.d. components are symmetrical with respect to the center of the system. 2. When n ≥ 2k, the component reliability importance increases from component 1 to component k and decreases from component n − k + 1 to component n. The pattern is not specified for components between k and n − k + 1. 3. When n < 2k, the component reliability importance increases from component 1 to component n − k + 1 and decreases from component k to component n. 4. When n < 2k, the component reliability importances of the 2k −n components in the middle of the linear system are constant. These results are also presented in Figure 9.5.
352
CONSECUTIVE-k-OUT-OF-n SYSTEMS
IL(i)
0
1
2
...
k
...
n-k+1
...
n
i
IL(i)
0
1
2
. . . n-k+1
...
k
...
n
i FIGURE 9.5 Component B-importance patterns in Lin/Con/k/n:F system.
For a Lin/Con/k/n:F system with k = 2, Zuo and Kuo [260] completely identify the pattern of the B-importances, as stated in the following theorem. Theorem 9.2 In a Lin/Con/2/n:F system with i.i.d. components and n > 4, the component reliability importance I L (i) satisfies the following conditions: I L (2i) > I L (2i − 1)
for 2i ≤ (n + 1)/2,
(9.80)
I L (2i) < I L (2(i − 1))
for 2i ≤ (n + 1)/2,
(9.81)
for 2i + 1 ≤ (n + 1)/2.
(9.82)
I L (2i + 1) > I L (2i − 1)
Before providing a proof for Theorem 9.2, we first introduce the following lemma for a Lin/Con/k/n:F system. Lemma 9.2 The reliability of a Lin/Con/2/n:F system with i.i.d. components can be expressed as R L (2, n) = R L (2, i)R L (2, n − i) − p2 q 2 R L (2, i − 2)R L (2, n − i − 2) for 2 ≤ i ≤ n − 2.
(9.83)
Lemma 9.2 can be easily verified with the following arguments. A Lin/Con/2/n:F system can be divided into two subsystems, one with i components from 1 through
353
OPTIMAL SYSTEM DESIGN
i and the other with n − i components from i + 1 through n(2 ≤ i ≤ n − 2). If the original system works, both of these subsystems will work. If both of these two subsystems work, the original system will work except when components i and i + 1 are failed, components i − 1 and i + 2 are working, the first i − 2 components form a working subsystem, and the last n − i − 2 components form a working subsystem. Proof In the following derivations, for simplicity we will use R(i) to represent R L (2, i) . With equation (9.38), we have the following for a Lin/Con/2/n:F system with i.i.d. components: R(n + 1) = R(n) − R(n − 2) pq 2 , R(n − 1) = R(n) + R(n − 3) pq 2 , R(0) = R(1) = 1, R(i) > R(i + 1)
for i ≥ 1.
Define Ji = R(i − 1)R(n − i). Based on equation (9.74), we know that Ji has the same pattern as I L (i) because q and R L (k, n) are constant. We will consider Ji from now on. First, observe the inequality R(m) > p R(m − 1)
for m > 0,
0 < p < 1,
(9.84)
because R(m) = pm R(m − 1) + qm Pr(the m-component system work | the mth component fails) > pm R(m − 1). Another general result is R(m) < R(i)R(m − i)
for i = 1, 2, . . . , m − 1,
which is obvious according to equation (9.83). Using equations (9.83) and (9.38), we have J1 = R(0)R(n − 1) = R(n − 1), J2 = R(1)R(n − 2) = R(n − 2) = R(n − 1) + pq 2 R(n − 4), J3 = R(2)R(n − 3) = R(n − 1) + p2 q 2 R(n − 5), J4 = R(3)R(n − 4) = R(n − 1) + p2 q 2 R(1)R(n − 6) = R(n − 1) + p 2 q 2 R(n − 6).
(9.85)
354
CONSECUTIVE-k-OUT-OF-n SYSTEMS
By letting m = n − 4 in equation (9.84), we have R(n − 4) > p R(n − 5)
for n ≥ 5.
From the expressions of Ji for 1 ≤ i ≤ 4 given above, we have J1 < J3 < J4 < J2 . We now compare Jm+1 with Jm for m ≥ 4. Assuming m + 3 ≤ (n + 1)/2 and using equation (9.83), we have Jm+3 = R(m + 2)R(n − m − 3) = R(n − 1) + p 2 q 2 R(m)R(n − m − 5), Jm+2 = R(m + 1)R(n − m − 2) = R(n − 1) + p 2 q 2 R(m − 1)R(n − m − 4), Jm+1 = R(m)R(n − m − 1), Jm = R(m − 1)R(n − m). If Jm+1 > Jm , then we have the following: R(m)R(n − m − 1) > R(m − 1)R(n − m), R(m)R(n − 4 − m − 1) > R(m − 1)R(n − 4 − m)
(n is replaced by n − 4),
R(m)R(n − m − 5) > R(m − 1)R(n − m − 4), R(m + 2)R(n − m − 3) > R(m + 1)R(n − m − 2)
(m is replaced by m + 2),
Jm+3 > Jm+2 . If Jm+1 < Jm , then we have the following: R(m)R(n − m − 1) < R(m − 1)R(n − m), R(m)R(n − 4 − m − 1) < R(m − 1)R(n − 4 − m)
(n is replaced by n − 4),
R(m)R(n − m − 5) < R(m − 1)R(n − m − 4), R(m + 2)R(n − m − 3) < R(m + 1)R(n − m − 2)
(m is replaced by m + 2),
Jm+3 < Jm+2 . We now compare Jm+2 with Jm for m ≥ 4. Assuming m + 4 ≤ (n + 1)/2 and using equation (9.83), we have Jm+4 = R(m + 3)R(n − m − 4) = R(n − 1) + p 2 q 2 R(m + 1)R(n − m − 6), Jm+2 = R(m + 1)R(n − m − 2) = R(n − 1) + p 2 q 2 R(m − 1)R(n − m − 4), Jm = R(m − 1)R(n − m) = R(n − 1) + p2 q 2 R(m − 3)R(n − m − 2).
OPTIMAL SYSTEM DESIGN
355
If Jm+2 > Jm , then we have R(m + 1)R(n − m − 2) > R(m − 1)R(n − m), R(m + 1)R(n − 4 − m − 2) > R(m − 1)R(n − 4 − m) (n is replaced by n − 4), R(m + 1)R(n − m − 6) > R(m − 1)R(n − m − 4), R(m + 3)R(n − m − 4) > R(m + 1)R(n − m − 2) (m is replaced by m + 2), Jm+4 > Jm+2 . If Jm+2 < Jm , then we have R(m + 1)R(n − m − 2) < R(m − 1)R(n − m), R(m + 1)R(n − 4 − m − 2) < R(m − 1)R(n − 4 − m) (n is replaced by n − 4), R(m + 1)R(n − m − 6) < R(m − 1)R(n − m − 4), R(m + 3)R(n − m − 4) < R(m + 1)R(n − m − 2) (m is replaced by m + 2), Jm+4 < Jm+2 . This concludes the proof. The pattern of component reliability importance in a Lin/Con/2/n:F system is illustrated in Figure 9.6. Component 2 has the highest importance while component 1 has the lowest importance. There are two trends in Figure 9.6. One is a gradual down trend from component 2, to component 4, to component 6, and finally to component l, where l is the largest even integer less than or equal to (n + 1)/2. The other is a gradual up trend from component 1, to component 3, to component 5, and finally to component m, where m is the largest odd integer less than or equal to (n + 1)/2. The pattern of the component reliability importance is completely specified for a Lin/Con/k/n:F system when k = 2. For a general Lin/Con/k/n:F system with k > 2, in addition to the results given in Theorem 9.1, the following results are provided by Zuo [257]. The proofs of these results are omitted here. Theorem 9.3 The component reliability importance of a Lin/Con/k/n:F system with i.i.d. components and 2 < k < n/2 has the following pattern: I L (1) ≤ I L (k + 1),
n ≥ 2k + 1,
I L (k) > I L (2k),
n ≥ 4k − 1,
I L (ik) > I L (ik + 1),
k + 1 ≤ ik + 1 ≤
n+1 . 2
356
CONSECUTIVE-k-OUT-OF-n SYSTEMS
down trend
IL(i)
. . . up trend
0 FIGURE 9.6
1
2
3
4
5 i
6
...
n +1 2
B-importance pattern of Lin/Con/2/n:F system with i.i.d. components.
Theorem 9.3 states that component 1 is less important than component k + 1 for n ≥ 2k +1, component k is more important than component k +1 for n ≥ 4k −1, and component ik is always more important than component ik + 1 for k + 1 ≤ ik + 1 ≤ (n + 1)/2. Chang et al. [45] add that component 1 is the least important one among all components (so is component n due to symmetry) and component k + 1 is less important than component k +2 for n > 2k. Even with all of these results, the pattern of component reliability importance for a Lin/Con/k/n:F system with k ≥ 3 and i.i.d. components is still not completely identified. As was noted, the structural importance indicates the importance of component relative to its positioning in the system. Through the relationship to the Fibonacci sequence with order k, Lin et al. [142] obtain a closed-form solution of structure importance for each component of the Lin/Con/k/n:F system. They also obtain a complete ordering of the components with respect to their structure importance for some Lin/Con/k/n:F and a partial ordering for other systems. 9.2.2 Invariant Optimal Design In this section, we consider the problem of assigning reliabilities to the components of linear and circular Con/k/n:F systems. Suppose that there are n distinct reliability values and they have been arranged in ascending order of their values, as indicated below: p[1] < p[2] < · · · < p[n] . For ease of discussion, we also define p[0] ≡ 0 and p[n+1] ≡ 1. Whenever we are given a set of n specific reliability values, we can arrange them in the required order. There exists an optimal way of allocating these reliability values to the components
OPTIMAL SYSTEM DESIGN
357
or positions of a Con/k/n:F system. An interesting phenomenon is that even when the exact values of the n reliabilities are unknown and only the ranking of their values are known, there still exists a unique optimal allocation of these reliabilities to the components of some Con/k/n:F systems. A Con/k/n:F system is said to admit an invariant optimal design if there exists an optimal arrangement depending only on the ordering of pi but not on their actual values. The existence of invariant optimal arrangements is not commonplace and the identification of invariant arrangements is not trivial. We provide a coverage of available research results in this area. When k = 1 and k = n, a Con/k/n:F system becomes a series and a parallel system, respectively. All arrangements of components in such systems yield the same system reliability; that is, every configuration is an invariant optimal configuration. The optimal design problem of a Con/k/n system was first studied by Derman et al. [62] under the condition that k = 2. For the Lin/Con/2/n:F system, the optimal design was conjectured by Derman et al. [62] and proved by Wei et al. (partially) [242], Malon [160], and Du and Hwang [67] and is given below: (1, n, 3, n − 2, . . . , n − 3, 4, n − 1, 2). An interpretation of this invariant optimal design is to assign the least reliable component (with reliability p[1] ) to position 1, the next least reliable component (or p[2] ) to position n, the most reliable component (or p[n] ) to position 2, the next most reliable component (or p[n−1] ) to position n − 1, and so on. Malon [161] studied the optimal design of the Lin/Con/k/n:F system for all possible k values and discovered that the Lin/Con/k/n:F system admits an invariant optimal design if and only if k ∈ {1, 2, n −2, n −1, n}. We will assume that flipping an arrangement of the components in a Lin/Con/k/n:F system is considered to be the same system. Hwang [100] extended the conjecture of Derman et al. [62] to the Cir/Con/2/n:F system by conjecturing that the optimal design of a Cir/Con/2/n:F system is (n, 1, n − 1, 3, n − 3, . . . , n − 4, 4, n − 2, 2, n), which was proved by Malon [160] and Du and Hwang [67] independently. Tong [238] discovered that the system reliability is not affected by permutations of the components (and their reliabilities) for a Cir/Con/k/n:F system when k = n − 1 or k = n. Thus, any arrangement of the components for such systems is an optimal design. Hwang [100] notes that a Lin/Con/k/n:F system with n components can be considered to be a special case of the Cir/Con/k/n + 1:F system with pn+1 = 1. Thus, the invariant optimal design of a circular system reduces to the optimal design of a linear system when the best component in the circular system is considered to be perfect. Hwang [102] identifies all invariant optimal designs of the circular systems. Theorem 9.4 The necessary condition for the optimal arrangement of Cir/Con/2/n:F and Cir/Con/(n − 2)/n:F systems are ( pi − p j )( pi−1 − p j+1 ) < 0
1 ≤ i, j ≤ n,
(9.86)
358
CONSECUTIVE-k-OUT-OF-n SYSTEMS
where pi is the reliability of the component at position i(1 ≤ i ≤ n) and pi ≡ pi−n for i > n. Theorem 9.5 The only arrangement for Cir/Con/2/n:F and Cir/Con/(n − 2)/n:F systems that satisfies the condition in Theorem 9.4 is (n, 1, n − 1, 3, n − 3, . . . , n − 4, 4, n − 2, 2, n). Proof To satisfy the condition stated in Theorem 9.4, component 1 must be adjacent to components n and n −1. If not, 1 is not adjacent to n −1, but to some i(i < n −1). Let j be the item following n − 1 in the sequence (1, i, . . . , n − 1, j). This sequence violates the condition ( p[1] − p[ j] )( p[i] − p[n−1] ) < 0. Similarly, we can show that n must be adjacent to 1 and 2, 2 must be adjacent to n and n − 2, and so on. Corollary 9.1 The only arrangement for a Lin/Con/2/n:F system satisfying the condition stated in Theorem 9.4 is (1, n, 3, n − 2, . . . , n − 3, 4, n − 1, 2). Proof From Theorem 9.5, the only configuration for a Cir/Con/k/(n + 1):F satisfying the condition stated in Theorem 9.4 is (n + 1, 1, n, 3, n − 2, . . . , n − 3, 4, n − 1, 2, n + 1). The proof is immediate from the fact that this arrangement reduces to the arrangement given in Corollary 9.1 if we let p[n+1] ≡ 1. Theorem 9.6 (Malon [161]) The Lin/Con/k/n:F system admits an invariant optimal design if and only if k ∈ {1, 2, n −2, n −1, n}. The invariant optimal designs are k
Invariant Optimal Design
1 2 n−2 n−1 n
(any arrangement) (1, n, 3, n − 2, . . . , n − 3, 4, n − 1, 2) (1, 4, (any arrangement), 3, 2) (1, (any arrangement), 2) (any arrangement)
We will only provide the arguments for 2 < k < n. When n ≥ 2k, there does not exist any invariant optimal design. This is proved if we can show that different component reliability values result in different optimal arrangements. Suppose p[1] = p[2] = · · · = p[k−1] = 0, p[k] = p[k+1] = · · · = p[n−1] = p, where 0 < p < 1, and p[n] = 1. Then the optimal arrangement is (0, 0, . . . , 0, 1, p, . . . , p). In other words, the worst k − 1 components are placed at one end of the system. Suppose, now, p[1] = p[2] = · · · = p[k−1] = p and p[k] = p[k+1] = · · · = p[n] = r , where 0 < p < r < 1. If we still place the worst k − 1 components at one end, we would
359
OPTIMAL SYSTEM DESIGN
have the following arrangement: ( p, p, . . . , p , r, r, . . . , r ). ! ! n−k+1
k−1
However, the following arrangement would result in a higher system reliability: ( p, p, . . . , p , r, r, . . . , r , p). ! ! n−k+1
k−2
Consequently, there does not exist an invariant optimal design. For the case n < 2k, if each of the middle 2k − n components works, the system works. It is like these 2k − n components are connected in series with one another and then to the other 2(n − k) components. As a result, the best components must be allocated to these middle 2k − n positions. The remaining 2(n − k) components form a Lin/Con/(n − k)/n:F subsystem. If this subsystem has an invariant optimal design, then the original Lin/Con/k/n:F system has an invariant optimal design. Based on earlier arguments in this paragraph, the Lin/Con/(n − k)/2(n − k):F subsystem does not have an invariant optimal design when n − k > 2. This shows that an invariant optimal design exists if and only if k = n − 2 or k = n − 1. When k = n − 1, the best n − 2 components are placed in the middle, and thus (1, any arrangement, 2) is the invariant optimal design. When k = n − 2, the best n − 4 components are placed in the middle in any manner and the worst four components 1, 2, 3, and 4 are arranged at the two ends following the pattern of a Lin/Con/2/n:F system, that is, (1, 4, any arrangement, 3, 2) is the invariant optimal design. In the preceding paragraph, the failure probabilities have for the sake of simplicity been allowed to take on the values 0 and 1. A familiar continuity argument shows that the same analysis applies to rule out the existence of an invariant optimal configuration even when the failure probabilities are restricted to the range 0 < pi < 1. Hwang [102] identifies the invariant optimal designs of all Cir/Con/k/n:F systems. When k ∈ {1, n −1, n}, any arrangement is optimal. When k ∈ {2, n −2}, the optimal arrangement is stated in Theorem 9.5. In the following, we provide arguments for proving that an invariant optimal design does not exist for a Cir/Con/k/n:F system when 2 < k < n − 2. When 2 < k < n − 3, the Lin/Con/k/(n − 1):F system does not admit invariant configuration. As a result, the Cir/Con/k/n:F systems do not admit any invariant configurations when 2 < k < n − 3. Now consider the case when k = n − 3. If p[n] = 1, then the rest of the n − 1 components should be ordered following the invariant optimal configuration of a Lin/Con/(n − 3)/(n − 1):F system stated in Theorem 9.6. Choose r and p such that 0 < r < p < 1, let p[1] = p[2] = r , and let p[3] = p[4] = · · · = p[n−1] = p. Then the optimal configuration is supposed to be (1, r, p, . . . , p, r, 1), ! n−3
(9.87)
360
CONSECUTIVE-k-OUT-OF-n SYSTEMS
TABLE 9.3 Invariant Optimal Designs of Linear and Circular Con/k/n:F Systems Linear System
Circular System
k=1
(any arrangement)
(any arrangement)
k=2
(1, n, 3, n − 2, . . . , n − 3, 4, n − 1, 2) [62]
(1, n − 1, 3, n − 3, . . . , n − 2, 2, n, 1) [100]
2
Does not exist [161]
Does not exist [102]
k =n−2
(1, 4, (any arrangement), 3, 2) [161]
(1, n − 1, 3, n − 3, . . . , n − 2, 2, n, 1) [102]
k =n−1
(1, (any arrangement), 2) [161]
(any arrangement) [102]
k=n
(any arrangement)
(any arrangement)
that is, the two least reliable components should be placed adjacent to the most reliable component. However, we find the following choice of component reliabilities contradicts the above configuration. Choose p such that 0 < p < 1, let p[1] = p[2] = 0, and let p[3] = p[4] = · · · = p[n] = p. If the above configuration is optimal, we need to arrange the components in the order C1 = ( p, 0, p, . . . , p, 0, p). !
(9.88)
n−3
With equation (9.44), the system reliability can be expressed as the following based on the decomposition of a component with 0 reliability: RC1 = RC ( p, . . . , p, 0) − p2 (1 − p)k−1 − (k − 2) p2 (1 − p)k−2 . !
(9.89)
n−1
However, the configuration C2 = ( p, p, 0, p, . . . , p , 0, p) !
(9.90)
n−4
has system reliability (calculated in a similar way) RC2 = RC ( p, . . . , p, 0) − p 2 (1 − p)k−1 − (k − 3) p 2 (1 − p)k−2 , !
(9.91)
n−1
which is higher than RC1 . Thus, an invariant optimal design does not exist. The results on invariant optimal designs of linear and circular Con/k/n:F systems are tabulated in Table 9.3. Exercise 1. Prove Theorem 9.4.
OPTIMAL SYSTEM DESIGN
361
9.2.3 Variant Optimal Design As tabulated in Table 9.3, neither linear nor circular Con/k/n:F systems have an invariant optimal design when 2 < k < n−2. In these cases, the optimal arrangement of components depends on the values of the reliabilities of these components. Such an arrangement is called a variant optimal design to indicate that it is a function of the specific values of the component reliabilities. In this section, we provide results on identification of variant optimal designs of linear and circular Con/k/n:F systems. Some of these results narrow down the candidates for optimal design while others are heuristic algorithms in identification of “good” designs. Theorem 9.7 Let π be a permutation of the elements in {1, 2, . . . , 2k}. A necessary condition for π ∗ to be the optimal design of a Lin/Con/k/2k:F system with component reliability p1 ≤ · · · ≤ p2k is πi∗ < π ∗j ,
∗ ∗ πk+i > πk+ j
for 1 ≤ i < j ≤ k,
where πi∗ is the reliability of the component at position i in permutation π ∗ . Proof For any permutation π, let i and j be such that 1 ≤ i < j ≤ k and suppose that πi > π j . It is enough to show that the permutation πi j obtained from π by interchanging the components at positions i and j is at least as good as the permutation π. Using Theorem 6.1, we can easily verify that component j is more critical than component i for 1 ≤ i < j ≤ k and for k ≤ j < i ≤ 2k. With Theorem 6.8, we know the optimal design must satisfy the condition given in Theorem 9.7. The size of sample space for optimal configuration amounts to n!. By symmetry, however, the size for the linear system can be reduced to n!/2 and the circular system to (n − 1)!. The tremendous complexity of factorial calculation prevents us from implementing enumeration. As Malon [161] and Tong [237] demonstrated, the sample space for the optimal configuration can be drastically reduced via rearrangement of the inequality. Let S be the set of permutations of 1, 2, . . . , n. Definition 9.1 A permutation π ∈ S is said to be inadmissible if there exists a π ∈ S such that R(π(p)) ≤ R(π (p)) holds for all p such that 0 < p < 1. c
Let π be a permutation such that πi < π j . If i is more critical than j (i > j), then π is inadmissible because the interchange of components i and j would increase system reliability. As a consequence, π should be eliminated from further consideration because πi j is a uniformly better permutation. A procedure for obtaining candidates for optimal arrangement has the following steps: 1. Eliminate all inadmissible permutations via pairwise rearrangements. 2. Delete all but one of the permutations that are permutation equivalent.
362
CONSECUTIVE-k-OUT-OF-n SYSTEMS
3. Let S0 ⊂ S denote the subset of permutations that are not yet eliminated or deleted. Find a permutation π ∗ in S0 satisfying R(π ∗ (p)) = max R(π(p)) π∈S0
either analytically or (when necessary) from numerical calculations when the pi values are known. Then π ∗ is an optimal permutation. Based on their analysis of the pattern of component reliability importances in Con/k/n:F systems, Kuo et al. [134] developed the following guidelines for variant optimal design of a Con/k/n:F system: 1. Given a desired system reliability level with to-be-determined component reliabilities, we would devote our effort to allocating higher reliabilities to components with higher B-importances and lower reliabilities to components with lower B-importances. 2. Given the reliabilities of n components, their arrangement should follow the pattern of B-importances when the components are assumed to be i.i.d. In other words, the positions with higher B-importances should be allocated a higher reliability value. 3. The conditions specified in Theorem 9.7 should be satisfied. 4. The best components should be assigned the 2k − n middle positions in any order for a Lin/Con/k/n:F system with k < n < 2k. c
As we know, if i > j, then I L (i) > I L ( j) when pi = p for all i. However, the reverse is not necessarily true. As a result, a design obtained following the above guidelines may not be optimal. In addition, the costs incurred in allocating components to different positions are not considered at all. In the case of Cir/Con/k/n:F systems, the B-importance is constant over all component positions when pi = p for all i. As a result, the B-importance is not helpful for variant optimal design of circular systems. Zuo and Kuo [260] present a heuristic method to find at least suboptimal designs of Lin/Con/k/n:F systems. Birnbaum importance is used to indicate where to allocate reliabilities. A position with a higher B-importance should be assigned a component with a larger reliability, or in other words, the reliability pattern should match the B-importance pattern. To test the goodness of the heuristic, they used the exhaustive search method to find real optimal designs. Assume that there are n components with known reliability values. Initially each of the n positions is assigned a component (this assignment is called the initial design). Earlier results indicate that components 1 through k should have increasing reliabilities and components n − k + 1 through n should have decreasing reliabilities for n ≥ 2k. It is also clear that each component from k to n − k + 1 belongs to exactly k minimal cut sets. The initial design selected by Zuo and Kuo [260] is (1, 3, 5, . . . , 6, 4, 2). From computer simulations, this selection is also confirmed to be better than other choices.
CONSECUTIVE-k-OUT-OF-n:G SYSTEMS
363
Starting with the chosen initial design, the B-importance of each component is calculated. If a position, say i, has a more reliable component but not a higher importance than another position, say j, then these two components exchange positions. The procedure starts from the least reliable component. The importance of this component is compared with the importance of the next more reliable component. If the importance of the less reliable component is larger than that of the more reliable component, exchange these two components. If the system reliability is improved by this exchange, the exchange is kept. Otherwise, the exchange is abolished, and the next more reliable component is considered. The process continues until the Bimportance pattern matches the reliability pattern or no interchange of any two components improves system reliability. Experiments show that this heuristic provides very good suboptimal design.
9.3 CONSECUTIVE-k-OUT-OF-n:G SYSTEMS A Con/k/n:G system consists of an ordered sequence of n components such that the system works if and only if at least k consecutive components are working. Depending on whether the components are arranged along a line or a circle, correspondingly, they are linear or circular Con/k/n:G systems too. Tong [237] reports the first study on the Con/k/n:G system. The concept of a Con/k/n:G system is well explained by Kuo et al. [134]. Kuo et al. [134] state that the Con/k/n:G and F systems are the mirror images of each other. Applying the definition of duality in equation (4.9), we can easily verify that Con/k/n:F and G systems are the duals of each other. A minimal cut set in a Con/k/n:F system becomes a minimal path set in a Con/k/n:G system. Every k consecutive components form a minimal cut set in a Con/k/n:F system while every k consecutive components form a minimal path set in a Con/k/n:F system. 9.3.1 System Reliability Evaluation Lemma 9.3 If the reliability of component i, denoted by p i , in one type of the Con/k/n system (say, the F system) is equal to the unreliability of component i, denoted by qi , in the other type of the Con/k/n system (that is, the G system) for i = 1, 2, . . . , n, given that both types of systems have the same values of n and k, then the reliability of one type of system is equal to the unreliability of the other type of system. This lemma actually describes the following procedure for reliability evaluation of the Con/k/n:G systems utilizing the equations and algorithms that have been covered earlier in this chapter: 1. Given is a (linear or circular) Con/k/n:G system with k, n, and component reliability values denoted by pi for 1 ≤ i ≤ n.
364
CONSECUTIVE-k-OUT-OF-n SYSTEMS
2. Calculate qi = 1 − pi for 1 ≤ i ≤ n. 3. Find the reliability of the corresponding (linear or circular) Con/k/n:F system with k, n, and component reliabilities qi for 1 ≤ i ≤ n. Let R F denote this system reliability. 4. The reliability of the original G system, denoted by RG , is then equal to 1−R F . Zuo [257] provides the following lemma stating the relationship between the reliability bounds of the Con/k/n F and G systems: Lemma 9.4 If the upper and lower bound formulas for the reliability of a primal Con/k/n system are u(p) and l(p), respectively, then the upper and lower bounds for the reliability of the dual Con/k/n system are 1 −l(q) and 1 − u(q), respectively, where p + q = 1 and u and l indicate two specific function forms. Using the duality relationship between the Con/k/n:F system and Con/k/n:G systems described in Lemma 9.3, we can easily convert a few simple equations for reliability evaluation of a Con/k/n:F system to those for Con/k/n:G systems. The following formulas are identified by Kuo et al. [134] and Zuo [260]. We will use subscript LG and CG to indicate linear and circular G systems, respectively. When the components are independent, based on equations (9.35) and (9.44), we have the following: RLG (k, n) = RLG (k, n − 1) + Q LG (k, n − k − 1)qn−k
RCG (k, n) =
k−1 i=0
qi+1
i . j=1
p j qn−k+i
n .
p j , (9.92)
j=n−k+1 n .
pj
j=n−k+i+1
× Q LG (k, (i + 2, n − k + i − 1)) + qn RLG (k, n − 1) + pn RCG (k, n − 1).
(9.93)
When the components are i.i.d, we have the following based on Zuo and Kuo [260] and equation (9.29): (9.94) RLG (k, n) = RLG (k, n − 1) + qpk Q LG (k, n − k − 1), 0, 0 ≤ n < k, pk , n = k, (9.95) RCG (k, n) = n k k < n ≤ 2k + 1, p + nqp , RCG (k, n − 1) + qpk Q CG (k, n − k − 1), n > 2k + 1. Using Lemma 9.4, we can identify the following equations for bounds on the reliability of Con/k/n:G systems [134, 257]. When the components are independent, we have the following based on equations (9.54), (9.49), (9.50), and (9.53):
CONSECUTIVE-k-OUT-OF-n:G SYSTEMS
lLG (k, n) = lCG (k, n) = 1 −
n/k .
1 −
i=1
u LG (k, n) = 1 −
n−k+1 .
1 −
u CG (k, n) = 1 −
1 −
u LG (k, n) =
1 −
i=1
pj ,
(9.96)
pj ,
(9.97)
j=i i+k−1 .
pj ,
(9.98)
j=i
i=1 k .
j=(i−1)k+1
i+k−1 .
i=1 n .
ik .
365
mi .
q jk+i ,
(9.99)
j=0
where m i = (n − i)/k. When the components are i.i.d., the corresponding bounds are lLG (k, n) = lCG (k, n) = 1 − (1 − pk )n/k , k n−k+1
u LG (k, n) = 1 − (1 − p )
,
u CG (k, n) = 1 − (1 − pk )n , u CG (k, n) =
k .
(1 − q m i +1 ),
(9.100) (9.101) (9.102) (9.103)
i=1
where m i = (n − i)/k. Unlike the Con/k/n:F systems, the reliability of a Con/k/n:G system increases as n increases and decreases as k increases. In the extreme cases of k = 1 and k = n, the Con/k/n:G system becomes a parallel system and a series system, respectively. The system reliability of a Con/k/n:G system can be made as high as desired through the increase of n even though component reliabilities are not very high. 9.3.2 Component Reliability Importance Using Birnbaum’s definition of component reliability importance and the results for Lin/Con/k/n:F systems, Kuo et al. [134] developed B-importance formulas for Con/k/n:G systems. The Birnbaum measure of importance for a Lin/Con/k/n:G system is given by ILG (i) =
(k, n − i) RLG (k, n) − RLG (k, i − 1) − Q LG (k, i − 1)RLG , pi
(9.104)
(k, n − i) is the reliability of a Lin/Con/k/(n − 1):G system with comwhere RLG ponents i + 1, i + 2, . . . , n.
366
CONSECUTIVE-k-OUT-OF-n SYSTEMS
Similar to how it functions in F systems, the B-importance for a G system increases from position 1 to position min{k, n − k + 1} and decreases from position max{k, n − k + 1} to position n. If k ≤ n < 2k, the B-importance stays constant between component n − k + 1 and component k. For the linear Con/k/n:G system with k = 2, Zuo and Kuo [260] prove that the B-importance pattern is the same as that of a Lin/Con/2/n:F system. Kuo et al. [134] provide the following equation for the B-importance of component i in a Cir/Con/k/n:G system: Ii =
RCG (k, n) − Ri (k, n − 1) . pi
(9.105)
It is easy to see from equation (9.105) that the B-importance of a Cir/Con/k/n:G system with i.i.d. components is constant for i = 1, . . . , n. According to Zuo [257] and Hwang et al. [104], the B-importance of component i in a Lin/Con/k/n:G system with i.i.d. component reliability p is equal to the B-importance of component i in a Lin/Con/k/n:F system with i.i.d. component reliability 1− p. Since the importance of a component depends on the reliabilities of the components in the system, the importances of component i in the F and G systems are usually different, except when the component reliability in the G system equals the component unreliability in the F system. 9.3.3 Invariant Optimal Design For Con/k/n:G systems, there is also the issue of optimal arrangement of components with different reliabilities. In this section, we review the results on invariant optimal designs of Con/k/n:G systems when only the ranking of component reliabilities are known. Kuo et al. [134] identify the invariant optimal designs of Lin/Con/k/n:G systems with n ≤ 2k and state that all possible arrangements of a Cir/Con/k/n:G system with k = n − 1 result in the same system reliability. Zuo and Kuo [260] identify the optimal designs of the Cir/Con/k/n:G systems with n ≤ 2k + 1 and prove that invariant optimal designs do not exist for a linear system with n > 2k or for a circular system with n > 2k + 1. Similar to the Con/k/n:F systems, the optimal arrangement of components of a Cir/Con/k/n:G system with p [1] = 0 reduces to the optimal arrangement of a Lin/Con/k/(n − 1):G system. Thus, we will concentrate on the optimal design of circular systems in the following. Theorem 9.8 A necessary condition for the optimal design of a Cir/Con/k/n:G system with k < n ≤ 2k + 1 is ( pi − p j )( pi−1 − p j+1 ) > 0,
1 ≤ i,
j ≤ n,
(9.106)
where pi is the reliability of the component at position i and p j ≡ p j−n for j > n.
CONSECUTIVE-k-OUT-OF-n:G SYSTEMS
367
Theorem 9.9 For a Cir/Con/k/n:G system with k < n ≤ 2k + 1, the only configuration satisfying the necessary condition in Theorem 9.8 is Cn = (1, 3, 5, 7, . . . , 8, 6, 4, 2, 1).
(9.107)
Thus, Cn is the invariant optimal design of a Cir/Con/k/n:G system with k < n ≤ 2k + 1. Proof To satisfy the necessary condition specified in Theorem 9.8, component 1 has to be adjacent to components 3 and 2. If not, 1 is adjacent to some i, i > 3. Let j be the item following component 2 in the sequence 1, i, . . . , 2, j. This sequence violates the condition specified in Theorem 9.8, since p1 < p j and pi > p2 . Similarly, we can show that 2 must be adjacent to 1 and 4, 3 must be adjacent to 1 and 5, and so on. In essence, Cn is the only configuration satisfying the necessary condition in Theorem 9.8. Thus, it is the invariant optimal design of a Cir/Con/k/n:G system with k < n ≤ 2k + 1. Theorem 9.10 The optimal design of a Lin/Con/k/n:G system with k < n ≤ 2k is given by L n = (1, 3, 5, . . . ,2 min{k, n − k + 1} − 1, (any arrangement), 2 min{k, n − k + 1}, . . . , 6, 4, 2). Proof When n < 2k, the middle 2k − n components are in every minimal path of the system. These components have to work for the system to work. Thus, the 2k−n most reliable components should be allocated to these positions. After these components have been allocated with the highest reliabilities, the remaining components form a Lin/Con/(n − k)/2(n − k):G system. Thus, we concentrate on the optimal design of a Lin/Con/k/2k:G system. The necessary condition for the optimal design of a Lin/Con/k/n:G system with n = 2k can be stated as follows based on Theorem 9.8: ( pi − p j )( pi−1 − p j+1 ) > 0,
1 < i < j < n.
(9.108)
With arguments similar to those used in the proof of Theorem 9.9, we can prove that the only configuration satisfying condition (9.108) is as stated in Theorem 9.10.
Theorem 9.11 There does not exist any invariant optimal configuration for a Lin/Con/k/n:G system when 2 ≤ k < 12 n. To prove Theorem 9.11, it suffices to exhibit that different choices of component reliabilities will lead to different optimal configurations for a Lin/Con/k/n:G system with 2 ≤ k < n/2. If the component reliability values p[1] = p[2] = · · · = p[n−k−1] = 0, p[n−k] = p[n−k+1] = p, and p[n−k+2] = p[n−k+3] = · · · =
368
CONSECUTIVE-k-OUT-OF-n SYSTEMS
p[n] = 1 are given, where 0 < p < 1, then the optimal arrangement is (0, . . . , 0, p, 1, . . . , 1, p, 0, . . . , 0), ! ! ! i
k−1
n−i−k−1
where i = 0, 1, . . . , n − k − 1. This indicates that if an invariant optimal design exists, the best k + 1 components must be placed adjacent to one another. Now choose s and t such that 0 < s < t < 1. Let p[1] = p[2] = · · · = p[n−k−1] = s, p[n−k] = p[n−k+1] = t, and p[n−k+2] = p[n−k+3] = · · · = 1. Then, it can be shown that the optimal arrangement is (t, 1, . . . , 1, t, s, . . . , s ). ! ! k−1
n−k−1
This indicates that if an invariant optimal design exists, the best k + 1 components must be placed at one end of the linear system. Finally, to obtain a contradiction, choose numbers s and t such that 0 < s < t < 0.5 and let p[1] = p[2] = · · · = p[n−k−1] = s and p[n−k] = p[n−k+1] = · · · = p[n] = t. For this set of given component reliabilities, we can find that the following arrangement gives higher system reliability than the arrangement with the best k + 1 components placed at one end of the linear system: (s, t, . . . , t , s, . . . , s ). ! ! k+1
n−k−2
Considering all three sets of given component reliabilities, we conclude that an invariant optimal design does not exist for a Lin/Con/k/n:G system with 2 ≤ k < n/2. For a complete proof of this theorem, refer to Zuo and Kuo [260]. Theorem 9.12 There does not exist any invariant optimal configuration for a Cir/Con/k/n:G system when 2 ≤ k < (n − 1)/2. Proof Choose p[1] = 0. Then, the Cir/Con/k/n:G system is equivalent to a Lin/Con/k/(n − 1):G system with available reliabilities p[2] , p[3] , . . . , p[n] and 2 ≤ k < (n −1)/2. According to Theorem 9.11, such a Lin/Con/k/(n −1):G system TABLE 9.4 Invariant Optimal Designs of Con/k/n:G Systems k
Linear System
Circular System
k=1
(any arrangement)
(any arrangement)
2 ≤ k < n/2
Does not exist, [260]
Does not exist, [260]
n/2 ≤ k ≤ n − 2 k =n−1
(1, 3, 5, . . . , 2(n − k) − 1, (any arrangement), 2(n − k), . . . , 6, 4, 2) [134]
(1, 3, 5, 7, . . . , n, . . . , 8, 6, 4, 2, 1) [260] (any arrangement) [134]
k=n
(any arrangement)
(any arrangement)
SYSTEM LIFETIME DISTRIBUTION
369
does not have an invariant optimal configuration. As a result, the Cir/Con/k/n:G system with p1 = 0 does not have an invariant optimal configuration, and thus, neither does a general Cir/Con/k/n:G system with 2 ≤ k < (n − 1)/2. With the theorems presented above, the results on invariant optimal designs of linear and circular Con/k/n:G systems are complete. They are summarized in Table 9.4. Exercise 1. Prove Theorem 9.8. 9.3.4 Variant Optimal Design For systems without invariant optimal designs, we provide results that will narrow down the choices of possible optimal designs. For a Lin/Con/k/n:G system with n > 2k > 2, invariant optimal designs do not exist. The minimal path sets in a Lin/Con/k/n:G system are (1, 2, . . . , k), (2, 3, . . . , k + 1), (3, 4, . . . , k + c c c c 2), . . . , (n − k + 1, . . . , n). Using Theorem 6.1, we see that 1 < 2 < 3 < · · · < k c c c and n − k + 1 > n − k + 2 > · · · > n. As a result, the reliabilities of components 1 through k should be in ascending order while the reliabilities of components n −k +1 through n should be in descending order to maximize system reliability. This result was summarized by Kuo et al. [134] as follows: 1. Components from positions 1 through min{k, n − k + 1} should be arranged in nondecreasing order of their reliabilities. 2. Components from positions max{k, n − k + 1} through n should be arranged in nonincreasing order of their reliabilities. Zuo and Kuo [260] also applied the heuristic algorithm presented in Section 9.2.3 to the optimal design of Con/k/n:G systems. The heuristic works equally well for G systems. Zuo [257] provides a lemma stating the relationship between the designs of the Con/k/n F and G systems: Lemma 9.5 The optimal design of the primal system is the worst design of its dual system and vice versa. This lemma illustrates the relationship between the designs of Con/k/n F and G systems. However, it may not be very useful in system design as no one is interested in the worst design of any system. 9.4 SYSTEM LIFETIME DISTRIBUTION So far in this chapter we have been concentrating on Con/k/n systems with constant component reliabilities. The results covered so far are useful when there is a fixed
370
CONSECUTIVE-k-OUT-OF-n SYSTEMS
mission period. In this section, we examine the performance of Con/k/n systems as a function of time. We are interested in the lifetime of the system as a function of the lifetimes of the components. As a result, we will deal with lifetime distributions of the components and the system. Notation • • • • • • • • • • • • •
X i (t): state of component i at time t. It is equal to 1 if the component is working at time t and 0 otherwise. φ(t): state of the system at time t. It is equal to 1 if the system is working at time t and 0 otherwise. Rs (t): system reliability function Fs (t): CDF of the lifetime of the system h s (t): failure rate function of the system Ts : lifetime of the system, a random variable Ti : lifetime of component i, a random variable Fi (t): CDF of the lifetime of component i F(t): CDF of the lifetimes of i.i.d. components Ri (t): reliability function of component i R(t): reliability function of i.i.d. components h i (t): failure rate function of component i h(t): failure rate function of i.i.d. components
9.4.1 Systems with i.i.d. Components When the components are i.i.d., replacing p and q with R(t) and F(t), respectively, in equation (9.17), we obtain the following expression of the reliability function of a Lin/Con/k/n:F system: Rs (t) =
M
N ( j, k, n)R(t)n− j F(t) j ,
(9.109)
j=0
where M is given in equation (9.16) and N ( j, k, n) can be calculated with equation (9.15) for the Lin/Con/k/n:F systems. Writing F(t) = 1 − R(t) in equation (9.109), we get j i j N ( j, k, n) (−1) R(t)n− j+i . Rs (t) = i j=0 i=0 M
(9.110)
Note that R(t)n− j+i is equal to the reliability of a series system with (n − j + i) i.i.d. components with common reliability function R(t).
371
SYSTEM LIFETIME DISTRIBUTION
Using equation (9.110), we have the following expression of the MTTF of a Lin/Con/k/n:F system with i.i.d. components: MTTF =
∞ 0
M
=
R(t) dt =
M
N ( j, k, n)
j=0
N ( j, k, n)
j=0
j
j
(−1)i
i=0
(−1)i
i=0
∞ j R(t)n− j+i dt, i 0
j E(T1,n− j+i ), i
(9.111)
where T1,n denotes the first-order statistic based on a sample of size n from CDF F(t). We can find another expression for the MTTF of a Lin/Con/k/n:F system with i.i.d. components if equation (9.109) is used. Let G i,n (t) denote the CDF of the ithorder statistic, Ti,n , for 1 ≤ i ≤ n. Then, we have n F(t) j R(t)n− j = G j,n (t) − G j+1,n (t), j ∞ ∞ 4 5 1 j n− j F(t) R(t) dt = G j+1,n (t) − G j,n (t) dt n 0 0 j 1 = E(T j+1,n − T j,n ), n j MTTF =
M N ( j, k, n) E(T j+1,n − T j,n ), n j=0 j
(9.112)
(9.113)
(9.114)
where G j,n (t) = 1 − G j,n (t). Equation (9.112) can be interpreted as follows. The probability that there are exactly j failed components and n − j surviving components by time t is equal to the probability that the jth failure occurs before time t subtracted from the probability that the ( j + 1)th failure occurs before t. Theorem 9.13 The r th moment of the lifetime of a Con/k/n:F system can be expressed as E(Tsr )
=r
∞
t 0
r −1
j r Rs (t) dt = N L ( j, k, n) (−1) E(T1,n− j+i ) i j=0 i=0 M
j
i
(9.115) r where E(T1,n− j+i ) = r
∞ 0
t r −1 R(t)n− j+1 dt.
372
CONSECUTIVE-k-OUT-OF-n SYSTEMS
Example 9.7 Consider a Lin/Con/k/n:F system with i.i.d. components. The lifetime distribution of each component follows the Weibull distribution with pdf given in equation (2.102). Other functions and characteristics of the Weibull distribution are given in equations (2.103)–(2.107): ∞ ∞ β n E(T1,n ) = [R(t)] dt = e−n(t/η) dt 0
0
β t y = ηn e dy, y=n , β η 0 1 1 −1/β 1 −1/β = ηn , = ηn 1+ β β β 1 E(T1,n− j+i ) = η(n − j + i)−1/β 1 + . β
∞
−y
−1/β
1
1/β−1
Using equations (9.111) and (9.115), we have E(Ts ) =
M
N ( j, k, n)
j=0
j i=0
(−1)i
1 j η(n − j + i)−1/β 1 + i β
M j 1 j N ( j, k, n) (−1)i = η 1 + (n − j + i)−1/β , β j=0 i i=0 M j r j E(Tsr ) = ηr 1 + N ( j, k, n) (−1)i (n − j + i)−r/β . i β j=0 i=0 The system MTTF equals E(Ts ) and the variance of the system lifetime can be obtained with Var(Ts ) = E(Ts2 ) − E(Ts )2 . For a Cir/Con/k/n:F system with i.i.d. components, the reliability function can be obtained from equation (9.21) in a similar fashion and expressed as RC (t; k, n) =
k−1 j=0
( j + 1)R L (t; k, n − j − 2)
j i=0
(−1)i
j R(t)i+2 , i
(9.116)
where R L (t; k, n − j − 2) is the reliability function of a Lin/Con/k/(n − j − 2):F system with i.i.d. components. 9.4.2 System with Exchangeable Dependent Components So far in this chapter the components of a system are assumed to be s-independent. In this section, we consider components that have s-exchangeable lifetimes. Exchangeable lifetimes include independent lifetimes as a special case. Thus, we say that components with exchangeable lifetimes are somewhat dependent on one another.
373
SYSTEM LIFETIME DISTRIBUTION
In some sense, s-exchangeable components are identical but not independent of one another. In other words, the s-exchangeable component lifetimes are dependent but exchangeable. Shanthikumar [223] provided the definition of exchangeable lifetimes and developed an equation for lifetime distribution evaluation of Lin/Con/k/n:F systems with exchangeable components. Papastavridis [183] developed the equation for the corresponding circular systems. Lau [138] studied s-exchangeable lifetimes for a wider range of systems including the k-out-of-n and linear and circular consecutive systems. Definition 9.2 If the joint pdf of the component lifetimes is invariant under the permutation of the component indices, the component lifetimes are s-exchangeable. Example 9.8 Consider a two-component system. The joint pdf of the lifetimes of the two components is given by f (t1 , t2 ) = ce−λt1 −λt2 −λ0 min{t1 ,t2 } . Based on the given joint pdf, we can see that f (t1 , t2 ) = f (t2 , t1 ). Thus, we say that components 1 and 2 are exchangeable. Note that the lifetimes of these two components are neither independent nor exponentially distributed. If they were i.i.d. with exponential lifetime distributions, we would have the following joint pdf: f (t1 , t2 ) = λ1 λ2 e−(λ1 +λ2 )t = f (t2 , t1 ). Consider a system with exchangeable component lifetimes. Based on Definition 9.2, for any permutation π = {π(1), π(2), . . . , π(n)} of {1, 2, . . . , n} and t > 0, we have Pr(Tπ(1) ≤ t, Tπ(2) ≤ t, . . . , Tπ(n) ≤ t) = Pr(T1 ≤ t, T2 ≤ t, . . . , Tn ≤ t). (9.117) In addition, for t > 0 and j = 0, 1, . . . , n, we have Pr(X π(1) (t) = 0, . . . , X π( j) (t) = 0, X π( j+1) (t) = 1, . . . , X π(n) (t) = 1) = Pr(Tπ(1) ≤ t, . . . , Tπ( j) ≤ t, Tπ( j+1) > t, . . . , Tπ(n) > t) = Pr(T1 ≤ t, T2 ≤ t, . . . , T j ≤ t, T j+1 > t, . . . , Tn > t). (9.118) Let N (t) be the number of failed components at time t. Since there are nj different ways to have exactly j failed components at time t, we have, for t > 0 and j = 0, 1, . . . , n, m j (t) ≡ Pr(N (t) = j) n Pr(T1 ≤ t, . . . , T j ≤ t, T j+1 > t, . . . , Tn > t). = j
(9.119)
374
CONSECUTIVE-k-OUT-OF-n SYSTEMS
By conditioning on the value of N (t), we find the following expression of the system reliability function of a coherent system: Rs (t) =
n
Pr(Ts > t | N (t) = j) Pr(N (t) = j)
j=0
=
n
d(t; n, j)h j (t),
t > 0,
(9.120)
j=0
where d(t; n, j) is the probability that the system is still working with j failed components. From equations (9.118) and (9.119), it follows that for any permutation π of {1, 2, . . . , n}, t > 0, and j = 0, 1, . . . , n, Pr(X π(1) = 0, . . . , X π( j) = 0, X π( j+1) = 1, . . . , X π(n) = 1 | N (t) = j) 1 = . n j
(9.121)
As a result, d(t; n, j) is independent of the time parameter t. Hence, we write d(n, j) = d(t; n, j), t > 0, in general. The reliability of a general system with exchangeable component lifetimes can then be expressed as Rs (t) =
n
d(n, j)m j (t),
(9.122)
j=0
where h j (t) is given in equation (9.119) and d(n, j) is defined as d(n, j) = Pr(Ts > t | N (t) = j).
(9.123)
Example 9.9 Consider a Lin/Con/k/n:F system with i.i.d. components. Such a system is a special case of a system with exchangeable components. We will illustrate the equations derived in this section for reliability evaluation of such a system. With equations (9.119) and (9.123), we have m j (t) = d(n, j) =
n F(t) j R(t)n− j , j N ( j, k, n) , n j
Using equation (9.122), we get
j = 0, 1, . . . , n, j = 0, 1, . . . , n.
375
SYSTEM LIFETIME DISTRIBUTION
n n N ( j, k, n) n F(t) j R(t)n− j = Rs (t) = N ( j, k, n)F(t) j R(t)n− j , n j j=0 j=0 j which is almost identical to equation (9.109). The difference is in the upper limit of the summation. As we discussed before, N ( j, k, n) = 0 for j > M. Based on the foregoing derivations and illustrations for system lifetime distribution when component lifetimes are exchangeable, we can see that there are some similarities between systems with exchangeable components and systems with i.i.d. components. The difference is in the probability for a specific realization of exactly j failures. When the components are i.i.d., this probability is equal to F(t) j R(t)n− j . When the components are exchangeable, we do not have such a closed expression. However, we do know that this probability is constant for each specific realization, which is equal to Pr(T1 ≤ t, T2 ≤ t, . . . , T j ≤ t, T j+1 > t, . . . , Tn > t). As a result, we can modify the equations for reliability evaluation of systems with i.i.d. components to obtain equations for reliability function evaluation of systems with exchangeable components. Using equation (9.17), we obtain the following expression for the reliability function of a Lin/Con/k/n:F system with exchangeable components: R L (t) =
M
N ( j, k, n) Pr(T1 ≤ t, T2 ≤ t, . . . , T j ≤ t, T j+1 > t, . . . , Tn > t),
j=0
(9.124) where M is given in equation (9.16) and several expressions of N ( j, k, n) are given in Section 9.1.1. Using equation (9.22), we obtain the following expression for the reliability function of the Cir/Con/k/n:F system with exchangeable components: RC (t) =
M
Nc ( j, k, n) Pr(T1 ≤ t, T2 ≤ t, . . . , T j ≤ t, T j+1 > t, . . . , Tn > t),
j=0
(9.125) where M is given in equation (9.23) and Nc ( j, k, n) is given in equation (9.24). 9.4.3 System with (k − 1)-Step Markov-Dependent Components In addition to exchangeable dependent components, other dependence mechanisms have been proposed for Con/k/n systems. Papastavridis and Lambiris [185] assume that the reliability of component i is only dependent on the status of component i −1. This kind of dependence is called Markov dependence or one-step Markov depen-
376
CONSECUTIVE-k-OUT-OF-n SYSTEMS
dence. Ge and Wang [83] and Ksir and Boushaba [125] used the same assumption in their studies of Con/k/n systems. A more realistic scheme of component failure dependency is described by Fu [79]. Consider n pumps that are equally spaced for transporting crude oil from point A to point B. Each pump is able to transport oil a distance of k pumps. The system is failed if at least k consecutive pumps are failed. When all pumps are working, each pump only needs to raise the pressure and speed of the oil flow so that it will reach the next pump. However, when i consecutive pumps immediately preceding a specific pump are failed (i < k), this pump has to work much harder in order to push the oil to reach the next pump. Thus, the failure probability of each pump depends on the number of consecutive component failures immediately preceding it. This kind of dependency can only go back at most k − 1 steps; otherwise the system is already failed. In this section, we describe the algorithm for reliability evaluation of Con/k/n:F systems under (k − 1)-step Markov dependence. The one-step Markov dependence can be treated as a special case of what is to be discussed. Assumptions 1. The components are i.i.d. when working under the same load condition. 2. The reliability of each component depends only on the number of consecutive component failures immediately preceding the component. 3. The system is failed if and only if at least k consecutive components are failed. Notation •
•
•
p(m), q(m): working probability and failure probability of a component when there are m consecutive component failures immediately preceding this component, m = 0, 1, 2, . . . , k − 1 R(m; k, n): probability that the n-component system works, components n − m +1 through n are failed, and component n −m works, m = 0, 1, 2, . . . , k −1. In such an n-component system, we say that the length of the last failure string is of length m. R(k, n): reliability of an n-component system
Based on the given assumptions and defined notation, we have R(k, n) =
k−1
R(m; k, n).
(9.126)
m=0
To evaluate R(m; k, n), we need the following recursive equation: m−1 k−1 . q( j) p(i)R(i; k, n − m − 1), R(m; k, n) = j=0
i=0
(9.127)
SYSTEM LIFETIME DISTRIBUTION
377
with the following boundary conditions: R(0; k, 0) = 1,
(9.128)
R(0; k, 1) = p(0),
(9.129)
R(i; k, i) =
i−1 .
q( j),
i = 1, 2, . . . , k − 1,
(9.130)
i < m.
(9.131)
j=0
R(m; k, i) = 0,
To find the computational complexity of this algorithm, we note that we need to find R(m; k, i) for 0 ≤ m ≤ k − 1 and 1 ≤ i ≤ n. The total number of entries in R(m; k, n) is then O(kn). To find R(m; k, n) for each set of given m and n values requires O(k) operations with equation (9.127). Finding R(k, n) from R(m; k, n) with equation (9.126) requires O(k) operations. As a result, the algorithm has a complexity of O(k 2 n). Example 9.10 Consider a Lin/Con/3/4:F system with two-step Markov dependence. Assume q(0) = 0.1, q(1) = 0.2, and q(2) = 0.3. Then, we have p(0) = 0.9, p(1) = 0.8, and p(2) = 0.7. Using the boundary conditions, we have R(0; 3, 0) = 1,
R(0; 3, 1) = p(0) = 0.9000,
R(1; 3, 1) = q(0) = 0.1000,
R(2; 3, 2) = q(0)q(1) = 0.0200,
R(1; 3, 0) = 0,
R(2; 3, 0) = 0
R(2; 3, 1) = 0.
Using the recursive equation, we have the following results: R(0; 3, 2) = p(0)R(0; 3, 1) + p(1)R(1; 3, 1) + p(2)R(2; 3, 1) = 0.8900, R(1; 3, 2) = q(0) p(0) = 0.0900, R(0; 3, 3) = p(0)R(0, 3, 2) + p(1)R(1; 3, 2) + p(2)R(2; 3, 2) = 0.8870, R(1; 3, 3) = q(0) [ p(0)R(0; 3, 1) + p(1)R(1; 3, 1) + p(2)R(2; 3, 1)] = 0.0890, R(2; 3, 3) = q(0)q(1) [ p(0)R(0; 3, 0) + p(1)R(1; 3, 0) + p(2)R(2; 3, 0)] = 0.0180, R(0; 3, 4) = p(0)R(0; 3, 3) + p(1)R(1; 3, 3) + p(2)R(2; 3, 3) = 0.8821, R(1; 3, 4) = q(0) [ p(0)R(0; 3, 2) + p(1)R(1; 3, 2) + p(2)R(2; 3, 2)] = 0.0887, R(2; 3, 4) = q(0)q(1) [ p(0)R(0; 3, 1) + p(1)R(1; 3, 1) + p(2)R(2; 3, 1)] = 0.0178, R(3, 4) = R(0; 3, 4) + R(1; 3, 4) + R(2; 3, 4) = 0.9886.
378
CONSECUTIVE-k-OUT-OF-n SYSTEMS
9.4.4 Repairable Consecutive-k-out-of-n Systems Several studies have been reported on repairable Con/k/n systems. Zhang and Wang [255] studied a Lin/Con/2/n:F system with a single repairman and the first-in, firstserve repair policy. Zhang et al. [256] considered a Cir/Con/2/n:F system with a single repairman and a priority repair policy. Lam and Zhang [136] studied a Con/k/n:F system with one-step Markov dependence. They all assume that components are i.i.d., component lifetimes and repair times are exponentially distributed, and repairs are perfect. In this section, we use the Cir/Con/2/n:F system to illustrate the methodology used in the analyses of the repairable systems. Assumptions 1. The system has a Cir/Con/2/n:F structure. 2. Components are i.i.d. following the exponential lifetime distribution with parameter λ. 3. The repair time for a failed component is exponentially distributed with parameter µ. 4. Repairs are perfect. 5. There is only one repairman. 6. All components are working at time zero. 7. A failed component whose repair would restore the functioning of the system has a higher priority for repair. 8. Once the system is failed, no more component failures may occur. We explain the priority repair policy with an example. With the priority repair policy, components 3 and 4 have a higher priority than other failed components if the following string represents the states of the components in a Lin/Con/2/6:F system: 010010, where 0 indicates the failure state and 1 the working state. Components 3 and 4 form a minimal cut to the system and have caused the system to fail and thus are called critical components. Let N (t) represent the state of the Cir/Con/2/n:F system at time t. Then, N (t) may take the following values:
N (t) =
0 1 2 .. .
n2 −2 −3 .. .
if at time t all components work and the system works, if at time t one component fails and the system works, if at time t two components fail and the system works, if, at time t, n2 components fail and the system works, if at time t two components fail and the system fails, if at time t three components fail and the system fails,
− n2 − 1 if, at time t, n2 + 1 components fail and the system fails.
SYSTEM LIFETIME DISTRIBUTION
379
Assume {N (t), t ≥ 0} is a continuous-time homogeneous Markov chain with state space = {−n/2 − 1, . . . , −2, 0, 1, 2, 3, . . . , n/2}. Obviously, the set of working states is W = {0, 1, 2, . . . , n/2} and the set of failed states is F = {−2, −3, . . . , −n/2 − 1}. We need to find the number of different component state vectors for a working system state, namely, state i for i ∈ W . Let MiL and MiC indicate this number in a linear and circular Con/2/n:F system, respectively. Based on the definitions of MiL and N ( j, k, n) defined in equation (9.2), we have n−i +1 , i ∈ W. (9.132) MiL = N (i, 2, n) = i Based on the definitions of MiC and Nc ( j, k, n) given in equations (9.22) and (9.24), we have n n n−i C N (i, 2, n − 1) = , i ∈ W. (9.133) Mi = Nc (i, 2, n) = i n−i n−i Another expression of MiC is given by Zhang et al. [256] as follows: n−i −1 n−i +1 C , i ∈ W, − Mi = i −2 i where ni ≡ 0 for n > 0 and i < 0. For any i ∈ W , j ∈ , j = i, and 1 ≤ l ≤ Mi , we have
(9.134)
Pi j (t) ≡ Pr(N (t + t) = j | N (t) = i) =
Mi
Pr(the system is in case l of state i | N (t) = i)
l=1
× Pr(N (t + t) = j | the system is in case l of state i at time t)
Mi 1 = ql j t + o(t), (9.135) Mi l=1 where we have used the following equations: Pr(the system is in case l of state i | N (t) = i) =
Pr(the system is in case l of state i at time t ) Pr(N (t) = i)
=
pn−i (1 − p)i 1 = , n−i i Mi Mi p (1 − p)
(9.136)
Pr(N (t + t) = j | the system is in case l of state i at time t) = ql j t + o(t).
(9.137)
380
CONSECUTIVE-k-OUT-OF-n SYSTEMS
Equation (9.137) is due to the assumption that all components follow the i.i.d. exponential lifetime distribution. With the expression Pi j (t) for i = j in equation (9.135), we have the following expression for Pii (t): Mi 1 ql j t + o(t). (9.138) Pii (t) = 1 − Mi j=i l=1 Based on the expressions of Pi j (t) given in equations (9.135) and (9.138), we define the following transition rate matrix: P = ( pi j ),
i, j ∈ ,
pi j = lim
Pi j (t) , t
i, j ∈ ,
pii = lim
Pii−1 (t) , t
where t→0 t→0
i = j,
i ∈ .
(9.139) (9.140)
The explicit expressions of Pi j (t) for i, j ∈ for a Cir/Con/2/n:F system are given as P0,1 (t)=nλ t + o(t),
(9.141)
P0,0 (t)=1 − nλ t + o(t),
(9.142)
P0, j (t)=o(t),
j=0, 1
(9.143)
P1,0 (t)=µ t + o(t),
(9.144)
P1,2 (t)=(n − 3)λ t + o(t),
(9.145)
P1,−2 (t)=2λ t + o(t),
(9.146)
P1,1 (t)=1 − [(n − 1)λ + µ] t + o(t), P1, j (t)=o(t),
(9.147) j=0, 1, 2, −2,
(9.148)
i =2, 3, . . . , n/2,
(9.149)
i =2, 3, . . . , n/2,
(9.150)
i =2, 3, . . . , n/2,
(9.151)
Pi,i (t)=1 − ((n − i)λ + µ) t + o(t),
i =2, 3, . . . , n/2,
(9.152)
Pi, j (t)=o(t),
j=i, i − 1, i + 1, −(i + 1),
(9.153)
i =2, 3, . . . , n/2 + 1,
(9.154)
i =2, 3, . . . , n/2 + 1,
(9.155)
j=i − 1, −i.
(9.156)
Pi,i−1 (t)=µ t + o(t), Mi+1 (i + 1)λ t + o(t), Mi (i + 1)Mi+1 λ t + o(t), Pi,−(i+1) (t)= (n − i) − Mi Pi,i+1 (t)=
P−i,i−1 (t)=µ t + o(t), P−i,−i (t)=1 − µ t + o(t), P−i, j (t)=o(t),
SYSTEM LIFETIME DISTRIBUTION
381
In the following, we explain the derivation of equation (9.150). The other equations are relatively straightforward. Because all components in the system are identical, the failure rate of each component is λ. Every case of state i occurs with an equal probability, that is, 1/Mi . Based on the definition of the generalized transition probability and the model assumptions, we have Pi,i+1 (t) =
{total number of ways for transition from i to i + 1} λ t + o(t). Mi
We know that every case of state i + 1 is the result of a transition from state i. There are i + 1 possible cases of state i that may change into state i + 1 and there are Mi+1 possible cases of state i + 1. Hence, the total number of ways for transition from i to i + 1 is equal to (i + 1)Mi+1 . Equation (9.150) follows. We use the following example to illustrate the above results from which many system reliability indexes can be derived. Example 9.11 Consider a Cir/Con/2/6:F system with i.i.d. components. Each component has a constant failure rate of λ. There is only one repairman with a repair rate of µ. Critical components have higher priority for receiving repair. The system state space is = {0, 1, 2, 3, −2, −3, −4}, where W = {0, 1, 2, 3} and F = {−2, −3, −4}. The transition rate matrix P has been identified to be −6λ 6λ µ −(5λ + µ) 0 µ P= 0 0 0 µ 0 0 0 0
0 0 0 3λ 0 2λ 2 −(4λ + µ) 0 3λ µ −(3λ + µ) 0 0 0 −µ µ 0 0 0 µ 0
0 0 10 3λ 0 0 −µ 0
0 0 0 3λ . (9.157) 0 0 −µ
To find the probability distribution of the state of the system, we need to define the following: P j (t) = Pr(N (t) = j),
j ∈ .
According to the Kolmogorov–Feller forward equation,we can obtain the Fokker– Planck equation as follows: P j (t) = lim Pl (t) pl j , j ∈ , (9.158) l∈
with the initial conditions 1, P j (0) = 0,
j = 0, j ∈ ,
j = 0.
382
CONSECUTIVE-k-OUT-OF-n SYSTEMS
Letting P(t) = (P0 (t), P1 (t), P2 (t), P3 (t), P−2 (t), P−3 (t), P−4 (t)) and using equation (9.157), we can rewrite equation (9.158) as P (t) = P(t)P,
(9.159)
with initial condition P(0). The Laplace transform technique may be used to solve this differential equation. The availability function of the system is given by A(t) = P0 (t) + P1 (t) + P2 (t) + P3 (t). The steady-state availability of the system is given by A=
6λµ3 + 18λ2 µ2 + 12λ3 µ + µ4 . 15λµ3 + 102λ2 µ2 + 126λ3 µ + 36λ4 + µ4 − 9λ2 µ − 12λµ2
(9.160)
The rate of occurrence of failure (ROF) of the system is defined to be the rate of transition from a working state to a failure state, which is given by Pi (t) pi j = 2λP1 (t) + 10 m f (t) = 3 λP2 (t) + 3λP3 (t). i∈W, j∈F
The steady-state ROF of the system is given by m f = lim m f (t) t→∞
=
12λ2 µ3 + 36λ4 µ + 60λ3 µ2 . (9.161) 15λµ3 + 102λ2 µ2 + 126λ3 µ + 36λ4 + µ4 − 9λ2 µ − 12λµ2
When the system is in the steady state, according to equations (9.160) and (9.161), we have the following expression of the MTBF of the system: MTBF =
µ3 + 6λµ2 + 18λ2 µ + 12λ3 A = . mf 12λ2 µ2 + 60λ3 µ + 36λ4
(9.162)
To find the mean time to the first failure (MTTFF) of the system, we conE(t), t ≥ 0} with F = sider a continuous-time homogeneous Markov process { N {−2, −3, −4} as the absorbing state. Repairs are performed on failed components as long as the system is not failed yet. Obviously, the difference between Markov proE(t), t ≥ 0} is that the set of failed states in {N (t), t ≥ 0} cesses {N (t), t ≥ 0} and { N E(t), t ≥ 0}. Now let becomes the set of absorbing states in { N E(t) = j), P˜ j (t) = Pr( N
j ∈ .
Then the system reliability is R(t) = P˜0 (t) + P˜1 (t) + P˜2 (t) + P˜3 (t).
SUMMARY
383
It is easy to derive the following systems of differential equations: E P W (t) = E PW (t)B,
(9.163)
where E PW (t) = ( P˜0 (t), P˜1 (t), P˜2 (t), P˜3 (t)), E PW (0) = (1, 0, 0, 0). The matrix B is the 4 × 4 submatrix at the top left corner of the matrix P: −6λ 6λ 0 0 µ −(5λ + µ) 3λ 0 . B= 2 0 µ −(4λ + µ) λ 3 0 0 µ −(3λ + µ) The Laplace transform technique may again be used to solve the differential equations in equation (9.163). The system MTTFF is MTTFF =
594λ3 + 272λ2 µ + 43λµ2 + 3µ3 . 1080λ4 + 408λ3 µ + 36λ2 µ2
(9.164)
For detailed derivations throughout this example, readers are referred to Zhang et al. [256].
9.5 SUMMARY Reliability evaluation of Con/k/n systems may be performed with the event decomposition approach, the MIS approach, or the approximation approach. There exist closed-form solutions for systems with i.i.d. components. The most efficient formulas for reliability evaluation of Con/k/n systems with independent components reported so far are recursive. The linear and circular consecutive F and G systems are duals of each other. Lifetime distributions of the systems may be derived, even when the components are dependent. One of the least studied areas is maintainability of repairable systems. Several invariant optimal configurations for Con/k/n systems have been reported. For some k and n combinations, there does not exist an invariant optimal configuration. Some importance measures may be used for variant optimal design, such as rearrangement inequality and B-importance. The Con/k/n system structures have been extended to more general applications, to be discussed in later chapters. Additional coverage of the Con/k/n and related systems is provided by Chang et al. [45].
10 MULTIDIMENSIONAL CONSECUTIVE-k-OUT-OF-n SYSTEMS
The linear and circular Con/k/n F and G systems have been extensively studied since the early 1980s as reviewed in previous chapters. A system is specified by the number of components denoted by n and the minimum number of consecutive failed (working) components that would cause system failure (working) denoted by k. The system fails (works) if and only if at least k consecutive components in the system fail (work). This type of system can be regarded as having a one-dimensional arrangement of components. Recently, there has been an endeavor to extend onedimensional reliability models to two- or three- or d-dimensional versions (d ≥ 2). Salvia and Lasher [214] are the first authors who extended one-dimensional Con/k/n systems to multidimensions. According to their definition, the two- or three-dimensional system is a square or cubic grid of side n (containing n 2 or n 3 components). The system fails if and only if there is at least one square or cube of side k (2 ≤ k ≤ n − 1) that contains all failed components. Boehme et al. [36] define the two-dimensional Con/k/n:F system as a linear or circular connected-Xout-of-(m, n):F lattice system. The components in such a system are arranged into a rectangular pattern with m rows and n columns. The connected-X may represent (r, s), and, in this case, the system fails whenever there exists a grid of r rows and s columns that consists of all failed components (1 ≤ r ≤ m, 1 ≤ s ≤ n). The connected-X may also represent (r, s)-or-(s, r ) and, in this case, the system fails whenever there exists a grid of r rows and s columns or s rows and r columns that consists of all failed components. The models described by Boehme et al. [36] are more general because m may not be equal to n and s may not be equal to r . Generally, we have 1 ≤ r, s ≤ m, n. The linear and circular (m, n)-lattice systems are depicted in Figures 10.1a and b. We will denote such systems as linear or circular (r, s)/(m, n):F or G systems. 384
MULTIDIMENSIONAL CONSECUTIVE-k-OUT-OF-n SYSTEMS
385
... ...
...
...
...
...
...
...
... ...
...
...
(a)
... . ..
...
...
...
. ... ..
...
... (b)
(c) FIGURE 10.1
c 1992 IEEE.) Linear and circular (m, n)-lattice systems. (From [36],
386
MULTIDIMENSIONAL CONSECUTIVE-k-OUT-OF-n SYSTEMS
The following examples are given by Salvia and Lasher [214] and Boehme et al. [36] to illustrate where such multidimensional models may be used: 1. A group of connector pins for an electronic device includes some redundancy in its design. The connection is good unless a square of size 2 with four pins is defective. 2. The presence of a disease is diagnosed by reading an X-ray. Let p be the probability that an individual cell (or other small portion of the X-ray) is healthy. Unless diseased cells are aggregated into a sufficiently large pattern (say a k ×k square), the radiologist might not detect their presence. 3. In medical diagnostics, it may be more appropriate to consider a threedimensional grid in order to calculate the detection probability of patterns in a three-dimensional space. 4. A camera surveillance system has 16 cameras arranged into four rows and four columns to monitor a specified area. The distances between two adjacent cameras in the same row or the same column are equal to d. Each camera is able to monitor a disk of radius d. Such a system can be modeled as a linear (2,1)-or-(1,2)-out-of-(4,4):F lattice system. 5. A cylindrical object (e.g., a reactor) covered by a system of feelers for measuring temperature is shown in Figure 10.1c. Each of the m circles has n feelers. The system is failed whenever a (3,2)-grid of failed components exists. This system can be modeled as a circular (3,2)-out-of-(m, n):F lattice system. In this chapter, we explore the relationship between multidimensional Con/k/n systems and one-dimensional Con/k/n systems. Issues such as reliability evaluation and optimal design for multidimensional Con/k/n systems are studied. Because of the increase in the number of dimensions, the algorithms for reliability evaluation of general multidimensional Con/k/n systems are much more complicated. We often have to rely on bounds to estimate system reliability. Notation • • • •
r, s, m, n: given system parameter (r, s)/(m, n):F: a connected (r, s)-out-of-(m, n):F lattice system R(k, n, p): reliability of a one-dimensional Con/k/n:F system with i.i.d. component reliability p R(k, n, p1 , p2 , . . . , pn ): reliability of a one-dimensional Con/k/n:F system with component reliabilities p1 , p2 , . . . , pn
10.1 SYSTEM RELIABILITY EVALUATION 10.1.1 Special Multidimensional Systems Consider a two-dimensional system with mn components evenly arranged into m rows and n columns (m, n ≥ 2). The system fails if and only if there exists a grid
SYSTEM RELIABILITY EVALUATION
387
of size r × s such that all components in this grid are failed. In this section, we explore some special values of r and/or s that would make system reliability evaluation utilizing algorithms for one-dimensional systems possible. When r = s = 1, linear and circular (1, 1)/(m, n):F systems become series systems because the failure of any component causes system failure. When r = m and s = n, linear and circular (m, n)/(m, n):F systems become parallel systems because the system fails only if all components are failed. Under these circumstances, we simply apply the results on series and parallel systems that have been covered earlier in this book. When r = 1 and s > 1, each row of the linear (or circular) (1, s)/(m, n):F system can be treated as a one-dimensional linear (or circular) Con/s/n:F subsystem. There are m subsystems in the system. These m subsystems can be considered to be connected in series because the failure of any subsystem causes system failure. The reliability of the system is then a product of the reliabilities of the m subsystems. When s = 1 and r > 1, each column can be treated as a linear (or circular) Con/r/m:F subsystem. The reliability of the system is the product of the reliabilities of the n subsystems. When r = m and 1 < s < n, the m components in each column can be considered to form a parallel subsystem. The reliability of each subsystem can then be calculated following the parallel system structure. Hwang and Shi [105] call such a system a redundant Con/k/n:F system. The n subsystems then form the corresponding linear and circular Con/s/n:F system structure. When s = n and 1 < r < m, each row has a parallel system structure. The rows form a linear or circular Con/r/m:F system structure. Exercise 1. Compute the reliabilities of a linear (1, 4)/(2, 8):F system with i.i.d. component reliability 0.9 and a circular (2, 3)/(2, 6):F system with i.i.d. component reliability 0.95.
10.1.2 General Two-Dimensional Systems For a general two-dimensional Con/k/n system with 1 < r < m and 1 < s < n, there are no simple formulas for reliability evaluation. However, the SDP method discussed in Chapter 5 can be used to evaluate the reliability of such systems. In a linear two-dimensional system, each rectangle of r × s components forms a minimal cut in an F system and a minimal path in a G system. There are (m −r +1)(n −s +1) minimal cuts in such an F system and (m − r + 1)(n − s + 1) minimal paths in such a G system. There are (m −r +1)n minimal cuts in a circular F system and (m −r +1)n minimal paths in a circular G system. The SDP method operates by taking as data the minimal cuts (or paths) of the system and using each cut (or path) to generate a set of disjoint terms that are themselves disjoint with all the terms already obtained for other cuts (or paths).
388
MULTIDIMENSIONAL CONSECUTIVE-k-OUT-OF-n SYSTEMS
Yamamoto and Miyakawa [251] report a recursive algorithm (called the YM algorithm) for reliability evaluation of a two-dimensional linear (r, s)/(m, n):F system. This algorithm is more efficient than the SDP method. In the following, we present the YM algorithm. Additional Notation •
• •
• • • • •
•
R j : probability that there are no consecutive r failed components in the jth column of the system. It is the reliability of the Lin/Con/r/m:F subsystem consisting of the components forming column j.
(v): a set of (v − r + 1)-dimensional binary vectors (r ≤ v ≤ m) δ: a (v − r + 1)-dimensional binary vector (r ≤ v ≤ m). The ith position in this vector has a value of 1 if the components (i, ·), (i + 1, ·), . . . , (i + r − 1, ·) in a certain column of the system are all failed and a value of 0 otherwise. A vector δ of dimension m − r + 1 represents the states of all the components in a column. We are only interested in vectors that have a value of 1 in them. However, not all vectors with a value of 1 in them can be such a δ vector. For example, suppose r = 2 and m = 4. We have m − r + 1 = 3. A threedimensional vector (1, 0, 1) cannot be this kind of δ vector that is defined. Using our definition of δ, (1, 0, 1) would represent the event that components 1 and 2 are failed, components 2 and 3 are not failed simultaneously, and components 3 and 4 are failed. Such an event is apparently impossible. An algorithm will be given to generate such δ vectors. M(δ): index of the last 1 in the (v − r + 1)-dimensional binary vector δ (r ≤ v ≤ m). For example, if δ = (1, 0, 0, 1, 0, 0), then M(δ) = 4. F j (δ): probability that the event represented by δ is realized for the components in column j h i : an integer variable in the algorithm, 0 ≤ h i ≤ s, for 1 ≤ i ≤ m − r + 1 i : an integer variable in the algorithm, 0 ≤ i ≤ s, for 1 ≤ i ≤ m − r + 1 R( j; h 1 , . . . , h m−r +1 ): reliability of the linear (r, s)/(m, j):F subsystem augmented with failure grids of sizes r × h 1 , r × h 2 , . . . , r × h m−r +1 (1 ≤ j ≤ n). Each of the failure grids is dependent on its neighbor grids. For example, h i indicates that component (i, j + 1) is at the top-left corner of the failure grid of size r × h i for 1 ≤ i ≤ m − r + 1. R( j): reliability of the linear (r, s)/(m, j):F subsystem, composed of the first j columns (1 ≤ j ≤ n). Then R(n) is the reliability of the two-dimensional system; R( j) = R( j; h i = 0, 1 ≤ i ≤ m − r + 1).
First we need to obtain the set of vectors (m), which will contain all valid δ vectors of dimension m − r + 1 that have at least one value 1 in each δ vector. The following recursive equations are used to find (m):
(r ) = {(1)},
(10.1)
389
SYSTEM RELIABILITY EVALUATION
(v + 1) = {(δ(v), 0) for all δ(v) ∈ (v)} ∪ {(0, . . . , 0, 1)} ∪ {(δ(v), 1) if M(δ(v)) = v − r + 1 or M(δ(v)) ≤ v − 2r + 1, where δ(v) ∈ (v)}
for r ≤ v ≤ m − 1.
(10.2)
Since each δ vector indicates the specific states of the components in a certain column, we need to find out the probability for δ to occur. Calculate F j (δ) for each δ ∈ (m) and j = 1, . . . , n. This can be evaluated directly based on the given δ vector and the reliabilities of the components. The following recursive equations are needed to calculate R(n; h i = 0, 1 ≤ i ≤ m − r + 1). R( j; h 1 , . . . , h m−r +1 ) = R j × R( j − 1; 0, 0, . . . , 0) R( j − 1; 1 , . . . , m−r +1 )F j (δ), +
(10.3)
δ∈ (m)
where R j is the reliability of the Lin/Con/r/m:F subsystem with all components in columns j and can be obtained with equation (9.36) and i =
hi + 1 0
if δi = 1, if δi = 0.
The boundary condition for equation (10.3) is
R( j; h 1 , . . . , h m−r +1 ) =
0
if
1
if 0 ≤ j < s −
max
1≤i≤m−r +1
h i = s, max
1≤i≤m−r +1
hi .
(10.4)
The reliability of the linear (r, s)/(m, j):F system is R( j) = R( j; h i = 0, 1 ≤ i ≤ m − r + 1)
for 1 ≤ j ≤ n.
(10.5)
The cardinality of (m) does not exceed 2m−r +1 . Because the reliability of a Con/r/m:F system can be computed in O(m) time (see Section 9.1.2), the computing time for each F j (δ) is O(m 2 ). The number of R( j; h 1 , . . . , h m−r +1 ) terms is (n + 1)(s + 1)m−r +1 . Therefore, the system reliability can be computed in O(nm 2 s m−r ) time by the YM algorithm. In the computer implementation, the YM algorithm requires at least the following memory space: • •
2x binary arrays of size m − r + 1 as the working area for the generation of
(m), where x < 2m−r +1 , and two real arrays of size s m−r +1 as the working area for the calculation of R(n; 0, 0, . . . , 0) using recursive relations.
390
MULTIDIMENSIONAL CONSECUTIVE-k-OUT-OF-n SYSTEMS
When all the components have an equal reliability p, it is possible to reduce the computing time and memory space for system reliability evaluation with the following theorem. Theorem 10.1 Let pi j = pm+1−i, j , i = 1, 2, . . . , m/2, j = 1, 2, . . . , n: R( j; h 1 , . . . , h m−r +1 ) = R( j; h m−r +1 , h m−r , . . . , h 1 ).
(10.6)
Example 10.1 We calculate the reliability of the linear (2, 2)/(4, n):F system with pi j = p for all i, j. Then r = s = 2, m = 4. 1. From equation (10.2), we have
(2) = {(1)},
(3) = {(1, 0), (0, 1), (1, 1)},
(4) = {(1, 0, 0), (0, 1, 0), (1, 1, 0), (0, 0, 1), (0, 1, 1), (1, 1, 1)}. 2. We also obtain the following for j = 1, 2, . . . , n: F j (1, 0, 0) = F j (0, 0, 1) = pq 2 , F j (1, 1, 0) = F j (0, 1, 1) = pq 3 , F j (0, 1, 0) = p 2 q 2 , F j (1, 1, 1) = q 4 . 3. R j = 1 − q 2 − 2 pq 2 for j = 1, 2, 3. The following recursive formulas are given, based on equation (10.3): R( j; h 1 , h 2 , h 3 ) = (1 − q 2 − 2 pq 2 )R( j − 1; 0, 0, 0) 4 5 R( j − 1; 1 , 2 , 3 )F j (δ) , + δ∈ (4)
where
i =
hi + 1 0
if δi = 1, if δi = 0.
The boundary condition is R( j; h 1 , h 2 , h 3 ) =
0
if max h i = s,
1
if 0 ≤ j < s − max h i .
1≤i≤3
1≤i≤3
SYSTEM RELIABILITY EVALUATION
391
TABLE 10.1 Reliability of Linear (2, 2)/(4, n):F System p
n=4
n = 10
n = 50
0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95
0.8250 0.8899 0.9370 0.9681 0.9864 0.9956 0.9991 0.9999
0.5711 0.7107 0.5256 0.9086 0.9603 0.9869 0.9973 0.9998
0.0492 0.1588 0.3553 0.5953 0.8026 0.9309 0.9857 0.9991
From Theorem 10.1, we have the following for j = 0, 1, 2, . . . , n: R( j; 1, 0, 0) = R( j; 0, 0, 1),
R( j; 1, 1, 0) = R( j; 0, 1, 1).
Therefore, the reliability of the linear (2, 2)/(4, n):F system is R(n; 0, 0, 0) by equation (10.5). Table 10.1 lists the reliabilities of the linear (2, 2)/(4, n):F system with i.i.d. components. According to the authors, this method takes much less time than the SDP method. Exercise 1. Use the SDP method to evaluate the reliability of a linear (2, 3)/(5, 6):F system with i.i.d. component reliability 0.94. 10.1.3 Bounds and Approximations When the components are i.i.d., Yamamoto and Miyakawa [251] provide the following lower and upper bounds on the reliability of the system: (10.7) LB1 = R(r, m; 1 − q s )n−s+1 , = > n−s UB1 = R(r, m; 1 − q s ) 1 − [1 − R(r, m; 1 − q s )]R(r, m; 1 − q) , (10.8) where R(r, m; p) is the reliability of a Lin/Con/r/m:F system with i.i.d. component reliability p; LB1 is obtained from the observation that the probability of the intersection of n − s + 1 events is greater than or equal to the product of the probabilities of these individual events if these events are associated. Event i for 1 ≤ i ≤ n − s + 1 corresponds to the event that the linear (r, s)/(m, s):F subsystem consisting of columns i, i + 1, . . . , i + s − 1 is working. Proof of Equation (10.8) To verify UB1 , define the following: •
E j : the subsystem consisting of the first j columns works, s ≤ j ≤ n
392 • •
MULTIDIMENSIONAL CONSECUTIVE-k-OUT-OF-n SYSTEMS
E ( j) : the subsystem consisting of columns j − s + 1, j − s + 2, . . . , j works, s≤ j ≤n A( j) : the jth column has less than r consecutive failed components, 1 ≤ j ≤ n−s
First we note the following: Pr(E j+1 | E j ) ≥ Pr(E ( j+1) ) Pr(A( j−s+1) ).
(10.9)
Then, we can express the reliability of the linear (r, s)/(m, n):F system, Rsys , as Rsys = Pr(E n ) = Pr(E s )
n−1 . j=s
Pr(E j+1 ) Pr(E j )
n−1 .4
5 1 − Pr(E j+1 | E j ) .
(10.10)
5 1 − Pr(E ( j+1) ) Pr(A( j−s+1) ) .
(10.11)
= Pr(E s )
j=s
Using the result in (10.9), we have Rsys ≤ Pr(E s )
n−1 .4 j=s
When the components are i.i.d., we have Rsys ≤ R(r, m; 1 − q s ){1 − [1 − R(r, m; 1 − q s )]R(r, m; 1 − q)}n−s ,
(10.12)
which is the expression of UB1 . Salvia and Lasher [214] provide lower and upper bounds for two-dimensional systems with i.i.d. components assuming r = s ≡ k and m = n. An error in their upper bound equation was pointed out by Ksir [124]. In the following, we extend their bounds to cases when r may not be equal to s and m may not be equal to n. The system certainly works when at most r − 1 rows have s or more consecutive failures. Thus, a lower bound is equal to the probability that at most r − 1 rows have s or more consecutive failures, expressed as LB2 =
r −1 i=0
m [1 − R(s, n)]i [R(s, n)]m−i , i
(10.13)
where R(s, n) is the reliability of a subsystem consisting of n components in a row and having the Con/s/n:F structure. To find an upper bound on system reliability, the m rows can be divided into m/r +1 groups. Group i consists of r consecutive rows starting from row (i−1)r +1 for 1 ≤ i ≤ m/r +1. Each of these groups is treated as an (r, s)/(r, n):F system. The methods for reliability evaluation of an (r, s)/(r, n):F
393
SYSTEM RELIABILITY EVALUATION
system have been discussed in the previous section. Since the last group has less than r rows, it must be working. Each of the groups works if the original system works. As a result, we have an upper bound on the reliability of the original system, expressed as UB2 = [R((r, s)/(r, n))]m/r ,
(10.14)
where R((r, s)/(r, n)) is the reliability of a (r, s)/(r, n):F system with i.i.d. components, which can be expressed as the reliability of a Lin/Con/s/n:F system with i.i.d. component reliability 1 − q r . The approaches used by Yamamoto and Miyakawa [251] and Salvia and Lasher [214] in their derivations of the upper and lower bounds on system reliability rely on the division of the rows into subsystems. The columns can also be divided into subsystems. Malinowski and Preuss [154] use both approaches in finding the lower and upper bounds of two-dimensional system reliability. They also consider systems with independent components. When the components are i.i.d., another set of lower and upper bounds on reliability of a linear (r, s)/(m, n):F system is given by LB3 = R(s, n; 1 − q r )m−r +1 , n/s
UB3 = [R((r, s)/(m, s))]
(10.15) .
(10.16)
The lower bound of a linear (r, s)/(m, n):F system with i.i.d. components can be chosen to be the maximum of LB1 , LB2 , and LB3 . The upper bound of a linear (r, s)/(m, n):F system with i.i.d. components can be chosen to be the minimum of UB1 , UB2 , and UB3 . When the components are independent, the following lower and upper bounds for both linear and circular systems are provided by Malinowski and Preuss [154]: n m−r .+1 . LB4 = max (10.17) Pr(A (i, r )), Pr(B ( j, s)) , i=1
UB4 = min
m/r .
j=1
Pr(A ((i − 1)r + 1, r )),
i=1
n/s .
Pr(B (( j − 1)s + 1, s)) ,
j=1
(10.18) where ∈ {L , C},
(10.19)
n L = n − s + 1,
(10.20)
n C = n,
(10.21)
Pr(A (w, r )) = R (s; n, ρw,1 , ρw,2 , . . . , ρw,n ), ρw, j = 1 −
w+r .−1
(1 − pk, j ),
k=w
(10.22) (10.23)
394
MULTIDIMENSIONAL CONSECUTIVE-k-OUT-OF-n SYSTEMS
Pr(B L (w, s)) = R L (r ; m, σ1,w , σ2,w , . . . , σm,w ), σi,w = 1 −
w+s−1 .
(1 − pi,l ),
(10.24) (10.25)
l=w
Pr(BC (w, s)) = R L (r ; m, τ1,w , τ2,w , . . . , τm,w ), τi,w = 1 −
w⊕(s−1) .
(1 − pi,l ),
(10.26) (10.27)
l=w
j ⊕ l ≡ [( j + l − 1) mod n] + 1.
(10.28)
For the linear (k, k)/(n, n):F systems with i.i.d. components, the following upper bound is provided by Koutras et al. [121]: 2q k2
UB5 = e−(n−k+1)
2q k2
+ (1 − e−(n−k+1)
2
)[(2k − 1)2 q k + 4Q(k)], (10.29)
where Q(k) =
k k
qk
2 −i j
− 1.
(10.30)
i=1 j=1
A lower bound can be obtained using all minimal cuts directly: 2 2 (n−k+1) . LB5 = 1 − q k
(10.31)
A version of upper and lower bounds for systems with independent components is also provided by Koutras et al. [121]. Koutras et al. [123] provide another upper bound on the reliability of a linear (k, k)/(n, n):F system with independent components: n .
UB6 = R(k, n, {Q k j })
4 5
1 − 1 − R(k, n, {Q i j }) R(k, n, {qi−k, j }) ,
i=k+1
(10.32) where Qi j =
i .
ql j ,
j = 1, 2, . . . , n,
(10.33)
i = 1, 2, . . . , n,
(10.34)
l=i−k+1
{Q i j } = {Q i1 , Q i2 , . . . , Q in }, {qi−k, j } = {qi−k,1 qi−k,2 , . . . , qi−k,n }.
(10.35)
SYSTEM LOGIC FUNCTIONS
395
Godbole et al. [84] provide another upper bound for the reliability of a linear (k, k)/(n, n):F system with i.i.d. components:
2 k 2 +k 2 k−1 (n−k+1)2 2 q [(k − k − 1)q + 1] 2(n − k + 1) UB7 = 1 − q k exp . 2 1 − qk (10.36) A version of the upper bound for independent components is also given. In derivation of the upper bounds, they used Janson’s exponential inequalities [111]. The authors also show that this upper bound is better than those provided by Fu and Koutras [81], Koutras et al. [121], and Barbour et al. [19]. Exercise 1. Compare the bounds on the system reliability of a linear and circular (2, 3)/(5, 8):F system with i.i.d. component reliabilities p = 0.1, 0.2, . . . , 0.9, 0.95. Plot the bounds as a function of the i.i.d. component reliabilities. 10.2 SYSTEM LOGIC FUNCTIONS Since it is difficult or impossible to obtain simple algorithms for reliability evaluation of multidimensional Con/k/n systems, some authors focus on derivation of the system’s logical function or structure function. In this section, we examine the logical functions of multidimensional Con/k/n systems. First, we look at the one-dimensional Con/k/n:F system. In a Lin/Con/k/n:F system, every group of consecutive k components forms a minimal cut. The minimal cuts are {1, 2, . . . , k}, {2, 3, . . . , k + 1}, {3, 4, . . . , k + 2}, . . . , {n − k + 1, n − k + 2, . . . , n}. Notation • • • •
xi : logic variable indicating that component i works x i : logic variable indicating that component i fails φi : logic variable indicating that a Lin/Con/k/i:F system with components 1, 2, . . . , i works (1 ≤ i ≤ n) K j : jth minimal cut, K j = { j, j + 1, . . . , j + k − 1}, 1 ≤ j ≤ n − k + 1
The logic or structure function of a Lin/Con/k/n:F system can be expressed either in closed form or in recursive form as follows: φn =
n−k+1 .
.
xi ,
j=1 i∈K j
φn = φn−1 − φn−k−1 xn−k
(10.37) . i∈K n−k+1
xi .
(10.38)
396
MULTIDIMENSIONAL CONSECUTIVE-k-OUT-OF-n SYSTEMS
Equation (10.38) corresponds to the recursive equation (9.36) for the reliability evaluation of the Lin/Con/k/n:F system. Since equation (9.36) is already available, the logical function given in equation (10.38) is not used very much. Now consider a two-dimensional linear (r, s)/(m, n):F system. The system has (m − r + 1)(n − s + 1) minimal cuts. Notation • •
K i, j : minimal cut set that contains the r s components arranged in a rectangle of dimension r × s with component (i, j) as the top-left corner of this rectangle φm,n : logic function of a system [i]
•
φs,n : logic function of the ith rowwise linear connected-(r, s)-out-of-(r, n):F subsystem. This subsystem contains r rows of components starting from row number i, 1 ≤ i ≤ m − r + 1.
•
φr,m : logic function of the jth columnwise linear connected-(r, s)-out-of(m, s):F subsystem. This subsystem contains s columns of components starting from column number j, 1 ≤ j ≤ n − s + 1.
( j)
The logic function of a two-dimensional linear (r, s)/(m, n):F system can then be expressed in the form φm,n =
m−r .+1 i=1
[i] φs,n =
n−s+1 .
( j)
φr,m =
j=1
n−s+1 . m−r .+1 j=1
i=1
.
x u,v . (10.39)
(u,v)∈K (i, j )
Exercise 1. Find the logic or structure function of a linear (2, 3)/(5, 7):F system.
10.3 OPTIMAL SYSTEM DESIGN In this section, we examine the optimal designs of multidimensional consecutive systems. To demonstrate how the systems can be analyzed, we use the two-dimensional case as an example. Suppose that the components all serve the same function and are interchangeable in the system. Without loss of generality, we assume that 0 < p[1] ≤ p[2] ≤ · · · ≤ p[mn] < 1. We also assume that we only have knowledge of the ranks of the mn component reliabilities and not of their actual values. An optimal design is invariant if it depends only on the ordering of the reliabilities but not on their actual values. According to Hwang and Shi [105], the condition for the existence of invariant arrangement is very limited even in the case of redundant consecutive systems, let alone two-dimensional systems.
OPTIMAL SYSTEM DESIGN
397
Theorem 10.2 (Hwang and Shi [105]) For r ≥ 2, the linear and circular (r, s)/(r, n):F systems are not invariant except when s = n − 1 and r = 2. The optimal arrangement of the linear and circular (2, n − 1)/(2, n):F system is given by: 1 4
any arrangement of the best 2n − 4 components
2 3
Zuo [258] studies invariant optimal design of special two-dimensional (r, s)/(m, n):F and G systems with m = r . A G system works if there exists a grid of size r × s that includes all working components. The following results are presented. Theorem 10.3 For a linear (r, s)/(r, n):F or G system, the necessary conditions for its optimal design are as follows: 1. pi1 j1 < pi2 j2 , i 1 , i 2 = 1, 2, . . . , r, 1 ≤ j1 < j2 ≤ min{s, n − s + 1}, pi1 j1 > pi2 j2 , i 1 , i 2 = 1, 2, . . . , r, max{s, n−s+1} ≤ j1 < j2 ≤ n. 2. If s < n < 2s, the r (2s − n) most reliable components should be placed in columns n − s + 1, n − s + 2, . . . , s in any order. According to Theorem 10.3, an optimal design places the r (2s − n) most reliable components in columns n − s + 1 through s in any order for a linear (r, s)/(r, n):G system with s > n/2. The other relatively unreliable components should be allocated to columns 1, 2, . . . , n − s, s + 1, . . . , n. We can treat the remaining columns as a system with s = n − s and n = 2(n − s), which is actually a linear (r, s )/(r, 2s ):G system. The optimal design of a (r, s )/(r, 2s ):G system is part of the optimal design of a (r, s)/(r, n):G system with s > n/2, because the middle 2s − n columns of components in the latter system should be the most reliable ones regardless of arrangement. These arguments are similar to the ones used by Malon [161] for onedimensional Con/k/n:F systems with k > n/2. Based on Theorem 10.3, Zuo [258] considers r = 2 and identifies invariant optimal designs of a linear (2, s)/(2, n):G system, which are given in the following theorem. Theorem 10.4 An invariant optimal design of a linear (2, s)/(2, n):G system with s < n ≤ 2s is 1 2
5 6
9 ··· 10 · · ·
4(n − s) − 3 4(n − s) − 2
any arrangement
4(n − s) − 1 · · · 4(n − s) ···
11 7 3 12 8 4
Theorem 10.5 A linear (r, s)/(r, n):G system with 2 ≤ s < n/2 and r ≥ 2 is not invariant. Neither is a circular (r, s)/(r, n):G system with 2 ≤ s < (n − 1)/2 and r ≥ 2.
398
MULTIDIMENSIONAL CONSECUTIVE-k-OUT-OF-n SYSTEMS
Theorem 10.6 For a circular (2, s)/(2, n):G system with s < n ≤ 2s + 1, an invariant optimal design is 1 2
3 4
7 8
11 12
15 16
··· ···
2n − 1 2n
··· ···
17 18
13 14
9 10
5 6
1 2
For linear (r, s)/(r, n):F systems with no invariant optimal designs, Zuo and Shen [263] develop a heuristic method for variant optimal design. The heuristic used is that a position with a higher B-importance should be assigned a component with larger reliability. Suppose that the available component reliabilities have been arranged in the order p1 < p2 < · · · < pnr . First of all, at the initialization step, each of the nr positions of the system is assigned a reliability value of p1 , the lowest component reliability available. Then, the available components left for assignment have reliabilities p2 , p3 , . . . , pnr . Each of these available components will replace a component reliability value p1 that is already in the system. At each iteration step, the B-importances of the positions with reliability value p1 are calculated and the position with the largest B-importance is assigned a component with the largest reliability value of those components available for assignment. Thus, after the first iteration, pnr replaces one p1 out of the system and the components available for further assignment are p2 , p3 , . . . , pnr −1 . The iteration step is repeated until all the nr components are placed into the system. This method is very simple to use. Other heuristic methods tested include the one used by Zuo and Kuo [260] for nonredundant Con/k/n:F system design and the one that assigns the next more reliable component available to the position with the largest B-importance in the iteration step. From comparisons with real optimal solutions generated from an enumeration algorithm and the solutions generated from the heuristic method by Zuo and Kuo [260] for the regular Con/k/n:F systems, the heuristic method provides consistently good close-to-optimal solutions. Another optimal system design problem is considered by Hwang [105]. Such a design problem is called redundancy design. We will use a telecommunications system as an example to explain this problem. Suppose that communication between point A and point B is required and the receiver at point B is perfect. The distance between these two points is, say, 24 km. Transmitters with a range of 4 km are to be used. We have the budget to purchase and install 24 i.i.d. transmitters. The following are possible designs of this transmission line. 1. The 24 transmitters are arranged from point A to point B evenly with the distance between two adjacent transmitters being 1 km. The system can be modeled as a Lin/Con/4/24:F system or a linear (1, 4)/(1, 24):F system. 2. Two transmitters are connected in parallel to form a relay station. These 12 relay stations are arranged from point A to point B evenly with the distance between two adjacent stations being 2 km. The system can be modeled as a linear (2, 2)/(2, 12):F system.
399
OPTIMAL SYSTEM DESIGN
A
B
1
2
3
4
...
5
23
24
Receiver
(a) Design 1 A
1
B
3
5
23 ...
2
4
Receiver
6
24 (b) Design 2
A
B
1
5
2
6
21 22 ...
Receiver
3
7
23
4
8
24 (c) Design 3
FIGURE 10.2
Possible designs of telecommunications systems.
3. Four transmitters are connected in parallel to form a relay station. These six relay stations are arranged from point A to point B evenly with the distance between two adjacent stations being 4 km. The system can be modeled as a linear (4, 1)/(4, 6):F system. These three designs are illustrated in Figure 10.2. This problem requires the selection of the k value and the redundancy level at each station. Consider a Lin/Con/k/n:F system with a given set of component reliabilities, p1 , p2 , . . . , pn . Instead of building such a system, we could build a twodimensional linear (r, k/r )/(r, n/r ):F system, where r divides both k and n utilizing the same set of components. Hwang [105] proves that the higher the r value, the higher the system reliability. In other words, a linear (r, 1)/(r, n/r ):F system is always the best.
400
MULTIDIMENSIONAL CONSECUTIVE-k-OUT-OF-n SYSTEMS
Exercise 1. Compare all possible designs of a Lin/Con/4/24:F system with i.i.d. component reliability 0.9.
10.4 SUMMARY As seems obvious, multidimensional Con/k/n systems are much more complicated than one-dimensional systems. In this chapter, we have considered reliability evaluation and approximation of two-dimensional systems. It is clear that more efficient algorithms are needed for two- and higher dimensional systems. We also must address the optimal design issues of multidimensional systems.
11 OTHER k-OUT-OF-n AND CONSECUTIVE-k-OUT-OF-n MODELS
The k-out-of-n system structure and the Con/k/n system structure have been extensively covered in Chapters 7 and 9. In this chapter, we provide a thorough coverage of many variations and combinations of the k-out-of-n and the Con/k/n system models. We assume that each component and system may be in only two possible states (working or failed) and that the failures of the components are independent. Special system reliability models will be introduced one by one. The applications of each model will be described first, and then, algorithms for reliability evaluation and optimal system design will be introduced. 11.1 THE s-STAGE k-OUT-OF-n SYSTEMS The k-out-of-n system models discussed in Chapter 7 can be considered to be onestage k-out-of-n systems. This is because the system can be decomposed into a single hierarchical level of n components that has the k-out-of-n structure. In a two-stage kout-of-n system, the system can be decomposed into two hierarchical levels and each level has a k-out-of-n structure. In other words, the system consists of n subsystems and has a k-out-of-n structure while subsystem i consists of n i components and has a ki -out-of-n i structure for 1 ≤ i ≤ n. A general s-stage k-out-of-n system can be decomposed into s hierarchical levels and each level has a k-out-of-n structure. Based on the above definition of the s-stage k-out-of-n system, the parallel–series system structure given in Figure 4.16 can be considered a two-stage k-out-of-n system. The first stage has a 1-out-of-m:G structure while the second stage consists of 1-out-of-n i :F structures for 1 ≤ i ≤ m. Similarly, the series–parallel system structure given in Figure 4.17 can be considered a two-stage k-out-of-n system. The first 401
402
OTHER k-OUT-OF-n AND CONSECUTIVE-k-OUT-OF-n MODELS
stage has a 1-out-of-n:F structure while the second stage consists of 1-out-of-m i :G structures for 1 ≤ i ≤ n. The issues of reliability evaluation and optimal system design of such special systems have been discussed in Chapters 4 and 6, respectively. Since each level consists of the k-out-of-n structures, system reliability evaluation is straightforward for a general s-stage k-out-of-n system. An algorithm for the kout-of-n system structure can be used at each level for reliability evaluation of each subsystem and the system. A more interesting issue for these systems is optimal system design. In a two-stage k-out-of-n system with m subsystems and n i components in subsystem i (1 ≤ i ≤ m), the system reliability does not depend on how the components inside a subsystem are arranged. However, it does depend on how the components are assigned to each subsystem. Thus, the optimal assignment of a total of N components (N = n 1 + n 2 + · · · + n m ) is the optimal partition of the N components into m groups with n i components in group i (1 ≤ i ≤ m). Derman et al. [60] consider the optimal design problem for two-stage k-out-of-n systems. The first stage has a k-out-of-n:G structure, while the second stage consists of series structures. There are n components of type j available and j = 1, 2, . . . , J . A module is a subsystem with a series structure that requires exactly one component of every type. We need to divide the n J components into n groups such that the probability that at least k modules out of n modules will work properly. The solution to this problem, given by Derman et al. [60], is to use the “greedy principle.” In other words, assign the best component of every type to a module, the best component remaining of every type to another module, and so on. Two-stage k-out-of-n systems were initially studied by Hwang [103]. Du and Hwang [69] extend the two-stage k-out-of-n system to the more general s-stage kout-of-n system (s ≥ 2). For s ≥ 2, an s-stage k-out-of-n system is a k-out-of-n system whose ith component is itself an (s − 1)-stage k i -out-of-n i subsystem. The question of interest for these systems is how to allocate components with possibly different reliabilities to the positions of the systems such that system reliability is maximized. Definition 11.1 (Hwang [103]) An ordered partition of a group of numbers is defined to be one in which, for any two subsets Su and Sv , no number in Su exceeds any number in Sv or vice versa. The ordered allocation defined by Prasad et al. [197] is the same as the ordered partition in Hwang [103] and consecutive partition in Chakravarty et al. [43, 44]. We should note that any ordered partition depends only on the ranks of reliabilities of the available components but not on the actual reliability values. For a given two subsets Su and Sv there are two ordered partitions if n u < n v , one assigning the larger numbers to Su and the other to Sv . The former is called short ordered because it assigns the better components to the “shorter” set while the latter is called long ordered. A system S is then called short ordered or long ordered depending on whether there exists a short-ordered partition or long-ordered partition that maximizes the system reliability. Hwang [103] then analyzes the properties of
THE s-STAGE k-OUT-OF-n SYSTEMS
403
two-stage k-out-of-n systems to determine what properties are required for a shortor a long-ordered partition to be optimal. For the optimal design problem of a two-stage k-out-of-n system, we need to find a partition S1 , S2 , . . . , Sn with given cardinalities n 1 , n 2 , . . . , n n , respectively, that maximizes the reliability function or minimizes the unreliability function, expressed in the form F(S) =
n
f (Si ).
(11.1)
i=1
If the function is not in this summation form, we may have to consider a logarithm transformation. The function f is said to be minimum ordered if there always exists an ordered partition Su∗ and Sv∗ on the set of elements Su ∪ Sv such that f (Su∗ )+ f (Sv∗ ) is the minimum, or f (Su∗ ) + f (Sv∗ ) ≤ f (Su ) + f (Sv ),
(11.2)
for all possible partitions Su and Sv . If the ≤ is changed to ≥ in equation (11.2), f is called maximum ordered. Assuming that ki is constant for all subsystems, Hwang [103] finds that if f is minimum ordered, then the short-ordered partition will always minimize the function F, and if f is maximum ordered, then the short-ordered partition will always maximize the function F. Based on these discoveries, the following results are reported for optimal design of special two-stage k-out-of-n systems. Theorem 11.1 (Hwang [103]) A two-stage k-out-of-n system wherein the second stage is a series structure is a short-ordered system. In other words, assigning the more reliable components to the smaller subsystem will always give higher system reliability. An ordered allocation obtained with this principle will maximize system reliability. This theorem agrees with the result by Malon [162] that the greedily assembled subsystems that have a series structure are always optimal. This theorem applied to the parallel–series system results in the same result as reported in Chapter 6. Theorem 11.2 (Hwang [103]) A two-stage k-out-of-n system wherein the first stage is a parallel structure and the second stage consists of k-out-of-ni :F structures is a short-ordered system. In other words, assigning more reliable components to the smaller subsystem will always give higher system reliability. Theorem 11.3 (Hwang [103]) A two-stage k-out-of-n system wherein the first stage is a parallel structure and the second stage consists of k-out-of-n i :G structures is a long-ordered system. In other words, assigning more reliable components to the larger subsystems will always give higher system reliability.
404
OTHER k-OUT-OF-n AND CONSECUTIVE-k-OUT-OF-n MODELS
It should be noted that the optimal allocation principle is different when the second stage is a G system from that when the second stage is an F system even though the first stage is a parallel structure in both cases. One may think that in these two cases the optimal allocation principle should be the same since a k-out-of-n:F system is equivalent to an (n − k + 1)-out-of-n:G system. However, this is not the case because the n i ’s may be different for different subsystems. Theorems 11.2 and 11.3 will give the same results if n i is the same for all 1 ≤ i ≤ m. A series–parallel system can be considered a two-stage k-out-of-n system wherein the first stage is a 1-out-of-m:F system and the second stage consists of n i -out-ofn i :F structures for 1 ≤ i ≤ m. As pointed out by Derman et al. [61], El-Neweihi et al. [72], and Baxter and Harche [25], the optimal arrangement is not an ordered arrangement. The optimal arrangement depends on the values of the reliabilities of the components. The following heuristic is provided by Hwang [103] for the optimal design of two-stage k-out-of-n systems where invariant optimal designs do not exist. Heuristic Algorithm Consider an l-out-of-m:G system wherein the subsystem Si is a k-out-of-n i :G structure (1 ≤ i ≤ m). Without loss of generality, assume k ≤ available n 1 ≤ n 2 ≤ · · · ≤ n m . Note that k is the same for all subsystems. The m n-component reliabilities have been ordered as p1 ≤ p2 ≤ · · · ≤ pn (n = i=1 n i ). Step 1: Assign the k largest reliabilities to S1 , the next k largest to S2 , and so on, until k reliabilities are assigned to Sl . Step 2: Among the remaining n − kl reliabilities, assign the n 1 − k largest to S1 , the next n 2 − k largest to S2 , and so on, until n l − k reliabilities are assigned to Sl . Step 3: Among the remaining n − li=1 n i reliabilities, assign the n l+1 largest to Sl+1 , the next n l+2 largest to Sl+2 , and so on, until Sm is assigned. For an l-out-of-m:G system with m subsystems wherein the subsystem Si is a k-out-of-n i :F structure (1 ≤ i ≤ m), the same heuristic algorithm may be applied except that k should be changed to n i − k + 1 for assignment to Si in step 1 and n i − k changed to k − 1 for assignment to Si in step 2. This heuristic algorithm provides optimal assignments for the cases covered in Theorems 11.1, 11.2, and 11.3. It also yields optimal assignments when there are lk perfect components for an l-out-of-m:G system wherein each subsystem is a k-outof-n i :G structure and when there are l(k − 1) failed components for a l-out-of-m:G system wherein each subsystem is a k-out-of-n i :F structure. Du and Hwang [69] extend the optimal allocation results from a two-stage to an s-stage (s > 2) k-out-of-n system. An s-stage k-out-of-n system is defined as a k (s) out-of-n (s) system consisting of n (s) subsystems wherein subsystem i (s) has an (s − (s−1) (s−1) 1)-stage ki -out-of-n i structure. In this definition, the superscript indicates the stage or the hierarchical level of the system. For example, here is a three-stage
LINEAR AND CIRCULAR m-CONSECUTIVE-k-OUT-OF-n MODEL
405
k-out-of-n system. The highest level is a 2-out-of-3:G structure (k (3) = 2 and n (3) = 3), and the next level is a 3-out-of-5:F structure (ki(2) = 3 and n i(2) = 5 for 1 ≤ (1)
(1)
i ≤ 3), and the lowest level is a 1-out-of-2:F structure (ki j = 1 and n i j = 2 for 1 ≤ i ≤ 3 and 1 ≤ j ≤ 2). If the lowest level subsystems of the s-stage k-out-of-n system have series structures, Du and Hwang [69] prove that the assignment of components to these lowest level series structures must be an ordered allocation in order to maximize system reliability. In the example system given in the previous paragraph, we would pair up the best remaining components and allocate them to the lowest level 1-out-of-2:F structures. How these lowest level systems should be arranged in stage 2 and stage 1 (the highest level) is a different question. It would depend on the structures of those stages.
11.2 REDUNDANT CONSECUTIVE-k-OUT-OF-n SYSTEMS The Con/k/n:F system with component level redundancy is called a redundant Con/k/n:F system [105]. The system has n components and component i is actually a subsystem with ri subcomponents connected in parallel (1 ≤ i ≤ n). Component i fails if and only if all of its ri subcomponents are failed, and the system fails if and only if at least k components are failed. If ri = r for 1 ≤ i ≤ n, such a redundant Con/k/n:F system is equivalent to a two-dimensional (r, k)/(r, n):F system, which has been studied in Chapter 10. The reliability evaluation of a redundant Con/k/n:F system is straightforward. We can treat each component as a parallel structure and express the unreliability of the component as a product of the unreliabilities of all its subcomponents. The reliability of the system can be obtained using the Con/k/n:F model once the reliability of each component is known. A more interesting problem for a redundant Con/k/n:F system is its invariant optimal design. When ri = r for 1 ≤ i ≤ n, Hwang and Shi [105] develop some results for identification of invariant optimal design of such systems. However, because of the assumption that ri = r for 1 ≤ i ≤ n, we are dealing with the equivalent of a two-dimensional (r, k)/(r, n):F system. The optimal design of such two-dimensional systems has been covered in Section 10.3.
11.3 LINEAR AND CIRCULAR m-CONSECUTIVE-k-OUT-OF-n MODEL If an n-component system fails if and only if there exist at least m nonoverlapping runs of exactly k consecutive component failures (1 ≤ m ≤ n/k), then the system is called an m-consecutive-k-out-of-n system [87]. A run of sk consecutive component failures is considered to be s runs of exactly k consecutive component failures (s > 1).
406
OTHER k-OUT-OF-n AND CONSECUTIVE-k-OUT-OF-n MODELS
Notation • •
R(n, m): reliability of an m-consecutive-k-out-of-n:F system Q(n, m): unreliability of an m-consecutive-k-out-of-n:F system, Q(n, m) + R(n, m) = 1
Papastavridis [184] provides recursive equations for reliability evaluation of such systems. He decomposes the event that the n-component system fails into the union of the event that the (n − 1)-component subsystem fails (call this event 1) and the event that the (n − 1)-component subsystem works and the n-component system fails (call this event 2). These two events are disjoint. The probability of event 1 is Q(n − 1, m) by definition. For event 2 to occur, the (n − 1)-component subsystem must have exactly m − 1 nonoverlapping consecutive k-component failures and the n-component system must have exactly m nonoverlapping consecutive k-component failures. Event 2 can be decomposed into the intersection of the following disjoint events for 1 ≤ s ≤ m: the last sk components are all failed, component n − sk works, and the (n − sk − 1)-component subsystem has exactly m − s nonoverlapping consecutive k failures. Therefore, letting pi be the reliability of component i and qi = 1 − pi , the following equation is obtained: Q(n, m) = Q(n − 1, m) +
m s=1
pn−sk
sk .
qn−sk+i
Q(n − sk − 1, m − s)
i=1
− Q(n − sk − 1, m − s + 1) ,
(11.3)
where n ≥ km + 1 and Q(0, 0) = 1. When the components are i.i.d.with p1 = · · · = pn = p and q = 1 − p, equation (11.3) can be simplified to Q(n, m) = Q(n − 1, m) +
m
pq sk Q(n − sk − 1, m − s)
s=1
− Q(n − sk − 1, m − s + 1) .
(11.4)
The computational complexity of equations (11.3) and (11.4) is O(mn). To find a closed-form expression for system unreliability when components are i.i.d., Papastavridis [184] notes the following equation: Q(n, m) =
s∗ n
A(i, n − i + 1, s)q i p n−i+1 ,
(11.5)
s=m i=sk
where A(i, n − i + 1, s) indicates the number of ways that i like balls can be distributed in n − i + 1 unlike cells such that the total number of k-tuples of balls that appear in all cells is equal to s and (n ) . s∗ = k
THE k-WITHIN-CONSECUTIVE-m-OUT-OF-n SYSTEMS
407
To find A(i, n−i +1, s), first allocate the sk-tuples into the n−i +1 cells. This can be
done in s+n−i distinct ways [201]. There are i − sk balls remaining to be allocated. s We need to allocate them into the same n − i + 1 cells such that no more than k − 1 balls may be allocated to the same cell. This can be done in N (i − sk, k, n − sk) distinct ways [refer to equation (9.2) for the definition of N ( j, k, n)]. As a result, the following closed-form expression of system reliability is available: Q(n, m) =
n s∗ s +n−i N (i − sk, k, n − sk)q i p n−i , s s=m i=sk
(11.6)
where N (i − sk, k, n − sk) may be calculated with equation (9.8), (9.9), or (9.15). When the components are arranged in a circle, we have a circular m-consecutivek-out-of-n system. Alevizos et al. [8] provide the following recursive algorithm for unreliability evaluation of the circular system: if n < km, 0 q q · · · q if n = km, 1 2 n n q1 q2 · · · qn + q1 q2 . . . qi−1 pi qi+1 . . . qn if n = km + 1, i=1 (11.7) Q c (n, m) = pn Q(n − 1, m) + qn Q c (n − 1, m) sk−1 m + s=1 i=0 (q1 q2 . . . qi pi+1 )(qn qn−1 . . . qn−sk+i+1 pn−sk+i ) × [Q i+2 (n − sk − 2, m − s) − Q i+2 (n − sk − 2, m − s + 1)]
if n ≥ km + 2,
where Q c (n, m) is the failure probability of a circular m-consecutive-k-out-of-n:F system, Q(a, b) is the failure probability of a linear b-consecutive-k-out-of-a:F system, and Q i+2 (c, d) is the failure probability of a linear d-consecutive-k-out-of-c:F subsystem with components i + 2, i + 3, . . . , n − sk + i − 1. Because the reliability or unreliability of a linear m-consecutive-k-out-of-n:F system can be calculated with equation (11.3), the computational complexity of equation (11.7) is O(nkm 3 ). When the components are i.i.d. and n ≥ km + 2, equation (11.7) becomes Q(n, m) = pQ(n − 1, m) + q Q c (n − 1, m) +
m sk−1
p2 q sk Q(n − sk − 2, m − s)
s=1 i=0
− Q(n − sk − 2, m − s + 1) .
(11.8)
11.4 THE k-WITHIN-CONSECUTIVE-m-OUT-OF-n SYSTEMS Another generalization of the Con/k/n system is the k-within-consecutive-mout-of-n:F system, which consists of n linearly or cyclically ordered components {1, 2, . . . , n}. The system is failed if, among any consecutive m components, there
408
OTHER k-OUT-OF-n AND CONSECUTIVE-k-OUT-OF-n MODELS
are at least k (k ≤ m) failed components. In the case of k = m, the system becomes a Con/k/n:F system. This model has applications in quality control and radar detection. In quality control of a manufacturing process, m items produced consecutively by the process are randomly selected for a quality check. Within this sample of m items, if at least k of them are defective, we conclude that the manufacturing process needs to be adjusted. If such a process produces n components in a certain period of time, we are interested in the probability that such a random quality check is able to detect a problem in the process. In radar detection, we are interested in the probability of the occurrence of k ones in an m-bit sliding window at a particular time step. 11.4.1 Systems with i.i.d. Components The reliability of a linear or circular k-within-consecutive-m-out-of-n:F system with i.i.d. components can be expressed in the general form Ra (k, m, n) =
ra
Na ( j, k, m, n)q j p n− j ,
(11.9)
j=0
where a ∈ {L , C}, L (C) indicates a linear (circular) system, and Na ( j, k, m, n) is the number of ways of arranging a total of n components with j failed ones and n − j working ones in a line (when a = L) or a circle (when a = C) such that less than k components are failed among any consecutive m components. When k = 2, Naus [174] and Sfakianakis et al. [221] provide the following expressions for Na ( j, 2, m, n): n − ( j − 1)(m − 1) if a = L , j Na ( j, 2, m, n) = (11.10) n n − j (m − 1) if a = C. j n − j (m − 1) Together with ? @ n+m−1 m ra = ( ) n m
if a = L , (11.11) if a = C,
we can use equation (11.9) to evaluate the reliabilities of linear and circular 2-withinconsecutive-m-out-of-n:F systems. Proof of Equation (11.10) (Naus [174]) Note that N L ( j, 2, m, n) is the number of ways of arranging a total of n components ( j failed and n − j working) along a line such that no two failed components exist in any consecutive string of m components. Consider an arrangement of j 1’s (failed components), each pair of 1’s being
THE k-WITHIN-CONSECUTIVE-m-OUT-OF-n SYSTEMS
409
separated by exactly m − 1 0’s (working components), as shown below: 1 0 .. . 0! 1 0 .. . 0! 1 0 .. . 0! . . . 0 .. . 0! 1. m−1
m−1
m−1
(11.12)
m−1
This unique arrangement consists of j + ( j − 1)(m − 1) components. Then, N L ( j, 2, m, n) is equal to the number of ways of allocating the remaining n − j − ( j − 1)(m − 1) working components to the string in (11.12), which in turn is the number of ways of permuting these remaining n − j − ( j − 1)(m − 1) working components and the j failed components that are in (11.12). As a result, we have n − ( j − 1)(m − 1) [n − ( j − 1)(m − 1)]! . = N L ( j, 2, m, n) = j j! [n − j − ( j − 1)(m − 1)]! Note that NC ( j, 2, m, n) represents the number of ways of arranging a total of n components (including j failed and n − j working components) in a circle such that no two failed components exist in a consecutive string of m components. First, arbitrarily select a failed component (out of a total of j failed components) and arbitrarily assign it to one of the n possible positions. This component is represented by the first 1 in the following arrangement of j 1’s (or failed components) with each 1 followed by m − 1 0’s (or working components): 1 0 .. . 0! 1 0 .. . 0! 1 0 .. . 0! . . . 1 0 .. . 0! . m−1
m−1
m−1
(11.13)
m−1
This arrangement includes a total of m j components. The remaining n − m j working components should be merged into the arrangement in (11.13) such that no 0’s (or working components) go before the first 1 (the first failed component). The number of ways to do this is equal to the number of ways of permuting j − 1 1’s and n − m j 0’s, which is equal to (n − m j + j − 1)! j n − j (m − 1) = . j ( j − 1)!(n − m j)! n − j (m − 1) Since there are n ways to fix the spot of the selected failed component and the j failed components are identical, using the same arguments as used in the proof of equation (9.24), we have n j n − j (m − 1) NC ( j, 2, m, n) = × j j n − j (m − 1) n n − j (m − 1) = . j n − j (m − 1) When k > 2, n = m + s, and s ≤ m, Sfakianakis et al. [221] provide the following recursive equation for system reliability evaluation of the linear k-within-
410
OTHER k-OUT-OF-n AND CONSECUTIVE-k-OUT-OF-n MODELS
consecutive-m-out-of-n:F system: R L (k, m, n) =
k−1 i=0
m − s i m−s−i q p R L (k − i, s, 2s), i
(11.14)
where R L (i, s, 2s) = 1 if i > s. Proof of Equation (11.14) The system contains less than or equal to 2m components. The m − s components in the middle of the linear system are included in every possible string of m consecutive components. Let i indicate the number of failed components among these m − s components. Then, we must have 0 ≤ i ≤ k − 1 for the system to work. Once the middle m − s components have been considered, the remaining components, namely, 1, 2, . . . , s, m + 1, m + 2, . . . , m + s, form a linear (k − i)-within-consecutive-s-out-of-2s:F system. Applying the law of total probability, we obtain equation (11.14). For a more general k-within-consecutive-m-out-of-n:F system with i.i.d. components, Sfakianakis et al. [221] provide two sets of bounds for its system reliability. One set is based on improved Bonferroni inequalities and the other is based on conditional probabilities. We list the bounds based on conditional probabilities because of their simplicity:
s−1 LBa = 1 − R L (k, m, u) R L (k, m, m + 3) ,
s UBa = 1 − R L (k, m, m + ta − 1) R L (k, m, m + 3) a ,
(11.15) (11.16)
where a ∈ {L , C}, @ ? n + 1, s= m+3 u = n mod (m + 3), (n − m + 1) mod 4, a = L, ta = n mod 4, a = C, ( ) a = L, 14 (n − m + 1) , sa = ( ) 1n , a = C. 4
(11.17) (11.18) (11.19) (11.20)
(11.21)
The lower bound expression in equation (11.15) can be verified with the following arguments. The linear or circular system with n components can be partitioned into s independent linear subsystems. One of these s subsystems has n mod (m + 3) components and all others have m + 3 components each. If the system works, all
THE k-WITHIN-CONSECUTIVE-m-OUT-OF-n SYSTEMS
411
of these independent subsystems work but not necessarily vice versa. The upper bound expression in equation (11.16) can be verified with the following arguments. The system with n components can be divided into sa dependent subsystems each with r + 3 components. Subsystem 1 has components 1, 2, . . . , r + 3, subsystem 2 has components 4, 5, . . . , r , and so on. Neighboring subsystems have at least one component in common. If all of these subsystems work, the system must be working. Thus, an upper bound can be obtained.
11.4.2 Systems with Independent Components When the components are independent, Malinowski and Preuss [152, 155] provide recursive algorithms for the reliability evaluation of linear and circular kwithin-consecutive-m-out-of-n:F systems. The following notation needs to be defined: Notation • • •
• • • • • • • •
xi : binary number, i ∈ {1, 2, . . . , m} x i : 1 − xi X j : random variable indicating the state of the component j. It is 1 when the component i works and 0 otherwise. Pr(X j = 1) = p j , Pr(X j = 0) = q j , p j + q j = 1, j ∈ {1, . . . , n}. El, j : event that the linear k-within-consecutive-m-out-of-( j − l + 1):F subsystem consisting of the components {l, . . . , j} works, j − l + 1 ≥ m m Bk,m : set of all m-element binary vectors (x1 , . . . , x m ) such that i=1 xi > m−k ⊕: two-argument operator defined for s, t ∈ {1, 2, . . . , n}, s ⊕ t ≡ [(s + t − 1) mod n] + 1 v(s, t): number of elements in the set {s, s ⊕ 1, . . . , t}, s, t ∈ {1, 2, . . . , n} E s,t : event that the linear k-within-consecutive-m-out-of-v(s, t):F subsystem consisting of the components {s, s ⊕ 1, . . . , t} works R L (k, m, n): reliability of the linear k-within-consecutive-m-out-of-n:F system RC (k, m, n): reliability of the circular k-within-consecutive-m-out-of-n:F system Lk,l : set of all l-element binary vectors (x1 , x2 , . . . , xl ) such that x 1 + · · · + xl ≥ l − k + 1 x j + · · · + x j+m−1 ≥ m − k + 1
•
if l ≤ m, for each j ∈ {1, 2, . . . , l − m + 1} if l > m.
Ek,n : set of all n-element binary vectors (x1 , x2 , . . . , x n ) such that x j⊕0 +· · ·+ x j⊕(m−1) ≥ m − k + 1 for each j ∈ {1, 2, . . . , n}
412
OTHER k-OUT-OF-n AND CONSECUTIVE-k-OUT-OF-n MODELS
The reliability of the linear k-within-consecutive-m-out-of-n:F system is given by R L (k, m, n) = Pr(E 1,n ) =
Pr(E 1,n | X n−m+1 = x1 , . . . , X n = x m )
(x1 ,... ,xm )∈Bk,m
×
m .
( pn−m+i xi + qn−m+i x i ),
(11.22)
i=1
where the probabilities Pr(E 1,n | X n−m+1 = x1 , . . . , X n = xm ) are computed with the following equations: 0 if x1 + · · · + x m ≤ m − k, Pr(E 1,m | X 1 = x1 , . . . , X m = xm ) = 1 if x1 + · · · + x m > m − k, (11.23) and for j = m, . . . , n − 1, Pr(E 1, j+1 | X j−m+2 = x1 , . . . , X j+1 = x m ) 0 if x1 + · · · + xm ≤ m − k, q j−m+1 Pr(E 1, j | X j−m+1 = 0, X j−m+2 = x1 , . . . , X j = x m−1 ) = + p j−m+1 Pr(E 1, j | X j−m+1 = 1, X j−m+2 = x1 , . . . , X j = xm−1 ) if x 1 + · · · + xm > m − k. (11.24) For the circular k-within-consecutive-m-out-of-n:F system, if n ≥ 2m − 1, we have RC (k, m, n) =
m−1 . i=1
×
( pi xi + qi x i )
m .
pn−m+i yi + qn−m+i y i
i=1
(y1 ,y2 ,... ,ym ,x 1 ,... ,xm−1 )∈Lk,2m−1
Pr(E 1,n−1 | X 1 = x1 , . . . , X m−1 = xm−1 , X n−m+1 = y1 , . . . , X n−1 = ym−1 ),
(11.25)
where the conditional probabilities Pr(E 1,n−1 | X 1 = x1 , . . . , X m−1 = x m−1 , X n−m+1 = y1 , . . . , X n−1 = ym−1 ) are computed recursively from the following equations: Pr(E 1,2m−2 | X 1 = x 1 , . . . , X m−1 = xm−1 , X m = y1 , . . . , X 2m−2 = ym−1 ) 0 if (x 1 , . . . , x m−1 , y1 , . . . , ym−1 ) ∈ Lk,2m−2 , (11.26) = 1 if (x1 , . . . , x m−1 , y1 , . . . , ym−1 ) ∈ Lk,2m−2 ,
413
THE k-WITHIN-CONSECUTIVE-m-OUT-OF-n SYSTEMS
and for j ∈ {2m − 2, . . . , n − 2}, Pr(E 1, j+1 | X 1 = x 1 , . . . , X m−1 = x m−1 , X j−m+3 = y1 , . . . , X j+1 = ym−1 ) 0 if x 1 + · · · + xm−1 ≤ m − k − 1 or y1 + · · · + ym−1 ≤ m − k − 1, p j−m+2 Pr(E 1, j | X 1 = x 1 , . . . , X m−1 = xm−1 , X j−m+2 = 1, X j−m+3 = y1 , . . . , X j = ym−2 ) if x 1 + · · · + xm−1 ≥ m − k and y1 + · · · + ym−1 = m − k, = p j−m+2 Pr(E 1, j | X 1 = x 1 , . . . , X m−1 = xm−1 , X j−m+2 = 1, (11.27) X = y , . . . , X = y ) j−m+3 1 j m−2 + q j−m+2 Pr(E 1, j | X 1 = x1 , . . . , X m−1 = xm−1 , X j−m+2 = 0, X j−m+3 = y1 , . . . , X j = ym−2 ) if x 1 + · · · + xm−1 ≥ m − k and y1 + · · · + ym−1 ≥ m − k + 1. If n < 2m − 1, RC (k, m, n) may be computed with the following equation: RC (k, m, n) =
n .
(x1 ,... ,xn )∈Lk,n
pjxj + qj x j .
(11.28)
j=1
The algorithms for both linear and circular k-within-consecutive-m-out-of-n:F system reliability evaluation are enumerative in nature. For example, the summation in equation (11.22) is over || Bk,m || terms, where || Bk,m || =
m . i
k−1 i=0
Each term of the summation requires m arithmetic operations. To find each conditional probability needed in equation (11.22), we need to perform three arithmetic operations with equation (11.24). The total number of such conditional probabilities to be evaluated is equal to (n − m + 1)|| Bk,m ||. Thus, the total number of arithmetic operations of the algorithm for the linear system is approximately equal to (3n − 2m + 3)|| Bk,m ||. Similarly, the computational complexity of the algorithm for the circular system may be found. This is left as an exercise. Another set of lower and upper bounds is provided by Papastavridis and Koutras [178]. First define the following notation. Notation • •
•
Si : event that component i fails and there are at least k − 1 failures among components i − m + 1, i − m + 2, . . . , i − 1 for i = m, m + 1, . . . , n Sn+i : event that component i fails and there are at least k − 1 failures among components n − m + i + 1, n − m + i + 2, . . . , n, 1, 2, . . . , i − 1 for i = 1, 2, . . . , m − 1 (for the circular system only) Am : event that the k-out-of-m:F system consisting of components 1, 2, . . . , m is good
414 •
• •
OTHER k-OUT-OF-n AND CONSECUTIVE-k-OUT-OF-n MODELS
Ci : event that, for r = min{i − m, m − k + 1}, there is no failure among components (i − m) − r + 1, (i − m) − r + 2, . . . , i − m for i = m + 1, m + 2, . . . , n G i : event that there are at most k − 1 failures among components i − m + 1, i − m + 2, . . . , i − 1 for i = m + 1, m + 2, . . . , n A: complement of any set A
The bounds for linear and circular k-within-consecutive-m-out-of-n:F systems are LB L = Pr(Am )
n . i=m+1
UB L = Pr(Am )
n .
i=m+1
LBC = Pr(Am )
Pr(S i ),
n+m−1 .
Pr(Ci ) 1− Pr(Si ) − qi Pr(G i ) , Pr(G i )
Pr(S i ),
(11.29)
(11.30)
(11.31)
i=m+1
UBC = UB L .
(11.32)
To use these equations, we need to evaluate Pr(Si ) and Pr(G i ) using the algorithms for the reliability evaluation of a k-out-of-n:F system, which is covered in Chapter 7. For the optimal assignment of independent components with possibly different reliabilities in a k-within-consecutive-m-out-of-n:F system, Papastavridis and Sfakianakis [179] provide a set of necessary conditions. Theorem 11.4 (Papastavridis and Sfakianakis [179]) In a linear k-withinconsecutive-m-out-of-n:F system, R(pπ ) ≥ R(pπi,i+1 ) whenever 1. for 1 ≤ i ≤ min{m − 1, n − m}, pπi ≤ pπi+1 , or 2. for max{m, n − m + 1} ≤ i ≤ n − 1, pπi ≥ pπi+1 , where π is a permutation, (π1 , π2 , . . . , πn ), of the integers 1 through n and πi, j is a permutation obtained from permutation π by interchanging the integers at positions i and j. Theorem 11.4 states that the first l components must be arranged in ascending order of their reliabilities and the last l components must be arranged in descending order of their reliabilities to maximize the reliability where l = min{m, n − m + 1}. However, invariant optimal arrangements are not identified because no conditions are specified for optimal arrangements of the other n − 2l components. When the components are i.i.d., a partial ranking of the Birnbaum importance is provided by Papastavridis and Sfakianakis [179], as stated in the following theorem.
THE k-WITHIN-CONSECUTIVE-m-OUT-OF-n SYSTEMS
415
Theorem 11.5 (Papastavridis and Sfakianakis [179]) Consider a linear kwithin-consecutive-m-out-of-n:F system with i.i.d. components and let Ii denote the Birnbaum importance of the component in position i. Then 1. for 1 ≤ i ≤ min{m − 1, n − m}, Ii ≤ Ii+1 , and 2. for max{m, n − m + 1} ≤ i ≤ n − 1, Ii ≥ Ii+1 . Koutras [120] illustrates the application of the MIS approach for reliability evaluation of a k-within-consecutive-m-out-of-n:F system with k = 2. Based on the framework of the MIS approach introduced in Chapter 5, we define the state space to be S = {s0 , s1 , . . . , sm+1 } (N = m + 1). The Markov chain {Yt : t ≥ 0} is defined as follows: 1. Yt = s0 if the last m components of the subsystem with components 1, 2, . . . , t are working; 2. Yt = si if component t − i + 1 is the only failed component among the last m components in the subsystem with components 1, 2, . . . , t (i = 1, 2, . . . , m); and 3. Yt = sm+1 if at least two components are failed among the last m components in the subsystem with components 1, 2, . . . , t. 6 6 The state space S can be partitioned into three groups, namely, S = S0 S1 S2 , where S0 = {s0 }, S1 = {s1 , s2 , . . . , sm }, and S2 = {sm+1 }. State S0 indicates that the last window of m components contains no failed components while state S2 indicates that the last window of m components contains at least k failed components. State set S1 contains all combinations of exactly one failed component in the last window of m components in the subsystem with t components. Among the states of the Markov chain, sm+1 is the absorbing state. The transition probability matrix of the Markov chain {Yt : t ≥ 0} can be expressed as
pt
. . . Λt = p
t
0
qt pt .. .
pt
..
. pt pt
qt 0
qt qt .. . . qt qt 0 1 (m+2)×(m+2)
(11.33)
Applying Theorem 5.2, we obtain the following recursive equations for reliability evaluation of the k-within-consecutive-m-out-of-n:F system with k = 2: a0 (t) = pt (a0 (t − 1) + am (t − 1)),
416
OTHER k-OUT-OF-n AND CONSECUTIVE-k-OUT-OF-n MODELS
a1 (t) = qt (a0 (t − 1) + am (t − 1)), a j (t) = pt a j−1 (t − 1), am+1 (t) = qt
m−1
j = 1, 2, . . . , m,
ai (t − 1) + am+1 (t − 1),
i=1
Rn = 1 − am+1 (n). Koutras [120] also states that the MIS approach can be easily adapted for reliability evaluation of the general k-within-consecutive-m-out-of-n:F system. In this case, k−1 m the number of states will be equal to j=0 j + 1 = N + 1 and the state space S 6 can be partitioned as S = mj=0 S j , where S j , for j = 1, 2, . . . , m − 1, describes the jth deterioration stage and contains all configurations with exactly j failures among the last m components of the subsystem with components 1, 2, . . . , t. State S0 corresponds to no failures among the last m components whereas Sm denotes the absorbing state. The transition probability matrix of the system can be derived in a similar fashion as for the special case k = 2. However, the matrix will be much more complicated. 11.4.3 The k-within-(r, s)/(m, n):F Systems For a k-within-(r, s)/(m, n):F lattice system with i.i.d. components, Malinowski and Preuss [151] and Preuss [199] provide lower and upper bounds for system reliability: UB =
m/r . n/s . i=1
LB =
(k, r, s),
(11.34)
j=1
m−r .+1 n−s+1 . i=1
(k, r, s),
(11.35)
j=1
where
(k, r, s) =
r s r s−l (1 − p)l . p l
k−1 l=0
(11.36)
A k-within-(r, r )/(n, n):F system consists of n 2 components arranged in a square grid. The system fails if and only if there exists a square grid of size r × r within which at least k components are failed. This model is a combination of the k-out-ofn:F and the two-dimensional Con/k/n:F system models. Assuming that the components are i.i.d., Makri and Psillakis [150] provide some lower and upper bounds on the reliability of the (r, r )/(n, n):F system. The improved Bonferroni inequalities are used in the derivations. We provide their results in this section without proof.
THE k-WITHIN-CONSECUTIVE-m-OUT-OF-n SYSTEMS
Theorem 11.6 (Makri and Psillakis [150]) 0 < p < 1, the following hold:
417
For 1 ≤ k ≤ r 2 , 1 < r ≤ n, and
2S2 , N2 2(S1 − S2 /t) UB = 1 − , 1+t @ ? 2S2 , t =1+ S1 LB = 1 − S1 +
(11.37) (11.38) (11.39)
N = n − r + 1,
(11.40)
S1 = h N 2 ,
(11.41)
S2 = S21 + S22 , r r −1 N −r +u S21 = 2 (N − r + w)g(u, w), 1 w=w1 u=1 w1 = max{1, r − N + 1}, N −r +1 M = N (N + 1) 2
ζ (u, v) =
(11.45)
+ ζ (1, N − r )(N − r )(N − r + 1)
r −1 u=1
1 0
N −u , 1
if u ≤ v, otherwise,
r 2 2 r q x pr −x , h= x x=k
(11.43) (11.44)
2
S22 = Mh ,
(11.42)
(11.46)
(11.47)
2
g(u, w) =
m2 m1 m1 m 2 m 3 m 3 x 2r 2 −uw−x , q p x 1 =t1 x 1 x 2 =t2 x 2 x 3 =t3 x 3
m 1 = uw, 2
(11.48)
(11.49) (11.50)
m 2 = r − uw,
(11.51)
m3 = m2,
(11.52)
t1 = max{0, k − m 2 },
(11.53)
t2 = max{0, k − x1 },
(11.54)
t3 = t2 ,
(11.55)
x = x1 + x2 + x3 .
(11.56)
418
OTHER k-OUT-OF-n AND CONSECUTIVE-k-OUT-OF-n MODELS
Because the reliability of any system must be in the range between 0 and 1, LB and UB in Theorem 11.6 are meaningful only if they are in this range. Lin and Zuo [140] provide recursive equations for evaluation of the exact reliability of linear k-within-(r, s)/(m, n):F lattice systems with independent components. Case I: r = m The linear k-within-(m, s)/(m, n):F lattice system consists of mn components that are arranged in m rows and n columns. This is a special case of the general linear k-within-(r, s)/(m, n):F lattice system in which we let r = m. The system fails whenever there is at least one cluster of size m × s such that the number of failed components within this cluster is at least k. The component in the ith row and jth column is called component (i, j). Notation • • • •
•
pi j , qi j : reliability and unreliability of component (i, j) xi j : binary variable, 1 if component (i, j) fails, 0 otherwise g j : s-dimensional nonnegative integer vector (g j−s+1 , . . . , g j ) A(m, j, s, k): event that the linear k-within-(m, s)/(m, j):F lattice subsystem works. This subsystem consists of the components in the first j columns of the original system. B(i, l, u): event that there are u failed components in components (1, l), . . . , (i, l) of the subsystem
• • • • •
"7 " j X (m, j, s, k, g j ): conditional event of A(m, j, s, k) " l= j−s+1 B(m, l, gl )
R(m, j, s, k): probability that event A(m, j, s, k) occurs S(m, j, s, k, g j ): probability that event X (m, j, s, k, g j ) occurs Q(i, l, u): probability that event B(i, l, u) occurs
mks : set of all s-dimensional nonnegative integer vectors such that all elements of each vector are between zero and m and the sum of all elements of each vector is less than k
When n = s, the linear k-within-(m, s)/(m, n):F lattice system becomes a k-outof-ms system and the recursive formula for calculating the reliability of this system is already available [208]. When n > s, we provide the following recursive equations for system reliability evaluation: R(m, n, s, k) =
gn ∈ mks
S(m, n, s, k, gn )
n .
Q(m, l, gl ).
(11.57)
l=n−s+1
The conditional probability S(m, n, s, k, gn ) in equation (11.57) can be computed using the following recursive equation:
419
THE k-WITHIN-CONSECUTIVE-m-OUT-OF-n SYSTEMS
S(m, j, s, k, g j ) =
m
S(m, j − 1, s, k, j−1 )
ε j −s =0
× Q(m, j − s, ε j−s )
if g j ∈ mks
(11.58)
for j = s + 1, . . . , n, where
j−1 = (ε j−s , g j−s+1 , . . . , g j−1 ). It is noted that the boundary conditions for the above recursive equations are 0 if g j ∈ / mks for j = s, s + 1, · · · , n, S(m, j, s, k, g j ) = 1 if gs ∈ mks . (11.59) The probability Q(m, l, gl ) in equation (11.57) and the probability Q(m, j −s, ε j−s ) in equation (11.58) can be calculated with the following recursive equation: Q(i, l, u) = pil Q(i − 1, l, u) + qil Q(i − 1, l, u − 1),
i = 1, 2, . . . , m,
1 ≤ l ≤ j, (11.60)
with boundary conditions 1 Q(i, l, u) = 0
if i = 0, u = 0, if i < u or u < 0.
(11.61)
Proof of Equations (11.57), (11.58), and (11.60) Event A(m, n, s, k) can be decomposed as
n 7 7 1 B(m, l, gl ) . A(m, n, s, k) A(m, n, s, k) = gn ∈ mks
l=n−s+1
The terms in the square brackets for all gn are mutually exclusive, and B(m, l, gl ), l = n − s + 1, . . . , n, are independent. Then from the above decomposition of A(m, n, s, k), it is very straightforward that equation (11.57) holds. If g j ∈ mks , we have the decomposition " " 7 j " X (m, j, s, k, g j ) = A(m, j, s, k) "" B(m, l, gl ) "l= j −s+1 " " j " 7 = A(m, j − 1, s, k) "" B(m, l, gl ) "l= j −s+1
420
OTHER k-OUT-OF-n AND CONSECUTIVE-k-OUT-OF-n MODELS
" " 7 " j −1 = A(m, j − 1, s, k) "" B(m, l, gl ) "l= j −s+1 =
m 1
A(m, j − 1, s, k)
7
ε j −s =0
" j −1 "" 7 B(m, l, gl ) B(m, j − s, ε j −s ) "" "l= j −s+1
for j = s + 1, . . . , n. The terms in the square brackets are mutually exclusive for all ε j−s . From the above decomposition, we immediately obtain equation (11.58). Equation (11.60) can be easily derived by the decomposition of event B(i, l, u): 1 7 7 B(i − 1, l, u) {xil = 1} B(i − 1, l, u − 1) , B(i, l, u) = {xil = 0} noting that the two events in the two square brackets are mutually exclusive. The complexity for computing Q(m, l, gl ) using recursive equation (11.60) is O(m). The cardinality of mks does not exceed (m + 1)s . Thus the computing time of R(m, n, s, k) using equations (11.57), (11.58), and (11.60) is
O (n − s)m(m + 1)s + m s (m + 1)s = O (n − s)m s+1 + m 2s when s is fixed. In other words, the computing time of R(m, n, s, k) is polynomial in m and linear in n. Example 11.1 Consider a linear k-within-(m, s)/(m, n):F lattice system with m = r = 3, k = 4, and the following component reliabilities: pi j = 1 − 0.0015[(i − 1)n + j]
for i = 1, 2, 3, and j = 1, 2, . . . , n.
We compute the system reliability R(3, n, s, 4) using equations (11.57), (11.58), and (11.60) for various n and s. Matlab [109] is used for the computation work. The reliability R(3, n, s, 4) and the computing time for various n and s are listed in Table 11.1.
TABLE 11.1 System Reliability and Computing Time for Linear 4-within-(3, s)/(3, n):F Lattice System s=2 n 5 10 20 50 100
s=3
R(3, n, s, 4) Time (s) 0.999999 0.999968 0.999033 0.922598 0.142103
0.21 0.41 0.83 2.09 4.19
n 5 10 20 50 100
s=5
R(3, n, s, 4) Time (s) 0.999994 0.999783 0.993500 0.651598 0.000661
0.39 0.80 1.67 4.12 8.36
n 5 10 20 50 100
R(3, n, s, 4) Time (s) 0.999977 0.998769 0.965114 0.200798 8 × 10−9
0.05 3.44 7.37 19.54 39.78
421
THE k-WITHIN-CONSECUTIVE-m-OUT-OF-n SYSTEMS
Case II: r < m Notation • • • • • •
• • • • •
• • • • • • •
pi j , qi j : reliability and unreliability of component (i, j) xi j : binary variable, 1 if component (i, j) fails, 0 otherwise xi j : r -dimensional binary vector (xi−r +1, j , . . . , xi, j ) gi j : (i − r + 1)-dimensional nonnegative integer vector (g1 j , . . . , gi−r +1, j ), 0 ≤ gi j ≤ r, i = r, r + 1, . . . , m, j = 1, 2, . . . , n Gm j : sequence of (m − r + 1)-dimensional nonnegative integer vectors (gm, j−s+1 , . . . , gm j ) A (m, j, r, s, k): event that the linear k-within-(r, s)/(m, j):F lattice subsystem works. The subsystem consists of the components in the first j columns of the original system. B (u, i, j): event that there are exactly u failures from component (i, j) to (i + r − 1, j) in the jth column C(xi j ): event that a specific xi j = (xi−r +1, j , . . . , xi j ) occurs 7i−r +1 D(i, j, r, gi j ): event of l=1 B (gl j , l, j) " E(i, j, r, gi j , xi j ): conditional event of D(i, j, r, gi j ) "C(xi j ) X (m, j, r, s, k, Gm j ): conditional event of "7 " j A (m, j, r, s, k) " l= j−s+1 D(m, l, r, gml )
R (m, j, r, s, k): probability that event A (m, j, r, s, k) occurs Q (i, j, r, gi j ): probability that event D(i, j, r, gi j ) occurs Q (i, j, r, gi j , xi j ): probability that event E(i, j, r, gi j , xi j ) occurs S (m, j, r, s, k, Gm j ): probability that event X (m, j, r, s, k, Gm j ) occurs r u : set of all r -dimensional binary vectors such that the sum of its components equals u mr : set of all (m − r + 1)-dimensional nonnegative integer vectors whose elements are all between zero and r
mr sk : set of all sequences of s (m − r + 1)-dimensional nonnegative integer vectors whose elements are all between zero and r and each element in the sum vector of the s vectors is between zero and k − 1
When n = s, it becomes the earlier case I by taking the transpose of the m × n component matrix of the original system. In this case, the system structure becomes a linear k-within-(r, n)/(m, n):F lattice system. When n > s, we have the equation R (m, n, r, s, k) =
Gmn ∈ mrsk
S (m, n, r, s, k, Gmn )
n .
Q (m, l, r, gml ).
l=n−s+1
(11.62)
422
OTHER k-OUT-OF-n AND CONSECUTIVE-k-OUT-OF-n MODELS
The conditional probability S (m, n, r, s, k, Gm j ) in equation (11.62) can be calculated using the recursive equation S (m, j, r, s, k, Gm j ) = S (m, j − 1, r, s, k, Gm, j−1 ) gm, j −s ∈mr
× Q (m, j − s, r, gm, j−s ),
(11.63)
if Gm j ∈ mr sk for j = s + 1, . . . , n with the boundary conditions 0 S (m, j, r, s, k, Gm j ) = 1
if Gm j ∈ / mr sk if Gms ∈ mr sk
for j = s, s + 1, . . . , n, for j = s. (11.64)
The probability Q (m, l, r, gml ) in equation (11.62) and the probability Q (m, j − s, r, gm, j−s ) in equation (11.63) can be computed with the equation
Q (m, j, r, gm j ) =
Q (m, j, r, gm j , xm j )
xm j ∈r,gm−r+1, j m .
×
[ pi j (1 − xi j ) + qi j xi j ]
(11.65)
i=m−r +1
for j = 1, 2, . . . , n. The conditional probability Q (m, j, r, gm j , xm j ) in equation (11.65) can be calculated with Q (i, j, r, gi j , xi j ) =
1
Q (i − 1, j, r, gi−1, j , xi−1, j )
xi−r, j =0
× [ pi−r, j (1 − xi−r, j ) + qi−r, j xi−r, j ]
(11.66)
if xi j ∈ r,gi−r+1, j for i = r + 1, . . . , m, 1 ≤ j ≤ n, with the boundary conditions 0 Q (i, j, r, gi j , xi j ) = 1
if xi j ∈ / r,gi−r+1, j if xr j ∈ r,g1 j
for i = r, r + 1, . . . , m, for i = r.
(11.67)
Proof of Equations (11.62), (11.63), (11.65), and (11.66) We decompose event A (m, n, r, s, k) as
n 7 1 7 A (m, n, r, s, k) A (m, n, r, s, k) = D(m, l, r, gml ) . Gmn ∈ mrsk
l=n−s+1
Noting that the terms in the square brackets for all Gmn are mutually exclusive and D(m, l, r, gml ), l = n − s + 1, . . . , n are independent, we get formula (11.62).
THE k-WITHIN-CONSECUTIVE-m-OUT-OF-n SYSTEMS
423
If Gmn ∈ mr sk , we have " " 7 j " A (m, j, r, s, k) "" D(m, l, r, gml ) "l= j−s+1 " " 7 j " A (m, j − 1, r, s, k) "" D(m, l, r, gml ) "l= j−s+1 " " j−1 " 7 D(m, l, r, gml ) A (m, j − 1, r, s, k) "" "l= j−s+1 1 7 A (m, j − 1, r, s, k) D(m, j − s, r, gm, j−s )
X (m, j, r, s, k, Gm j ) =
=
= =
gm, j −s ∈mr
" " 7 " j−1 " D(m, l, r, gml ) " "l= j−s+1
for j = s + 1, . . . , n. The events in the square brackets are mutually exclusive for all j−1 gm, j−s given that event ∩l= j−s+1 D(m, l, r, gml ) occurs. From this decomposition, we derive equation (11.63). Similarly, equations (11.65) and (11.66) can be derived by the following event decompositions respectively: D(m, j, r, gm j ) =
1
2
D(m, j, r, gm j )
7
C(xm j )
3
xm j ∈r,gm−r+1, j
for j = 1, 2, . . . , n, and " E(i, j, r, gi j , xi j ) = D(i, j, r, gi j ) "C(xi j )
" = D(i − 1, j, r, gi−1, j ) "C(xi j ) " = D(i − 1, j, r, gi−1, j ) "C (xi−r +1 , . . . , xi−1 ) =
1 1 xi−r, j =0
D(i − 1, j, r, gi−1, j )
7
" C (xi−r, j ) "C (xi−r +1 , . . . , xi−1 )
if xi j ∈ r,gi−r+1, j for i = r + 1, . . . , m, 1 ≤ j ≤ n, where C (xi−r +1 , . . . , xi−1 ) represents the event that a specific (x i−r +1 , . . . , xi−1 ) occurs, and C (xi−r, j ) represents the event that a specific (xi−r, j ) occurs. The complexity for computing Q (m, j, r, gm j ) is O(2r m). The cardinality of
mr sk does not exceed (r +1)s(m−r +1) . Hence the computing time of R (m, n, r, s, k)
424
OTHER k-OUT-OF-n AND CONSECUTIVE-k-OUT-OF-n MODELS
TABLE 11.2 System Reliability and Computing Time for Linear 3-within-(2, s)/(3, n):F Lattice System s=2
s=3
n
R (3, n, 2, s, 3)
Time (s)
5 10 20 30 50
0.999933 0.998896 0.983139 0.921616 0.568857
18.32 49.49 100.03 159.16 268.25
n
R (3, n, 2, s, 3)
Time (s)
5 10 20 30 50
0.999789 0.996240 0.995192 0.774805 0.207783
41.20 124.79 296.33 457.46 788.97
using formulas (11.62), (11.63), (11.65), and (11.66) would be equal to O (r + 1)s(m−r +1) [(n − s)2r m + (2r m)s ] = O (r + 1)r s(m−r +1) [(n − s)m + 2r (s−1) m s ] when r and s are fixed. That is, the complexity of the algorithm for computing the system reliability is exponential in m and linear in n. Example 11.2 Consider a linear k-within-(r, s)/(m, n):F lattice system with m = 3, r = 2, k = 3 and the following component reliabilities: pi j = 1 − 0.0015[(i − 1)n + j]
for i = 1, 2, 3, j = 1, 2, . . . , n.
We compute the system reliability R (3, n, 2, s, 3) using formulas (11.62), (11.63), (11.65), and (11.66) for various n and s. Matlab is used for the computation work. The reliability R (3, n, 2, s, 3) and the computing time for various n and s are listed in Table 11.2.
11.5 SERIES CONSECUTIVE-k-OUT-OF-n SYSTEMS Shen and Zuo [228] consider the Con/k/n subsystems that are connected in series to form the system of interest. System reliability evaluation is simple when components are independent. The reliability of the system is equal to the product of the reliabilities of the subsystems. The more interesting issue for such systems is the optimal arrangement of components. A series Con/2/n:F system can be modeled from a telecommunications system. Assume that we have a telecommunications system with 20 relay stations. Station 9 and station 10 are far enough apart from each other that the only signal that can reach station 10 has to be from station 9. However, the distances between other neighboring stations are not as far from each other and each station can transmit signals to the following two stations. Then, we can say that we have a Lin/Con/2/9:F subsystem
SERIES CONSECUTIVE-k-OUT-OF-n SYSTEMS
425
Camera 1 5 2
Reaction Chamber 6 A
B
3 7 4
FIGURE 11.1
Camera monitoring system.
with stations 1–9 and a Lin/Con/2/11:F subsystem with stations 10–20 connected in series. Whenever at least two consecutive stations fail in a subsystem, the subsystem is failed and, as a result, the whole system is failed. Consider a problem of monitoring the reactions in a reactor. To obtain threedimensional snapshots of the reactions in a certain area, say A, in the reactor, four high-speed cameras are mounted at slightly different angles focusing on the area of interest (see Figure 11.1). The four cameras are labeled 1, 2, 3, and 4. At least two adjacent cameras have to work properly to obtain the required images of the reactions in the area. Then, the four cameras form a Lin/Con/2/4:G system. For a second area of interest, point B in the figure, three cameras (labeled 5, 6, and 7) are available to monitor this area. If we also need at least two adjacent cameras to work in order to obtain an acceptable image of the reactions in area B, we have a Lin/Con/2/3:G system. If we need high-quality snapshots of both areas to enable us to analyze the reactions in the reactor, we can say that the Lin/Con/2/4:G subsystem with cameras 1, 2, 3, and 4 and the Lin/Con/2/3:G subsystem with cameras 5, 6, and 7 are connected in series. If we have seven cameras with different reliabilities available, how should we arrange these cameras to achieve the highest system reliability? This question can be addressed by the optimal designs of series Con/k/n:G systems. The camera system described above may also be used in analysis of fluid dynamics in pipeline systems and flame propagation in engines and fires. With the above problems in mind, we will address the problem of invariant optimal design of series Con/k/n:G systems. Assuming that the reliabilities of available components are distinct in the range of (0, 1), Shen and Zuo [228] provide some useful results for optimal arrangements of components in series Con/k/n systems. If the component reliabilities are not necessarily distinct, some strict inequalities to be presented will become nonstrict and
426
OTHER k-OUT-OF-n AND CONSECUTIVE-k-OUT-OF-n MODELS
consequently the optimal design to be presented is unique up to equivalent components (components with the same reliability). The reverse sequence of a system or subsystem configuration is equivalent to the original one since they have the same system reliability. A series Con/k/n:G system is defined to be a system consisting of Lin/Con/k/ n:G subsystems connected in series. It works if and only if all the subsystems work. As discussed in Chapter 9, a single Lin/Con/k/n:G system has an invariant optimal design when k < n ≤ 2k (the special case when n = k is omitted). For a series Con/k/n:G system to have an invariant optimal design, all its Con/k/n:G subsystems must have invariant optimal designs. Thus, the optimal design of a series Con/k/n system is obtained once the available components are optimally partitioned among the subsystems. We will consider a system with a Lin/Con/k/n 1 :G subsystem and a Lin/Con/ k/n 2 :G subsystem connected in series. For such a system to have an invariant optimal design, we must have n 1 ≤ 2k and n 2 ≤ 2k. As discussed in Chapter 9 for a Lin/Con/k/n:G system, when n < 2k, the middle 2k − n components should be allocated the highest reliabilities in any order. Thus, the middle 2k − n 1 components in the first subsystem and the middle 2k − n 2 components in the second subsystem should be allocated the highest reliabilities in any order. As a result, we can concentrate on the optimal allocation of the reliabilities to a system with two Lin/Con/k/2k:G subsystems connected in series. Notation • • • • • • •
pi or qi : reliability of component i. They are also used to indicate component i. ∩: used to indicate that two subsystems are connected in series X : arrangement of n component reliabilities p1 , p2 , . . . , pn . For example, X = ( pi1 , pi2 , . . . , pin ), where (i1 , i2 , . . . , i n ) is a permutation of 1, 2, . . . , n. Y : arrangement of n component reliabilities q1 , q2 , . . . , qn . For example, Y = (qi1 , qi2 , . . . , qin ), where (i 1 , i 2 , . . . , i n ) is a permutation of 1, 2, . . . , n. X G (k, n): Lin/Con/k/n:G system with arrangement X X G (k1 , n 1 ) ∩ YG (k2 , n 2 ): system with X G (k1 , n 1 ) and YG (k2 , n 2 ) connected in series R(X ) or R(Y ): system reliability of arrangement X or Y
Theorem 11.7 (Shen and Zuo [228]) Let X = ( p1 , p2 , . . . , p2k ) and Y = (q1 , q2 , . . . , q2k ). The necessary conditions for the optimal design of system X G (k, 2k) ∩ YG (k, 2k) are Both X and Y are optimal, pi < qi
for 1 ≤ i ≤ k,
pi > qi
for k + 1 ≤ i ≤ 2k.
(11.68) (11.69)
SERIES CONSECUTIVE-k-OUT-OF-n SYSTEMS
427
Theorem 11.7 can be proved using mathematical induction on k. One needs to prove that if the conditions (11.68) and (11.69) are not satisfied, the reliability of the system can be improved. According to Theorem 11.7, the optimal design of a system with two Con/k/2k:G subsystems connected in series must have the reliability of the component at position i in the first subsystem be less reliable (more reliable) than the component at position i in the second subsystem for 1 ≤ i ≤ k (for k + 1 ≤ i ≤ 2k). Shen and Zuo [228] then argue that there is only one configuration satisfying the conditions in Theorem 11.7. Thus, Theorem 11.7 can be used to identify invariant optimal designs of two Lin/Con/k/2k:G subsystems connected in series. Based on Theorems 11.7, we can use the following procedure to identify the invariant optimal partition of components for system X G (k, 2k) ∩ YG (k, 2k): 1. Assign the best of the remaining components to X and the next best to Y . 2. Assign the best of the remaining components to Y and the next best to X . 3. Repeat steps 1 and 2 in sequence until all components are assigned. Theorem 11.8 (Shen and Zuo [228]) The optimal partition of components for X G (k, 2k) ∩ YG ((k − 1), 2(k − 1)) is as follows: 1. 2. 3. 4.
Assign the best component and the worst component to X . Assign the best of the remaining components to X and the next best to Y . Assign the best of the remaining components to Y and the next best to X . Repeat steps 2 and 3 in sequence until all components are assigned.
Theorem 11.9 (Shen and Zuo [228]) Assume that we have a series system X M that consists of m consecutive-k-out-of-2k:G subsystems: 1 2 m (k, 2k) ∩ X G (k, 2k) ∩ · · · ∩ X G (k, 2k). X M = XG
If we have 2mk components with their reliabilities arranged in ascending order, p[1] < p[2] < · · · < p[2mk−1] < p[2mk] , then the optimal partition of components for X M can be obtained as follows: 1. Set i = 1. 2. Assign p[2(i−1)m+1] , p[2(i−1)m+2] , . . . , p[2(i−1)m+m] to X 1 , X 2 , . . . , X m , respectively. 3. Assign p[(2i−1)m+1] , p[(2i−1)m+2] , . . . , p[2im] to X m , X m−1 , . . . , X 1 , respectively. 4. Increment i and repeat steps 2 and 3 until all available components are assigned. Theorem 11.10 (Shen and Zuo [228]) Assume that there is a series Con/k/n:G system with l Con/k/2k : G subsystems and m Con/(k − 1)/2(k − 1):G subsystems.
428
OTHER k-OUT-OF-n AND CONSECUTIVE-k-OUT-OF-n MODELS
1
2
5 6
3 FIGURE 11.2
4
7
Network diagram representing camera system.
The optimal partition of components for the system can be obtained as follows: 1. Add m perfect components and m failed components to the set of components available. 2. Do optimal design for the series Con/k/n:G system with (m +l) Con/k/2k:G subsystems following Theorem 11.9. Example 11.3 For the camera system described earlier, we have a Lin/Con/2/4:G subsystem connected in series with a Lin/Con/2/3:G subsystem. The Lin/Con/2/ 3:G subsystem can be further decomposed into a series subsystem with camera 6 and a Lin/Con/1/2:G subsystem (i.e., a parallel subsystem) with cameras 5 and 7. The system of cameras can be represented by the network diagram in Figure 11.2. If we have seven cameras of different reliabilities available, that is, p[1] < p[2] < p[3] < p[4] < p[5] < p[6] < p[7] , the optimal design of the system should assign p[7] to camera 6. According to Theorem 11.8, of the six cameras left, p[6] , p[1] , p[5] , and p[2] should be assigned to the Lin/Con/2/4:G subsystem and p[3] and p[4] should be assigned to the Lin/Con/1/2:G subsystem. An optimal assignment of p[6] , p[1] , p[5] , and p[2] in the Lin/Con/2/4:G subsystem is p1 = p[1] , p2 = p[5] , p3 = p[6] , and p4 = p[2] according to Kuo et al. [134]. The reliabilities p[3] and p[4] can be arbitrarily assigned to p5 and p7 . We will let p5 = p[4] and p7 = p[3] . Thus, an optimal design of the system of cameras is p6 > p3 > p2 > p5 > p7 > p4 > p1 , where pi is the reliability of the camera labeled i in Figure 11.1 (1 ≤ i ≤ 7). Example 11.4 Consider another system with two Lin/Con/4/8:G subsystems connected in series. Assume that there are 16 components with different reliabilities, that is, p[1] < p[2] < · · · < p[16] . Then, according to Theorems 11.1 and 11.2, an optimal partition of the 16 components into two groups is
COMBINED k-OUT-OF-n:F AND CONSECUTIVE-kc -OUT-OF-n:F SYSTEM
429
p[16] , p[13] , p[12] , p[9] , p[8] , p[5] , p[4] , p[1] for subsystem 1 and p[15] , p[14] , p[11] , p[10] , p[7] , p[6] , p[3] , p[2] for subsystem 2. Using the optimal design results for Lin/Con/k/n:G systems presented in Chapter 9, we have the following optimal arrangement of components in the two subsystems: ( p[1] , p[5] , p[9] , p[13] , p[16] , p[12] , p[8] , p[4] ) for subsystem 1 and ( p[2] , p[6] , p[10] , p[14] , p[15] , p[11] , p[7] , p[3] ) for subsystem 2. Chang and Hwang [46] provide additional results on the optimal design of series Con/k/n:G systems. In their series Con/k/n:G system model, the k value in each Con/k/n:G subsystem is allowed to be different. Consider a system with two subsystems connected in series. Subsystem 1 has a Lin/Con/k1 /n 1 :G structure while subsystem 2 has a Lin/Con/k2 /n 2 :G structure, where k1 may not be equal to k2 . A general series Con/k/n:G system consists of m subsystems connected in series. Subsystem i has a Lin/Con/ki /n i :G structure for i = 1, 2, . . . , m. Since a Lin/Con/k/n:G system does not have invariant optimal design when n > 2k (see Chapter 9), the series Con/k/n:G system has no invariant optimal design when n i > 2ki for at least one i, where 1 ≤ i ≤ m. They then focus on the issue of invariant optimal design for such series Con/k/n:G systems when ki ≤ n i ≤ 2ki for all 1 ≤ i ≤ m. Theorem 11.11 (Chang and Hwang [46]) A series Con/k/n:G system has invariant optimal designs if and only if | ki − k j | ≤ 1 and ki ≤ n i ≤ 2ki for all i and j such that 1 ≤ i, j ≤ m. Since the failure of any subsystem causes the series Con/k/n:G system to fail, the system to work. middle 2k i − n i components in every subsystem must work for the m As a result, in optimal system design, these components [there are i=1 (2ki − n i ) of them] should be assigned the highest component reliabilities in any order. The remaining components should then be assigned to the series system that consists of Lin/Con/ki /2ki :G subsystems, where ki = n i − ki and 1 ≤ i ≤ m. As discussed earlier in this section, when ki is constant and m = 2, invariant optimal designs have been identified. When k2 = k1 + 1 and m = 2, Chang and Hwang [46] state that the best and the worst components should be assigned to the subsystem with 2k2 components. However, the complete invariant optimal design is not identified in their paper. 11.6 COMBINED k-OUT-OF-n:F AND CONSECUTIVE-kc -OUT-OF-n:F SYSTEM Zuo et al. [262] propose a few system reliability models that are obtained by combining the k-out-of-n model and the Con/k/n model. These combination models are applied to the reliability evaluation and the remaining life estimation of hydrogen furnaces in a petrochemical company. In this section, we introduce the com-
430
OTHER k-OUT-OF-n AND CONSECUTIVE-k-OUT-OF-n MODELS
bined k-out-of-n:F and Con/k/n:F model. We will use kc to indicate the minimum number of consecutive component failures that would cause system failure in the Con/k/n:F model and k to indicate the minimum number of component failures that would cause system failure in the k-out-of-n:F model. The combined k-out-of-n:F and Con/kc /n:F system has n components arranged sequentially. It is failed whenever at least k components are failed or at least kc consecutive components are failed. Notation • • • • •
A(i, j, kc ): event that a combined j-out-of-i:F and Con/kc /i:F subsystem fails. The subsystem consists of components 1, 2, . . . , i, i ≥ j ≥ 0, i ≥ kc . A(i, j, kc ): complement of event A(i, j, kc ) B(i): event that the ith component fails B(i): complement of event B(i) Q(i, j, kc ): probability that event A(i, j, kc ) occurs
The proposed system model is the same as a k-out-of-n:F model if k c ≥ k and the same as a Con/kc /n:F if k ≥ n. In the following, we derive a recursive formula for computing Q(n, k, k c ), that is, the failure probability of the system. Event A(i, j, kc ) can be decomposed into three disjoint subevents: 1 7 7 A(i, j, kc ) = A(i − 1, j, kc ) B(i) A(i − 1, j − 1, kc ) B(i)
i 1 7 7 A(i − kc − 1, j − kc , kc ) B(i − kc ) B(l) . l=i−kc +1
(11.70) Based on the decomposition, we obtain the recursive equation Q(i, j, kc ) = pi Q(i − 1, j, kc ) + qi Q(i − 1, j − 1, kc )
+ 1 − Q(i − kc − 1, j − kc , kc ) pi−kc
i .
ql
(11.71)
l=i−kc +1
with the boundary conditions 0 Q(i, j, kc ) = 1
if i < min{ j, kc }, if j = 0,
p0 ≡ 1. The complexity for calculating Q(n, k, kc ) using equation (11.71) is O(nk).
(11.72) (11.73)
431
COMBINED k-OUT-OF-n:F AND CONSECUTIVE-kc -OUT-OF-n:F SYSTEM
Example 11.5 Consider a system with n = 8, k = 5, kc = 3, and the following component reliability data: i pi qi
1 0.8125 0.1875
2 0.8250 0.1750
3 0.8375 0.1625
4 0.8500 0.1500
5 0.8625 0.1375
6 0.8750 0.1250
7 0.8875 0.1125
8 0.9000 0.1000
Applying equation (11.71) and its boundary conditions, we have the following results: Q(1, 1) = q1 = 0.1875, Q(2, 1) = p2 Q(1, 1) + q2 = 0.3297, Q(2, 2) = q2 Q(1, 1) = 0.03281, Q(3, 1) = p3 Q(2, 1) + q3 = 0.4386, Q(3, 2) = p3 Q(2, 2) + q3 Q(2, 1) = 0.0811, Q(3, 3) = q3 Q(2, 2) = 0.0053, Q(3, 4) = q1 q2 q3 = 0.0053, Q(3, 5) = q1 q2 q3 = 0.0053, Q(4, 1) = p4 Q(3, 1) + q4 = 0.5228, Q(4, 2) = p4 Q(3, 2) + q4 Q(3, 1) = 0.1347, Q(4, 3) = p4 Q(3, 3) + q4 Q(3, 2) = 0.01669, Q(4, 4) = p4 Q(3, 4) + q4 Q(3, 3) + p1 q2 q3 q4 = 0.0088, Q(4, 5) = p4 Q(3, 5) + q4 Q(3, 4) + p1 q2 q3 q4 = 0.0088, Q(5, 2) = p5 Q(4, 2) + q5 Q(4, 1) = 0.1881, Q(5, 3) = p5 Q(4, 3) + q5 Q(4, 2) = 0.0329,
Q(5, 4) = p5 Q(4, 4) + q5 Q(4, 3) + 1 − Q(1, 1) p2 q3 q4 q5 = 0.0121, Q(5, 5) = p5 Q(4, 5) + q5 Q(4, 4) + p2 q3 q4 q5 = 0.0116, Q(6, 3) = p6 Q(5, 3) + q6 Q(5, 2) = 0.0523,
Q(6, 4) = p6 Q(5, 4) + q6 Q(5, 3) + 1 − Q(2, 1) p3 q4 q5 q6
Q(6, 5) = p6 Q(5, 5) + q6 Q(5, 4) + 1 − Q(2, 2) p3 q4 q5 q6
Q(7, 4) = p7 Q(6, 4) + q7 Q(6, 3) + 1 − Q(3, 1) p4 q5 q6 q7
Q(7, 5) = p7 Q(6, 5) + q7 Q(6, 4) + 1 − Q(3, 2) p4 q5 q6 q7
Q(8, 5) = p8 Q(7, 5) + q8 Q(7, 4) + 1 − Q(4, 2) p5 q6 q7 q8 As a result, we have Q s = 0.0171 and Rs = 0.9829.
= 0.0162, = 0.0137, = 0.0212, = 0.0155, = 0.0171.
432
OTHER k-OUT-OF-n AND CONSECUTIVE-k-OUT-OF-n MODELS
11.7 COMBINED k-OUT-OF-mn:F AND LINEAR (r, s)/(m, n):F SYSTEM The combined k-out-of-mn:F and linear (r, s)/(m, n):F system has mn components arranged in m rows with n components in each row (m ≥ 1, n ≥ 1). The system is failed whenever at least one of the following situations occurs: 1. At least k components are failed in the system. 2. There exists at least one cluster (size r × s) of components that are all failed, where 1 ≤ r s < k ≤ mn. This system is a combination of the k-out-of-n:F system and the two-dimensional Con/k c /n:F system proposed and studied by Zuo et al. [262]. Since the components in the system are arranged in a rectangular format, we will use (i, j) to indicate the position of the jth component in row i. Let pi j and qi j be the probabilities that the component in row i and column j is working and failed, respectively. Case I: r = m A simple algorithm for evaluating the unreliability of the combined k-out-of-n:F and linear (r, s)/(m, n):F system with r = m is provided below. Notation •
• • • •
A(m, j, u, s): event that the combined u-out-of-m j:F and linear connected(m, s)-out-of-(m, j):F subsystem fails. This subsystem consists of the components in the first j columns of the original system, 0 ≤ u ≤ k, 0 ≤ j ≤ n. A(m, j, u, s): complement of event A(m, j, u, s) B j (i, l): event that there are exactly l failures from component (1, j) to (i, j) in the jth column Q(m, j, u, s): probability that event A(m, j, u, s) occurs Q e (i, l, j): probability that event B j (i, l) occurs
Event A(m, j, u, s) can be decomposed into three disjoint subevents: A(m, j, u, s) =
m−1 1
A(m, j − s − 1, u − ms − l, s)
l=0 j 7
Bi (m, m)
1
7
B j−s (m, l)
A(m, j − 1, u − m, s)
7
B j (m, m)
i= j−s+1
1
m−1 1
A(m, j − 1, u − l, s)
7
B j (m, l)
l=0
Based on the decomposition, we obtain the recursive equation
.
(11.74)
COMBINED k-OUT-OF-mn:F AND LINEAR (r, s)/(m, n):F SYSTEM
Q(m, j, u, s) =
m−1
433
1 − Q(m, j − s − 1, u − ms − l, s)
l=0
× Q e (m, l; j − s)
j .
m .
qit
t= j−s+1 i=1
+
m
Q(m, j − 1, u − l, s)Q e (m, l, j),
(11.75)
l=0
where Q e (m, l, j) can be calculated using the recursive equation Q e (i, l, j) = pi j Q e (i − 1, l, j) + qi j Q e (i − 1, l − 1, j). The following boundary conditions are for equations (11.75) and (11.76): 0 if j < s, m j < u, Q(m, j, u, s) = 1 if u = 0, 1 if i = 0, l = 0, Q e (i, l, j) = 0 if i < l or l < 0.
(11.76)
(11.77)
(11.78)
The unreliability of the system is Q(m, n, k, s). The complexity for calculating Q(m, n, k, s) using equations (11.75) and (11.76) is O(nm 3 ). When r = m = 1, the model discussed in this section becomes the combined k-out-of-n:F and Con/s/n:F system model discussed in Section 11.6. Example 11.6 Let m = r = 3, n = 5, k = 7, s = 2, and pi j = 0.8 + 0.0125[(i − 1)n + j] for i = 1, 2, 3 and j = 1, 2, 3, 4, 5. Using equations (11.75) and (11.76) recursively, we find that Q(3, 5, 7, 2) = 0.0002 and Rs = 0.9998. Case II: r < m For the more general case when r < m, Zuo et al. [262] provide an algorithm for evaluating the reliability of the system by generalizing the idea of Yamamoto and Miyakawa [251] for the linear (r, s)/(m, n):F system. Notation •
• • • •
A(m, j, u, r, s): event that the combined u-out-of-m j:F and linear connected(r, s)-out-of-(m, j):F subsystem fails. This subsystem consists of the components in the first j columns of the original system, 0 ≤ j ≤ n, 0 ≤ u ≤ k. A(m, j, u, r, s): complement of event A(m, j, u, r, s) xi j : binary variable, 1 if component (i, j) fails, 0 otherwise x j : binary vector, x j = (x 1 j , x2 j , . . . , x m j ) g: (m − r + 1)-dimensional vector, g = (g1 , g2 , . . . , gm−r +1 ), where 0 ≤ gi ≤ s
434 • •
OTHER k-OUT-OF-n AND CONSECUTIVE-k-OUT-OF-n MODELS
E j : event that a specific x j occurs δ(x j ): (m − r + 1)-dimensional binary vector, δ(x j ) = (δ1 (x j ), . . . , δm−r +1 (x j )),
•
• • • •
.i+r −1 where δi (x j ) = xl j for i = 1, . . . , m − r + 1 l=i B(u, i, j): event that the rectangle of size r × u whose upper-right corner is (i, j) consists of all failed components for u = 1, 2, . . . , s; B(u, i, j) is a null event if j < u B(u, i, j): complement of event B(u, i, j) 7 7m−r +1 X (m, j, u, r, s; g): event of A(m, j, u, r, s) B(g , i, j) i i=1 R(m, j, u, r, s): probability that event A(m, j, u, r, s) occurs R(m, j, u, r, s; g): probability that event X (m, j, u, r, s; g) occurs
The following event decomposition is obtained first: X (m, j, u, r, s; g) =
1
Ej
7
X m, j − 1, u −
all x j
m
xi j , r, s;
,
i=1
where
= (ε1 , ε2 , . . . , εm−r +1 ), gi − 1 if δi (x j ) = 1, εi = r otherwise. Thus, the following recursive formula can be obtained:
m m . R(m, j, u, r, s; g) = R m, j − 1, u − xi j , r, s; ( pi j )1−xi j (qi j )xi j . i=1 i=1 all x j (11.79) Noting that R(m, j, u, r, s) = R(m, j, u, r, s; s1) where 1 is an (m − r + 1)dimensional vector whose components are all 1, we can use recursive equation (11.79) to evaluate the reliability Rs = R(m, n, k, r, s) of the combined k-out-of-n:F and linear (r, s)/(m, n):F system. The complexity for calculation of R(m, n, k, r, s) is O(kn2m (s + 1)m−r ). The Yamamoto–Miyakawa algorithm can be considered a special case of this algorithm with k > n. Example 11.7 Let m = r = 3, n = 5, k = 7, s = 2, and pi j as given in Example 11.6. Then R(3, 5, 7, 2, 2) = 0.9987.
COMBINED k-OUT-OF-mn:F MODEL
435
11.8 COMBINED k-OUT-OF-mn:F, ONE-DIMENSIONAL Con/kc /n:F, AND TWO-DIMENSIONAL LINEAR (r, s)/(m, n):F MODEL An even more general model was proposed by Zuo et al. [262]. The system has mn components arranged in a rectangular format. The system is failed whenever at least one of the following scenarios occur: 1. There exists at least one grid of size r ×s that consists of all failed components. 2. At least one row has at least kc consecutive components that are failed, where s < kc ≤ n. 3. There are at least k total failures, where k > kc and r s < k < n. Some notation defined in Section 11.7 will also be used in this section. Notation •
• • • • •
A(m, j, u, kc , r, s): event that the combined u-out-of-m j:F, Con/kc /j:F, and Lin/Con/(r, s)/(m, j):F subsystem fails. This subsystem consists of all components in the first j columns of the original system, 0 ≤ j ≤ n, 0 ≤ u ≤ k. A(m, j, u, kc , r, s): complement of event A(m, j, u, kc , r, s) h: m-dimensional vector, h = (h 1 , h 2 , . . . , h m ), where 0 ≤ h i ≤ kc C(v, i, j): event that all components (i, j − v + 1), (i, j − v + 2), . . . , (i, j) fail for v = 1, 2, . . . , kc ; C(v, i, j) becomes a null event if j < v C(v, i, j): complement of event C(v, i, j) X (m, j, u, kc , r, s; g; h): event of
m 7+1 7 7 7 m−r B(gi , i, j) C(h i , i, j) A(m, j, u, kc , r, s) i=1
• •
i=1
R(m, j, u, kc , r, s): probability that event A(m, j, u, kc , r, s) occurs R(m, j, u, r, s; g; h): probability that event X (m, j, u, kc , r, s; g; h) occurs
The following decomposition for the event X (m, j, u, kc , r, s; g; h) is used: X (m, j, u, kc , r, s; g; h)
m 1 7 = Ej , X m, j − 1, u − xi j , kc , r, s; ; all x j
i=1
where is as defined in the previous section and
= (θ1 , θ2 , . . . , θm ), hi − 1 if xi j = 1, θi = otherwise. kc
436
OTHER k-OUT-OF-n AND CONSECUTIVE-k-OUT-OF-n MODELS
The following recursive formula is then obtained: R(m, j, u, kc , r, s; g; h)
m m . R m, j − 1, u − xi j , kc , r, s; ; ( pi j )1−xi j (qi j )xi j . = i=1 i=1 all x j (11.80) Let 1t = (1, . . . , 1), where 1t is a t-dimensional vector whose components are all 1. Noting that R(m, j, u, kc , r, s) = R(m, j, u, kc , r, s; s1m−r +1 ; kc 1m ), we can use recursive equation (11.80) to evaluate the reliability R(m, n, k, kc , r, s) of the combined k-out-of-mn:F, Con/kc /n:F, and linear (r, s)/(m, n):F system. The complexity for calculation of R(m, n, k, kc , r, s) is O(kn(2kc )m (s + 1)m−r ). When k > mn, the model proposed in this section becomes the combined Con/kc /n:F and linear (r, s)/(m, n):F model, which is discussed in Section 11.7. Equation (11.80) can then be used for the reliability evaluation of such a system with the third index in R(m, j, u, kc , r, s; g; h) ignored. Example 11.8 The reliability R(3, 5, 7, 3, 2, 2) of the combined k-out-of-mn:F, Con/kc /n:F, and linear (r, s)/(m, n):F system whose components have unreliabilities as given in Example 11.6 was calculated. The result was 0.9846.
11.9 APPLICATION OF COMBINED k-OUT-OF-n AND CONSECUTIVE-k-OUT-OF-n SYSTEMS Zuo et al. [262] report an application of the combined k-out-of-n and Con/k/n models in a remaining life estimation of hydrogen reformer furnaces. The following section outlines this application. A petrochemical company uses several methane reformer furnaces to produce hydrogen for hydrotreating. These furnaces have hundreds of tubes that are filled with catalyst. Methane and steam are passed through these tubes at high temperature and pressure and hydrogen is produced. The retubing of a furnace, that is, the replacement of all the tubes, costs on the order of ten million dollars. The tubes in a furnace are arranged vertically and the top views are shown in Figures 11.3 and 11.4. In furnace 1, the tubes are evenly divided into two cells. The two cells can be regarded as independent of each other because each has its own heat source and inlets for steam and methane. In furnace 2, all the tubes are in a single cell. The tubes shown in Figures 11.3 and 11.4 are numbered consecutively from 1 to n, where n is the total number of tubes in the furnace. The tubes are designed to function at high temperature (around 940◦ C) and high pressure (around 382 psi) for a minimum design life of around 11 years. Under such
APPLICATION OF COMBINED k-OUT-OF-n AND CONSECUTIVE-k-OUT-OF-n SYSTEMS
1 115
2 116
3 117
4 118
... ... ... ...
114 228
229 343
230 344
231 345
232 346
... ... ... ...
342 456
FIGURE 11.3
1 93 185 277
2 94 186 278
FIGURE 11.4
437
Tube arrangement in furnace 1.
3 95 187 279
4 96 188 280
... ... ... ...
... ... ... ...
92 184 276 368
Tube arrangement in furnace 2.
working conditions, the failure mechanisms of the tubes include oxidation/corrosion, creep degradation, and carbonization [112]. Whenever a tube is found failed, for example, ruptured due to creep damage, the flow of methane and steam into the tube is terminated. No catastrophic damage may be caused by the failure of a single tube. When too many tubes are failed, the furnace efficiency becomes too low and the furnace has to be retubed. Based on discussions with engineers at the company, the following failure scenarios for the furnaces are identified: •
•
•
Scenario 1: The furnace is failed whenever a certain percentage of the tubes in the furnace are failed. For example, the furnace is failed whenever at least 10% of the tubes (i.e., at least a total of 46 tubes) of the 456 tubes in furnace 1 are failed. Scenario 2: The furnace is failed if at least one row of tubes has at least a certain number of consecutive tubes that are all failed. For example, we may say that furnace 2 is failed when there are five or more consecutive failed tubes in a row. Scenario 3: The furnace is failed whenever there is at least one cluster of size r × s of failed tubes.
We can say that the furnace is failed if at least one of the three scenarios is realized. In evaluation of the reliability of a furnace under all these scenarios, the combined k-out-of-n:F and Con/k/n:F models have to be used. The algorithm for the combined k-out-of-mn:F, Con/kc /n:F, and linear (r, s)/(m, n):F system was first applied to furnace 1. The system is failed if at least one of the three scenarios is realized. The tubes in furnace 1 are divided into two independent cells. Each cell has 2 × 114 = 228 tubes. We will assume the following specific system failure definition: the system is failed if at least one of the following scenarios is realized: 1. Scenario 1: There are at least 55 failed tubes. 2. Scenario 2: At least one row has at least 4 consecutive failed tubes.
438
OTHER k-OUT-OF-n AND CONSECUTIVE-k-OUT-OF-n MODELS
3. Scenario 3: There exists at least one grid of failed components of size 2 × 2 in at least one of the two cells. To evaluate the reliability of furnace 1 using the algorithm in Section 11.8, we can add a column 115 with two perfect components (whose reliabilities equal 1) to the right of column 114 in cell 1 (see Figure 11.3). Then, we can concatenate cell 2 to the right of column 115. The following system parameters and component reliabilities are used: m = 2,
n = 229,
k = 55,
k c = 4,
r = 2,
s = 2,
p1,115 = p2,115 = 1, pi, j = 0.9000
for j = 115, i = 1, 2, 1 ≤ j ≤ 229.
The reliability of furnace 1 was found to be 0.8669 with a CPU time of about 700 s on a Pentium II 300 PC. If each of the three scenarios is applied independently on furnace 1, the computed system reliabilities and computation times are as follows: • • •
Scenario 1: system reliability = 0.9150, CPU time = 2.65 s. Scenario 2: system reliability = 0.9608, CPU time = 0.04 s. Scenario 3: system reliability = 0.9779, CPU time = 0.30 s.
For furnace 2, we use the following system parameters and component reliabilities to represent the combined system failure scenarios: m = 4,
n = 92,
pi, j = 0.9000
k = 45,
kc = 3,
r = 2,
s = 2,
for 1 ≤ i ≤ 4, 1 ≤ j ≤ 92.
The reliability of furnace 2 was found to be 0.6641 with a computing time of 4.3 h on a Pentium II 300 PC. If each of the scenarios is applied independently, the computed system reliabilities and the corresponding CPU times are as follows: • • •
Scenario 1: system reliability = 0.9069, CPU time = 1.62 s. Scenario 2: system reliability = 0.7222, CPU time = 0.02 s. Scenario 3: system reliability = 0.9735, CPU time = 2.48 s.
These numerical results show that the combined k-out-of-n and Con/k/n models and their reliability evaluation algorithms are useful for calculation of the reliability of practical engineering systems. For large systems, the computation time is long for the combined model because of its complexity. 11.10 CONSECUTIVELY CONNECTED SYSTEMS Shanthikumar [224] extended the Con/k/n:F system to the consecutively connected system (CCS). A system consisting of a source (0), n components {1, 2, . . . , n}, and
CONSECUTIVELY CONNECTED SYSTEMS
439
a sink (n + 1) is a CCS if the source is connected to components {1, 2, . . . , k0 } and components j (1 ≤ j ≤ n) are connected to components { j +1, j +2, . . . , j +k j } by arcs. The source, sink, and arcs are perfect and the n components are failure prone. The system functions if there is a connection from the source to the sink through functioning components. The CCS allows the modeling of systems in which different components have different transmitting capabilities. An O(n 2 ) algorithm was provided for the reliability evaluation of the CCS by Shanthikumar [224]. Hwang and Yao [107] extended Shanthikumar’s algorithm to circular CCSs, resulting in an O(mn 2 ) algorithm, where m is either 1 or the number of common components shared by two adjacent consecutive minimal cuts in the system, whichever is larger. There are at most n consecutive minimal cuts in a circular CCS. If the two adjacent consecutive minimal cuts with the smallest number of common components can be selected, that is, m is minimized, the algorithm by Hwang and Yao [107] attains its most efficient operation. Hwang and Yao [107] further generalized the CCS by assuming that the components are multistate, and they provided an O(K n) algorithm for computing the reliability of the multistate CCS, where K (K < n) is the maximum number of states of the components. The equations in Hwang and Yao [107] can also be used for finding the reliability of a CCS with binary components. Multistate systems will be discussed in Chapter 12. Zuo [259] studies both linear and circular CCSs and provides an O(kn) algorithm for the linear system and an O(k 3 n) algorithm for the circular system reliability evaluation. In the following, we provide the results reported in Zuo [259]. Notation • •
•
• •
ki : transmitting capability of component i, or the maximum number of immediately succeeding components from which component i can reach directly ki : receiving capability of component i, or the maximum number of immediately preceding components from which component i can directly receive a signal R(i, j): reliability of a linear CCS with source i, sink j, and components i + 1, i + 2, . . . , j − 1, given that component l has transmitting capability kl (0 ≤ i ≤ l ≤ j − 1 ≤ n) R (i): reliability of a linear CCS with source 0, sink i, and components 1, 2, . . . , i − 1, given that component j has receiving capability k j (1 ≤ j ≤ i) RC (1, n): reliability of a circular CCS with components 1, 2, . . . , n
In a CCS, the source and the sink may also be referred to as components 0 and n + 1, respectively. Components 0 and n + 1 are failure free while the other components are failure prone. Component j (0 ≤ j ≤ n) has a transmitting capability of k j . When component i works, it is capable of reaching the next ki components directly. When it is failed, it cannot reach even the immediately following component. As defined in Chapter 4, a component is irrelevant to the system structure if the state of the component does not affect the state of the system. Otherwise, the component is relevant. In a CCS, because each component may have a different transmitting
440
OTHER k-OUT-OF-n AND CONSECUTIVE-k-OUT-OF-n MODELS
capability, there is a possibility that a component is irrelevant to the system structure. The following lemma states a necessary and sufficient condition for a component to be irrelevant in a CCS. It is quite intuitive. Lemma 11.1 (Zuo [259]) In a CCS, component j is irrelevant to the structure of the system if and only if, for all relevant components i that can reach j directly, the following inequality is satisfied: i + ki ≥ j + k j .
(11.81)
For a given CCS structure, Lemma 11.1 may be used to determine the relevancy of each component. According to Lemma 11.1, we need to know the relevancy of all the components that can reach component j directly in order to determine the relevancy of component j. Components 0 and n + 1 are relevant from the definition of CCS. When one is checking the relevancy of a component, say j, that can be reached by the source directly, it is only necessary to compare j + k j with 0 + k0 . If j + k j > k0 , component j is relevant; otherwise, it is irrelevant. It is not necessary to check all the components that can reach j directly because all the components from 1 to j − 1 receive a signal from the source directly. In the example in Figure 11.5, we can simultaneously detect that both components 1 and 2 are irrelevant because 1 + k 1 = 3 < k0 = 4 and 2 + k2 = 4 = k0 . This feature is used in reliability evaluation of a CCS in the following discussions. Though irrelevant components normally do not exist in real CCSs, irrelevancy is still an important concept for reliability evaluation of CCSs in the following discussions. The reliability of a sub-CCS with source i(0 ≤ i ≤ n), components i + 1, . . . , n, and sink n + 1 is 1 if i can reach n + 1 directly. When source i cannot reach n + 1 directly, at least one of the ki immediately succeeding components, namely, i + 1, i + 2, and i + ki , must work for the subsystem to work. If all these ki components are relevant to the subsystem structure, we have R(i, n + 1) = pi+1 R(i + 1, n + 1) + qi+1 pi+2 R(i + 2, n + 1) + · · · + qi+1 · · · qi+ki −1 pi+ki R(i + ki , n + 1).
(11.82)
However, if any of these ki components are irrelevant to the subsystem structure, it makes no contribution to the sub-CCS reliability, R(i, n + 1). If component j ( j =
0
1
2
FIGURE 11.5
3
4
5
A CCS with irrelevant components.
6
CONSECUTIVELY CONNECTED SYSTEMS
441
i + 1, i + 2, . . . , i + ki ) is irrelevant to the subsystem structure, p j and q j are temporarily set to 0 and 1, respectively, in equation (11.82) for reliability evaluation of the subsystem. The algorithm is as follows: Algorithm R Sys Tr R Sys Tr(ki , p j , q j ) For i = n To 0 By −1 Do If i + ki ≥ n + 1 Then R(i, n + 1) = 1; Else R(i, n + 1) = 0; temp = 1; For j = i + 1 To i + ki By 1 Do If j + k j > i + ki Then R(i, n + 1) = R(i, n + 1) + temp ∗ p j ∗ R( j, n + 1); temp = temp ∗ q j ; EndIf EndFor EndIf EndFor Return R(0, n + 1); End The system reliability R(0, n + 1) can be computed in O(kn) time where k ≡ max0≤i≤n {ki } in this algorithm. In computing R(0, n + 1), all the subsystem reliabilities R(i, n + 1) for i = 1, 2, . . . , n, are also provided. The transmitting capability ki of component i has been defined as the number of immediately succeeding components that component i can reach directly. It is a function of component i’s transmitting power and the distances of the succeeding components from it. The connection relationship of all the components in such a system is completely specified by {ki , i = 0, 1, . . . , n}. Similarly, we can define the receiving capability ki of component i as the number of immediately preceding components from which component i can directly receive the signal. Under this definition, the receiving capability of a component is a function of its receiving sensitivity and the distances of the preceding components from it. The system structure of a CCS under the receiving capability definition is completely specified by {ki , i = 1, 2, . . . , n + 1}. When a CCS structure is specified by the receiving capabilities of the components, we have a similar result to Lemma 11.1 in testing the relevancy of a component in the system. In this case, component j is irrelevant to the structure of the system if and only if all relevant components i that can directly receive signals from component j can also directly receive signals from component j − k j , that is, i − ki ≤ j − k j . Similarly, if component j is the sink with receiving capability k j , component i is relevant if and only if i − ki < j − k j for i = j − 1, j − 2, . . . , j − k j .
442
OTHER k-OUT-OF-n AND CONSECUTIVE-k-OUT-OF-n MODELS
A Con/k/n:F system can be described in terms of both component transmitting capabilities and component receiving capabilities since we can say ki = ki for all the components. A general CCS structure may only be specified by either {ki , i = 0, . . . , n} or {ki , i = 1, . . . , n + 1} but not both. For example, the system in Figure 11.5 cannot be represented by {ki , i = 1, 2, . . . , 6}. Neither can that in Shanthikumar [224] be represented by {ki , i = 1, 2, . . . , 7}. When it is suitable to work with the receiving capabilities of the components, Algorithm R Sys Re given below can be used to compute the reliability of a CCS system. The algorithm iterates n +1 times to compute system reliability, R (n +1). The complexity of the algorithm is O(k n), where k ≡ max1≤i≤n+1 {ki }:
Algorithm R Sys Re R sys Re(ki , p j , q j ) For i = 1 To n + 1 By 1 Do If i ≤ ki Then R (i) = 1; Else R (i) = 0; temp = 1; For j = i − 1 To i − ki By −1 Do If j − k j < i − ki Then R (i) = R (i) + R ( j) ∗ p j ∗ temp; temp = temp ∗ q j ; EndIf EndFor EndIf EndFor Return R (n + 1); End For circular CCSs, Zuo [259] adopts definition 2 of a circular CCS given by Hwang and Yao [107]. A circular CCS with ki < n is a system that works if there exist two working components that can reach each other directly or indirectly, where ki and n are the transmitting capability of component i and the number of components in the system, respectively. Only the concept of transmitting capability is used for the circular systems here. A useful physical system in existence or to be designed would not have irrelevant components because irrelevant components represent a waste of resources. Thus, we restrict our discussions in this section to the circular CCSs, that do not have irrelevant components. Assume that a minimal cut set with m components starting with component 1 has been found, that is, {1, 2, . . . , m}. Therefore, at least one of these m components must work for the system to work. For each j (1 ≤ j ≤ m), define C j as the set
CONSECUTIVELY CONNECTED SYSTEMS
443
of components excluding components 1, 2, . . . , j − 1 that can reach component j directly. Assume that the elements in C j have been ordered in decreasing order of component indices. The number of elements in C j is denoted by l j . For example, if only components 1, 2, n, and n − 2 can reach component 3 directly, then C3 = {n, n − 2}. There are two elements (l3 = 2) in set C 3 and they have been ordered in decreasing order of component indices (n > n − 2). For a circular system to work, the event that at least one of the components in the cut set {1, 2, . . . , m} works must happen. This event can be decomposed into the following disjoint subevents: 1 12 123 .. .
Component 1 works Component 1 fails and component 2 works Both 1 and 2 fail and 3 works
12 ··· m − 1m
Component m is the only working component in the set.
Given that subevent 1 2 · · · j − 1 j for 1 ≤ j ≤ m occurs, at least one of the components in set C j must work for the system to work. Let C j (i) represent the ith element in set C j . One of the following disjoint subevents must occur to make sure that at least one of the components in set C j works: C j (1) C j (1) C j (2) .. .
Component C j (1) works C j (1) fails and C j (2) works
C j (1) C j (2) · · · C j (l j − 1) C j (l j )
C j (l j ) is the only working one in set C j
If two events 1 2 · · · j − 1 j for 1 ≤ j ≤ m and C j (1) C j (2) · · · C j (i − 1) C j (i) for 1 ≤ i ≤ l j are occurring, there must be a working path of components from working component j to working component C j (i) for the system to work. That is, the linear sub-CCS including components j, j + 1, . . . , C j (i) must work. As a result, we have the following equation for the reliability of a circular system:
RC (1, n) =
j−1 m . j=1
qx
pj
x=0 l
=
j m j=1 i=1
p j pC j (i)
l
j i=1
j−1 . x=0
i−1 .
qC j (y) pC j (i) R( j, C j (i))
y=0
i−1 . qx qC
j (y)
R( j, C j (i)), (11.83)
y=0
where q0 ≡ 1 and C j (0) ≡ 0. The complexity of the above equations is O(k 3 n), where k ≡ max1≤i≤n {ki }. The following provides an example showing the reliability calculation of the circular CCS in Figure 11.6.
444
OTHER k-OUT-OF-n AND CONSECUTIVE-k-OUT-OF-n MODELS
1
2
6
3
5
4
FIGURE 11.6
Example of a circular CCS.
Example 11.9 The system has six components {1, 2, 3, 4, 5, 6}. There are no irrelevant components in the system. The minimal cut set starting with component 1 is {1, 2}, m = 2, C1 = {6, 5}, l1 = 2, C2 = {5}, l2 = 1. Applying equation (11.83), we have
j−1 lj 2 i−1 . . p j pC j (i) qx qC j (y) R( j, C j (i)) RC (1, 6) = j=1 i=1
=
2
x=0
p1 pC1 (i)
i=1
i−1 .
y=0
qC1 (y) R(1, C1 (i)) +
y=0
1
p2 p5 q1 R(2, 5)
i=1
= p1 p6 R(1, 6) + p1 p5 q6 R(1, 5) + p2 p5 q1 R(2, 5). According to Algorithm 1, R(4, 5) = 1, R(3, 5) = R(2, 5) = p4 , R(1, 5) = p2 R(2, 5) + q2 p3 R(3, 5) = p2 p4 + q2 p3 p4 , R(5, 6) = R(4, 6) = 1, R(3, 6) = R(2, 6) = p4 , R(1, 6) = p2 R(2, 6) + q2 p3 R(3, 6) = p2 p4 + q2 p3 p4 . As a result, the reliability of the circular system is RC (1, 6) = p1 p4 ( p2 + q2 p3 )( p6 + q6 p5 ) + p2 p4 p5 q1 . Lin et al. [143] study the dual of CCSs, namely the consecutively connected G systems. A consecutively connected G system has n ordered components arranged along a line or a circle. It works if and only if there exists a component i and the subsequent ki − 1 components that are all working. For a system with n
CONSECUTIVELY CONNECTED SYSTEMS
445
components, there are at most m distinct path sets, namely, S1 = {1, 2, . . . , k1 }, S2 = {2, 3, . . . , 1 + k2 }, . . . , Sm = {m, m + 1, . . . , m + km − 1}, where m ≤ n and m + km − 1 ≤ n. When ki = k for all i, we have the simple Con/k/n:G system structure. It is apparent that the consecutively connected G system is a mirror image of the consecutively connected F system that has been studied. The algorithms for consecutively connected F systems can be used for reliability evaluation of the consecutively connected G systems. Lin et al. [143] provide an O(n) algorithm for linear and an O(n 2 ) algorithm for circular consecutively connected G system reliability evaluation. In the following, we present their algorithms. Notation • • • •
Si : event that the ki consecutive components starting from component i all work K n : vector Kn = (k1 , k2 , . . . , kn ) gi : number of distinct path sets in a subsystem with components 1, 2, . . . , i Ra (n, Kn ): reliability of a consecutively connected G system with vector Kn , where a = L for linear systems and a = C for circular systems
It is also assumed that for each i < j, we have i + ki ≤ j + k j . Otherwise, we will find that ki should be smaller than the one given. Algorithm N Paths is given for finding gi for i = 1, 2, . . . , n: Algorithm N Paths N Paths(ki ) For i = 0 To n By 1 gi = 0; Endfor For i = 1 to n By 1 If i + ki − 1 ≤ n gi+ki −1 = i; Endif Endfor For i = 1 to n By 1 If gi = 0 Then gi = gi−1 ; Endif Endfor Return gi ; End
Do
Do Then
Do
With the values of gi for i = 1, 2, . . . , n from Algorithm N Paths, we can use the following equation for system reliability evaluation of a linear consecutively connected G system:
446
OTHER k-OUT-OF-n AND CONSECUTIVE-k-OUT-OF-n MODELS
R L (n, Kn ) = R L (n − 1, Kn−1 ) gn i+k i −1 .
1 − R(i − 2, K i−2 ) qi−1 + pj , i=gn−1 +1
(11.84)
j=i
with the boundary condition R L (i, Ki ) = 0 for i < k1 . For a circular consecutively connected G system, the following equations are given for its system reliability evaluation: qn R L (n − 1, Kn−1 ) + pn if kn = 1, RC (n, Kn ) = if 1 < kn ≤ n, qn R L (n − 1, Kn−1 ) + pn RC (n − 1, Kn−1 ) (11.85) where ki
ki = ∞
if i + ki − 1 < n, 1 ≤ i ≤ n − 1, if i + ki − 1 ≥ n, 1 ≤ i ≤ n − 1,
k1 = kn − 1, ki ki = ki − 1
(11.86) (11.87)
if i + ki − 1 < n, 2 ≤ i ≤ n − 1 if i + ki − 1 ≥ n, 2 ≤ i ≤ n − 1.
(11.88)
Example 11.10 Let a = L, n = 5, k1 = 1, k2 = 3, k3 = 2, k4 = 2, and k5 = ∞. Applying Algorithm N Paths, we obtain the following values of gi : g0 = 0,
g1 = 1,
g2 = 1,
g3 = 1,
g4 = 3,
g5 = 4.
Applying equation (11.84), we have R L (i, K1 ) = p1
for i = 1, 2, 3,
R L (4, K4 ) = p1 + q1 p3 p4 , R L (5, K5 ) = p1 + q1 p3 p4 + q1 q3 p4 p5 . Example 11.11 Let a = C, n = 6, k1 = 1, k2 = 3, k3 = 2, k4 = 2, k5 = 2, and k6 = 2. Applying equation (11.85), we have RC (6, (1, 3, 2, 2, 2, 2)) = q6 R L (5, (1, 3, 2, 2, ∞)) + p6 RC (5, (1, 3, 2, 2, 1)) = q6 R L (5, (1, 3, 2, 2, ∞)) + p6 (q5 R L (4, (1, 3, 2, ∞)) + p5 ) = q6 ( p1 + q1 p3 p4 + q1 q3 p4 p5 ) + p6 (q5 ( p1 + q1 p3 p4 ) + p5 ).
WEIGHTED CONSECUTIVE-k-OUT-OF-n SYSTEMS
447
11.11 WEIGHTED CONSECUTIVE-k-OUT-OF-n SYSTEMS A weighted Con/k/n:F system consists of n components, where component i carries a positive integer weight of wi . The system is failed if and only if the total weight of the failed consecutive components is at least k. In this model, k no longer represents the minimum number of consecutive component failures. Instead, it represents the minimum total weight loss due to consecutive component failures that causes system failure. As a result, k may be greater than, less than, or equal to n. 11.11.1 Weighted Linear Consecutive-k-out-of-n:F Systems Notation • • • • • • • • • •
m: number of minimal cut sets Si : ith minimal cut, 1 ≤ i ≤ m B(i): starting component of Si , 1 ≤ i ≤ m E(i): ending component of Si , 1 ≤ i ≤ m T (i): total weight of the components in Si , 1 ≤ i ≤ m Q(i): probability for all components in Si to fail, 1 ≤ i ≤ m q j : unreliability of component j, 1 ≤ j ≤ n w j : weight carried by component j, 1 ≤ j ≤ n R L (i, j): reliability of a weighted Lin/Con/k/n:F system with components i, i + 1, . . . , j Q L (i, j): 1 − R L (i, j)
To find the reliability of a weighted Lin/Con/k/n:F system, Wu and Chen [248] suggest first finding the minimal cut sets of the system using Algorithm Find Mincuts: Algorithm Find Mincuts Find Mincuts(w j , q j ) m = 0; i = 1; T (i) = 0; Q(i) = 1; B(i) = 1; For j = 1 To n By 1 Do T (i) = T (i) + w j ; Q(i) = q j Q(i); If T (i) ≥ k Then m = m + 1; E(i) = j; While (T (i) − w B(i) ) ≥ k Do T (i) = T (i) − w B(i) ; Q(i) = Q(i)/q B(i) ; B(i) = B(i) + 1;
448
OTHER k-OUT-OF-n AND CONSECUTIVE-k-OUT-OF-n MODELS
EndWhile B(i + 1) = B(i) + 1; T (i + 1) = T (i) − w B(i) ; Q(i + 1) = Q(i)/q B(i) ; i = i + 1; EndIf EndFor End Algorithm Find Mincuts starts from component 1, adding the weight of one additional component at a time. As soon as the total weight is at least k, we have found a cut set. Then the components in this set are removed one at a time starting from component 1 until removing one more component would make the total weight of the components in the cut set go below k. A minimal cut set is found. The next cut includes all components in the current minimal cut except the first component. One more component is then added to this set of components until the total weight in this set is at least k. A new cut set is found. Again we need to make sure that this is a minimal cut by removing some components with lower indices. This process is repeated until we reach the last component in the system. The computational complexity of this algorithm is O(n) [248]. Example 11.12 Let k = 4, n = 7, and (w1 , . . . , w7 ) = (1, 1, 1, 2, 3, 2, 1). Applying Algorithm Find Mincuts, Table 11.3 is produced. From Table 11.3, we see that three minimal cut sets are identified: S1 = {2, 3, 4}, S2 = {4, 5}, and S3 = {5, 6}. Each minimal cut set consists of consecutive components. After the minimal cuts are obtained, the unreliability of the weighted Lin/Con/k/n:F system can be evaluated as the probability of the union of these minimal cuts. A recursive equation is provided for this computation: Q L (1, E(m)) = Pr(S1 ∪ S2 ∪ · · · ∪ Sm ) = Q L (1, E(m − 1)) +
B(m)−B(m−1)−1
×
i=0
R L (1, B(m − 1) + i − 1) p B(m−1)+i Q L (m)
B(m)−1 .
ql ,
(11.89)
l=B(m−1)+i+1
where
0 Q L (1, i) = Q L (1, E( j)) Q L (1, E(m))
for i = 0, 1, . . . , E(1) − 1, for i = E( j), E( j) + 1, . . . , E( j + 1) − 1, for i = E(m), E(m) + 1, . . . , n, (11.90)
449
WEIGHTED CONSECUTIVE-k-OUT-OF-n SYSTEMS
TABLE 11.3 Example of Weighted Lin/Con/4/7:F System Minimal Cut Set
Total Weight
For Loop
1
1
2
2
3
3
5
4
While Loop 1 1 1 1 1 1 1 1
1
4
1
4
2
6 2
2 1 2 1 2 1 2 1 2 1
5 3
5
4 6
5 3
4 2 4 2 4 2 4 2 4 2 4 2
5 3 5 3 5 3 5 3 5 3
3
4
3 1 3 1 3 1 3 1 3 1
5
5
Components and Weights
5 7
6 2 6 2 6 2
b 9
ql ≡ 1
for b < a.
7 1
(11.91)
l=a
The unreliability of the system is Q L (1, n). The computational complexity for Q L (1, n) is O(n). Example 11.13 Consider the system in Example 11.12. With Algorithm Find Mincuts we have obtained the following: S1 = {2, 3, 4},
B(1) = 2,
E(1) = 4,
Q(1) = q2 q3 q4 ,
S2 = {4, 5},
B(2) = 4,
E(2) = 5,
Q(2) = q4 q5 ,
S3 = {5, 6},
B(3) = 5,
E(3) = 6,
Q(3) = q5 q6 .
450
OTHER k-OUT-OF-n AND CONSECUTIVE-k-OUT-OF-n MODELS
With equations (11.89) and (11.90), we have Q L (1, 1) = 0, Q L (1, 2) = 0, Q L (1, 3) = 0, Q L (1, 4) = Q(1) Q L (1, 5) = Q L (1, 4) +
1
R L (1, 1 + i) p2+i Q(2)
i=0
3 .
ql
l=3+i
= q2 q3 q4 + p2 q3 q4 q5 + p3 q4 q5 , Q L (1, 6) = Q L (1, 5) +
0
R L (1, 3 + i) p4+i Q(3)
i=0
4 .
ql
l=5+i
= q2 q3 q4 + p2 q3 q4 q5 + p3 q4 q5 + p4 q5 q6 , Q L (1, 7) = Q L (1, 6), R L (1, 7) = 1 − Q L (1, 7). 11.11.2 Weighted Circular Consecutive-k-out-of-n:F Systems When the weighted Con/k/n:F system is circular, Wu and Chen [248] provide an algorithm for reliability evaluation:
s−1 n . . qi ps R L (s + 1, l − 1) pl q j , (11.92) RC (1, n) = A
i=1
j=l+1
where RC (1, n) is the reliability of a weighted Cir/Con/k/n:F system with components 1, 2, . . . , n, R(s + 1, l − 1) is the reliability of a weighted Lin/Con/k/n:F system with components s + 1, s + 2, . . . , l − 1, and A=
s−1 i=1
wi +
n
wj.
(11.93)
j=l+1
The computational complexity of equation (11.92) is O(nk) if n ≥ k and O(n 2 ) if n ≤ k. Chang et al. [47] provide an improved algorithm with complexity O(T n), where T ≤ min{n, (k − wmax )/wmin + 1}. Thus, the computational complexity of these equations is O(n × min{n, k}). We first define the following additional notation and assumptions. Additional Notation •
wmin : minimum of all wi values for 1 ≤ i ≤ n
451
WEIGHTED CONSECUTIVE-k-OUT-OF-n SYSTEMS • •
wmax : maximum of all wi values for 1 ≤ i ≤ n T : max{i : i−1 j=1 w j < k, 1 ≤ i ≤ n + 1}
Assumptions 1. The components are arranged such that component 1 has the highest weight, that is, w1 = max1≤i≤n {wi }. 2. Assume p j = 0 and w j = w j−n for j > n. Based on these assumptions, Chang et al. [47] provide RC (1, n) =
T
R L (i + 1, n + i − 1)
i=1
× Pr(component i is the first working component) 1 for T > n, T i−1 . = R L (i + 1, n + i − 1) pi q j for T ≤ n. i=1
j=1
(11.94)
12 MULTISTATE SYSTEM MODELS
Thus far, we have assumed that a system and its components may only be in two possible states: either working or failed. Because of this assumption, the structure function of the system is a binary function of binary variables. As a result, the corresponding system reliability models are referred to as binary system reliability models. Binary system reliability models have been widely used in practical reliability engineering. They have played important roles in the reliability enhancement of various engineering systems. However, for some engineering systems, the binary assumption does not accurately represent the possible states that each of them may experience. As defined earlier, reliability is the probability that a device is able to perform its intended functions satisfactorily under specified conditions for a specified period of time. To describe the satisfactory performance of a device, we may use more than two levels of satisfaction, for example, excellent, average, and poor. To model systems and components whose performance can be modeled with more than two possible levels, multistate system reliability models have been proposed. In a multistate system model, both the system and its components may experience more than two levels of performance. In other words, we say that a system and its components may be in M + 1 possible states, 0, 1, 2, . . . , M, where 0 indicates the completely failed state, M the perfectly working state, and others degraded states. With such assumptions, we have a discrete multistate system model simply because the state of each component and the system can be represented by integer numbers. Continuous multistate system models have also been proposed wherein the state of a component or a system may be represented by a continuous variable that may take values in a closed interval. For example, readers may refer to Block and Savits [33]. In this chapter, we concentrate on multistate system reliability models wherein a system and/or its components may be in more than two possible states. We first 452
CONSECUTIVELY CONNECTED SYSTEMS
0
1
FIGURE 12.1
2
3
4
453
t
Consecutively connected system.
discuss the consecutively connected system (CCS) wherein the components are multistates while the system is binary. The two-way communication system is discussed next. Then, the concepts used in binary system reliability theory are extended to multistate systems. Several special multistate system reliability models are introduced. Finally, concepts and methods for performance evaluation of general multistate systems are discussed.
12.1 CONSECUTIVELY CONNECTED SYSTEMS WITH BINARY SYSTEM STATE AND MULTISTATE COMPONENTS In Section 11.10, we covered consecutively connected systems with binary system and component states. In this section, we consider consecutively connected systems having multistate components. To illustrate, consider a consecutively connected system with the source 0, the sink t, and four failure-prone components, as shown in Figure 12.1. In this system, the transmission capabilities of components 1–4 are 3, 1, 2, and 1, respectively. The components in the consecutively connected system given in Figure 12.1 may be either binary or multistate. For example, we can assume that a component may experience only two possible states, working or failed. Under this assumption, component 1 is capable of reaching components 2, 3, and 4 when it is working and cannot perform any function when it is failed. We can also assume that a component may experience more than two possible states. Again take component 1 as an example. When it is in state 3, it is capable of reaching components 2, 3, and 4. When it is in state 2, it is capable of reaching only components 2 and 3. When it is in state 1, it is capable of reaching only component 2. When it is in state 0, it cannot reach any other component. Under both of these assumptions, we consider the system to have only two possible states. When there is a link of working components from the source to the sink, the system is working; otherwise the system is failed. When the components are multistates, we call such systems multistate consecutively connected systems (MCCSs). 12.1.1 Linear Multistate Consecutively Connected Systems Hwang and Yao [107] consider a linear consecutively connected system with binary system states and multistate components. The system has the following components: 0, 1, . . . , n + 1, where components 0 (the source) and n + 1 (the sink) are perfect while other components are subject to degradation and failure. These components are arranged along a line. Component i may be in one of ki + 1 possible states,
454
MULTISTATE SYSTEM MODELS
namely, {0, 1, 2, . . . , ki }. When it is in state zero, it is completely failed. When it is in state h for 1 ≤ h ≤ ki , it is directly connected to the immediately following h components. This model may be used to represent the situation wherein the component transmitting capability may be reduced due to its own deterioration or due to shortage in the power supply. It should be noted that even if the components may be in more than two possible states, the system is still binary. The system is defined working if and only if there is a link from the source to the sink through working or partially working components. Notation • • • •
•
•
ki : transmission capability of component i or the maximum possible number of following components that component i could reach directly K : max0≤i≤n {ki } pi,i : probability that component i is completely failed pi, j : probability that component i is directly connected to components i + 1, i + 2, . . . , j but not to components farther than j for i < j ≤ i + ki and i+ki j=i pi, j = 1 f (i, j): probability that component 0 reaches (connects directly or indirectly to) component i and component j is the farthest component that is directly connected to one of the components 0, 1, . . . , i(1 ≤ i < j ≤ n + 1) R L (n): reliability of a linear MCCS with components 0, 1, 2, . . . , n, n + 1 where R L (n) = f (n, n + 1)
Hwang and Yao [107] provide the following equations for system reliability evaluation: f (0, j) = p0, j
for 0 < j ≤ k0 ,
f (i, j) = f (i − 1, j)
j h=i
pi,h + pi, j
(12.1) j−1
f (i − 1, h)
h=i
for i = 1, 2, . . . , n, i < j ≤ max0≤u≤i {u + ku }.
(12.2)
Equation (12.2) is a recursive formula for evaluating the probability of the event (say, event A) that component 0 reaches (connects directly or indirectly to) component i and component j is the farthest component that is directly connected to one of the components 0, 1, . . . , i. This event, A, can be decomposed into two disjoint events, A1 and A2 , such that A = A1 ∪ A2 , where A1 represents the event that the source can reach component i − 1 directly or indirectly; component j is the farthest component that is directly connected to one of the components {0, 1, . . . , i − 1}; component i cannot reach beyond component j; A2 represents the event that the source can reach component i − 1 directly or indirectly; the farthest component that can be reached directly by the components in {0, 1, . . . , i − 1} has to be among {i, i + 1, . . . , j − 1}, and component i can reach component j directly but cannot
CONSECUTIVELY CONNECTED SYSTEMS
455
reach beyond component j. With such definitions of events A, A1 , and A2 , equation (12.2) can be interpreted as Pr(A) = Pr(A1 ) + Pr(A2 ). The reliability of the multistate consecutively connected system is given by f (n, n + 1). The right-hand side of equation (12.2) uses the values of f (i − 1, h) for max0≤u≤i−1 {u + ku } < h ≤ max0≤u≤i {u + ku }, which are not computed at preceding steps. However, these f (i − 1, h) values are obviously zero. The computational complexity for f (n, n + 1) using equation (12.2) is O(K n). Example 12.1 Evaluate the reliability of the linear MCCS shown in Figure 12.1. Let p0,0 = 0,
p0,1 = 0.2,
p0,2 = 0.8,
p1,1 = 0.1,
p1,2 = 0.2,
p1,3 = 0.3,
p2,2 = 0.3,
p2,3 = 0.7,
p3,3 = 0.1,
p3,4 = 0.3,
p4,4 = 0.2,
p4,5 = 0.8.
p1,4 = 0.4,
p3,5 = 0.6,
To find the reliability of the system R L (4) = f (4, 5), we need to find the following probabilities first: f (1, 2), f (1, 3), f (1, 4), f (2, 3), f (2, 4), f (3, 4), f (3, 5), and f (4, 5). Other f (i, j) values required in calculation of f (4, 5) are equal to zero. Using equation (12.2), we have f (1, 2) = f (0, 2)
2
p1,h + p1,2 f (0, 1)
h=1
= p0,2 ( p1,1 + p1,2 ) + p1,2 p0,1 = 0.2800, f (1, 3) = p1,3 [ f (0, 1) + f (0, 2)] = p1,3 ( p0,1 + p0,2 ) = 0.3000, f (1, 4) = p1,4 [ f (0, 1) + f (0, 2)] = p1,4 ( p0,1 + p0,2 ) = 0.4000, f (2, 3) = f (1, 3) + p2,3 f (1, 2) = 0.4960, f (2, 4) = f (1, 4)
4
p2,h = f (1, 4) = 0.4000,
h=2
f (3, 4) = f (2, 4) + p3,4 f (2, 3) = 0.5488, f (3, 5) = f (2, 5) + p3,5
4 h=3
f (i − 1, h) = p3,5 f (2, 3) + f (2, 4) = 0.5376,
456
MULTISTATE SYSTEM MODELS
R L (4) = f (4, 5) = f (3, 5)
5
p4h + p4,5 f (3, 4)
h=4
= f (3, 5) + p4,5 f (3, 4) = 0.9766. Kossow and Preuss [119] propose another algorithm for the reliability evaluation of a linear MCCS, which is the same as the one defined by Hwang and Yao [107] except that the source is assumed to be failure prone. Additional Notation • •
Pi, j : probability that component i can directly reach component j or beyond i+k ( j ≥ i), Pi, j = l= j i pil for i < j ≤ i + ki , Pi,i = 1, Pi, j = 0 for j > i + ki Q i, j : = 1 − Pi, j for 1 ≤ i ≤ n and j ≥ i
Kossow and Preuss [119] provide the following equations:
R L (n) =
P01 n av (n)R L (n − v) + an+1 (n)
for n = 0, for 1 ≤ n < k0 ,
v=1
min{K ,n} av (n)R L (n − v)
(12.3)
for n ≥ k0 ,
v=1
where Pn,n+1 v−1 . Q n− j+1,n+1 Pn−v+1,n+1 av (n) = j=1 0
for v = 1, for 2 ≤ v ≤ kn−v+1 ,
(12.4)
for v > kn−v+1 .
In equation (12.4), av (n) is the probability that component n − v + 1 is the last component that can reach the sink directly. For example, a1 (n) is the probability that component n can reach the sink (n + 1) directly, a2 (n) is the probability that the component n − 1 is the last component that can reach the sink directly, and so on. An interpretation of equation (12.3) is that the probability that the sink can be reached by the source directly or indirectly is equal to the sum of the probabilities that the component n − v + 1 can be reached by the source directly or indirectly and component n − v + 1 is the last component that can reach the sink directly for v = 1, 2, . . . , min{K , n}. The complexity of equation (12.3) is O(K n).
CONSECUTIVELY CONNECTED SYSTEMS
0 FIGURE 12.2
1
2
3
457
4
Linear consecutively connected system for Example 12.2.
Example 12.2 Consider a linear MCCS structure shown in Figure 12.2. The system has five components including component 0 as the source and component 4 as the sink. Let p0,0 = 0.1,
p0,1 = 0.2,
p0,2 = 0.7,
p1,1 = 0.1,
p1,2 = 0.1,
p1,3 = 0.2,
p2,2 = 0.2,
p2,3 = 0.8,
p3,3 = 0.3,
p3,4 = 0.7.
p1,4 = 0.6,
Using equation (12.4) with n = 3, k0 = 2, P0,0 = 1, P1,1 = 1, P2,2 = 1, P3,3 = 1,
K P0,1 P1,2 P2,3 P3,4
= 3, = 0.9, = 0.9, = 0.8, = 0.7,
k1 = 3, P0,2 = 0.7, P1,3 = 0.8,
k2 = 1, P1,4 = 0.6,
we have a1 (3) = P3,4 = 0.7000, a2 (3) = (1 − P3,4 )P2,4 = 0.0000, a3 (3) = (1 − P3,4 )(1 − P2,4 )P1,4 = 0.1800, a1 (2) = P2,3 = 0.8000, a2 (2) = (1 − P2,3 )P1,3 = 0.1600, a1 (1) = P0,1 = 0.9000, a2 (1) = Q 1,2 P0,2 = 0.0700. With equation (12.3), we have R L (0) = P0,1 = 0.9000, R L (1) = a1 (1)R L (0) + a2 (1) = 0.8800, R L (2) = a1 (2)R L (1) + a2 (2)R L (0) = 0.8480,
k3 = 1,
458
MULTISTATE SYSTEM MODELS
R L (3) =
3 v=1
av (3)R L (3 − v)
= a1 (3)R L (2) + a2 (3)R L (1) + a3 (3)R L (0) = 0.7556. 12.1.2 Circular Multistate Consecutively Connected Systems A circular MCCS works if and only if there exist at least two (partially or completely) working components that can reach each other directly or indirectly [107]. Based on this definition, Hwang and Yao [107] extend the algorithm presented by Shanthikumar [224] for a system with binary components to a system with multistate components. The resulting algorithm has a complexity of O(mn 2 ), where m is the number of common components within two consecutive minimal cut sets. Zuo and Liang [261] develop another algorithm with a complexity of O(K mn), where K = max0≤i≤n {ki } and m is the number of elements in a minimal cut set with the least number of elements. The algorithm uses the algorithm by Hwang and Yao [107] to compute the reliability of a linear MCCS subsystem. Notation •
• •
• • • • • •
f (i, j, l): probability that component i can reach component j directly or indirectly (i < j) and component l is the farthest component directly connected to one of the components i, i + 1, . . . , j ( j < l ≤ maxi≤u≤ j {u + ku }) Q i, j : probability that component i is unable to reach component j directly j−1 ( j > i), Q i, j = h=i pi,h , Q i, j = 1 for j > i + ki R(i, j): probability that component i can reach component j directly or indirectly, where there is at least one other component between component i and component j, i ≤ j S: cut set with consecutive components m: number of elements in set S C j : set of components excluding those in S that can reach component j directly l j : number of elements in set C j C j (t): tth element in set C j RC (n): reliability of a circular MCCS
Then, the following equations can be obtained from Hwang and Yao [107]: for i < l ≤ i + ki , pi,l l−1 l p j,h + p j,l f (i, j − 1, h) f (i, j, l) = f (i, j − 1, l) (12.5) h= j h= j for i < j < l ≤ max {u + ku }. i≤u≤ j
CONSECUTIVELY CONNECTED SYSTEMS
The probability that i can reach j directly or indirectly is R(i, j) = f (i, j, l),
459
(12.6)
l> j
which is the reliability of a linear MCCS structure with components i, i + 1, . . . , j. In a circular MCCS system, we assume that each component is failure prone, that is, component i (the source) is also failure prone. Whether component j can be reached by component i does not depend on whether component j works or not. Thus, the working status of the last component in the linear subsystem does not affect the evaluation of R(i, j). We also allow j to be greater than, less than, or equal to i with at least two different components included in the subsystem. For example, consider a circular system with seven components (n = 7). Here, R(4, 1) is the reliability of a linear multistate consecutively connected subsystem with components 4, 5, 6, 7, and 1; R(2, 2) is the reliability of a linear multistate consecutively connected subsystem with components 2, 3, 4, 5, 6, 7, 1, and 2. In the following discussions, we assume that component j is equivalent to component j − n whenever j is greater than n. The system reliability of a circular MCCS can be expressed as RC (n) = R(i, i) + × 90
v=1
j−1 .
j=i+1
u=i
l j t−1 . t=1
where
i+m−1
v=1
Q u, j
Q C j (v), j
1 − Q C j (t), j R( j, C j (t)) ,
(12.7)
Q C j (v), j ≡ 1.
Example 12.3 Consider a circular MCCS structure with 10 components as shown in Figure 12.3. Assume the following given data:
p1, j p3, j p5, j p7, j p9, j
k1 = 2,
k2 = 3,
k3 = 1,
k4 = 3,
k5 = 3,
k6 = 1,
k7 = 2,
k8 = 1,
k9 = 2,
k10 = 2,
= (0.10, 0.20, 0.70), = (0.02, 0.98), = (0.05, 0.15, 0.35, 0.45), = (0.06, 0.34, 0.60), = (0.11, 0.22, 0.67),
p2, j p4, j p6, j p8, j p10, j
= (0.02, 0.18, 0.30, 0.50), = (0.01, 0.19, 0.20, 0.60), = (0.21, 0.79), = (0.30, 0.70), = (0.10, 0.20, 0.70).
In the given data, p7, j = (0.06, 0.34, 0.60) indicates that p7,7 = 0.06, p7,8 = 0.34, and p7,9 = 0.60. We have used p7,7 to represent the probability that component
460
MULTISTATE SYSTEM MODELS
10
1
2
9
3
8
4 7
FIGURE 12.3
6
5
Example of circular MCCS structure.
7 has a transmitting capability of 0 and p7,9 is the probability that component 7 has a transmitting capability of 2 (9 − 7). If one starts with component 6, then S = {6, 7, 8}, m = 3, C7 = {5, 4}, and C8 = {5}. Components 6, 5, and 4 can all reach component 7 directly. However, only components 5 and 4 are in set C7 because component 6 is a member of S. The components in C 7 are ordered in counterclockwise order of their indices. According to equation (12.7), the system reliability can be expressed as RC (10) = R(6, 6) + Q 6,7 [(1 − Q 5,7 )R(7, 5) + Q 5,7 (1 − P4,7 )R(7, 4)] + Q 6,8 Q 7,8 (1 − Q 5,8 )R(8, 5), where Q 6,7 = p6,6 = 0.21, Q 5,7 = p5,5 + p5,6 = 0.20, Q 4,7 = p4,4 + p4,5 + p4,6 = 0.40, Q 6,8 = p6,6 + p6,7 = 1.00, Q 7,8 = p7,7 = 0.06, Q 5,8 = p5,5 + p5,6 + p5,7 = 0.55, and R(6, 6), R(7, 5), R(7, 4), and R(8, 5) can be calculated with equations (12.5) and (12.6): RC (10) ≈ 0.544986 + 0.21 × [(1 − 0.20) × 0.696684 + 0.20 × (1 − 0.4) × 0.700151] + 1 × 0.06 × (1 − 0.55) × 0.581955 ≈ 0.695385. If one starts with component 9, then S = {9} and m = 1. The system reliability is more easily calculated with equation (12.6) as RC (10) = R(9, 9) ≈ 0.695385. Malinowski and Preuss [153] provide another definition of the circular MCCS. This definition allows ki > n for 1 ≤ i ≤ n and enables a simpler algorithm to be
CONSECUTIVELY CONNECTED SYSTEMS
461
developed. It is equivalent to the definition by Hwang and Yao [107] used earlier in this chapter when ki < n for all i. Definition 12.1 (Malinowski and Preuss [153]) A circular MCCS works if, for every component j (working or not), 1 ≤ j ≤ n, there exists a (partially or completely) working component i that can reach component j directly. Additional Notation • •
Q i, j : probability that component i cannot reach component j or beyond directly, i.e., Q i, j = pi,i + pi,i+1 + · · · + pi, j−1 si : multicomponent signal course consisting of components i, i + 1, . . . , n (i)
•
k0 : maximum number of components beyond component n that can be reached directly by any component in si . In other words, si may be able to reach components 1, 2, . . . , k0(i) directly.
•
p0,k : probability that si is able to reach component k directly but not go beyond,
(i)
(i)
0 ≤ k ≤ k0 •
•
•
(1)
L i : linear MCCS with si as the source and components 1, 2, . . . , i − 1 as the intermediate components plus an imaginary sink right after component i − 1, where 2 ≤ i ≤ n (2) L i : linear MCCS with si as the source and components 1, 2, . . . , i − 2 as the intermediate components plus an imaginary sink right after component i − 2, where 3 ≤ i ≤ n (α) (α) R L (L i ): reliability of L i , where α = 1, 2
Again we use the assumption that component j is equivalent to component j − n when j > n. The following equations are provided by Malinowski and Preuss [153] for evaluation of the reliability of a circular MCCS with n components: (i)
k 0 = max {k j − (n − j)},
(12.8)
i≤ j≤n
(i)
p0,0 =
n .
Q j,n+1 ,
(12.9)
j=i (i)
p0,k =
n .
Q j,n+k+1 −
j=i
k−1
(i)
p0, j ,
(i)
1 ≤ k ≤ k0 ,
(12.10)
j=0 (1)
RC (n) = R L (L 1 ) −
n 2
3 (2) (1) R L (L i ) − R L (L i ) .
(12.11)
i=2
As defined earlier, each L iα represents a sublinear consecutively connected system. (α)
The reliability of such linear subsystems, R L (L i ) for 1 ≤ α ≤ 2, can be calculated with equations (12.3) and (12.4).
462
MULTISTATE SYSTEM MODELS
1
2
6
3
5 FIGURE 12.4
4
Circular multistate consecutively connected system.
Example 12.4 Consider the circular MCCS structure shown in Figure 12.4. The system has six components. The following data are summarized: k1 = k2 = k4 = 2,
n = 6, p1,1 p2,2 p3,3 p4,4 p5,5 p6,6 Q 1,2 Q 2,3 Q 3,4 Q 4,5 Q 5,6 Q 6,7
= 0.6, = 0.5, = 0.5, = 0.3, = 0.1, = 0.2, = 0.6, = 0.5, = 0.5, = 0.3, = 0.1, = 0.2,
p1,2 p2,3 p3,4 p4,5 p5,6 p6,7 Q 1,3 Q 2,4 Q 3, j Q 4,6 Q 5,7 Q 6, j
= 0.1, = 0.1, = 0.5, = 0.1, = 0.1, = 0.8, = 0.7, = 0.6, =1 = 0.4, = 0.2, =1
k3 = k6 = 1,
k5 = 3,
p1,3 = 0.3, p2,4 = 0.4, p4,6 = 0.6, p5,7 = 0.1, Q 1, j Q 2, j for j Q 4, j Q 5,8 for j
=1 =1 ≥ 5, =1 = 0.3, ≥ 8.
p5,8 = 0.7, for j ≥ 4, for j ≥ 5, for j ≥ 7, Q 5, j = 1
for j≥9,
(i)
First, we use equation (12.8) to find k0 for 1 ≤ i ≤ 6: (1)
k0 = max {k j − (n − j)} = max{−3, −2, −2, 0, 2, 1} = 2, 1≤ j≤6 (2)
(3)
(4)
(5)
k0 = k0 = k0 = k0 = 2,
(6)
k0 = 1.
With equations (12.9) and (12.10), we obtain (1)
p0,0 =
6 . j=1
(1)
p0,1 =
6 . j=1
Q j,n+1 =
6 .
Q j,7 = 0.0400,
j=1 (1)
Q j,n+1+1 − p0,0 =
6 . j=1
(1)
Q j,8 − p0,0 = 0.2600,
CONSECUTIVELY CONNECTED SYSTEMS (1)
p0,2 =
6 .
(1)
463
(1)
Q j,9 − p0,0 − p0,1 = 0.7000.
j=1
Similarly, we have found the following: (2)
(3)
(4)
(5)
p0,0 = 0.2,
(6)
(2)
(3)
(4)
(5)
p0,1 = 0.8,
(2)
(3)
(4)
(5)
p0,0 = p0,0 = p0,0 = p0,0 = 0.04,
(6)
p0,1 = p0,1 = p0,1 = p0,1 = 0.26, p0,2 = p0,2 = p0,2 = p0,2 = 0.7. (α)
Based on the definitions of L i
for 1 ≤ i ≤ n and α = 1, 2, we have
(1)
(2)
R L (L 2 ) = R L (L 3 ),
(1)
(2)
R L (L 4 ) = R L (L 5 ).
R L (L 1 ) = R L (L 2 ), R L (L 3 ) = R L (L 4 ),
(1)
(2)
(1)
(2)
With equation (12.11), the system reliability can be expressed as (1)
(2)
(1)
RC (6) = R L (L 5 ) − R L (L 6 ) + R L (L 6 ). (α)
To calculate R L (L i ) for α = 1, 2, we use equations (12.3) and (12.4). For example, when i = 3 and α = 2, the corresponding linear subsystem is shown in Figure 12.5. In Figure 12.5, the source s consists of components 3, 4, 5, and 6. (1) In the subsystem L 5 , we have n = 4, k3 = 1,
(1)
k0 = 2, k4 = 2,
k1 = 2, k2 = 2, K = max{ki } = 2.
Thus, we have a1 (4) = P4,5 = 1 − Q 4,5 = 0.7000, a2 (4) = Q 4,5 P3,5 = 0.0000, a1 (3) = P3,4 = 0.5000, a2 (3) = Q 3,4 P2,4 = 0.2000, a1 (2) = P2,3 = 0.5000,
s FIGURE 12.5
1
2
System representing L 3(2) .
464
MULTISTATE SYSTEM MODELS
a2 (2) = Q 2,3 P1,3 = 0.1500, a1 (1) = P1,2 = 0.4000, (1)
a2 (1) = Q 1,2 P0,2 = 0.4200, (1)
(1)
R L (0) = P0,1 = 1 − p0,0 = 0.9600, R L (1) = a1 (1)R L (0) + a2 (1) = 0.8040, R L (2) = a1 (2)R L (1) + a2 (2)R L (0) = 0.5460, R L (3) = a1 (3)R L (2) + a2 (3)R L (1) = 0.4338, R L (4) = a1 (4)R L (3) + a2 (4)R L (2) = 0.3037, R L (L (1) 5 ) = R L (4) = 0.3037. (2)
Similarly, in the system L 6 , we have n = 4, R L (1) = a1 (1)R L (0) = 0.3200, R L (2) = a1 (2)R L (1) + a2 (2)R L (0) = 0.2800, R L (3) = a1 (3)R L (2) + a2 (3)R L (1) = 0.2040, R L (4) = a1 (4)R L (3) + a2 (4)R L (2) = 0.1428, (2)
R L (L 6 ) = R(4) = 0.1428. (1)
In the subsystem L 6 , we have n = 5, (1) R L (L 6 )
= R L (5) = a1 (5)R L (4) + a2 (5)R L (3) = 0.1408.
Finally, the reliability of the system is (1)
(2)
(1)
RC (6) = R L (L 5 ) − R L (L 6 ) + R L (L 6 ) = 0.3016. 12.1.3 Tree-Structured Consecutively Connected Systems Malinowski and Preuss [156, 158] investigate tree-structured and reverse treestructured MCCSs. They provide algorithms for the reliability evaluation of these systems and indicate that these system structures are generalizations of the consecutively connected system structures proposed by Shanthikumar [224]. Figure 12.6 shows an example of the tree structure. A tree-structured consecutively connected system has one source node (also called the root node), one or more leaf nodes, and some intermediate nodes. The direction of communication is from the root node to the leaf nodes via some intermediate nodes. The root node is at level
CONSECUTIVELY CONNECTED SYSTEMS
Root Node x0
Level 0
x1
x3
x2
x4
x5
x8
x7
x9
x11
465
Level 1
x6
x10
Level 2
Level 3
Level 4
A Leaf Node FIGURE 12.6
Tree-structured consecutively connected system.
0 (refer to Figure 12.6). The nodes at level 1 are the child nodes of the root node. Suppose that there is a node called node A at level j; nodes C, D, and E are at level j + 1 and are directly connected to node A ( j ≥ 0). We say that node A is the parent node of nodes C, D, and E and the nodes C, D, and E are the child nodes of node A. The nodes that do not have any child nodes are called leaf nodes. It is assumed that the leaf nodes can only receive signals. However, the root node and the intermediate nodes may be in several possible states. A node may only send signals toward the leaf nodes. In addition, it can only send signals to those nodes that are directly or indirectly connected to it. For example, in Figure 12.6, x0 is the root node; x3 , x 6 , x 7 , x8 , x10 , and x11 are leaf nodes; x4 is the parent node of x7 and x8 ; and, x7 and x 8 are the child nodes of node x4 . A reverse tree-structured system is shown in Figure 12.7. In such a system, the root node can only receive signals. The leaf nodes are the sources of signals. We still say that the root node is at level 0. The direction of communication is from leaf nodes to the unique root node. We can also use the concepts of parent nodes and child nodes. However, the definitions should be modified. The nodes at level j + 1 that are directly connected to a node at level j are called the parent nodes of the node at level j. In the reverse tree-structured system, several parent nodes may have the same child node. Such a reverse tree-structured system is considered working if all leaf nodes can reach the root node directly or indirectly. In the following, we outline the algorithm for reliability evaluation of a treestructured system as proposed by Malinowski and Preuss [158].
466
MULTISTATE SYSTEM MODELS
A Leaf Node
x8
x7
x3
x9
Level 3
x5
x4
x6
x2
x1
x0
Level 2
Level 1
Level 0
Root Node FIGURE 12.7
Reverse tree-structured consecutively connected system.
Notation • •
• • • • •
• • •
•
T : tree structure <: used to order the nodes of T . We say x < y if x is located above y and there is a path between x and y in the structure. For example, x 2 < x 9 , but x1 < x11 in Figure 12.6. ≤: x ≤ y if and only if x < y or x = y λ(x): set consisting of x and all nodes above x, that is, λ(x) = {v | v ≤ x}. For example, λ(x 9 ) = {x0 , x2 , x5 , x9 } in Figure 12.6. π(x): parent node of x. For example, π(x5 ) = x 2 in Figure 12.6. If x is a root node, π(x) = ∅. H (x): set of child nodes of x. We know that y ∈ H (x) if and only if π(y) = x. For example, H (x5 ) = {x9 , x10 } in Figure 12.6. If x is a leaf node, H (x) = ∅. l(x): level of the node x. By definition, l(x) = | {y | y ≤ x} | − 1, where | A | denotes the cardinality of set A. For example, l(x0 ) = 0 and l(x8 ) = 3 in Figure 12.6. ψ(x): ψ(x) = {y | x ≤ y}. For example, ψ(x5 ) = {x5 , x9 , x10 , x11 } in Figure 12.6. T (x): subtree of T with x as the root node and ψ(x) as the set of nodes ξ : family of subsets of nodes of T . A ∈ ξ if and only if either of the following conditions holds: (1) A = ∅ or (2) if x ∈ A, y ∈ A, and x = y, then x ≤ y and y ≤ x. For example, {x1 , x5 , x6 } ∈ ξ , but {x 1 , x2 , x5 } ∈ ξ since x2 < x 5 in Figure 12.6. ξx : subfamily of ξ . A ∈ ξx if and only if both of the following conditions hold: (1) A ∈ ξ and (2) A ∈ ψ(x) \ x. ξx indicates all admissible ranges that
CONSECUTIVELY CONNECTED SYSTEMS
•
•
•
•
•
467
component x can reach. For example, A = {x9 , x10 , x 6 } ∈ ξx2 in Figure 12.6 if x 2 could at most reach x9 and x10 . γ (x): range of the component x, i.e., the distance to the farthest node that can be reached directly by a signal from x. By definition, y = γ (x) if and only if y receives directly a signal from x and no component located below y does. It is assumed that γ (x) = x for a failed component x. (x): range of the multicomponent signal source consisting of x and all the nodes above x. For example, in Figure 12.6, if γ (x 0 ) = {x 5 , x 8 } and γ (x 1 ) = {x3 , x4 }, then (x 1 ) = {x3 , x5 , x8 }. The signal from either x0 or x 1 reaches directly the components x2 , x3 , x4 , x5 , x8 . ⊕: two-argument operator defined as A1 ⊕ A2 = (A1 ∪ A2 ) \ (B1 ∪ B2 ), where A1 , A2 ∈ ξ and B1 = {x | x ∈ A1 , x < y for at least one y, y ∈ A2 }, B2 = {x | x ∈ A2 , x < y for at least one y, y ∈ A1 } E x : event that each component belonging to ψ(x) receives, directly or indirectly, a signal from λ(π(x)). It is assumed that E x is a sure event if x is the root node. R: system reliability, i.e., the probability that each leaf node receives, directly or indirectly, a signal from the root node
Based on the definition of the system and the notation defined earlier, we have the following expression of the system reliability: 7 R = Pr Ey , (12.12) y∈H (v0 )
where v0 is the root node. In equation (12.12), all events E y are not independent since all child nodes of v0 receive a signal from the common source v0 . However, when the range of the root node is fixed, E y ’s are conditionally independent. Thus, we can rewrite equation (12.12) as . Pr(E y | γ (v0 ) = A) Pr(γ (v0 ) = A). (12.13) R= A∈ξv0
y∈H (v0 )
When a tree-structured system has only two levels, we can use equation (12.13) for the system reliability evaluation directly. However, when the system has more than two levels, the probability Pr(γ (v0 )) = A may be equal to zero. In this case, an algorithm is proposed by Malinowski and 9 Preuss [158]. The following equations are needed for calculating the probability y∈H (v) Pr(E y | γ (v0 ) = A). When x is a leaf node and v = π(x), we have 1 if x ∈ A, (12.14) Pr(E x | (v) = A) = 0 if x ∈ / A.
468
MULTISTATE SYSTEM MODELS
Equation (12.14) is obvious. When x is neither a root node nor a leaf node, v is the parent node of x, and A ∈ ξφ(v) , we have Pr(E x | (v) = A) 0 if A ∩ ψ(x) = ∅, .
= Pr(E y | (x) = A ∩ ψ(x) ⊕ B) Pr(γ (x) = B) B∈ξx y∈H (x) if A ∩ ψ(x) = ∅. (12.15) Equation (12.15) evaluates the reliability of a subtree wherein the parent node of x is the root node. For example, in Figure 12.6, let x = x5 ; then v = π(x) = x 2 , φ(v) = {x 0 , x2 }, and ψ(x5 ) = {x5 , x9 , x10 , x11 }. The term Pr(E x5 | (v) = A) is the probability that nodes x5 , x9 , x10 , and x 11 can receive a signal from x2 given the ranges of x0 and x2 are a set of A. If A ∩ ψ(x) = ∅, equation (12.15) is obvious since x does not receive a signal from φ(v). If A ∩ψ(x) = ∅, it means that node x receives a signal from φ(x) with probability 1. The left-hand side of equation (12.15) is equal to 7 E y | (v) = A . Pr y∈H (x)
The event E x is then decomposed into a series of events E y , where node y is one of the child nodes of x. Indeed, for a fixed A and if B ∈ ξx , then Pr(E y | (x) = (A ∩ ψ(x)) ⊕ B) = Pr((v) ∩ ψ(x) = A ∩ ψ(x)) Pr(γ (x) = B). By starting from the level of the leaf nodes, one continues to use the formulas until reaching the level of the root node and finding the system reliability. Algorithm (Malinowski and Preuss [158]) Step 0: Given a tree structure, the root node x 0 , and Pr(γ (xi ) = {y1 , y2 , . . . , yr }) for each node xi and all possible nodes y j ( j = 1, . . . , r ) that xi may be able to reach directly. Step 1: Assign L = max{l(x)}, that is, the system has L + 1 levels indicated by l = 0, 1, . . . , L. Step 2: on the level l, evaluate all probabilities
For each component x located Pr E π(x) | (x) ∩ ψ(x) = A , where A ∈ ξ and A ⊆ ψ(x). If x is a leaf node, use equation (12.14). If x is a nonleaf node, use equation (12.15).
CONSECUTIVELY CONNECTED SYSTEMS
469
Step 3: If l = 0, let l = l − 1; then go to step 2. Step 4: Evaluate the system reliability using equation (12.13). Example 12.5 Consider a tree-structured system shown in Figure 12.6. Let the nonleaf nodes have the following probabilities: Pr(γ (x 0 ) = ∅) = 0.10, Pr(γ (x 0 ) = {x 2 , x3 , x4 }) = 0.30, Pr(γ (x1 ) = ∅) = 0.10, Pr(γ (x 1 ) = {x 7 , x8 }) = 0.40, Pr(γ (x2 ) = ∅) = 0.15, Pr(γ (x 1 ) = {x 9 , x10 }) = 0.20, Pr(γ (x4 ) = ∅) = 0.10, Pr(γ (x 5 ) = ∅) = 0.05, Pr(γ (x1 ) = {x 9 , x10 }) = 0.70, Pr(γ (x9 ) = ∅) = 0.20,
Pr(γ (x 0 ) = {x1 , x2 }) = 0.30, Pr(γ (x 0 ) = {x1 , x5 , x6 }) = 0.30, Pr(γ (x 1 ) = {x3 , x4 }) = 0.50, Pr(γ (x 2 ) = {x5 , x6 }) = 0.65, Pr(γ (x 4 ) = {x7 , x8 }) = 0.90, Pr(γ (x 5 ) = {x11 }) = 0.25, Pr(γ (x 9 ) = {x11 }) = 0.80.
If x ∈ {x7 , x8 , x10 , x 11 }, that is, x is a leaf node, equation (12.14) yields Pr (E x | (π(x)) ∩ ψ(x) = A) = 1. If x is a nonleaf node, we have to find all possible subsets, A, first based on the range of each node. The following are the pairs of {x, A} for evaluating the probability Pr(E x | (π(x)) ∩ ψ(x) = A): x = x1 : x = x2 : x = x4 : x = x5 : x = x9 :
A = {x1 },
Pr(E x | (π(x)) ∩ ψ(x) = A) = 0.4500,
A = {x3 , x4 },
Pr(E x | (π(x)) ∩ ψ(x) = A) = 0.9400,
A = {x2 },
Pr(E x | (π(x)) ∩ ψ(x) = A) = 0.3640,
A = {x5 , x6 },
Pr(E x | (π(x)) ∩ ψ(x) = A) = 0.6810,
A = {x4 },
Pr(E x | (π(x)) ∩ ψ(x) = A) = 0.9000,
A = {x7 , x8 },
Pr(E x | (π(x)) ∩ ψ(x) = A) = 1.0000,
A = {x5 },
Pr(E x | (π(x)) ∩ ψ(x) = A) = 0.5600,
A = {x9 , x10 },
Pr(E x | (π(x)) ∩ ψ(x) = A) = 0.8500,
A = {x9 },
Pr(E x | (π(x)) ∩ ψ(x) = A) = 0.8000,
A = {x11 },
Pr(E x | (π(x)) ∩ ψ(x) = A) = 1.0000.
Other pairs of {x, A} are irrelevant to the system reliability since Pr(E x | (π(x)) ∩ ψ(x) = A) = 0. In the following, use x = x1 and A = {x 3 , x 4 } to show how to apply equation (12.15) to evaluate Pr(E x | (π(x)) ∩ ψ(x) = A):
470
MULTISTATE SYSTEM MODELS
Pr( E x | (π(x)) ∩ ψ(x) = A) = Pr(E x1 | (π(x 1 )) ∩ ψ(x 1 ) = {x 3 , x4 }) = Pr(E x3 | (x 1 ) ∩ ψ(x 3 ) = {x3 }) Pr(E x4 | (x 1 ) ∩ ψ(x 4 ) = {x4 }) × Pr(γ (x1 ) = φ) + Pr(E x3 | (x1 ) ∩ ψ(x 3 ) = {x3 }) × Pr(E x4 | (x1 ) ∩ ψ(x4 ) = {x4 }) Pr(γ (x 1 ) = {x3 , x4 }) + Pr(E x3 | (x 1 ) ∩ ψ(x 3 ) = {x 3 }) Pr(E x4 | (x 1 ) ∩ ψ(x 4 ) = {x7 , x8 }) × Pr(γ (x 1 ) = {x7 , x8 }) = 0.9400. Finally, the system reliability is equal to R = Pr(E x1 | (π(x 0 )) ∩ ψ(x1 ) = {x1 }) × Pr(E x2 | (π(x0 )) ∩ ψ(x2 ) = {x2 }) Pr(γ (x0 ) = ∅) + Pr(E x1 | (π(x0 )) ∩ ψ(x1 ) = {x1 }) × Pr(E x2 | (π(x0 )) ∩ ψ(x2 ) = {x5 , x6 }) Pr(γ (x0 ) = {x2 , x5 , x6 }) + Pr(E x1 | (π(x0 )) ∩ ψ(x1 ) = {x3 , x4 }) × Pr(E x2 | (π(x0 )) ∩ ψ(x2 ) = {x2 }) Pr(γ (x0 ) = {x 2 , x3 , x4 }) = 0.2352. 12.2 TWO-WAY CONSECUTIVELY CONNECTED SYSTEMS Hwang and Yao [107] argue that in some applications wherein the components are relay stations, communication between the source and the sink has to go both ways. Essentially, they say that the relay range of each station depends not only on the distance but also on the direction. For the Con/k/n:F system, the source can reach the sink if and only if the sink can reach the source. Thus, two-way reliability is the same as one-way reliability for a Con/k/n:F system. However, this is not necessarily the case for a consecutively connected system. If forward communication is independent of backward communication, then, the two-way reliability is simply the product of the two one-way system reliabilities. When the forward and backward communications are dependent, separate algorithms are needed for evaluation of the two-way reliability of consecutively connected systems. Malinowski and Preuss [157, 159] define two classes of two-way communication systems: two-way linear consecutively connected systems (LCCSs) and twoway circular consecutively connected systems (CCCSs) in which the components are assumed to be multistate. In a two-way LCCS, the system consists of n + 2 components labeled 0, 1, . . . , n + 1. All components are capable of receiving signals. Component i is in state (h, k) if it can send signals directly to h consecutive
TWO-WAY CONSECUTIVELY CONNECTED SYSTEMS
0
1
FIGURE 12.8
2
3
471
4
Two-way multistate LCCS.
components immediately preceding it and k components immediately following it, 0 ≤ h ≤ i and 0 ≤ k ≤ n − i + 1. Component 0 can only send signals to components following it while component n + 1 can only send signals to the components preceding it. The probability that component i is in state (h, k) is given for each admissible h and k. The states of different components are mutually independent. The system is considered working if the signal from component 0 can reach component n + 1 and the signal from component n + 1 can reach component 0. Figure 12.8 shows a two-way LCCS with five components, where component 0 is the source (or sink) and component 4 is the sink (or source). In a two-way CCCS, the system consists of n cyclically ordered components 1, 2, . . . , n, that is, component i + 1 succeeds component i, i ∈ {1, 2, . . . , n − 1} and component 1 succeeds component n (Figure 12.9). Each component is capable of sending a signal in both left and right directions. All components operate independently. The system is functioning if each component can be reached by at least one other component. In the following paragraph, we discuss the algorithm for two-way LCCS reliability evaluation proposed by Malinowski and Preuss [157]. Regarding the algorithm for two-way CCCS, readers may refer directly to Malinowski and Preuss [159]. Notation (h, k): state of a component, h ∈ [0, i] and k ∈ [0, n + 1 − i] for component i
i i-1
i+1
1
n-1
...
...
•
n FIGURE 12.9
Two-way multistate CCCS.
472 • • • • • • •
•
•
• •
MULTISTATE SYSTEM MODELS
γr (i): rightward range of component i, that is, the number of components following component i that component i can reach directly, 0 ≤ γr (i) ≤ n + 1 − i γl (i): leftward range of component i, that is, the number of components preceding component i that component i can reach directly, 0 ≤ γl (i) ≤ i h i , ki : maximum leftward and rightward ranges of component i, respectively, that is, Pr(γl (i) ≤ h i ) = Pr(γr (i) ≤ ki ) = 1 Hi : maximum leftward range of the multicomponent signal source {i, i +1, . . . , n + 1}, Hi = maxi≤ j≤n+1 {i − ( j − h j )} K i : maximum rightward range of the multicomponent signal source {0, 1, . . . , i}, K i = max0≤ j≤i {(k j + j) − i} pi (h, k): Pr(γl (i) = h, γr (i) = k), 0 ≤ h ≤ h i , 0 ≤ k ≤ ki (i)
Ak : event that component i is reached by component 0 directly or indirectly and component i + k is the rightmost component that is reached directly by the multicomponent signal source {0, 1, . . . , i} B (i) : event that the signal from the multicomponent signal source {i + 1, e + 2, . . . , n + 1} reaches component i and is transmitted directly or indirectly to component 0 (i)
E h : event that component i − h is the leftmost component that can be reached directly by a signal from the multicomponent signal source {i + 1, i + 2, . . . , n + 1} (i)
(i)
(i)
X h,k : conditional probability Pr Ak ∩ B (i) | E h k ≤ Ki , 0 ≤ i ≤ n R: reliability of the two-way LCCS
, 0 ≤ h ≤ Hi+1 − 1, 1 ≤
In the following algorithm, we pick component n + 1 for consideration first. By (i) the definition of X h,k , the system reliability is equal to R=
Hn+1 −1
(n)
X h,1 pn+1 (h + 1, 0).
(12.16)
h=0
If component 0 is considered first, we will get a similar formula. In equation (n) (12.16), X h,1 can be computed recursively using the following equations: (0)
X 0,k = p0 (0, k) for 1 ≤ k ≤ K 0 , min{k+1,K hi i−1 } (i−1) (i) pi (ν, k) = X X h,k
ν=λ(h)
+
hi ν=λ(h)
µ=1
X (i−1)
max{h−1,ν−1},k+1
(12.17)
max{h−1,ν−1},µ
min{k−1,k i}
pi (ν, µ)
µ=0
for 1 ≤ k ≤ K i , 0 ≤ h ≤ Hi+1 − 1, and 1 ≤ i ≤ n,
(12.18)
473
TWO-WAY CONSECUTIVELY CONNECTED SYSTEMS
where
1, λ(h) = 0,
h = 0, h ≥ 1.
(12.19)
The following should be noted in using equation (12.18): pi (ν, k) = 0
and the first summation is then zero,
for k > ki
(12.20) (i−1)
X max{h−1,ν−1},k+1 = 0
for k > K i−1 − 1
and the second summation is then zero. (12.21)
Algorithm (Malinowski and Preuss [157]) Step 0: Given ki , h i , and pi (h, k), 0 ≤ h ≤ h i , 0 ≤ k ≤ ki , 0 ≤ i ≤ n + 1. Step 1: Calculate K i and Hi : K 0 = k0 ,
K i = max{ki , K i−1 − 1}
for 1 ≤ i ≤ n + 1, (12.22)
Hn+1 = h n+1 ,
Hi = max{h i , Hi+1 − 1}
for 0 ≤ i ≤ n.
(12.23) (n)
Step 2: Define λ(h) = 1 if h = 0 and λ(h) = 0 for h ≥ 1. Calculate X h,1 for 0 ≤ h ≤ Hn+1 − 1 using equation (12.18). Step 3: Calculate the system reliability using equation (12.16). Example 12.6 Consider a four-component two-way LCCS. Let p0 (0, 0) = 0.1, p1 (0, 0) = 0.1, p2 (0, 0) = 0.1, p3 (0, 0) = 0.1,
p0 (0, 1) = 0.9, p1 (1, 1) = 0.5, p2 (1, 1) = 0.5, p3 (1, 0) = 0.9.
p1 (0, 2) = 0.4, p2 (2, 0) = 0.4,
First, we calculate all K i and Hi as follows: K 0 = 1,
K 1 = 2,
K 2 = 1,
K 3 = 0,
H0 = 0,
H1 = 1,
H2 = 1,
H3 = 1.
(i)
Then we calculate X h,k : (0)
X 0,1 = p0 (0, 1) = 0.9000, (1)
(0)
X 0,1 = p1 (1, 1)X 0,1 = 0.4500,
474
MULTISTATE SYSTEM MODELS (1)
X 0,2 = 0, (1)
X 1,1 = (1)
X 1,2 = (2)
X 0,1 =
1 ν=0 1 ν=0 2
(0)
p1 (ν, 1)X 0,1 = 0.4500, (0)
p1 (ν, 2)X 0,1 = 0.3600, p2 (ν, 1)
ν=1
2 µ=1
(1)
X ν−1,µ +
2 ν=1
X (1)
ν−1,2
1
p2 (ν, µ)
µ=0
(1) (1) (1) (1) = p2 (1, 1) X 0,1 + X 0,2 + X 0,2 p2 (1, 1) + X 1,2 p2 (2, 0) = 0.3690, (2)
R = X 0,1 p3 (1, 0) = 0.3321.
12.3 KEY CONCEPTS IN MULTISTATE RELIABILITY THEORY In a binary reliability system, the definition domains of the states of the system and its components are {0, 1}. A natural extension of this definition domain to the multistate context is to let the definition domain of the state of component i be {0, 1, . . . , Mi }, where Mi ≥ 1 for 1 ≤ i ≤ n. Similarly the definition domain of the state of the system may be chosen to be {0, 1, . . . , Ms }, where Ms ≥ 1. Under these extensions, the state of the component or of the system is a discrete random variable. Correspondingly, such multistate system reliability models are discrete models. If we let Mi be the perfect state of component i for 1 ≤ i ≤ n and Ms the perfect state of the system, we are allowing each component and the system to have a different number of possible states. This certainly provides the maximum flexibility for modeling a multistate system. However, it brings about complexity in reliability analysis. As a result, most researchers have concentrated on the study of multistate systems wherein each component and the system have the same number of possible states. In other words, the state space of each component and the system is taken to be {0, 1, . . . , M}, where M is a positive integer. Such a discrete multistate system reliability model has received the most attention in the literature of multistate reliability research. When M = 1, the discrete multistate system reliability model reduces to the binary system reliability model. As stated, state M is the perfect state; state 0 is the complete failure state; and states j for 1 ≤ j < M are degraded states between state 0 and state M. This means that a lower state level represents a lower performance level of the component or the system. There are also continuous-state system reliability models wherein the state of each component or the system is treated as a continuous random variable with a definition domain as the closed interval [0, 1]. In this book we limit ourselves to discrete models.
KEY CONCEPTS IN MULTISTATE RELIABILITY THEORY
475
Assumptions 1. The state space of each component and the system is {0, 1, 2, . . . , M}. 2. The states of all components are independent random variables. 3. The state of the system is completely determined by the states of the components. 4. A lower state level represents a worse or equal performance of the component or the system. Notation • • • •
xi : state of component i, xi = j if component i is in state j, 0 ≤ j ≤ M x: an n-dimensional vector representing the states of all components, x = (x 1 , x2 , . . . , x n ) φ(x): state of the system, which is also called the structure function of the system, 0 ≤ φ(x) ≤ M ( ji , x): vector x whose ith argument is set equal to j, where 0 ≤ j ≤ M and 1≤i ≤n
As given in Definition 4.2 for binary systems, a component is said to be irrelevant to the performance of a system if the state of the system is not at all affected by the state of this component. Otherwise, the component is said to be relevant. In other words, the state of the system remains the same when the state of an irrelevant component changes from state 0 to state 1. If there exists a component state vector x such that the state of the system is dictated by the state of a component, we say that this component is relevant. Since each component may only be in two possible states in a binary system, any change in the state of a component involves both possible states. Consider a binary series system with n components as an example. When components 1, 2, . . . , n − 1 are all in state 1 (the working state), the state of the system is dictated by component n. Thus, we conclude that component n is a relevant component in a binary series system. In a multistate system, a component may be in M + 1 possible different states. A change for the better in the state of a component may be from j to j + 1 for 1 ≤ j ≤ M − 1, or from j to l for 0 ≤ j < l ≤ M, or simply from 0 to M. Depending on how the state of the system changes when such changes in the state of a component occur, there are different degrees of relevancy of a component to the performance of the multistate system. Definition 12.2 Component i is said to be strongly relevant to a multistate system with structure function φ if for every level j of component i there exists a vector (·i , x) such that φ( ji , x) = j and φ(li , x) = j for l = j and 0 ≤ j ≤ M, where 1 ≤ i ≤ n. Definition 12.2 is based on El-Neweihi et al. [71]. It says that if component i is strongly relevant, then there must exist a component state vector x such that the state
476
MULTISTATE SYSTEM MODELS
of the system is exactly equal to the state of component i. The following example illustrates this concept. Example 12.7 Consider a multistate system whose structure function is given by φ(x) = max{x 1 , x2 , . . . , x n }. In such a multistate system, the state of the best component dictates the state of the system. When x = (0, 0, . . . , 0, xn ), the state of component n completely dictates the state of the system. Based on Definition 12.2, we say that component n is strongly relevant to the system. Similarly, we can verify that every component is strongly relevant in this system. Definition 12.3 Component i is said to be relevant to a multistate system with structure function φ if for every level j (1 ≤ j ≤ M) of component i there exists a vector (·i , x) such that φ(( j − 1)i , x) < φ( ji , x), where 1 ≤ i ≤ n. Definition 12.3 is based on Griffith [86]. Based on this definition, a change in the state of a relevant component is able to change the state of system. However, the state of this component does not necessarily dictate the state of the system completely. The following example illustrates this definition. Example 12.8 Consider a multistate system with the structure function ( ) φ(x) = 12 (x1 + x2 ) , where x1 , x2 , and φ have definition domain {0, 1, 2, 3} and x denotes the largest integer less than equal to x. Another way of representing the structure function of such a system is as follows: φ(x)
x
0
1
2
3
(0, 0) (0, 1) (1, 0)
(1, 1) (0, 2) (0, 3) (2, 0) (3, 0) (1, 2) (2, 1)
(2, 2) (1, 3) (2, 3) (3, 1) (3, 2)
(3, 3)
When component 2 is in state 3, the system state decreases from 2 to 1 when the state of component 1 decreases from 1 to 0. It can be seen that under the condition that component 2 is in state 3, when component 1 changes its state from 1 to 0, the system state does change. However, the change in the system state is from 2 to 1 instead of from 1 to 0. When component 2 is in state 0, the system state changes from
KEY CONCEPTS IN MULTISTATE RELIABILITY THEORY
477
1 to 0 when component 1 changes its state from 2 to 1. When component 2 is in state 3, the system state changes from 3 to 2 when component 1 changes its state from 3 to 2. This shows that under certain conditions a change in the state of component 2 will cause the system state to change. Thus, we conclude that component 1 is relevant to the system structure based on Definition 12.3. Similarly, we can verify that component 2 is relevant to the system structure. Note that neither component is strongly relevant to the system structure because neither component can completely dictate the state of the system. Definition 12.4 Component i is said to be weakly relevant to a multistate system with structure function φ if there exists a vector (·i , x) such that φ(0i , x) < φ(Mi , x), where 1 ≤ i ≤ n. Definition 12.4 is based on Griffith [86]. For a weakly relevant component i, we only require that φ( ji , x) > φ(( j −1)i , x) for at least one j value, where 1 ≤ j ≤ M. The following example illustrates such a component. Example 12.9 Consider a two-component multistate system with the following structure function: φ(x)
x
0
1
2
3
(0, 0) (1, 0) (2, 0) (3, 0)
(0, 1) (1, 1) (2, 1) (3, 1) (1, 2)
(0, 2) (2, 2) (3, 2)
(0, 3) (1, 3) (2, 3) (3, 3)
From the given structure function, we can see the following: 1. When component 2 is in state 0, the system is in state 0 no matter what state component 1 is in. 2. When component 2 is in state 1, the system is in state 1 no matter what state component 1 is in. 3. When component 2 is in state 3, the system is in state 3 no matter what state component 1 is in. 4. When component 2 is in state 2, the system state decreases only when component 1 degrades from state 2 to state 1. Based on Definition 12.4, we conclude that component 1 is weakly relevant to the system structure. Comparing the definitions of strongly relevant, relevant, and weakly relevant, we have the following interpretations and observations:
478
MULTISTATE SYSTEM MODELS
1. Given certain states of other components, the state of the system is equal to the state of a strongly relevant component. 2. Given certain states of other components, every state of a relevant component has nontrivial influence on the state of the system. In other words, the state of the system changes as the state of this relevant component changes. However, the state of the system does not have to be equal to the state of this component. 3. Given certain states of other components, a weakly relevant component has nontrivial influence on the state of the system. We do not require every state of the component to have such an influence. We only require that at least one state of such a component has this influence. 4. A strongly relevant component automatically satisfies the requirements for relevant and weakly relevant components. A relevant component automatically satisfies the requirements for a weakly relevant component. 5. A weakly relevant component may also be a relevant component or a strongly relevant component. A relevant component may also be a strongly relevant component. We have defined the coherent system in the binary context in Definition 4.3. In Definition 4.3, there are two conditions for a system to be a coherent system: One is that the system structure function must be monotonically increasing in each argument and the other is that every component must be relevant. In a multistate system, because there may be different degrees of relevancy for each component, there are different degrees of coherency. First we will introduce the concept of the multistate monotone system based on Griffith [86]. Definition 12.5 Let φ be a function with domain {0, 1, 2, . . . , M}n and range {0, 1, 2, . . . , M}, where M and n are positive integers. The function φ represents a multistate monotone system (MMS) if it satisfies 1. φ(x) is increasing in x ≥ 0 and 2. min1≤i≤n xi ≤ φ(x) ≤ max1≤i≤n xi , or equivalently, φ(x) = j for 0 ≤ j ≤ M. Based on Definition 12.5, the improvement of any component (or increase in the state of any component) does not degrade the performance of the system. A higher state level indicates a higher level of performance. In addition, when all components are in the same state, the system is also in this same state. For example, when all components are completely failed, the system must be completely failed, and when all components are working perfectly, the system must be working perfectly. Definition 12.6 A multistate monotone system with structure function φ(x) is strongly coherent, coherent, and weakly coherent if and only if every component is strongly relevant, relevant, and weakly relevant, respectively.
KEY CONCEPTS IN MULTISTATE RELIABILITY THEORY
479
For other relevancy conditions in definitions of coherent systems, readers are referred to reviews provided by Andrzejczak [12] and Huang [94]. There are two different definitions of minimal path vectors and minimal cut vectors in the multistate context. One is given in terms that results in the system being exactly in a certain state while the other is given in terms that results in the system being in or above a certain state. We present both in the following. Definition 12.7 (El-Neweihi et al. [71]) A vector x is a connection vector to level j if φ(x) = j. A vector x is a lower critical connection vector to level j if φ(x) = j and φ(y) < j for all y < x. A vector x is an upper critical connection vector to level j if φ(x) = j and φ(y) > j for all y > x. A connection vector to level j defined here is a path vector that results in a system state of level j. If these vectors are known, one can use them in evaluation of the probability distribution of the system state. Definition 12.8 (Natvig [173]) A vector x is called a minimal path vector to level j if φ(x) ≥ j and φ(y) < j for all y < x. A vector x is called a minimal cut vector to level j if φ(x) < j and φ(y) ≥ j for all y ≥ x. In this definition, both minimal path vectors and minimal cut vectors are defined in terms of whether they result in a system state “equal to or greater than j” or not. A minimal path vector to level j based on Definition 12.8 may or may not be a connection vector to level j defined in Definition 12.7 as it may result in a system state higher than j. A minimal cut vector to level j may or may not be a connection vector to level j − 1 as it may result in a system state below j − 1. We find that the minimal path vectors and minimal cut vectors given in Definition 12.8 are easier to use than the connection vectors given in Definition 12.7. Definition 12.9 (Xue [249]) Let φ(x) be the structure function of a multistate system. The structure function of its dual system, φ D (x), is defined to be φ D (x) = M − φ(M − x), where M − x = (M − x1 , M − x 2 , . . . , M − x n ). Based on this definition of duality, the following two results are immediate: 1. (φ D (x)) D = φ(x). 2. Vector x is an upper critical connecting vector to level j of φ if and only if M − x is a lower critical connection vector to level M − j of φ D . Vector x is a lower critical connecting vector to level j of φ if and only if M − x is an upper critical connection vector to level M − j of φ D .
480
MULTISTATE SYSTEM MODELS
12.4 SPECIAL MULTISTATE SYSTEMS AND THEIR PERFORMANCE EVALUATION Reliability is the most widely used performance measure of a binary system. It is the probability that the system is in state 1. Once the reliability of a system is given, we also know the probability that the system is in state 0. Thus, the reliability uniquely defines the distribution of a binary system in different states. In a multistate system, it is often assumed that the state distribution, that is, the distribution of each component in different states, is given. The performance of the system is represented by its state distribution. Thus, the most important performance measure in a multistate system is its state distribution. A state distribution may be given in terms of a probability distribution function (pdf), a cumulative distribution function (CDF), or a reliability function. Notation • • • • • • • • •
pi, j : probability that component i is in state j, 1 ≤ i ≤ n, 0 ≤ j ≤ M p j : probability that a component is in state j when all components are i.i.d. Pi, j : probability that component i is in state j or above P j : probability that a component is in state j or above when all components are i.i.d. Q i, j : 1 − Pi, j , probability that component i is in a state below j Q j : 1 − Pj Rs, j : Pr(φ(x) ≥ j) Q s, j : 1 − Rs, j rs, j : Pr(φ(x) = j)
Based on the defined notation, we have the following facts: Pi,0 =
M
pi, j = 1,
1 ≤ i ≤ n,
j=0
pi,M = Pi,M , pi, j = Pi, j − Pi, j+1 , Rs,0 = 1,
Rs,M = rs,M ,
1 ≤ i ≤ n,
rs, j = Rs, j − Rs, j+1 ,
0 ≤ j ≤ M − 1, 0 ≤ j ≤ M − 1.
12.4.1 Simple Multistate k-out-of-n:G Model Notation •
R(k, n; j): probability that the k-out-of-n:G system is in state j or above
Huang et al. [98] provide a review of the multistate k-out-of-n models. In this section, we consider the simple multistate k-out-of-n models. The multistate series and parallel systems are treated as special cases of the simple multistate k-out-of-
481
SPECIAL MULTISTATE SYSTEMS AND THEIR PERFORMANCE EVALUATION
n:G system. The state of a multistate series system is equal to the state of the worst component in the system [71], that is, φ(x) = min xi . 1≤i≤n
The state of a multistate parallel system is equal to the state of the best component in the system [71], that is, φ(x) = max xi . 1≤i≤n
These definitions are natural extensions from the binary case to the multistate case. System performance evaluation for the defined multistate parallel and multistate series systems are straightforward: Rs, j =
n .
Pi, j ,
1≤ j ≤M
for a series system,
(12.24)
Q i, j ,
1≤ j ≤M
for a parallel system.
(12.25)
i=1
Q s, j =
n . i=1
A simple extension of the binary k-out-of-n:G system results in the definition of a simple multistate k-out-of-n:G system. We call it simple because there are more general models of the k-out-of-n:G systems, to be discussed later. Definition 12.10 (El-Neweihi [71]) A system is a simple multistate k-out-of-n:G system if its structure function satisfies φ(x) = x (n−k+1) , where x (1) ≤ x (2) ≤ · · · ≤ x(n) is a nondecreasing arrangement of x1 , x2 , . . . , x n . Based on this definition, a k-out-of-n:G system has the following properties: 1. The multistate series and multistate parallel systems satisfy Definition 12.10. 2. The state of the system is determined by the worst state of the best k components. n 3. Each state j for 1 ≤ j ≤ M has nk minimal path vectors and k−1 minimal cut vectors. The numbers of minimal path and minimal cut vectors are the same for each state. We can say that the defined k-out-of-n:G system has the same structure at each system state j for 1 ≤ j ≤ M. In other words, the system is in state j or above if and only if at least k components are in state j or above for each j (1 ≤ j ≤ M). Because of this property, Boedigheimer and Kapur [35] actually define the k-out-of-n:G system as one that has nk minimal path n minimal cut vectors to system state j for 1 ≤ j ≤ M. vectors and k−1
482
MULTISTATE SYSTEM MODELS
System performance evaluation for the simple k-out-of-n:G system is straightforward. For any specified system state j, we can use the system reliability evaluation algorithms for a binary k-out-of-n:G system to evaluate the probability that the multistate system is in state j or above. For example, equation (7.26) and its boundary condition can be used as follows for 1 ≤ j ≤ M: R(k, n; j) = Pn, j R(k − 1, n − 1; j) + Q n, j R(k, n − 1; j), (12.26) R(0, n; j) = 1,
(12.27)
R(n + 1, n; j) = 0.
(12.28)
Example 12.10 Derive the state distribution of a simple multistate 2-out-of-4:G system with the following i.i.d. component state distribution (M = 3): Pi,0 = 1,
Pi,1 = 0.9,
Pi,2 = 0.7,
Pi,3 = 0.4
for i = 1, 2, 3, 4.
From the given data, we know the following: pi,0 = 0.1,
pi,1 = 0.2,
pi,2 = 0.3,
pi,3 = 0.4
for i = 1, 2, 3, 4.
Using equations (12.26)–(12.28) for j = 1, 2, 3, respectively, we have R(1, 1; 1) = 0.9000,
R(1, 2; 1) = 0.9900,
R(1, 3; 1) = 0.9990,
R(2, 2; 1) = 0.8100,
R(2, 3; 1) = 0.9720,
R(2, 4; 1) = 0.9963,
R(1, 1; 2) = 0.7000,
R(1, 2; 2) = 0.9100,
R(1, 3; 2) = 0.9730,
R(2, 2; 2) = 0.4900,
R(2, 3; 2) = 0.7840,
R(2, 4; 2) = 0.9163,
R(1, 1; 3) = 0.4000,
R(1, 2; 3) = 0.6400,
R(1, 3; 3) = 0.7840,
R(2, 2; 3) = 0.1600,
R(2, 3; 3) = 0.3520,
R(2, 4; 3) = 0.5248.
The calculated R(2, 4; 1), R(2, 4; 2), and R(2, 4; 3) represent the probabilities that the system is in state 1 or above, state 2 or above, and state 3 (or above), respectively, that is, Pr(φ ≥ 1) = 0.9963,
Pr(φ ≥ 2) = 0.9163,
Pr(φ ≥ 3) = 0.5248,
Pr(φ = 0) = 0.0037,
Pr(φ = 1) = 0.08,
Pr(φ = 2) = 0.3915,
Pr(φ = 3) = 0.5248. 12.4.2 Generalized Multistate k-out-of-n:G Model The simple multistate k-out-of-n:G system model imposes the limit that the system structure at each system level must be the same. In practical situations, a multistate system may have different structures at different system levels. For example, consider a three-component system with four possible states. The system could be a 1-out-of-
SPECIAL MULTISTATE SYSTEMS AND THEIR PERFORMANCE EVALUATION
483
3:G structure at level 1; in other words, it requires at least one component to be in state 1 or above for the system to be in state 1 or above. It may have a 2-out-of3:G structure at level 2; in other words, for the system to be in state 2 or above, at least two components must be in state 2 or above. It may have a 3-out-of-3:G structure at level 3; namely, at least three components have to be in state 3 for the system to be in state 3. Such a k-out-of-n system model is more flexible for modeling real-life systems. Huang et al. [97] propose a definition of the generalized multistate k-out-of-n:G system and develop reliability evaluation algorithms for this multistate k-out-of-n:G system model. Definition 12.11 (Huang et al. [97]) An n-component system is called a generalized multistate k-out-of-n:G system if φ(x) ≥ j (1 ≤ j ≤ M) whenever there exists an integer value l ( j ≤ l ≤ M) such that at least kl components are in state l or above. In this definition, k j ’s do not have to be the same for different system states j (1 ≤ j ≤ M). This means that the structure of the multistate system may be different for different system state levels. Generally speaking, k j ’s values are not necessarily in a monotone ordering. But the following two special cases of this definition will be particularly considered: •
•
When k1 ≤ k2 ≤ · · · ≤ k M , the system is called an increasing multistate kout-of-n:G system. In this case, for the system to be in a higher state level j or above, a larger number of components must be in state j or above. In other words, there is an increasing requirement on the number of components that must be in a certain state or above for the system to be in a higher state level or above. That is why we call it the increasing multistate k-out-of-n:G system. When k1 ≥ k 2 ≥ · · · ≥ k M , the system is called a decreasing multistate kout-of-n:G system. In this case, for a higher system state level j, there is a decreasing requirement on the number of components that must be in state level j or above.
When k j is a constant, that is, k1 = k2 = · · · = k M = k, the structure of the system is the same for all system state levels. This reduces to the definition of the simple multistate k-out-of-n:G system. We call such systems constant multistate k-out-of-n:G systems. All the concepts and results of binary k-out-of-n:G systems can be easily extended to the constant multistate k-out-of-n:G systems. The constant multistate k-out-of-n:G system can be treated as a special case of the increasing multistate k-out-of-n:G system in our later discussions. Example 12.11 (Huang et al. [97]) Suppose that a plant has five production lines for producing a certain product. The plant has four different production levels: full scale for maximum customer demand (state 3), average scale for normal customer demand (state 2), low scale when the customer demand is low (state 1), and zero
484
MULTISTATE SYSTEM MODELS
scale when the plant is shut down. All of the five production lines have to work full scale (at state 3) for the system to be in state 3. At least three lines have to work at least at the average scale for the system to be at least in state 2. At least two lines have to work at least at the low scale (state 1) for the system to be in state 1 or above. Such a system can be represented by an increasing multistate k-out-of-n:G system model with k1 = 2, k2 = 3, and k3 = 5. Example 12.12 (Huang et al. [97]) To illustrate an increasing multistate k-outof-n:G system, consider a three-component system with k 1 = 1, k2 = 2, and k3 = 3. Both the system and the components may be in one of four possible states, namely, 0, 1, 2, and 3. The following table illustrates the relationship between system state and component states: φ(x)
x
0
1
(0, 0, 0) (1, 0, 0) (1, 1, 0) (1, 1, 1) (2, 0, 0) (2, 1, 0) (2, 1, 1) (3, 0, 0) (3, 1, 0) (3, 1, 1) +
2
3
(2, 2, 0) (2, 2, 1) (2, 2, 2) (3, 2, 0) (3, 2, 1) (3, 2, 2) (3, 3, 0) (3, 3, 1) (3, 3, 2) +
(3, 3, 3)
In the table, the plus sign represents the permutations of the component states listed above the plus sign. For example, system state level 2 can result from component states (2, 2, 0) and their permutations, namely, (0, 2, 2) and (2, 0, 2). This is because all of the components in a k-out-of-n:G system perform the same functions. Because k1 < k2 < k3 in this case, we have a simpler version of the proposed definition of the multistate k-out-of-n:G system. The system is in state 3 if all three components are in state 3. The system is in state 2 or above if at least two components are in state 2 or above. The system is in state 1 or above if at least one component is in state 1 or above. The system in this example has a series structure at system state 3 (3-out-of-3:G), a 2-out-of-3:G structure at system state 2, and a parallel structure at system state 1 (1-out-of-3:G). Example 12.13 (Huang et al. [97]) To illustrate a decreasing multistate k-outof-n:G system, consider a three-component system wherein both the system and the components may be in one of four possible states, 0, 1, 2, and 3. A decreasing k-outof-n:G system satisfies the relationship between system state and component states as shown in the following table. In this example, we have k1 = 3, k2 = 2, and k3 = 1:
SPECIAL MULTISTATE SYSTEMS AND THEIR PERFORMANCE EVALUATION
φ(x)
x
0
1
2
485
3
(0, 0, 0) (1, 1, 1) (2, 2, 0) (3, 0, 0) (1, 0, 0) (2, 1, 1) (2, 2, 1) (3, 1, 0) (2, 0, 0) + (2, 2, 2) (3, 1, 1) (1, 1, 0) + (3, 2, 0) (2, 1, 0) (3, 2, 1) + (3, 2, 2) (3, 3, 0) (3, 3, 1) (3, 3, 2) (3, 3, 3) +
In the table, the plus sign represents the permutations of the states of the components listed above the plus sign. In terms of the definition of a multistate k-out-ofn:G system we have provided, the system in this example is in state 3 if at least one component is in state 3 (k3 = 1). The system is in state 2 or above if at least two components are in state 2 or above (k2 = 2) or if at least one component is in state 3 (k3 = 1). The system is in state 1 or above if all three components are in state 1 or above (k1 = 3), or if at least two components are in state 2 or above (k2 = 2), or if at least one component is in state 3 or above (k3 = 1). We can see that in this example k1 > k2 > k3 indicates a strictly decreasing multistate k-out-of-n:G system. We can say that the system in this example has a 1-out-of-3:G structure at system state 3, a 2-out-of-3:G structure at system state 2, and a 3-out-of-3:G structure at system state 1. For an increasing multistate k-out-of-n:G system, that is, k1 ≤ k2 ≤ · · · ≤ k M , Definition 12.11 can be rephrased as follows: φ(x) ≥ j if and only if at least k j components are in state j or above. If at least k j components are in state j or above (these components can be considered “working” as far as state level j is concerned), then the system will be in state j or above (the system is considered to be working) for 1 ≤ j ≤ M. The only difference between this case of Definition 12.11 and Definition 12.10 is that the number of components required to be in state j or above for the system to be in state j or above may change from state to state. Other characteristics of this case of the generalized multistate k-out-of-n:G system are exactly the same as those defined with Definition 12.10. Algorithms for binary k-out-of-n:G system reliability evaluation can also be extended to the increasing multistate k-out-of-n:G system for reliability evaluation: R j (k j , n) = Pn, j R j (k j − 1, n − 1) + (1 − Pn, j )R j (k j , n − 1), (12.29) where R j (b, a) is the probability that at least b out of a components are in state j or above. The following boundary conditions are needed for equation (12.29):
486
MULTISTATE SYSTEM MODELS
R j (0, a) = 1
for a ≥ 0,
(12.30)
R j (b, a) = 0
for b > a > 0.
(12.31)
When all of the components have the same state probability distribution, that is, pi, j = p j for all i, the probability that the system is in j or above, Rs, j , can be expressed as n n k (12.32) P j (1 − P j )n−k . Rs, j = k k=k j
For a decreasing multistate k-out-of-n:G systems, that is, k1 ≥ k2 ≥ · · · ≥ k M , the wording of its definition is not as simple. The system is in level M if at least k M components are in level M. The system is in level M − 1 or above if at least k M−1 components are in level M − 1 or above or at least k M components are in level M. Generally speaking, the system is in level j or above (1 ≤ j ≤ M) if at least k j components are in level j or above, at least k j+1 components are in level j + 1 or above, at least k j+2 components are in level j + 2 or above, and finally at least k M components are in level M. Definition 12.12 A decreasing multistate k-out-of-n:G system can also be stated in terms of the system being exactly in a certain state: φ(x) = j if and only if at least k j components are in state j or above and at most kl − 1 components are at state l or above for l = j + 1, j + 2, . . . , M, where j = 1, 2, . . . , M. In the following, we will separate the case with i.i.d. components and the case with independent components. When all of the components are i.i.d., the following equation can be used to calculate the probability that the system is in state j: n M n pk + rs, j = (12.33) βl (k) , Q n−k j j k k=k l= j+1,k >1 j
l
where βl (k) is the probability that at least one and at most kl − 1 components are in state l, at most k u − 1 components are in state u for j + 1 ≤ u < l, the total number of components that are at states between j and l inclusive is k, and the system is in state j. As shown in equation (12.33), for a given k value, n − k components are in states below j and the remaining k components must be in state j or above. At the same time, all of these component states must make sure that the system is exactly in state j. In equation (12.33), we are summing up the probabilities that exactly k components are in state j or above without bringing the system state above j for k = k j , k j + 1, . . . , n. The term βl (k) is dependent on the value of k. The quantity in the last parentheses is the probability that at least k components are in state j or above without causing the system to be in a state above j. Within the last paren-
487
SPECIAL MULTISTATE SYSTEMS AND THEIR PERFORMANCE EVALUATION
theses, pkj is the probability that all the k components are exactly in state j and β j+1 (k) is the probability that at least one and at most k j+1 − 1 components are in state j + 1 and the total number of components in states j and j + 1 is k. When l = j +2, βl (k) = β j+2 (k) is the probability that i 1 components are in state j +2 for 1 ≤ i 1 ≤ k j+2 − 1, i 2 components are in state j + 1 for 0 ≤ i 2 ≤ k j+1 − 1 − i1 , and for any u, the total number of components in states j, j + 1, and j + 2 is k. If k u = 1 j +1 ≤ u ≤ M, βl (k) = 0 for l > u because kl is nonincreasing. Let Ia = am=1 i m for a = 1, 2, . . . , l − j. We can use the following equation to calculate βl (k): βl (k) =
k l −1 i 1 =1
×
kl−1 k −1−I −1−I1 k k − I1 i2 l−2 2 k − I2 i3 i1 p pl−1 pl−2 × · · · i2 i3 i1 l i =0 i =0 2
k j +1 −1−I l− j −1 il− j =0
3
k − Il− j−1 il− j k−I p j+1 × p j l− j . il− j
(12.34)
The number of terms to be summed together in equation (12.34) is equal to the number of ways to assign k identical balls to l − j different cells with at least 1 and at most kl − 1 balls in cell l, at most kt − 1 balls in cells t for j < t < l, and the remaining balls in cell j. Thus, the computation time for βl (k) in equation (12.34) will be much smaller than K l− j , where K = max{k j , j = 1, 2, . . . , n}. In turn, the computation time for rs j in equation (12.33) will be much less than n K M . We have developed a computer program for given values of M for calculation of system state distributions. For most practical engineering problems, a limited state number M, for example M = 10, is big enough to describe the performance of the system and its components. Thus, the proposed algorithm is practical. Example 12.14 (Huang et al. [97]) Given a four-component multistate system, assume both the system and its components may be in state 0, 1, 2, 3, or 4. Let k1 = 4, k2 = 3, k3 = 2, and k4 = 1. The components are assumed to be i.i.d. with p0 = 0.1, p1 = 0.2, p2 = 0.3, p3 = 0.3, and p4 = 0.1. Use the above algorithm to calculate the system probabilities at all levels. We find Q 4 = 0.9, Q 3 = 0.6, Q 2 = 0.3, and Q 1 = 0.1. At level 4, j = 4, k 4 = 1. By equation (12.33), we have rs,4 =
4 4 k=1
k
k Q 4−k 4 p4 =
4 4 k=1
k
(0.9)4−k × 0.1k = 0.3439.
At level 3, j = 3, k3 = 2. By equation (12.33),we have rs,3 =
4 4 k=2
k
k Q 4−k 3 p3 =
4 4 k=2
k
(0.6)4−k × 0.3k = 0.2673.
488
MULTISTATE SYSTEM MODELS
At level 2, j = 2, k2 = 3. Since k3 > 1, we need to find the expression of β3 (k). Based on equation (12.34), 1 k
β3 (k) =
i1
i 1 =1
4 4
rs,2 =
k=3
k
p3i1 p2k−i1
k 0.3 × 0.3k−1 = k × 0.3k , = 1
2 3 p2k + β3 (k) Q 4−k 2
4 4 (0.3)4−k 0.3k + k × 0.3k = 0.1701. = k k=3
At level 1, j = 1, k1 = 4. Since k2 = 3 > 1 and k3 = 2 > 1, we need to find β2 (k) and β3 (k). By equation (12.34), β2 (k) =
2 4 i 1 =1
i1
p2i1 p1k−i1
and
β3 (k) =
1 4 i 1 =1
i1
p3i1
1 3
i 2 =0
i2
p2i2 p13−i2
.
By equation (12.33), rs,1 =
4 4 k=4
4
Q 4−4 1
p1k
+
3
βl (k) = 0.24 + β2 (4) + β3 (4) = 0.0856.
l=2
At level 0, we have rs0 = 1 − rs,1 − rs,2 − rs,3 − rs,4 = 0.1331. When the components are independent, the following equation can be used to calculate the system probability at level j: rs, j =
n k=k j
Rej (k, n) +
M
j βk (l) ,
(12.35)
l= j+1,l>1
j
where βk (l) is the probability that at least one and at most kl − 1 components are at state l (l > j), at most ku − 1 components are at state u for j < u < l, the total number of components at state j or above is equal to k, n − k components are at j states below j, and the system is in state j. In equation (12.35), nk=k j Re (k, n) is the probability that at least k j components are in state j and the other components are below state j: j
j
j
Re (k, n) = pn, j Re (k − 1, n − 1) + Q n, j Re (k, n − 1),
(12.36)
SPECIAL MULTISTATE SYSTEMS AND THEIR PERFORMANCE EVALUATION
j
Re (k, k) =
k .
489
pi, j ,
(12.37)
Q 1, j Q 2, j · · · Q i−1, j × pi, j × Q i+1, j · · · Q k, j .
(12.38)
i=1 j
Re (1, k) =
k i=1
j
To calculate βk (l), we use the following equation: j
βk (l) =
k l −1
−1−I1 kl−1
−1−I2 kl−2
i 1 =1
i 2 =0
i 3 =0
···
k j +1 −1−I l− j −1 il− j =0
× R((l i1 , (l − 1)i2 , . . . , ( j + 1)il− j , j k−Il− j ), n),
(12.39)
where Ia = am=1 i m for a = 1, 2, . . . , l − j. In equation (12.39), R((l i1 , (l − 1)i2 , . . . , ( j + 1)il− j , j k−Il− j ), n) is the probability that there are exactly i 1 components at level l, i 2 components at level l − 1, . . . , il− j components at level j +1, k−Il− j components at level j, and the remaining n − k components at states below j. For simplicity, let R((l i1 , (l − 1)i2 , . . . , ( j + 1)il− j , j k−Il− j ), n) = R((l ∼ j), n). We can calculate R((l ∼ j), n) using the following recursive relation: R((l ∼ j), n) = pn,l R((l i1 −1 , ∗), n − 1) + pn,l−1 R(((l − 1)i2 −1 , ∗), n − 1) + · · · + pn, j+1 R((( j + 1)il− j −1 , ∗), n − 1) + pn, j R(( j k−Il− j −1 , ∗), n − 1) + Q n, j R((l ∼ j), n − 1),
(12.40)
where R((l i1 −1 , ∗), n − 1) ≡ R((l i1 −1 , (l − 1)i2 , . . . , ( j + 1)il− j , j k−Il− j ), n − 1) and similar convention is followed for other notation in the equation. The boundary conditions for equation (12.40) are as follows: R((l 0 , (l − 1)0 , . . . , i 1 , . . . , ( j + 1)0 , j 0 ), 1) = p1,i , R((l 0 , (l − 1)0 , . . . , ( j + 1)0 , j 0 ), 1) = Q 1, j , R((l 0 , (l − 1)0 , . . . , ( j + 1)0 , j 0 ), n) =
n .
(12.41) (12.42)
Q k, j ,
(12.43)
pk,i ,
(12.44)
k=1
R((l 0 , (l − 1)0 , . . . , i n , . . . , ( j + 1)0 , j 0 ), n) =
n . k=1
490
MULTISTATE SYSTEM MODELS
R((l 0 , (l − 1)0 , . . . , i 1 , . . . , ( j + 1)0 , j 0 ), n) =
n
Q 1, j Q 2, j · · · Q k−1, j pk,i Q k+1, j · · · Q n, j .
(12.45)
k=1
The computation time for rs, j in the non-i.i.d. case with equation (12.35) is much less than n 2 K M , where K = max1≤i≤n {ki }.
12.4.3 Generalized Multistate Consecutive-k-out-of-n:F System The consecutively connected system models discussed earlier in this chapter allow the components to be in more than two different states while the system is allowed to be binary. Hiam and Porat [93] provide a Bayes reliability model of the Con/k/n:F system in which both the system and its components are assumed to have more than two possible states while k is assumed to be constant. When k is constant, the system has the same reliability structure at all system states. However, a multistate system may have different structures at different system states or levels. The following example illustrates this point. Example 12.15 (Huang et al. [96]) A batch of products may be sorted into one of the following three classes based on the level of quality: grade A, grade B, and rejected. The following sampling procedure is used to classify the product items: if consecutive 3-out-of-10 items of a sample do not meet the standard of grade A, then a subsequent inspection is conducted under the standard of grade B; otherwise, it is labeled grade A. If consecutive 5-out-of-10 items of a sample are judged to be lower than grade B, then this batch will be rejected; otherwise, it is labeled grade B. For such a problem, we can define a multistate consecutive k-out-of-n:F system with the label of the batch as the system state and the sampled items as components. Both the system and the components have three possible states: state 2 (grade A), state 1 (grade B), and state 0 (rejected). At the system state level 2, it has a Con/3/10:F structure, and at the system state level 1 it has a Con/5/10:F structure. Motivated by Example 12.15, Huang et al. [96] propose a definition of a generalized multistate Con/k/n:F. In their definition, a different number of consecutive components need to be below level j for the system to be below level j for different j values (1 ≤ j ≤ M). The required number of consecutive component “failures” is dependent on the system state level under consideration. Examples are given to illustrate the ideas of the proposed definition. System performance evaluation algorithms are presented for the proposed multistate Con/k/n:F system. In the following discussions of multistate systems, we sometimes still use the term “failure.” This term now has dynamic meanings. When we are concerned about state level j, we can say the system is failed or a component is failed if it is below state level j. Of course, when the value of j changes, the meaning of failed changes too.
SPECIAL MULTISTATE SYSTEMS AND THEIR PERFORMANCE EVALUATION
491
Definition 12.13 (Huang et al. [96]) φ(x) < j ( j = 1, 2, . . . , M) if at least kl consecutive components are in states below l for all l such that j ≤ l ≤ M. An n-component system with such a property is called a multistate Con/k/n:F system. In this definition, k j ’s do not have to be the same for different system states j (1 ≤ j ≤ M). This means that the structure of the multistate system may be different for different system state levels. The following two special cases of this definition will be considered: •
When k1 ≥ k2 ≥ · · · ≥ k M , the system is called a decreasing multistate Con/k/n:F system. In this case, for the system to be below a higher state level j, a smaller number of consecutive components must be below state j. In other words, as j increases, there is a decreasing requirement on the number of consecutive components that must be below state j for the system to be below state level j.
•
When k1 ≤ k2 ≤ · · · ≤ k M , the system is called an increasing multistate Con/k/n:F system. In this case, for the system to be below a higher state level j, a larger number of consecutive components must be below state j. In other words, as j increases, there is an increasing requirement on the number of consecutive components that must be below a state j for the system to be below state level j.
When k j is constant, that is, k1 = k2 = · · · = k M = k, the structure of the system is the same for all of the system state levels. This reduces to the definition of the multistate Con/k/n:F system provided by Hiam and Porat [93]. We call such a system a constant multistate Con/k/n:F system. The constant multistate Con/k/n:F system is considered to be a special case of the decreasing multistate Con/k/n:F system. Example 12.16 (Huang et al. [96]) To illustrate a decreasing multistate Con/k/ n:F system, consider a three-component system wherein both the system and the components may be in one of three possible states, 0, 1, and 2. The system state and the component states have relationships as shown in the following table. In this example, we have k1 = 2 and k2 = 1: φ(x)
x
0
1
2
(0, 0, 0) (1, 0, 0) (0, 0, 1) (2, 0, 0) (0, 0, 2)
(0, 1, 0) (1, 1, 0)+ (1, 1, 1) (0, 2, 0) (2, 1, 1)+ (2, 2, 0)+ (2, 2, 1)+
(2, 2, 2)
492
MULTISTATE SYSTEM MODELS
In the table, the plus sign represents all the permutations of the elements of the component state vector x. For example, system state level 1 can result from component states (1, 1, 0) and their permutations, namely, (0, 1, 1) and (1, 0, 1). In terms of the definition of a multistate Con/k/n:F system that we have provided, the system in this example is below state 2 if and only if at least one (consecutive) component is below state 2 (k2 = 1). The system is below state 1 if and only if at least two consecutive components are below state 1 (k1 = 2). We can see that in this example k1 > k2 , indicating a strictly decreasing multistate Con/k/n:F system. We can say that the system in this example has a Con/1/3:F structure at system state level 2 and a Con/2/3:F structure at system state level 1. Example 12.17 (Huang et al. [96]) To illustrate an increasing multistate Con/k/ n:F system, consider a three-component system with k1 = 1, k2 = 2, and k3 = 3. Both the system and the components may be in one of four possible states, namely, 0, 1, 2, and 3. The following table illustrates the relationship between system state and component states: φ(x)
0
x
(0, 0, 0) (1, 0, 0)+ (1, 1, 0)+ (2, 0, 0) (0, 0, 2) (2, 1, 0) (2, 0, 1) (1, 0, 2) (0, 1, 2)
1
2
(1, 1, 1) (0, 2, 0) (2, 1, 1) (1, 2, 0) (1, 1, 2) (0, 2, 1) (2, 2, 0)+ (2, 2, 1)+ (2, 2, 2)
3 (3, 0, 0)+ (3, 1, 0)+ (3, 1, 1)+ (3, 2, 0)+ (3, 2, 1)+ (3, 2, 2)+ (3, 3, 0)+ (3, 3, 1)+ (3, 3, 2)+ (3, 3, 3)
Again, the plus sign represents all permutations of the elements of the component state vector x. The system is below state 3 if and only if at least three (consecutive) components are below state 3. The system is below state 2 if and only if at least two consecutive components are below state 2 and at least three (consecutive) components are below state 3. The system is below state 1 if and only if at least one (consecutive) component is below state 1, at least two consecutive components are below state 2, and at least three (consecutive) components are below state 3. The system in this example has a parallel structure at system state 3 (3-out-of-3:F), a Con/2/3:F structure at system state level 2, and a series structure at system state level 1 (1-out-of-3:F). Case I. Decreasing Multistate Con/k/n:F Systems: k1 ≥ k2 ≥ · · · ≥ kM In this case, the definition of a multistate Con/k/n:F system is equivalent to that at φ(x) < j if and only if at least k j consecutive components have xi < j for j =
SPECIAL MULTISTATE SYSTEMS AND THEIR PERFORMANCE EVALUATION
493
1, 2, . . . , M. Those components with xi < j are considered failed with respect to state level j. The system is considered failed with respect to state level j if φ < j. The algorithms for binary Con/k/n:F system reliability evaluation provided by Hwang [100] can be used to find Pr(φ < j) for j = 1, 2, . . . , M of a multistate Con/k/n:F system. Once Pr(φ < j) is found for j = 1, 2, . . . , M, we can easily find Pr(φ = j) for j = 0, 1, . . . , M. The following equation is based on Hwang [100]: F j (n; k j ) = F j (n − 1; k j )
+ 1 − F j (n − k j − 1; k j ) Pn−k j , j
n .
Q m, j ,
(12.46)
m=n−k j +1
where F j (a; b) is the probability that at least b consecutive components are below state j in an a-component system and j may take values from 1 to M. Equation (12.46) can be applied recursively with the following boundary conditions: F j (a; b) = 0,
b > a > 0,
P0, j = 1,
j = 1, 2, . . . , M,
j = 1, 2, . . . , M.
(12.47) (12.48)
By recursively applying equation (12.46), we can find the probabilities that the system is below state j for j = 1, 2, . . . , M and then find the probabilities that the system is in state j for j = 0, 1, . . . , M using the following equations: Pr(φ < j) = F j (n; k j )
for j = 1, 2, . . . , M,
(12.49)
Pr(φ = 0) = Pr(φ < 1),
(12.50)
Pr(φ = M) = 1 − Pr(φ < M),
(12.51)
Pr(φ = j) = Pr(φ < j + 1) − Pr(φ < j)
for j = 1, . . . , M − 1. (12.52)
Note that the constant multistate Con/k/n:F system can be treated as a special case of the decreasing Con/k/n:F system with k1 = k2 = · · · = k M . Example 12.18 (Huang et al. [96]) To illustrate the evaluation of a system state distribution of a decreasing multistate Con/k/n:F system, consider Example 12.16 wherein n = 3 and M = 2. Assume that all of the components are i.i.d. with the following common component state distribution: p0 = 0.1,
p1 = 0.4,
p2 = 0.5.
Calculate the system state distribution: Q 1 = p0 = 0.1, P0 = 1,
Q 2 = p0 + p1 = 0.5, P1 = p1 + p2 = 0.9,
Q 3 = 1, P2 = p2 = 0.5.
For j = 2, we have k2 = 1. Using equation (12.46), we have
494
MULTISTATE SYSTEM MODELS
F2 (3; 1) = F2 (2; 1) + 1 − F2 (1; 1) P22 Q 32 = Q 12 + P12 Q 22 + P12 P22 Q 32 = 0.8750. For j = 1, we have k1 = 2. Again using equation (12.46), we have
F1 (3; 2) = F1 (2; 2) + 1 − F1 (1; 1) Q 21 Q 31 = Q 11 Q 21 + P11 Q 21 Q 31 = 0.0190. The system state distributions can be calculated as follows: Pr(φ < 2) = F2 (3; 1) = 0.8750, Pr(φ < 1) = F1 (3; 2) = 0.0190, Pr(φ = 2) = 1 − Pr(φ < 2) = 0.1250, Pr(φ = 1) = Pr(φ < 2) − Pr(φ < 1) = 0.8560, Pr(φ = 0) = F1 (3; 2) = 0.0190. Case II. Increasing Multistate Con/k/n:F Systems: k1 ≤ k2 ≤ · · · ≤ kM We assume that at least one of the inequalities in this case is a strict inequality. Based on the definitions of minimal path vectors provided earlier, the minimal path vectors to level j could also be a minimal path vector to level j + 1 or even higher levels. Consider the system structure given in Example 12.17. One of the minimal path vectors to system level 1 is (0, 2, 0) because φ(0, 2, 0) ≥ 1 and φ(x) < 1 for all x < (0, 2, 0). At the same time, component state vector (0, 2, 0) is also a minimal path vector for system state level 2. As a result, we are unable to use binary Con/k/n:F system reliability evaluation formulas for evaluation of system state distribution under case II. New methods need to be developed for this purpose. Alternatively, bounding techniques may be used to establish bounds for the probability that the system will be in each possible state. Given the minimal path vectors or minimal cut vectors of a multistate system, either the inclusion–exclusion (IE) or the sum-of-disjoint-product (SDP) method can be used to establish the bounds.
12.5 GENERAL MULTISTATE SYSTEMS AND THEIR PERFORMANCE EVALUATION Notation
1 = 0
if xi ≥ j, otherwise
•
xi, j : binary indicator, xi, j
•
x j : binary indicator vector, x j = (x 1, j , x2, j , . . . , x n, j ) 1 if φ(x) ≥ j, j j φ (x): binary indicator, φ = 0 otherwise
•
GENERAL MULTISTATE SYSTEMS AND THEIR PERFORMANCE EVALUATION • • •
495
ψ j : binary structure function for system state j ψ: multistate structure function of binary variables φ D : dual structure function of φ
Many researchers have investigated the properties of multistate systems and provided limited means for their performance evaluation. In this section, we provide a coverage of these results. Some special multistate system models introduced earlier will be used to illustrate the concepts and methods to be presented. Definition 12.14 (Hudson and Kapur [99]) Two component state vectors x and y are said to be equivalent if and only if there exists a j such that φ(x) = φ(y) = j, j = 0, 1, . . . , M. We use notation x ↔ y to indicate that these two vectors are equivalent. In a series multistate system with two components (n = 2) and three states (M = 2), the following component state vectors and the permutations of the elements of each vector will cause the system to be in state 1 and are thus equivalent to one another: (1, 1), (1, 2). In a parallel multistate system with three components (n = 3) and four states (M = 3), the following component state vectors and the permutations of the elements of each vector will cause the system to be in state 2 and thus are equivalent to one another: (2, 2, 2), (2, 2, 1), (2, 2, 0), (2, 1, 1), (2, 1, 0), (2, 0, 0). In a generalized k-out-of-n:G system, all permutations of the elements of a component state vector are equivalent to the component state vector since the positions of the components in the system are not important. For example, consider a multistate kout-of-n:G system with three components (n = 3) and four possible states (M = 3). Component state vector (1, 2, 3) and its permutation vectors, namely, (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), and (3, 2, 1), are equivalent to one another. Based on Definition 12.14, if x ↔ y and y ↔ z, then x ↔ z. This means that equivalent state vectors have the transferable property. Obviously, x ↔ x. All component state vectors that result in the same system state level are equivalent to one another. We can say that these equivalent vectors are in the same class called an equivalent class. Definition 12.15 (Huang and Zuo [95]) A multistate coherent system is called a dominant system if and only if its structure function φ satisfies: φ(y) > φ(x) implies either (1) y > x or (2) y > z, x ↔ z, and x = z. This dominance condition says that if φ(y) > φ(x), then vector y must be larger than a vector that is in the same equivalent class as vector x. A vector is larger than another vector if every element of the first vector is at least as large as the corresponding element in the second vector and at least one element in the first vector is larger than the corresponding one in the second vector. For example, vector (2, 2, 0) is larger than (2, 1, 0), but not larger than (0, 1, 1). We use the word “dominant” to indicate that vector y dominates vector x even though we may not necessarily have
496
MULTISTATE SYSTEM MODELS
y > x. If a multistate system does not satisfy Definition 12.15, we call it a nondominant system. The increasing k-out-of-n:G system is a dominant system while a decreasing kout-of-n:G system is a nondominant system. Consider a decreasing k-out-of-n:G system with n = 3, M = 3, k 1 = 3, k2 = 2, and k3 = 1. Then, we have φ(3, 0, 0) = 3, φ(2, 2, 0) = 2, and φ(1, 1, 1) = 1. The dominance conditions given in Definition 12.15 are not satisfied since (3, 0, 0) > (2, 2, 0)∗ > (1, 1, 1)∗ even though φ(3, 0, 0) > φ(2, 2, 0) > φ(1, 1, 1). The asterisk is used to indicate the vector or any one of its permutations. Example 12.19 (Huang and Zuo [95]) Consider a two-component multistate system. Both the system and its components may be in three possible states: 0, 1, and 2. Table 12.1 shows one structure function of such a system (called system A), which has a series structure at level 2 because the system is at level 2 if and only if both components are at level 2. The system has a parallel structure at level 1 because the system is in state 1 or above if and only if at least one of the components is in state 1 or above. Based on Definition 12.15, we conclude that system A is a dominant system. Table 12.2 gives a different system structure of a two-component system (called system B). System B has a parallel structure at level 2 and a series one at level 1. Since vector (2, 0) resulting in system state 2 is not bigger or equal to any vector resulting in system state 1, system B is a nondominant system. Let φ and φ be the structure functions of two different binary coherent systems with the same number of components. If φ(x) ≥ φ (x) holds for all x, then we say that structure φ is stronger than structure φ [22]. Among binary systems with n components, the series structure is weakest structure9 while the parallel structure 9the n n xi ≤ φ(x) ≤ 1− i=1 (1−xi ). In a dominant is the strongest structure because i=1 l j system, let φ and φ denote the binary structure functions for system levels l and j, respectively, where l > j. For any component state vector x, it is always true that φl (x) ≤ φ j (x) based on Definition 12.15. Otherwise, if φ l (x) > φ j (x), that is, φl (x) = 1 and φ j (x) = 0, then we will have φ(x) ≥ l and φ(x) < j, which
TABLE 12.1 System A: Dominant System φ(x)
x
0
1
2
(0, 0)
(1, 0) (0, 1) (1, 1) (2, 0) (0, 2) (2, 1) (1, 2)
(2, 2)
GENERAL MULTISTATE SYSTEMS AND THEIR PERFORMANCE EVALUATION
497
TABLE 12.2 System B: Nondominant System φ(x)
0
1
2
(1, 1)
x
(0, 0) (1, 0) (0, 1)
(2, 0) (0, 2) (2, 1) (1, 2) (2, 2)
contradict with each other. As a result, we conclude that the structure of a dominant system changes from strong to weak as its system level increases. With respect to any given system level j, the states of a multistate system can be divided into two separate groups: “working” if φ(x) ≥ j and “failed” otherwise. Similarly, component i is said working when xi ≥ j and failed otherwise. Since j may take different values, working and failure have different meanings for different j values. We can say that the meanings of working and failure are dynamic or context dependent. The dichotomy of the states of the system and the states of the components can be done at each state level for any multistate system. However, the difference between a dominant system and a nondominant system is described below. In a dominant system, the minimal path vectors to system state j cause the system to be exactly in state j, where 1 ≤ j ≤ M. In a nondominant system, a minimal path vector to system state j may cause the system to be in a state higher than j, where 1 ≤ j ≤ M. In Example 12.19, the minimal path vectors to level 1 of system A are (1, 0) and (0, 1), which both result exactly in system level 1. The only minimal path vector to level 2 is (2, 2), which also results exactly in system level 2. In Example 12.20, the minimal path vectors to level 1 of system B are (1, 1), (2, 0), and (0, 2). Vectors (2, 0) and (0, 2) do not cause the system to be in state 1. A key problem in the multistate context is how to find system state distribution Pr(φ(x) ≥ j) for all 1 ≤ j ≤ M. For a nondominant system, binary system reliability evaluation algorithms cannot be used directly for evaluation of its system state distribution because the minimal path vectors to some system states may cause the system to be in a higher state. Whether binary system reliability evaluation algorithms can be used directly for evaluation of the state distribution of a dominant system depends on whether the system has a binary image. In the following, we introduce the concept of a binary-imaged system. In general, dominant systems can be divided into two types: with binary image and without binary image, where the meaning of “binary image” implies that the system can be treated as a binary system and thus binary reliability evaluation algorithms can be applied. As illustrated in the previous paragraph, a nondominant system is not a binary-imaged system because its component state vectors cannot be divided into two separate groups. For instance, system B in Example 12.19 is not a binary-imaged system since it is nondominant, even though it has a parallel structure at level 2 and a series structure at level 1. In other words, the dominance property has
498
MULTISTATE SYSTEM MODELS
to be satisfied first for the system to have a binary image. Ansell and Bendell [14] defined a multistate coherent system with a binary image by xj = yj
⇒ φ j (x) = φ j (y)
for any j.
(12.53)
They claim that if a system satisfies condition (12.53), then it can be treated as a binary system. However, this condition is too strong because some systems are, in fact, binary imaged even though they do not satisfy equation (12.53). Huang and Zuo [95] propose a definition of the binary-imaged dominant system as follows. Definition 12.16 A multistate system is a binary-imaged system if and only if its structure indicator function satisfies either (1) x j = y j ⇒ φ j (x) = φ j (y) or (2) x j = z j and z ↔ y ⇒ φ j (x) = φ j (y). This definition implies that a binary-imaged system must be a dominant system. In Example 12.19, system A is a binary-imaged system based on Definition 12.16. However, system B is not a binary-imaged system. For example, let x = (1, 0) and y = (2, 0); with regard to level j = 1, we have x j = y j = (1, 0) but φ j (x) = 0 = φ j (y) = 1. A dominant system is not always binary imaged since the structure property of a multistate system is much more complex. For example, a binary system with two components has only two possible structures: either series or parallel. There are many other structures that can be constructed out of two components in the multistate context. Example 12.20 is used to illustrate this point. Example 12.20 Consider a two-component multistate system. Both the system and its components may be in three possible states: 0, 1, and 2. Table 12.3 shows another structure function of such a system (called system C), which is neither series nor parallel at any level but is a dominant system. System D in Table 12.4 belongs to a dominant system too, but its structures are neither series nor parallel. In this example, both systems C and D are dominant, but they are not binary-imaged systems. An equivalent definition of the dominant system using binary indicator structure functions is as follows.
TABLE 12.3 System C: Dominant System φ(x)
0
1
2
x
(0, 0) (0, 1)
(1, 0) (1, 1) (0, 2) (1, 2)
(2, 0) (2, 1) (2, 2)
GENERAL MULTISTATE SYSTEMS AND THEIR PERFORMANCE EVALUATION
499
TABLE 12.4 System D: Dominant System φ(x)
x
0
1
2
(0, 0)
(1, 0) (0, 1) (1, 1) (2, 0) (0, 2)
(2, 1) (1, 2) (2, 2)
Definition 12.17 A multistate system is a dominant system if and only if its structure indicator function satisfies: φ j (x) = φ j (y) implies either (1) x j = y j or (2) x j = z j and z ↔ y for j = 1, 2, . . . , M. Usually, the structure function φ of a multistate system can be expressed in terms of n × M binary variable as follows: φ(x) = ψ(x 1,0 , x 1,1 , . . . , x1,M , . . . , xn,0 , xn,1 , . . . , x n,M ).
(12.54)
For a binary-imaged system, we can express its binary structure function φ j for each state j as φ j (x) = ψ j (x1, j , x2, j , . . . , xn, j ).
(12.55)
Equation (12.55) indicates that the indicator structure function of a binary-imaged multistate system for each given state is only related to n binary variables. Existing binary algorithms can be used to evaluate the probability that a binary-imaged system is at state j or above for any level j. For instance, in Example 12.19, the minimal path vectors with respect to level 1 are (1, 0) and (0, 1). The minimal path vector with respect to level 2 is (2, 2). We can write φ 1 (x) = ψ 1 (x 1,1 , x2,1 ) and φ 2 (x) = ψ 2 (x1,2 , x2,2 ). In a binary system, a system reliability function of n components can be expressed in terms of the system reliability function of n − 1 components as h(p) = pi,1 h(1i ; p) + (1 − pi,1 )h(0i ; p).
(12.56)
For a binary-imaged multistate system, the same principle can be used: h j (P j ) = Pi, j h(1i ; P j ) + (1 − Pi, j )h(0i ; P j ).
(12.57)
Once h j (P j ) = Pr(φ ≥ j) is calculated for all j values, we can find Pr(φ = j) as follows: Pr(φ = j) = h j (P j ) − h j+1 (P j+1 ).
(12.58)
In the following, we use an example to illustrate how to apply the binary algorithm for system performance evaluation of a binary-imaged multistate system.
500
MULTISTATE SYSTEM MODELS
1 1 2
3
1
2
3
2 3 State 1
FIGURE 12.10
State 2
State 3
Dominant system with binary image for Example 12.21.
Example 12.21 A multistate coherent system consists of three i.i.d. components. Both the system and the components are allowed to have four possible states: 0, 1, 2, 3. Assume p0 = 0.1, p1 = 0.3, p2 = 0.4, p3 = 0.2. The structures of the system are shown in Figure 12.10. Obviously, the system has a parallel structure at level 1, a mixed parallel–series structure at level 2, and a series structure at level 3. The structures change from strong to weak as the level of the system increases. The minimal path vectors to level 1 are (1, 0, 0) and its permutations; to level 2 are (2, 0, 2) and (0, 2, 2); and to level 3 is (3, 3, 3). The system is a binary-imaged system. The following table illustrates the relationship between the system state and the component states: φ(x)
x
0
1
(0, 0, 0) (3, 1, 1) (3, 1, 0) (3, 0, 0) (2, 1, 1) (2, 1, 0) (2, 0, 0) (1, 1, 1) (1, 1, 0) (1, 0, 0) + (3, 3, 1) (3, 3, 0) (3, 2, 1) (2, 3, 1) (3, 2, 0) (2, 3, 0) (2, 2, 1) (2, 2, 0)
2
3
(3, 3, 2) (3, 2, 2) (2, 2, 2) + (3, 1, 3) (1, 3, 3) (3, 0, 3) (0, 3, 3) (3, 1, 2) (1, 3, 2) (1, 2, 3) (2, 1, 3) (0, 2, 3) (3, 0, 2) (0, 3, 2) (2, 0, 3) (1, 2, 2) (2, 1, 2) (2, 0, 2) (0, 2, 2)
(3, 3, 3)
GENERAL MULTISTATE SYSTEMS AND THEIR PERFORMANCE EVALUATION
501
In the table, the plus sign represents all of the permutations of the states of the components listed above the plus sign. To calculate the system reliabilities at all levels, we can extend binary algorithms: P1 = 0.9,
P2 = 0.6,
P3 = 0.2,
Rs,3 = P1,3 × P2,3 × P3,3 = P33 = 0.0080, Rs,2 = [1 − (1 − P1,2 )(1 − P2,2 )]P3,2 = 2P22 − P23 = 0.5040, Rs,1 = 1 − (1 − P1,1 )(1 − P2,1 )(1 − P3,1 ) = 1 − (1 − P1 )3 = 0.9990, rs,3 = Rs,3 = 0.0080,
rs,2 = Rs,2 − Rs,3 = 0.4960,
rs,1 = Rs,1 − Rs,2 = 0.4950,
rs,0 = 1 − Rs,1 = 0.0010.
As shown in Example 12.21, a commonly used approach for multistate system reliability evaluation is to extend the results for binary system reliability evaluation. The minimal path vectors and minimal cut vectors play important roles for the reliability evaluation of binary systems. Usually, finding all the minimal path vectors or minimal cut vectors of a multistate system is difficult. Sometimes the enumeration method may be easier to use for simple systems [249]. For example, in Example 12.19, the minimal path vectors to level 1 of system B include (1,1), (2,0), and (0,2), where φ(1, 1) = 1 but φ(0, 2) = φ(2, 0) = 2. Some minimal path vectors to level 1 are also the minimal path vectors to level 2. This often happens in a nondominant system. j
j
j
Let P1 , P2 , . . . , Pr be the minimal path
Theorem 12.1 (Huang and Zuo [95])
j
vectors to level j of a dominant system; then φ(Pi ) = j for i = 1, 2, . . . , r . j
Proof Assume that there exists an m (1 ≤ m ≤ r ) such that φ(Pm ) = l > j. By j j the definition of the dominant system, we have: If φ(Pm ) = l > φ(Pi ) = j for j
j
j
i = 1, . . . , r and i = m, then either we have Pm > Pi or there exists some z ↔ Pi j
j
such that Pm > z. This contradicts that Pm is a minimal path vector to level j. Theorem 12.2 (Huang and Zuo [95])
j
j
j
Let K1 , K2 , . . . , Ks be the minimal cut j
vectors to level j of a dominant system; then φ(Ki ) = j − 1 for i = 1, 2, . . . , s. Proof Similar to that for Theorem 12.1. For a nondominant system, there exists at least one level j such that at least one j j j j of the minimal path vectors P1 , P2 , . . . , Pr satisfies φ(Pi ) > j, i = 1, 2, . . . , r . j
j
j
There also exists at least one level j such that its minimal cut vectors K1 , K2 , . . . , Kk j φ(Ki )
satisfy < j − 1, i = 1, 2, . . . , k. Since all the minimal path vectors to level j of a dominant system result in system level j, we can dichotomize the system. For a nondominant system, however, some
502
MULTISTATE SYSTEM MODELS j
minimal path vectors may have φ(Pi ) > j. A minimal path vector to level j could also be the minimal path vector to level j + 1, j + 2, and so on. This is why we generally cannot dichotomize a nondominant system. Example 12.22 Consider system D in Example 12.20. Clearly, the system has a parallel structure at system level 1 because the system is at level 1 or above if at least one component is at state 1 or above. The system has neither a series nor a parallel structure at level 2. But we notice that the system is at level 2 if at least one component is at state 2 with the restriction that no component is allowed to be below state 1. It is analogous to a “parallel” structure. Let us see how the binary algorithm for parallel systems can be extended to such a system. Recall a binary parallel system with n components. If the reliability of component i is pi and the unreliability is qi , then the system reliability can be calculated: n .
Rs = 1 −
(1 − pi ) = 1 −
i=1
n .
qi .
(12.59)
i=1
Now, if the minimal path vectors to level j of a multistate system are ( j, . . . , j, s, . . . , s) and its permutations, where s may be nonzero, we can treat it as a parallel structure. With regard to system level j, no components are allowed to have a state below s; that is, those states below s can be ignored when calculating Rs, j . The total 9n system reliability Rs, j of a “multistate system” when ignoring states below s is i=1 Pi,s rather than 1. Thus the following formula can be used to calculate Rs, j : Rs, j =
n .
Pi,s −
i=1
n .
(Q i, j − Q i,s ).
(12.60)
i=1
Now in Example 12.20 we assume pi,0 = 0.1, pi,1 = 0.2, and pi,2 = 0.7 in system D. The system reliabilities are calculated: Rs,1 = 1 −
2 .
Q i,1 = 0.9900,
i=1
Rs,2 =
2 . i=1
Pi,1 −
2 .
(Q i,2 − Q i,1 ) = 0.7700,
i=1
rs,2 = Rs,2 = 0.7700,
rs,1 = Rs,1 − Rs,2 = 0.2200,
rs,0 = 1 − rs,1 − rs,2 = 0.0100. 12.6 SUMMARY Multistate system structures are more complex than binary system structures, and there are still many unresolved research issues involving multistate system reliability analysis. In this chapter, we have attempted to provide a systematic introduction to
SUMMARY
503
multistate system reliability theory. We have also extended many concepts used in binary system reliability analysis to the multistate context. Many other research results on multistate system reliability exist in the literature. However, we are unable to include all of them primarily because of the complex nature of multistate systems and the degree of difficulty for most readers of this book. Advanced readers may refer to Amari and Misra [11], Aven [16], Barlow and Wu [23], Block et al. [32], Block and Savits [34], Brunell and Kapur [42], Kuhnert [126], Meng [166, 167], Natvig [173], Wood [245], Xue and Yang [250], and Yu et al. [253].
APPENDIX LAPLACE TRANSFORM
In reliability analysis of repairable systems, we often need to solve differential equations. Laplace transform is a powerful technique for solving differential equations. In this appendix, we briefly review the method of Laplace transform for solving ordinary differential equations involving real-valued functions. All results are given without proofs. The conditions given in this section for most results are sufficient conditions. They are given for use in reliability analysis only. For advanced coverage of Laplace transform, readers are referred to Schiff [219]. Many Laplace transform formulas have been tabulated, for example in Abramowitz and Stegun [3]. Definition and Properties Suppose that f (t) is a real-valued function of time t ≥ 0 and s is a real-valued parameter. The Laplace transform of the function f (t) is defined as ∞ τ e−st f (t) dt = lim e−st f (t) dt if the limit exists. F(s) = L( f (t)) = 0
τ →∞ 0
(A.1) The Laplace transform operation defined in equation (A.1), denoted by the symbol L, is performed on the function f (t) to produce a new function F(s). The parameter s has a definition domain too. The limit in equation (A.1) exists only if s is in its definition domain. In many cases, we do not list the specific definition domain of s in performing the Laplace transform. Example A.1 Consider the function f (t) = 1 for t ≥ 0, L( f (t)) = 1/s. This can be verified as follows: "∞ ∞ ∞ e−st "" 1 −st −st L( f (t)) = e f (t) dt = e dt = = , s > 0. " −s 0 s 0 0 504
LAPLACE TRANSFORM
505
Thus, we have the following formula for later use: L(1) =
1 , s
s > 0.
(A.2)
The results given in Example A.1 can be generalized to L(t n ) =
n! , s n+1
n = 0, 1, 2, . . . ,
s > 0.
(A.3)
For the function f (t) = t v , where v may take any real value greater than −1, the following general formula can be used: L(t v ) =
(v + 1) , s v+1
v > −1,
s > 0,
(A.4)
where (v + 1) is the gamma function evaluated at point v + 1. The Laplace transform operation has the linearity property, that is, L(c1 f 1 (t) + c2 f 2 (t)) = c1 L( f 1 (t)) + c2 L( f 2 (t)),
(A.5)
where c1 and c2 are arbitrary constants, provided that L( f 1 (t)), L( f 2 (t)), and L(c1 f 1 (t) + c2 f 2 (t)) all exist. The following Laplace transform results are listed for later reference: ω , + ω2 s , L(cos ωt) = 2 s + ω2 n ai i! L(a0 + a1 t + a2 t 2 + · · · + an t n ) = . i+1 s i=0 L(sin ωt) =
s2
(A.6) (A.7) (A.8)
Inverse Laplace Transform If L( f (t)) = F(s), the inverse Laplace transform of F(s) gives the original function f (t): f (t) = L−1 (F(s)),
t ≥ 0.
(A.9)
As long as f (t) is continuous for t ≥ 0, it has a unique Laplace transform F(s) and the inverse Laplace transform of the function F(s) results in a unique continuous function f (t) for t ≥ 0. This result makes the method of Laplace transform very useful in solving differential equations. The inverse Laplace transform also has the linearity property, namely, L−1 (c1 F1 (s) + c2 F2 (s)) = c1 f 1 (t) + c2 f 2 (t),
(A.10)
where c1 and c2 are arbitrary constants, provided that L−1 ( f 1 (t)), L−1 ( f 2 (t)), and L−1 (c1 f 1 (t) + c2 f 2 (t)) all exist.
506
APPENDIX
Translation Theorems Two theorems are available to deal with the shifting of functions in Laplace transform. One theorem deals with the shifting of F(s) by a distance of |a| in the s domain. The other theorem deals with the shifting of f (t) by a distance of a > 0 to the right in the t domain and the shifted function takes a value of zero for t < a. They are both called the translation theorems. For proofs of these two theorems, readers are referred to Schiff [219]. Theorem A.1 (First Translation Theorem) L(eat f (t)) = F(s − a), Theorem A.2 (Second Translation Theorem) L( f (t − a)) = e−as F(s),
If F(s) = L( f (t)) for s > 0, then s > a.
(A.11)
If F(s) = L( f (t)), then a > 0,
(A.12)
where f (t − a) ≡ 0 for t < a. Example A.2 get
As we know, L(t) = 1/s 2 for s > 0. Applying equation (A.11), we
L(teat ) =
1 , (s − a)2
s > a.
(A.13)
A Laplace transform formula generalized from Example A.2 is L(t n eat ) =
n! , (s − a)n+1
n = 0, 1, 2, . . . ,
s > a.
A corresponding inverse Laplace transform formula is 1 t n eat = , t ≥ 0. L−1 n! (s − a)n+1
(A.14)
(A.15)
Differentiation and Integration of Laplace Transforms Differentiation and integration techniques may be used to derive Laplace transforms and inverse Laplace transforms. If L( f (t)) exists, then L((−1)n t n f (t)) =
dn F(s), ds n
n = 1, 2, . . . ,
(A.16)
when s is greater than a certain constant. If L( f (t)) exists and limt→0+ f (t)/t exists, then ∞ f (t) F(x) d x (A.17) = L t s when s is greater than a certain constant.
LAPLACE TRANSFORM
507
Partial Fractions Generally speaking, when F(s) is expressed as a quotient of two polynomials, namely, F(s) =
P(s) , Q(s)
(A.18)
where P(s) is a lower order polynomial than Q(s) and P(s) and Q(s) do not have any common factors, then F(s) can be decomposed into a sum of the following partial fractions: 1. For each nonrepeated linear factor of the form as + b of Q(s), there is a corresponding partial fraction of the form A , as + b where A is a constant. 2. For each repeated linear factor of the form (as + b)n with n > 1, there are corresponding partial fractions of the form A3 An A2 A1 + + ··· + , + 2 3 as + b (as + b) (as + b)n (as + b) where Ai is a constant for i = 1, 2, . . . , n. 3. For each nonrepeated quadratic factor of the form as 2 + bs + c, there is a corresponding partial fraction of the form As + B , + bs + c
as 2
where A and B are constants. 4. For each repeated quadratic factor of the form (as 2 + bs + c)n with n > 1, there are corresponding partial fractions of the form A 3 s + B3 A n s + Bn A1 s + B1 A 2 s + B2 + +· · ·+ , + as 2 + bs + c (as 2 + bs + c)2 (as 2 + bs + c)3 (as 2 + bs + c)n where Ai and Bi are constants for i = 1, 2, . . . , n. The inverse Laplace transforms of the partial fractions are usually known or can be easily derived. Once F(s) is decomposed into a sum of partial fractions, equation (A.5) can be used to find the inverse Laplace transform of F(s) with the inverse Laplace transforms of these partial fractions. Example A.3
Consider the function F(s) =
s5
2s + 4 . − 3s 4 + 2s 3
508
APPENDIX
We can express F(s) in the form F(s) =
s 3 (s
A2 A5 2s + 4 A1 A3 A4 + 2 + 3 + + . = s s−1 s−2 − 1)(s − 2) s s
Then, we have 2s + 4 = A1 s 2 (s − 1)(s − 2) + A2 s(s − 1)(s − 2) + A3 (s − 1)(s − 2) + A4 s 3 (s − 2) + A5 s 3 (s − 1). Setting s = 0 gives A3 = 2. Setting s = 1 gives A4 = −6. Setting s = 2 gives A5 = 1. Equating the coefficients of s on the two sides gives A2 = 4. Equating the coefficients of s 2 gives A1 = 5. As a result, we have 1 1 −1 −1 1 −1 −1 L (F(s)) = 5L + 4L + 2L 2 s s s3 1 1 + L−1 − 6L−1 s−1 s−2 = 5 + 4t + t 2 − 6et + e2t . If the polynomial Q(s) in the denominator of the F(s) function can be expressed in the form Q(s) = (s − a1 )(s − a2 ) · · · (s − an ),
where ai = a j for i = j,
the decomposition of F(s) can be expressed as F(s) =
A1 A2 An + + ··· + . s − a1 s − a2 s − an
(A.19)
Multiplying both sides of equation (A.19) by s − ai and letting s → ai , we can find the constant Ai for i = 1, 2, . . . , n: Ai = lim (s − ai )F(s). s→ai
(A.20)
The inverse Laplace transform of F(s) can be expressed as f (t) =
n
Ai eai t .
(A.21)
i=1
Derivatives and Differential Equations If the Laplace transform of f (t) exists and f (t) and f (t) are continuous for t > 0, the Laplace transform of f (t) can be expressed as L( f (t)) = sL( f (t)) − f (0+ )
(A.22)
LAPLACE TRANSFORM
509
for s in a certain domain provided that f (0+ ) = limt→0+ f (t) exists. Similar results exist for higher order derivatives of f (t). Suppose that f (t), f (t), . . . , f (n) (t) are continuous for t > 0; then the Laplace transform of f (n) (t) can be expressed as L( f (n) (t)) = s n L( f (t)) − s n−1 f (0+ ) − s n−2 f (0+ ) − · · · − f (n−1) (0+ ) (A.23) for s in a certain domain, where n ≥ 2. These results can be used in solving differential equations. Asymptotic Values Laplace transform is a very effective technique for solving differential equations involving function f (t). If we are only interested in the limiting values of f (t) as t → 0 or as t → ∞, we do not have to find the explicit expression of f (t) first and then take the limit. The following two properties of the Laplace transform can be used for this purpose without requiring the explicit expression of f (t). If F(s) = L( f (t)) and f (t) and f (t) are continuous for t > 0, then f (0+ ) ≡ lim f (t) = lim s F(s),
(A.24)
f (∞) ≡ lim f (t) = lim s F(s)
(A.25)
t→0+
t→∞
s→∞ s→0
provided these limits exist. Example A.4 Find x0 (∞), x1 (∞), and x 2 (∞) of the following system of differential equations: x0 (t) = −λ0 x0 (t) + µ1 x1 (t), x 1 (t) = λ0 x0 (t) − (λ1 + µ1 )x1 (t) + µ2 x2 (t), x2 (t) = λ1 x1 (t) − µ2 x2 (t) with boundary conditions x0 (0) = 1, x1 (0) = 0, and x 2 (0) = 0. Taking the Laplace transform of both sides of the equation yields sL(x 0 (t)) − x0 (0) = −λ0 L(x0 (t)) + µ1 L(x1 (t)), sL(x 1 (t)) − x1 (0) = λ0 L(x0 (t)) − (λ1 + µ1 )L(x 1 (t)) + µ2 L(x2 (t)), sL(x2 (t)) − x2 (0) = λ1 L(x1 (t)) − µ2 L(x2 (t)). Applying boundary conditions and rearranging, we have (s + λ0 )L(x0 (t)) − µ1 L(x1 (t)) = 1, −λ0 L(x0 (t)) + (s + λ1 + µ1 )L(x1 (t)) − µ2 L(x2 (t)) = 0, −λ1 L(x1 (t)) + (s + µ2 )L(x2 (t)) = 0.
510
APPENDIX
Writing this system of linear equations in matrix form, we have AL(x(t)) = B, where
s + λ0 A = −λ0 0 1 B = 0 . 0
−µ1 s + λ1 + µ1 −λ1
0 −µ2 , s + µ2
L(x0 (t)) L(x(t)) = L(x 1 (t)) , L(x 2 (t))
We can find the solution of this system of linear equations using Cramer’s rule, L(xi (t)) =
|Ai | , |A|
where |A| is the determinant of the matrix A and Ai is the matrix obtained from the matrix A by replacing the (i + 1)st column by the vector B for i = 0, 1, 2: |A| = (s + λ0 )(s + λ1 + µ1 )(s + µ2 ) − λ0 µ1 (s + µ2 ) − λ1 µ2 (s + λ0 ), |A0 | = (s + λ1 + µ1 )(s + µ2 ) − λ1 µ2 , |A1 | = λ0 (s + µ2 ), |A2 | = λ0 λ1 , L(x0 (t)) = = L(x 1 (t)) = = L(x 2 (t)) = =
|A0 | |A| (s + λ1 + µ1 )(s + µ2 ) − λ1 µ2 , (s + λ0 )(s + λ1 + µ1 )(s + µ2 ) − λ0 µ1 (s + µ2 ) − λ1 µ2 (s + λ0 ) |A1 | |A| λ0 (s + µ2 ) , (s + λ0 )(s + λ1 + µ1 )(s + µ2 ) − λ0 µ1 (s + µ2 ) − λ1 µ2 (s + λ0 ) |A2 | |A| λ0 λ1 . (s + λ0 )(s + λ1 + µ1 )(s + µ2 ) − λ0 µ1 (s + µ2 ) − λ1 µ2 (s + λ0 )
If we need to find the functions x0 (t), x 1 (t), and x2 (t), inverse Laplace transforms must be performed. Since we are only required to find the limiting values of these functions as t → ∞, we can apply equation (A.25):
LAPLACE TRANSFORM
511
x0 (∞) = lim sL(x0 (t)) s→0
= lim
s→0
s[(s + λ1 + µ1 )(s + µ2 ) − λ1 µ2 ] (s + λ0 )(s + λ1 + µ1 )(s + µ2 ) − λ0 µ1 (s + µ2 ) − λ1 µ2 (s + λ0 )
(s + λ1 + µ1 )(s + µ2 ) − λ1 µ2 + s(2s + λ1 + µ1 + µ2 ) s→0 (s + λ1 + µ1 )(s + µ2 ) + (s + λ0 )(s + µ2 ) + (s + λ0 )(s + λ1 + µ1 ) − λ0 µ1 − λ1 µ2
= lim =
µ1 µ 2 , µ1 µ2 + λ0 µ2 + λ0 λ1
x 1 (∞) = lim sL(x1 (t)) s→0
sλ0 (s + µ2 ) s→0 (s + λ0 )(s + λ1 + µ1 )(s + µ2 ) − λ0 µ1 (s + µ2 ) − λ1 µ2 (s + λ0 )
= lim =
λ0 µ2 , µ1 µ2 + λ0 µ2 + λ0 λ1
x 2 (∞) = lim sL(x2 (t)) s→0
= lim
s→0
=
sλ0 λ1 (s + λ0 )(s + λ1 + µ1 )(s + µ2 ) − λ0 µ1 (s + µ2 ) − λ1 µ2 (s + λ0 )
λ0 λ1 . µ1 µ2 + λ0 µ2 + λ0 λ1
In deriving the above limits, we had to take the first derivatives of the numerators and the denominators with respect to s, respectively.
REFERENCES
[1] B. S. Abbas and W. Kuo. Stochastic effectiveness models for human-machine systems. IEEE Transactions on Systems, Man and Cybernetics, 20(4):826–834, 1990. [2] J. A. Abraham. An improved method for network reliability. IEEE Transactions on Reliability, R-28(1):58–61, 1979. [3] M. Abramowitz and I. A. Stegun. Handbook of Mathematical Functions. Dover, New York, 1965. [4] K. Adachi and M. Kodama. k-out-of-n:G system with simultaneous failure and three repair policies. Microelectronics and Reliability, 19:351–361, 1979. [5] K. K. Aggarwal, J. S. Gupta, and K. B. Misra. A new method for system reliability evaluation. Microelectronics and Reliability, 12:435–440, 1973. [6] S. Akhtar. Comment on modeling a shared-load k-out-of-n:G system. IEEE Transactions on Reliability, R-41(2):189, 1992. [7] S. Akhtar. Reliability of k-out-of-n:G systems with imperfect fault-coverage. IEEE Transactions on Reliability, R-43(1):101–106, 1994. [8] P. D. Alevizos, S. G. Papastavridis, and P. Sypasas. Reliability of cyclic m-consecutivek-out-of-n:F system. In H. Pham and M. H. Hamza (Eds.), Proceedings of IASTED Conference on Reliability, Quality Control and Risk Assessment. ACTA Press, Calgary, Canada, 1992, pp. 140–143. [9] S. V. Amari, J. B. Dugan, and Misra R. B. A separable method for incorporating imperfect fault-coverage into combinatorial models. IEEE Transactions on Reliability, R-48(3):267–274, 1999. [10] S. V. Amari, J. B. Dugan, and R. B. Misra. Optimal reliability of systems subject to imperfect fault-coverage. IEEE Transactions on Reliability, R-48(3):275–284, 1999. [11] S. V. Amari and R. B. Misra. Comments on: dynamic reliability analysis of coherent multi-state systems. IEEE Transactions on Reliability, R-46(4):460–461, 1997. [12] K. Andrzejczak. Structure analysis of multi-state coherent systems. Optimization, 25:301–316, 1992. [13] J. E. Angus. On computing MTBF for a k-out-of-n:G repairable system. IEEE Transactions on Reliability, R-37(3):312–313, 1988. 513
514
REFERENCES
[14] J. I. Ansell and A. Bendell. On alternative definitions of multi-state coherent systems. Optimization, 18(1):119–136, 1987. [15] I. Antonopoulou and S. Papastavridis. Fast recursive algorithm to evaluate the reliability of a circular consecutive-k-out-of-n:F system. IEEE Transactions on Reliability, R-36(1):83–84, 1987. [16] T. Aven. Reliability evaluation of multi-state systems with multi-state components. IEEE Transactions on Reliability, R-34(5):473–479, 1981. [17] D. S. Bai, W. Y. Yun, and S. W. Chung. Redundancy optimization of k-out-of-n systems with common-cause failures. IEEE Transactions on Reliability, R-40(1):56–59, 1991. [18] S. K. Banerjee and K. Rajarnani. Closed form solution for delta-star and star-delta conversions of reliability networks. IEEE Transactions on Reliability, R-25(2):118– 119, 1976. [19] A. D. Barbour, O. Chryssaphinou, and M. Roos. Compound Poisson approximation in systems reliability. Naval Research Logistics, 43:251–264, 1996. [20] R. E. Barlow and K. D. Heidtmann. Computing k-out-of-n system reliability. IEEE Transactions on Reliability, R-33:322–323, 1984. [21] R. E. Barlow, L. C. Hunter, and F. Proschan. Optimum redundancy when components are subject to two kinds of failure. Journal of Society for Industrial and Applied Mathematics, 11(1):64–73, 1963. [22] R. E. Barlow and F. Proschan. Statistical Theory of Reliability and Life Testing: Probability Models. Holt, Rinehart and Winston, New York, 1975. [23] R. E. Barlow and A. S. Wu. Coherent systems with multi-state components. Mathematics of Operations Research, 3(4):275–281, 1978. [24] L. A. Baxter and F. Harche. Note: On the greedy algorithm for optimal assembly. Naval Research Logistics, 39:833–837, 1992. [25] L. A. Baxter and F. Harche. On the optimal assembly of series-parallel systems. Operations Research Letters, 11:153–157, 1992. [26] L. A. Belfore. An O(n(log2 (n))2 ) algorithm for computing the reliability of k-out-ofn:G & k-to-l-out-of-n:G systems. IEEE Transactions on Reliability, R-44(1):132–136, 1995. [27] Y. Ben-Dov. Optimal reliability design of k-out-of-n redundant systems subjected to two kinds of failure. Journal of the Operational Research Society, 31(8):743–749, 1980. [28] W. B. Beyer. CRC Standard Mathematical Tables and Formulae, 29th ed., CRC Press, Boca Raton, FL, 1991. [29] R. E. Billinton and R. N. Allan. Reliability Evaluation of Engineering Systems: Concepts and Techniques. Plenum, New York, 1983. [30] Z. W. Birnbaum. On the importance of different components in a multi-component system. In P. R. Krishnaiah, (Ed.), Multivariate Analysis, Vol. 2, Academic Press, New York, 1969, pp. 581–592. [31] Z. W. Birnbaum, J. D. Esary, and S. C. Saunders. Multi-component systems and structures and their reliability. Technometrics, 3(1):55–77, 1961. [32] H. W. Block, W. S. Griffith, and T. H. Savits. L-superadditive structure functions. Applied Probability, 21:919–929, 1989. [33] H. W. Block and T. H. Savits. Continuous multi-state structure functions. Operations Research, 32(3):703–714, 1982.
REFERENCES
515
[34] H. W. Block and T. H. Savits. A decomposition for multi-state monotone systems. Journal of Applied Probability, 19:391–402, 1982. [35] R. A. Boedigheimer and K. C. Kapur. Customer-driven reliability models for multistate coherent systems. IEEE Transactions on Reliability, R-43(1):46–50, 1994. [36] T. K. Boehme, A. Kossow, and W. Preuss. A generalization of consecutive-k-out-of-n:F system. IEEE Transactions on Reliability, R-41(3):451–457, 1992. [37] P. J. Boland, F. Proschan, and Y. L. Tong. Optimal arrangement of components via pairwise rearrangements. Naval Research Logistics, 36:807–815, 1989. [38] R. C. Bollinger. Direct computation for consecutive-k-out-of-n:F systems. IEEE Transactions on Reliability, R-31(5):444–446, 1982. [39] R. C. Bollinger and A. A. Salvia. Consecutive-k-out-of-n:F networks. IEEE Transactions on Reliability, R-31(1):53–55, 1982. [40] R. C. Bollinger and A. A. Salvia. Consecutive-k-out-of-n:F system with sequential failures. IEEE Transactions on Reliability, R-34(1):43–45, 1985. [41] R. N. Bracewell. The Fourier Transform and Its Applications, 3rd ed. McGraw-Hill, Boston, 2000. [42] R. D. Brunell and K. C. Kapur. Customer-centered reliability methodology. In Proceedings of the Annual Reliability and Maintainability Symposium. IEEE, Piscataway, New Jersey, 1997, pp. 286–292. [43] A. K. Chakravarty, J. B. Orlin, and U. G. Rothblum. A partitioning problem with additive objective with an application to optimal inventory groupings for joint replenishment. Operations Research, 30:1018–1022, 1982. [44] A. K. Chakravarty, J. B. Orlin, and U. G. Rothblum. Consecutive optimizers for a partition problem with applications to optimal inventory groups for joint replenishment. Operations Research, 33:821–834, 1985. [45] G. J. Chang, L. Cui, and F. K. Hwang. Reliabilities of Consecutive-k Systems. Kluwer, Boston, 2000. [46] H. W. Chang and F. K. Hwang. Existence of invariant series consecutive-k-out-of-n:G systems. IEEE Transactions on Reliability, R-48(3):306–308, 1999. [47] J. C. Chang, R. J. Chen, and F. K. Hwang. A fast reliability algorithm for the circular consecutive-weighted-k-out-of-n:F system. IEEE Transactions on Reliability, R47(4):472–474, 1998. [48] A. Chao and W. D. Hwang. Bayes estimation of reliability for special k-out-of-m:G systems. IEEE Transactions on Reliability, R-32(4):370–373, 1983. [49] M. T. Chao and J. C. Fu. A limit theorem of certain repairable systems. Annals of the Institute of Statistical Mathematics, 41(4):809–818, 1989. [50] M. T. Chao and J. C. Fu. The reliability of large series systems under Markov structure. Advances in Applied Probability, 23:894–908, 1991. [51] M. T. Chao and G. D. Lin. Economical design of large consecutive-k-out-of-n:F systems. IEEE Transactions on Reliability, R-33(5):411–413, 1984. [52] A. A. Chari. Optimal redundancy of k-out-of-n:G system with two kinds of CCFS. Microelectronics and Reliability, 34(6):1137–1139, 1994. [53] I. R. Chen. Software-based k-out-of-n systems: Theory and example. In Proceedings of 1991 Southeastcon, Vol. 1. IEEE, Piscataway, New Jersey, 1991, pp. 193–197. [54] D. T. Chiang and S. C. Niu. Reliability of consecutive-k-out-of-n:F system. IEEE Transactions on Reliability, R-30(1):87–89, 1981.
516
REFERENCES
[55] O. Chrysaphinou and S. Papastavridis. Limit distribution for a consecutive k-out-of-n:F system. Advances in Applied Probability, 22:491–493, 1990. [56] C. S. Chung and J. Flynn. Optimal replacement policies for k-out-of-n systems. IEEE Transactions on Reliability, R-38(4):462–467, 1989. [57] W. K. Chung. Reliability analysis of a k-out-of-n:G redundant system with multiple critical errors. Microelectronics and Reliability, 30(5):907–910, 1990. [58] T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms, 8th ed. MIT Press, Cambridge, MA, 1992. [59] D. R. Cox and D. Oakes. Analysis of Survival Data. Chapman & Hall, New York, 1984. [60] C. Derman, G. J. Lieberman, and S. M. Ross. On optimal assembly of systems. Naval Research Logistics Quarterly, 19:569–574, 1972. [61] C. Derman, G. J. Lieberman, and S. M. Ross. Assembly of systems having maximum reliability. Naval Research Logistics Quarterly, 21:1–12, 1974. [62] C. Derman, G. J. Lieberman, and S. M. Ross. On the consecutive-k-out-of-n:F system. IEEE Transactions on Reliability, R-31(1):57–63, 1982. [63] B. S. Dhillon and O. C. Anude. Common-cause failure analysis of a k-out-of-n:G system with repairable units. Microelectronics and Reliability, 34(3):429–442, 1994. [64] B. S. Dhillon and O. C. Anude. Common-cause failure analysis of a k-out-of-n:G system with non-repairable units. International Journal of System Science, 26(10):2029– 2042, 1995. [65] C. Dichirico and C. Singh. Reliability analysis of transmission lines with common mode failures when repair times are arbitrarily distributed. IEEE Transactions on Power Systems, 3(3):1012–1019, 1988. [66] S. A. Doyle, J. B. Dugan, and F. A. Patterson-Hine. A combinatorial approach to modeling imperfect coverage. IEEE Transactions on Reliability, R-44(1):87–94, 1995. [67] D. Z. Du and F. K. Hwang. Optimal consecutive-2-out-of-n systems. Mathematics of Operations Research, 11(1):187–191, 1986. [68] D. Z. Du and F. K. Hwang. A direct algorithm for computing reliability of a consecutive-k cycle. IEEE Transactions on Reliability, R-37(1):70–72, 1988. [69] D. Z. Du and F. K. Hwang. Optimal assembly of an s-stage k-out-of-n system. SIAM Journal of Discrete Mathematics, 3(3):349–354, 1990. [70] J. B. Dugan and K. S. Trivedi. Coverage modeling for dependability analysis of faulttolerant systems. IEEE Transactions on Reliability, R-38(6):775–787, 1989. [71] E. El-Neweihi, F. Proschan, and J. Sethuraman. Multi-state coherent system. Journal of Applied Probability, 15:675–688, 1978. [72] E. El-Neweihi, F. Proschan, and J. Sethuraman. Optimal allocation of components in parallel-series and series-parallel systems. Journal of Applied Probability, 23(3):770– 777, 1986. [73] A. Erdelyi. Tables of Integral Transforms, Vol. 1. McGraw-Hill, New York, 1954. [74] J. D. Esary and F. Proschan. A reliability bound for systems of maintained, interdependent components. Journal of American Statistical Association, 65(329):329–338, 1970. [75] W. Feller. An Introduction to Probability Theory and Its Applications, 3rd ed., Vol. 1. Wiley, New York, 1968. [76] J. Flynn, C. S. Chung, and D. Chiang. Replacement policies for a multicomponent reliability system. Operations Research Letters, 7(2):167–172, 1988.
REFERENCES
517
[77] L. Fratta and U. G. Montanari. A Boolean algebra method for computing the terminal reliability in a communication network. IEEE Transactions on Circuit Theory, CT20:203–211, 1973. [78] J. E. Freund and R. E. Walpole. Mathematical Statistics, 3rd ed. Prentice-Hall, Englewood Cliffs, NJ, 1980. [79] J. C. Fu. Reliability of consecutive-k-out-of-n:F systems with (k − 1)-step Markov dependence. IEEE Transactions on Reliability, R-35(5):602–606, 1986. [80] J. C. Fu and B. Hu. On reliability of a large consecutive-k-out-of-n:F system with (k − 1)-step Markov dependence. IEEE Transactions on Reliability, R-36(1):75–77, 1987. [81] J. C. Fu and M. V. Koutras. Poission approximation for two-dimensional patterns. Annals of the Institute of Statistical Mathematics, 46(2):179–192, 1994. [82] J. C. Fu and W. Y. Lou. On reliabilities of certain large linearly connected engineering systems. Statistics and Probability Letters, 12:291–296, 1991. [83] G. Ge and L. Wang. Exact reliability formula for consecutive k-out-of-n:F systems with homogeneous Markov dependence. IEEE Transactions on Reliability, R-39:600–602, 1990. [84] A. P. Godbole, L. K. Potter, and J. K. Sklar. Improved upper bounds for the reliability of d-dimensional consecutive-k-out-of-n:F systems. Naval Research Logistics, 45:219– 230, 1998. [85] I. P. Goulden. Generating functions and reliabilities for consecutive k-out-of-n:F systems. Utilitas Mathematic, 32(1):141–147, 1987. [86] W. S. Griffith. Multi-state reliability models. Journal of Applied Probability, 17:735– 744, 1980. [87] W. S. Griffith. On consecutive-k-out-of-n failure systems and their generalizations. In A. P. Basu (Ed.), Reliability and Quality Control. Elseveir Science, Amsterdam, 1986, pp. 157–165. [88] W. S. Griffith and Z. Govindarajulu. Consecutive-k-out-of-n failure systems: Reliability and availability, component importance, and muti state extensions. American Journal of Mathematical and Management Sciences, 5(1,2):125–160, 1985. [89] D. L. Grosh. Comments on the delta-star problem. IEEE Transactions on Reliability, R-32(4):391–394, 1983. [90] H. Gupta and J. Sharma. A method of symbolic steady-state availability evaluation of k-out-of-n:G system. IEEE Transactions on Reliability, R-28(1):56–57, 1979. [91] H. Gupta and J. Sharma. State transition matrix and transition diagram of k-out-of-n:G system with spares. IEEE Transactions on Reliability, R-30(4):395–397, 1981. [92] K. D. Heidtmann. Improved method of inclusion-exclusion applied to k-out-of-n systems. IEEE Transactions on Reliability, R-31(1):36–40, 1982. [93] M. Hiam and Z. Porat. Bayes reliability modeling of a multistate consecutive k-out-ofn:F system. In Proceedings of the Annual Reliability and Maintainability Symposium. IEEE, Piscataway, New Jersey, 1991, pp. 582–586. [94] J. Huang. Reliability evaluation and analysis of multi-state systems. Ph.D. thesis, University of Alberta, Edmonton, Alberta, Canada, 2001. [95] J. Huang and M. J. Zuo. Dominant multi-state systems. IEEE Transactions on Reliability, submitted for publication.
518
REFERENCES
[96] J. Huang, M. J. Zuo, and Z. Fang. Multi-state consecutive-k-out-of-n systems. IIE Transactions on Quality and Reliability Engineering, submitted for publication. [97] J. Huang, M. J. Zuo, and Y. H. Wu. Generalized multi-state k-out-of-n:G systems. IEEE Transactions on Reliability, R-49(1):105–111, 2000. [98] J. Huang, M. Zuo, and W. Kuo. Multi-state k-out-of-n systems. In H. Pham (Ed.), Reliability Engineering Handbook. Springer-Verlag, London, 2002. [99] J. C. Hudson and K. C. Kapur. Reliability analysis for multi-state systems with multistate components. IIE Transactions, R-32(2):183–185, 1983. [100] F. K. Hwang. Fast solutions for consecutive-k-out-of-n:F system. IEEE Transactions on Reliability, R-31(5):447–448, 1982. [101] F. K. Hwang. Simplified reliabilities for consecutive-k-out-of-n:F systems. SIAM Journal on Algebraic and Discrete Methods, 7(2):258–264, 1986. [102] F. K. Hwang. Invariant permutations for consecutive-k-out-of-n cycles. IEEE Transactions on Reliability, R-38(1):65–67, 1989. [103] F. K. Hwang. Optimal assignment of components to a two-stage k-out-of-n system. Mathematics of Operations Research, 14(2):376–382, 1989. [104] F. K. Hwang, L. R. Cui, J. C. Chang, and W. D. Lin. Comments on “Reliability and component importance of a consecutive-k-out-of-n systems” by Zuo. Microelectronics and Reliability, 40(6):1061–1063, 2000. [105] F. K. Hwang and D. Shi. Redundant consecutive-k-out-of-n:F systems. Operations Research Letters, 6(6):293–296, 1987. [106] F. K. Hwang and P. E. Wright. An O(k 3 log(n/k)) algorithm for the consecutive-k-outof-n:F system. IEEE Transactions on Reliability, R-44(1):128–131, 1995. [107] F. K. Hwang and Y. C. Yao. Multistate consecutively-connected systems. IEEE Transactions on Reliability, R-38(4):472–474, 1989. [108] F. K. Hwang and Y. C. Yao. A direct argument for Kaplansky’s theorem on cyclic arrangement and its generalization. Operations Research Letters, 10:241–243, 1991. [109] Math Works. Using Matlab Version 5. Math Works, Natick, MA, 1996. [110] M. Iosifescu. Finite Markov Processes and Their Applications. Wiley, New York, 1980. [111] S. Janson. Poisson approximation for large deviations. Random Structures and Algorithms, 1:221–230, 1990. [112] C. E. Jaske and F. A. Simonen. Creep-rupture properties for use in the life assessment of fired heater tubes. In K. Natesan and D. J. Tillack (Eds.), Proceedings of the First International Conference on Heat-Resistant Materials, ASM International, Materials Park, OH, 1991, pp. 485–493. [113] K. H. Jung. MTBF and MSFT for a k-out-of-n:G system with two types of forced outages. Reliability Engineering and System Safety, 35(2):117–125, 1992. [114] K. C. Kapur and L. R. Lamberson. Reliability in Engineering Design. Wiley, New York, 1977. [115] A. Kaufman, D. Grouchko, and R. Cruon. Mathematical Models for the Study of the Reliability of Systems. Academic, New York, 1977. [116] H. Kim and K. G. Shin. Modeling of externally-induced/common-cause faults in faulttolerant systems. In Proceedings of Digital Avionics Systems Conference, IEEE, Piscataway, New Jersey, 1994, pp. 402–407. [117] D. E. Knuth. The Art of Computer Programming, Vol. 1: Fundamental Algorithms. Addison-Wesley, Reading, MA, 1969.
REFERENCES
519
[118] J. M. Kontoleon. Reliability determination of a r -successive-out-of-n:F system. IEEE Transactions on Reliability, R-29:437, 1980. [119] A. Kossow and W. Preuss. Reliability of linear consecutively-connected systems with multistate components. IEEE Transactions on Reliability, R-44(3):518–522, 1995. [120] M. V. Koutras. On a Markov chain approach for the study of reliability structures. Journal of Applied Probability, 33(2):357–367, 1996. [121] M. V. Koutras, G. K. Papadopoulos, and S. G. Papastavridis. Reliability of 2dimensional consecutive-k-out-of-n:F systems. IEEE Transactions on Reliability, R42(4):658–661, 1993. [122] M. V. Koutras, G. K. Papadopoulos, and S. G. Papastavridis. Note: Pairwise rearrangements in reliability structures. Naval Research Logistics, 41:683–687, 1994. [123] M. V. Koutras, G. K. Papadopoulos, and S. G. Papastavridis. A reliability bound for 2dimensional consecutive-k-out-of-n:F systems. Nonlinear Analysis, Methods, and Applications, 30(6):3345–3348, 1997. [124] B. Ksir. Comment on: 2-dimensional consecutive k-out-of-n:F models. IEEE Transactions on Reliability, R-41(4):575, 1992. [125] B. Ksir and M. Boushaba. Reliability bounds and direct computation of the reliability of a consecutive k-out-of-n:F system with Markov dependence. Microelectronics and Reliability, 33(3):313–317, 1993. [126] I. Kuhnert. Component relevancy in multi-state reliability models. IEEE Transactions on Reliability, R-44(1):95–96, 1995. [127] H. Kumamoto and E. H. Henley. Probabilistic Risk Assessment and Management for Engineers and Scientists, 2nd ed. IEEE, New York, 1996. [128] W. Kuo. Bayesian availability using gamma distributed priors. IEEE Transactions, 17(2):132–140, 1985. [129] W. Kuo, W. K. Chien, and T. Kim. Reliability, Yield, and Stress Burn-in. Kluwer Academic, Boston, 1998. [130] W. Kuo and T. Kim. An overview of manufacturing yield and reliability modeling for semiconductor products. Proceedings of the IEEE, 87(8):1329–1344, 1999. [131] W. Kuo and T. Kim. Semiconductor device manufacture yield and reliability modeling. In J. G. Webster (Ed.), Encyclopedia of Electrical and Electronics Engineering. Wiley, New York, 2001. [132] W. Kuo, V. R. Prasad, F. A. Tillman, and C. L. Hwang. Optimal Reliability Design: Fundamentals and Applications. Cambridge University Press, Cambridge, U.K., 2001. [133] W. Kuo and V. R. Prasad. An annotated overview of system reliability optimization. IEEE Transactions on Reliability, R-49(2):176–187, 2000. [134] W. Kuo, W. Zhang, and M. Zuo. A consecutive-k-out-of-n:G system: The mirror image of a consecutive-k-out-of-n:F system. IEEE Transactions on Reliability, R-39(2):244– 253, 1990. [135] P. H. Kvam. A parametric mixture-model for common-cause failure data (of nuclear power plants). IEEE Transactions on Reliability, R-47(1):30–34, 1998. [136] Y. Lam and Y. L. Zhang. Analysis of repairable consecutive-2-out-of-n:F systems with Markov dependence. International Journal of Systems Science, 30(8):799–809, 1990. [137] M. Lambiris and S. G. Papastavridis. Exact reliability formulas for linear and circular consecutive-k-out-of-n:F systems. IEEE Transactions on Reliability, R-34(2):124–126, 1985.
520
REFERENCES
[138] T. S. Lau. The reliability of exchangeable binary systems. Statistics and Probability Letters, 13(2):153–158, 1992. [139] A. Lesanovsky. Systems with two dual failure modes—a survey. Microelectronics and Reliability, 33(10):1597–1626, 1993. [140] D. Lin and M. J. Zuo. Reliability evaluation of a linear k-within-(r, s)-out-of-(m, n):F lattice system. Probability in the Engineering and Informational Sciences, 14(4):435– 443, 2000. [141] F. H. Lin and W. Kuo. Reliability importance and invariant optimal allocation. Journal of Heuristics, 8(2):155–172, 2002. [142] F. H. Lin, W. Kuo, and F. Hwang. Structure importance of consecutive-k-out-of-n systems, Operations Research Letters, 25:101–107, 1999. [143] M. S. Lin, M. S. Chang, and D. J. Chen. A generalization of consecutive k-out-of-n:G systems. IEICE Transactions on Information and Systems, E83-D(6):1309–1313, 2000. [144] M. Lipow. A simple lower bound for reliability of k-out-of-n:G systems. IEEE Transactions on Reliability, R-43(4):656–658, 1994. [145] H. Liu. Reliability of a load-sharing k-out-of-n:G system: Non-iid components with arbitrary distributions. IEEE Transactions on Reliability, R-47(3):279–284, 1998. [146] M. O. Locks. Recursive disjoint products, inclusion-exclusion, and min-cut approximations. IEEE Transactions on Reliability, R-29(5):368–371, 1980. [147] M. O. Locks. Recursive disjoint products: A review of three algorithms. IEEE Transactions on Reliability, R-31(1):33–35, 1982. [148] M. O. Locks. Comments on: Improved method of inclusion-exclusion applied to k-outof-n systems. IEEE Transactions on Reliability, R-33(4):321–323, 1984. [149] M. O. Locks. A minimizing algorithm for sum of disjoint products. IEEE Transactions on Reliability, R-36:445–453, 1987. [150] F. S. Makri and Z. M. Psillakis. Bounds for reliability of k-within two-dimensional consecutive-r -out-of-n failure systems. Microelectronics and Reliability, 36(3):341– 345, 1996. [151] J. Malinowski and W. Preuss. On the reliability of generalized consecutive systems — a survey. International Journal of Reliability, Quality and Safety Engineering, 2(2):187– 201, 1995. [152] J. Malinowski and W. Preuss. A recursive algorithm evaluating the exact reliability of a consecutive-k-within-m-out-of-n:F system. Microelectronics and Reliability, 35(12):1461–1465, 1995. [153] J. Malinowski and W. Preuss. Reliability of circular consecutively-connected systems with multi-state components. IEEE Transactions on Reliability, R-44(3):532–534, 1995. [154] J. Malinowski and W. Preuss. Lower and upper bounds for the reliability of connected(r, s)-out-of-(m, n):F lattice systems. IEEE Transactions on Reliability, R-45(1):156– 160, 1996. [155] J. Malinowski and W. Preuss. A recursive algorithm evaluating the exact reliability of a circular r -within-consecutive-k-out-of-n:F system. Microelectronics and Reliability, 36(10):1389–1394, 1996. [156] J. Malinowski and W. Preuss. Reliability evaluation for tree-structured systems with multi-state components. Microelectronics and Reliability, 36(1):9–17, 1996.
REFERENCES
521
[157] J. Malinowski and W. Preuss. Reliability of a 2-way linear consecutively connected system with multi-state components. Microelectronics and Reliability, 36(10):1483– 1488, 1996. [158] J. Malinowski and W. Preuss. Reliability of reverse-tree-structured systems with multistate components. Microelectronics and Reliability, 36(1):1–7, 1996. [159] J. Malinowski and W. Preuss. Reliability of a two-way circular consecutively connected system with multi-state components. Microelectronics and Reliability, 37(8):1255– 1258, 1997. [160] D. M. Malon. Optimal consecutive-2-out-of-n:F component sequencing. IEEE Transactions on Reliability, R-33(5):414–418, 1984. [161] D. M. Malon. Optimal consecutive-k-out-of-n:F component sequencing. IEEE Transactions on Reliability, R-34(1):46–49, 1985. [162] D. M. Malon. When is greedy module assembly optimal? Naval Research Logistics, 37:847–854, 1990. [163] U. Manber. Introduction to Algorithms: A Creative Approach. Addison-Wesley, Reading, MA, 1989. [164] A. W. Marshall and I. Olkin. Inequalities: Theory of Majorization and Its Applications. Academic, New York, 1979. [165] P. W. McGrady. The availability of a k-out-of-n:G network. IEEE Transactions on Reliability, R-34(5):451–452, 1985. [166] F. C. Meng. Component-relevancy and characterization results in multi-state systems. IEEE Transactions on Reliability, R-33(4):284–288, 1993. [167] F. C. Meng. Characterizing the Barlow-Wu structure functions. IEEE Transactions on Reliability, R-42(3):478–483, 1994. [168] K. B. Misra. Reliability Analysis and Prediction: A Methodology Oriented Treatment. Elsevier, Amsterdam, 1992. [169] M. Modarres. What Every Engineer Should Know about Reliability and Risk Analysis. Marcel Dekker, New York, 1993. [170] M. Morrison and S. Munshi. Availability of a v-out-of-(m + r ):G system. IEEE Transactions on Reliability, R-30(2):200–201, 1981. [171] M. S. Moustafa. Transient analysis of reliability with and without repair for k-outof-n:G systems with two failure modes. Reliability Engineering and System Safety, 53(1):31–35, 1996. [172] T. Nakagawa. Optimization problems in k-out-of-n systems. IEEE Transactions on Reliability, R-34(3):248–250, 1985. [173] B. Natvig. Two suggestions of how to define a multi-state coherent system. Applied Probability, 14:391–402, 1982. [174] J. Naus. An extension of the birthday problem. American Statistician, 22:27–29, 1968. [175] W. Nelson. Accelerated Testing: Statistical Models, Test Plans, and Data Analysis. Wiley, New York, 1990. [176] J. Newton. Comment on: Modeling a shared-load k-out-of-n:G system. IEEE Transactions on Reliability, R-42(1):140, 1993. [177] S. Papastavridis and M. V. Koutras. Consecutive k-out-of-n:F systems with maintenance. Annals of the Institute of Statistical Mathematics, 44:605–612, 1992. [178] S. Papastavridis and M. V. Koutras. Bounds for reliability of consecutive-k-within-mout-of-n systems. IEEE Transactions on Reliability, R-42(1):156–160, 1993.
522
REFERENCES
[179] S. Papastavridis and M. E. Sfakianakis. Optimal arrangement and importance of the components in a consecutive-k-out-of-r -from-n:F system. IEEE Transactions on Reliability, R-40:277–279, 1991. [180] S. G. Papastavridis. Upper and lower bounds for the reliability of a consecutive-k-outof-n:F system. IEEE Transactions on Reliability, R-35(5):607–610, 1986. [181] S. G. Papastavridis. A limit theorem for the reliability of a consecutive-k-out-of-n system. Advances in Applied Probability, 19:746–748, 1987. [182] S. G. Papastavridis. The most important component in a consecutive-k-out-of-n:F system. IEEE Transactions on Reliability, R-36(2):266–268, 1987. [183] S. G. Papastavridis. Lifetime distribution of circular consecutive-k-out-of-n:F systems with exchangeable lifetimes. IEEE Transactions on Reliability, R-38(4):460–461, 1989. [184] S. G. Papastavridis. m-consecutive-k-out-of-n:F system. IEEE Transactions on Reliability, R-39:386–388, 1990. [185] S. G. Papastavridis and M. Lambiris. Reliability of a consecutive-k-out-of-n:F system for Markov-dependent components. IEEE Transactions on Reliability, R-36(1):78–79, 1987. [186] E. Parzen. Stochastic Processes. Holden-Day, San Francisco, 1962. [187] H. Pham. Optimal system size for k-out-of-n systems with competing failure modes. Mathematical Computing Modeling, 15(6):77–81, 1991. [188] H. Pham. On the optimal design of k-out-of-n:G subsystems. IEEE Transactions on Reliability, R-41(4):572–574, 1992. [189] H. Pham. Optimal design of k-out-of-n redundant systems. Microelectronics and Reliability, 32:119–126, 1992. [190] H. Pham. Optimal cost-effective design of triple-modular-redundancy-with-spares systems. IEEE Transactions on Reliability, R-42(3):369–374, 1993. [191] H. Pham. Reliability analysis for dynamic configurations of systems with three failure modes. Reliability Engineering and System Safety, 63(1):13–23, 1999. [192] H. Pham and D. M. Malon. Optimal design of systems with competing failure modes. IEEE Transactions on Reliability, R-43(2):251–254, 1994. [193] H. Pham and M. Pham. Optimal designs of {k, n − k + 1}-out-of-n:F systems (subject to 2 failure modes). IEEE Transactions on Reliability, R-40(5):559–562, 1991. [194] H. Pham and S. J. Upadhyaya. The efficiency of computing the reliability of k-out-of-n systems. IEEE Transactions on Reliability, R-37(5):521–523, 1988. [195] M. J. Phillips. k-out-of-n:G systems are preferable. IEEE Transactions on Reliability, R-29(2):166–169, 1980. [196] V. R. Prasad, W. Kuo, and K. O. Kim. Maximization of percentile of system life through component redundancy allocation. IIE Transactions, 33(12):1071–1079, 2001. [197] V. R. Prasad, K. P. K. Nair, and Y. P. Aneja. Optimal assignment of components to parallel-series and series-parallel systems. Operations Research, 39(3):407–414, 1991. [198] V. R. Prasad and M. Raghavachari. Optimal allocation of interchangeable components in a series-parallel system. IEEE Transactions on Reliability, R-47(3):255–260, 1998. [199] W. Preuss. On the reliability of generalized consecutive systems. Nonlinear Analysis, Theory, Methods and Applications, 30(8):5425–5429, 1997.
REFERENCES
523
[200] A. Reibman and H. Zaretsky. Modeling fault coverage and reliability in a fault-tolerant network. In Proceedings of 1990 Global Telecommunications Conference, Vol. 2. IEEE, Piscataway, New Jersey, 1990, pp. 689–692. [201] J. Riordan. An Introduction to Combinatorial Analysis. Wiley, New York, 1978. [202] T. Risse. On the evaluation of the reliability of k-out-of-n systems. IEEE Transactions on Reliability, R-36(4):433–435, 1987. [203] A. Rosenthal. Note on closed form solutions for delta-star and star-delta conversion of reliability networks. IEEE Transactions on Reliability, R-27(2):110–111, 1978. [204] A. Rosenthal and D. Frisque. Transformations for simplifying network reliability calculations. Network, 7(1):97–111, 1977. See also Errata, Vol. 7, No. 1, 1977, p. 382. [205] S. M. Ross. Applied Probability Models with Optimization Applications. Holden-Day, San Francisco, 1970. [206] S. M. Ross. Stochastic Processes, 2nd ed. Wiley, New York, 1996. [207] J. Rupe and Way Kuo. Perfomability of systems based on renewal process model. IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans, 28(5):691–698, 1998. [208] A. M. Rushdi. Utilization of symmetric switching functions in the computation of kout-of-n system reliability. Microelectronics and Reliability, 26(5):973–987, 1986. [209] A. M. Rushdi. Comment on: An efficient non–recursive algorithm for computing the reliability of k-out-of-n systems. IEEE Transactions on Reliability, R-40(1):60–61, 1991. [210] A. M. Rushdi and K. A. Al-Hindi. A table for the lower boundary of the region of useful redundancy for k-out-of-n systems. Microelectronics and Reliability, 33(7):979–992, 1993. [211] R. K. Sah. An explicit closed-form formula for profit-maximizing k-out-of-r systems subject to two kinds of failures. Microelectronics and Reliability, 30(6):1123–1130, 1990. [212] R. K. Sah and J. E. Stiglitz. Qualitative properties of profit-making k-out-of-n systems subject to two kinds of failures. IEEE Transactions on Reliability, R-37(5):515–520, 1988. [213] A. A. Salvia. Simple inequalities for consecutive-k-out-of-n:F networks. IEEE Transactions on Reliability, R-31(5):450, 1982. [214] A. A. Salvia and W. C. Lasher. 2-dimensional consecutive-k-out-of-n:F models. IEEE Transactions on Reliability, R-39(3):382–385, 1990. [215] A. K. Sarje and E. V. Prasad. An efficient non-recursive algorithm for computing the reliability of k-out-of-n systems. IEEE Transactions on Reliability, R-38(2):234–235, 1989. [216] M. R. Satam. Consecutive k-out-of-n:F system: A comment. IEEE Transactions on Reliability, R-40(1):62, 1991. [217] T. H. Savits. Some multivariate distributions derived from a non-fatal shock model. Journal of Applied Probability, 25(2):383–390, 1988. [218] E. M. Scheuer. Reliability of an m-out-of-n system when component failure induces higher failure rates in survivors. IEEE Transactions on Reliability, R-37(1):73–74, 1988. [219] J. L. Schiff. The Laplace Transform: Theory and Applications. Springer-Verlag, New York, 1999.
524
REFERENCES
[220] W. G. Schneeweiss. Reliability Modeling. LiLoLe-Verlag Gmb H, Hagen, Germany, 2001. [221] M. Sfakianakis, S. Kounias, and A. Hillaris. Reliability of a consecutive-k-out-of-r from-n:F system. IEEE Transactions on Reliability, R-41(3):442–447, 1992. [222] J. G. Shanthikumar. Recursive algorithm to evaluate the reliability of a consecutive-kout-of-n:F system. IEEE Transactions on Reliability, R-31(5):442–443, 1982. [223] J. G. Shanthikumar. Lifetime distribution of consecutive-k-out-of-n:F systems with exchangeable lifetimes. IEEE Transactions on Reliability, R-34(5):480–483, 1985. [224] J. G. Shanthikumar. Reliability of systems with consecutive minimal cutsets. IEEE Transactions on Reliability, R-36(5):546–550, 1987. [225] J. Shao and L. R. Lamberson. Markov model for k-out-of-n:G systems with built-in test. Microelectronics and Reliability, 31(1):123–131, 1991. [226] J. Shao and L. R. Lamberson. Modeling a shared-load k-out-of-n:G system. IEEE Transactions on Reliability, R-40(2):205–209, 1991. [227] J. She and M. G. Pecht. Reliability of a k-out-of-n warm-standby system. IEEE Transactions on Reliability, R-41(1):72–75, 1992. [228] J. Shen and M. Zuo. Optimal design or series consecutive-k-out-of-n:G systems. Reliability Engineering and System Safety, 45:277–283, 1994. [229] S. H. Sheu and C. M. Kuo. Optimal age replacement policy of a k-out-of-n system with age-dependent minimal repair. Operations Research, 28(1):85–96, 1994. [230] S. H. Sheu and C. M. Kuo. Optimization problems in k-out-of-n systems with minimal repair. Microelectronics and Reliability, 44:77–82, 1994. [231] S. H. Sheu and C. T. Liou. Optimal replacement of a k-out-of-n system subject to shocks. Microelectronics and Reliability, 32(5):649–655, 1992. [232] H. Singh and G. Vijayasree. Preservation of partial orderings under the formation of kout-of-n systems of i.i.d. components. IEEE Transactions on Reliability, R-40(3):273– 276, 1991. [233] R. C. Suich and R. L. Patterson. k-out-of-n:G systems: Some cost considerations. IEEE Transactions on Reliability, R-40(3):259–264, 1991. [234] R. C. Suich and R. L. Patterson. Minimize system cost by choosing optimal subsystem reliability & redundancy. In Proceedings of Annual Reliability and Maintainability Symposium. IEEE, Piscataway, New Jersey, 1993, pp. 293–297. [235] B. N. Sur. Reliability evaluation of k-out-of-n redundant system with partially energized stand-by units. Microelectronics and Reliability, 36(3):379–383, 1996. [236] F. A. Tillman, C. L. Hwang, and W. Kuo. Optimization of Systems Reliability, 3rd ed. Marcel Dekker, New York, 1988. [237] Y. L. Tong. A rearrangement inequality for the longest run, with an application to network reliability. Journal of Applied Probability, 22:386–393, 1985. [238] Y. L. Tong. Some new results on the reliability of circular consecutive-k-out-of-n:F system. In A. P. Basu (Ed.), Reliability and Quality Control. North-Holland, New York, 1986, pp. 395–400. [239] U.S. Nuclear Regulatory Commission (NRC). Fault Tree Handbook. NUREG-0492. NRC, Washington, DC, 1986. [240] J. K. Vaurio. An implicit method for incorporating common-cause failures in system analysis. IEEE Transactions on Reliability, R-47(2):173–180, 1998.
REFERENCES
525
[241] S. D. Wang and C. H. Sun. Transformations of star-delta and delta-star reliability networks. IEEE Transactions on Reliability, R-45(1):120–126, 1996. [242] V. K. Wei, F. K. Hwang, and V. T. Sos. Optimal sequencing of items in a consecutive2-out-of-n system. IEEE Transactions on Reliability, R-32(1):30–33, 1983. [243] H. S. Wilf. Algorithms and Complexity. Prentice-Hall, Englewood Cliffs, NJ, 1986. [244] J. M. Wilson. An improved minimizing algorithm for sum of disjoint products. IEEE Transactions on Reliability, R-39(1):42–45, 1990. [245] P. A. Wood. Multi-state block diagrams and fault trees. IEEE Transactions on Reliability, R-34(3):236–240, 1985. [246] J. S. Wu and R. J. Chen. An O(kn) algorithm for a circular consecutive-k-out-of-n:F system. IEEE Transactions on Reliability, R-41(2):303–305, 1992. [247] J. S. Wu and R. J. Chen. An algorithm for computing the reliability of a weighted-kout-of-n system. IEEE Transactions on Reliability, R-43:327–328, 1994. [248] J. S. Wu and R. J. Chen. Efficient algorithms for k-out-of-n & consecutive-weightedk-out-of-n:F system. IEEE Transactions on Reliability, R-43(4):650–655, 1994. [249] J. Xue. On multi-state system analysis. IEEE Transactions on Reliability, R-34(4):329– 337, 1985. [250] J. Xue and K. Yang. Symmetric relations in multi-state systems. IEEE Transactions on Reliability, R-44(4):689–693, 1995. [251] H. Yamamoto and M. Miyakawa. Reliability of a linear connected-(r ,s)-out-of-(m,n):F lattice system. IEEE Transactions on Reliability, R-44(2):333–336, 1995. [252] Y. C. Yeh. Triple-triple redundant 777 primary flight computer. In Proceedings of 1996 Aerospace Applications Conference, Vol. 1. IEEE, Aspen, Colorado, 1996, pp. 293– 307. [253] K. Yu, I. Koren, and Y. Guo. Generalized multi-state monotone coherent systems. IEEE Transactions on Reliability, R-43(2):242–250, 1994. [254] R. S. Zakaria, H. T. David, and W. Kuo. A counter-intuitive aspect of component importance in linear consecutive-k-out-of-n systems. IIE Transactions, 24(5):147–154, 1992. [255] Y. L. Zhang and T. P. Wang. Repairable consecutive-2-out-of-n:F system. Microelectronics and Reliability, 36(5):605–608, 1996. [256] Y. L. Zhang, M. J. Zuo, and R. C. M. Yam. Reliability analysis for a circular consecutive-2-out-of-n:F repairable system with priority in repair. Reliability Engineering and System Safety, 68(2):113–120, 2000. [257] M. Zuo. Reliability and component importance of a consecutive-k-out-of-n systems. Microelectronics and Reliability, 33(2):243–258, 1993. [258] M. Zuo. Reliability and design of 2-dimensional consecutive-k-out-of-n systems. IEEE Transactions on Reliability, R-42(3):488–490, 1993. [259] M. Zuo. Reliability of linear and circular consecutively-connected systems. IEEE Transactions on Reliability, R-42(3):484–487, 1993. [260] M. Zuo and W. Kuo. Design and performance analysis of consecutive-k-out-of-n structure. Naval Research Logistics, 37:203–230, 1990. [261] M. Zuo and M. Liang. Reliability of multi-state consecutively-connected systems. Reliability Engineering and System Safety, 44:173–176, 1994.
526
REFERENCES
[262] M. Zuo, D. Lin, and Y. Wu. Reliability evaluation of combined k-out-of-n:F, consecutive-kc -out-of-n:F and linear connected-(r ,s)-out-of-(m,n):F system structures. IEEE Transactions on Reliability, R-49(1):99–104, 2000. [263] M. Zuo and J. Shen. System reliability enhancement through heuristic design. In C. G. Soares, Y. Murotsu, A. Pittaluga, J. S. Spencer, and B. Stahl (Eds.), Proceedings of the 11th International Conference on Offshore Mechanics and Arctic Engineering, Vol. II: Safety and Reliability. The American Society of Mechanical Engineers, New York, 1992, pp. 301–304.
BIBLIOGRAPHY
A. M. Abouammoh and M. A. Al-Kadi. Component relevancy in multi-state reliability models. IEEE Transactions on Reliability, 40(3):370–374, 1991. A. M. Abouammoh and M. A. Al-Kadi. Multi-state coherent systems of order k. Microelectronics and Reliability, 35(11):1415–1421, 1995. K. K. Aggarwal, K. B. Misra, and J. S. Gupta. A fast algorithm for reliability evaluation. IEEE Transactions on Reliability, R-24(2):83–85, 1975. K. K. Aggarwal and S. Rai. Reliability evaluation in computer communication networks. IEEE Transactions on Reliability, R-30(1):32–35, 1981. R. K. Agnihotri, G. Singhal, and S. K. Khandelwal. Stochastic analysis of a two-unit redundant system with two types of failure. Microelectronics and Reliability, 32(7):901–904, 1992. S. Aki and K. Hirano. Estimation of parameters in discrete distributions of order k. Annals of the Institute of Statistical Mathematics, 41(1):47–61, 1989. J. Ansell and A. Bendell. On the optimality of k-out-of-n:G system. IEEE Transactions on Reliability, R-31(2):206–210, 1982. T. Bai and R. Zhang. Building of a multi-state reliability model of large generating units. Journal of Shanghai Jiaotong University, 31(6):33–37, 1997. S. K. Banerjee and K. Rajamani. Parametric representation of probability in two dimensions— a new approach in system reliability evaluation. IEEE Transactions on Reliability, R21:56–60, 1972. A. D. Barbour, L. Holst, and S. Janson. Poisson Approximation. Oxford University Press, 1991. A. Behr, L. Camarinopoulos, and G. Pampoukis. Domination of k-out-of-n systems. IEEE Transactions on Reliability, R-44(4):705–708, 1995. M. R. Bhuiyan and R. N. Allan. Modeling multi-state problems in sequential simulation of power system reliability studies. IEE Proceedings. Generation, Transmission and Distribution, 142(4):343–349, 1994. J. Biernat. A unified algorithm of reliability estimation of k-out-of-n systems implied by the structure function. Microelectronics and Reliability, 33(3):395–401, 1993. 527
528
BIBLIOGRAPHY
M. T. R. Blackwell. The effect of short production runs on CSP-1. Technometrics, 19:259–263, 1977. P. J. Boland and E. El-Neweihi. Measures of component importance in reliability theory. Computers and Operations Research, 22(4):455–463, 1995. P. J. Boland, F. Prochan, and Y. L. Tong. Linear dependence in consecutive k-out-of-n:F systems. Probability in the Engineering and Informational Sciences, 4:391–397, 1990. P. J. Boland, F. Prochan, and Y. L. Tong. A stochastic ordering of partial sums of independent random variables and of some random processes. Journal of Applied Probability, 29(3):645–654, 1992. R. C. Bollinger. Reliability and runs of ones. Mathematics Magazine, 57:34–37, 1984. R. C. Bollinger. Strict consecutive-k-out-of-n:F systems. IEEE Transactions on Reliability, R-34(1):50–52, 1985. R. C. Bollinger. An algorithm for direct computation in consecutive-k-out-of-n:F system. IEEE Transactions on Reliability, R-35(5):611–612, 1986. W. D. S. Borges and F. W. Rodrigues. An axiomatic characterization of multi-state coherent structure. Mathematics of Operations Research, 8(3):435–438, 1983. J. S. Brunner. A recursive method for calculating binary integration detection probabilities. IEEE Transactions on Aerospace and Electronic Systems, 26(6):1034–1035, 1990. R. L. Bulfin and C. Y. Liu. Optimal allocation of redundant components for large systems. IEEE Transactions on Reliability, R-34(3):241–247, 1985. D. A. Butler. Bounding the reliability of multi-state systems. Operations Research, 30(3):530– 544, 1982. G. Cafaro, F. Corsi, and F. Vacca. Multi-state Markov models and structural properties of the transition-rate matrix. IEEE Transactions on Reliability, R-35(2):192–200, 1986. E. R. Canfield and W. P. Mccormick. Asymptotic reliability of consecutive k-out-of-n systems. Journal of Applied Probability, 29:142–155, 1992. J. Cao and Y. Wu. Reliability analysis of a multi-state system with a replaceable repair facility. ACTA Mathematical Applicatae SINICA, 4(3):113–121, 1988. F. U. Chan, L. K. Chan, and G. D. Lin. On consecutive k-out-of-n:F system. European Journal of Operations Research, R-36:207–216, 1988. G. J. Chang and F. K. Hwang. Optimal consecutive-k-out-of-n:F systems under a fixed budget. Probability in the Engineering and Informational Sciences, 2(1):63–73, 1988. M. T. Chao, J. C. Fu, and M. V. Koutras. Survey of reliability studies of consecutive-k-out-ofn:F and related systems. IEEE Transactions on Reliability, 44:120–127, 1995. C. A. Charalambides. On discrete distributions of order k. Annals of the Institute of Statistical Mathematics, 38(3):557–568, 1986. G. Chaudhuri, K. Hu, and N. Afshar. A new approach to system reliability. IEEE Transactions on Reliability, R-50(1):75–84, 2001. C. C. Chen and K. M. Koh. Principle and Techniques in Combinatorics. World Scientific, Singapore, 1992. R. W. Chen. Correction to: Failure distribution of consecutive-k-out-of-n:F systems. IEEE Transactions on Reliability, R-38(4):474, 1989. R. W. Chen and F. K. Hwang. Failure distributions of consecutive-k-out-of-n:F systems. IEEE Transactions on Reliability, R-34(4):338–341, 1985.
BIBLIOGRAPHY
529
R. W. Chen, F. K. Hwang, and W. C. W. Li. A reversible model for consecutive-2-out-ofn:F systems with node and link failures. Probability in the Engineering and Informational Sciences, 8:189–200, 1994. D. T. Chiang and R. F. Chiang. Relayed communication via consecutive-k-out-of-n:F system. IEEE Transactions on Reliability, R-35(1):65–67, 1986. O. Chrysaphinou and S. Papastavridis. Reliability of a consecutive k-out-of-n:F system in a random environment. Journal of Applied Probability, 27:452–458, 1990. O. Chrysaphinou, S. G. Papastavridis, and P. Sypass. Relayed communication via parallel redundancy. IEEE Transactions on Reliability, R-38(4):454–456, 1989. W. K. Chung. An availability calculation for k-out-of-n redundant system with common-cause failures and replacement. Microelectronics and Reliability, 20(5):517–519, 1980. W. K. Chung. A k-out-of-n redundant system with common-cause failures. IEEE Transactions on Reliability, R-29(4):344, 1980. W. K. Chung. A k-out-of-n three state unit redundant system with common-cause failure and replacements. Microelectronics and Reliability, 21(4):589–591, 1981. W. K. Chung. A k-out-of-n:G redundant system with cold standby units and common-cause failures. Microelectronics and Reliability, 24(4):691–695, 1984. W. K. Chung. A reliability model for a k-out-of-n:G redundant system with multiple failure modes and common cause failures. Microelectronics and Reliability, 27(4):621–623, 1987. W. K. Chung. An availability analysis of a k-out-of-n:G redundant system with dependant failure rates and common cause failures. Microelectronics and Reliability, 28(3):391–393, 1988. W. K. Chung. A k-out-of-n:G redundant system with dependent failure rates and commoncause failures. Microelectronics and Reliability, 28(2):201–203, 1988. W. K. Chung. Reliability analysis of a k-out-of-n:G vehicle fleet. Microelectronics and Reliability, 29(4):549–553, 1989. W. K. Chung. A reliability analysis of a k-out-of-n: redundant system with the presence of chance common-cause shock failures. Microelectronics and Reliability, 32(10):1395– 1399, 1992. W. K. Chung. Reliability analysis of a k-out-of-n:G redundant system in the presence of chance with multiple critical errors. Microelectronics and Reliability, 33(3):331–334, 1993. O. A. Cocca. Some experiences using MIL-STD-105 in the United States Air Force. Naval Research Logistics Quarterly, 32:11–16, 1985. C. J. Colbourn. The Combinatorics of Network Reliability. Oxford University Press, New York, 1987. J. W. Cooley and J. W. Tukey. An algorithm for the machine calculation of complex Fourier series. Mathematics of Computation, 19:297–301, 1965. D. R. Cox. Renewal Theory. Methuen, London, 1962. H. F. Davis. Fourier Series and Orthogonal Functions. Allyn and Bacon, Boston, 1963. B. S. Dhillon. A 4-unit redundant system with common-cause failures. IEEE Transactions on Reliability, R-26:373–374, 1977. B. S. Dhillon. A common cause failure availability model. Microelectronics and Reliability, 17:583–584, 1978. B. S. Dhillon. A 4-unit redundant system with common-cause failures. IEEE Transactions on Reliability, R-28(3):267, 1979.
530
BIBLIOGRAPHY
G. M. Dillard. A moving-window detector for binary integration. IEEE Transactions on Information Theory, 13(1):2–6, 1967. D. Z. Du and F. K. Hwang. Optimal consecutive-2 systems of lines and cycles. Networks, 15:439–447, 1985. D. Z. Du and F. K. Hwang. Optimal assignments for consecutive-2 graphs. SIAM Journal on Algebraic and Discrete Methods, 8(3):510–518, 1987. D. Z. Du and F. K. Hwang. Reliabilities of consecutive-2 graphs. Probability in the Engineering and Informational Sciences, 1:293–298, 1987. A. T. Duncan, A. B. Mundel, A. B. Godfrey, and V. A. Patridge. LQL indexed plan that are compatible with the structure of MIL-STD-105D. Journal of Quality Technology, 12(1):40–46, 1980. J. D. Echard. Estimation of radar detection and false alarm probabilities. IEEE Transactions on Aerospace and Electronic Systems, 27(2):255–259, 1991. T. Egeland. Mixing dependence and system reliability. Scandinavian Journal of Statistics, 19(2):173–183, 1992. B. B. Fawzi and A. G. Hawkes. Availability of an R-out-of-N system with spares and repairs. Journal of Applied Probability, 28(2):379–408, 1991. J. C. Fu. Reliability of a large consecutive-k-out-of-n:F system. IEEE Transactions on Reliability, R-34(2):127–130, 1985. J. C. Fu. Bounds for reliability of large consecutive-k-out-of-n:F systems with unequal component reliability. IEEE Transactions on Reliability, R-35(3):316–319, 1986. J. C. Fu. Poisson convergence in reliability of a large linearly connected system as related to coin tossing. Statistic Sinica, 3:261–275, 1992. J. C. Fu and M. V. Koutras. Distribution theory of runs: a Markov chain approach. Journal of American Statistical Association, 89(427):1050–1058, 1994. E. Fujiwara and K. Matsuoka. Fault-tolerant k-out-of-n logic unit networks. Systems and Computers in Japan, 19(9):21–31, 1988. J. B. Fussell. How to hand-calculate system reliability and safety characteristics. IEEE Transactions on Reliability, R-24(3):169–174, 1975. J. P. Gadani. System effectiveness evaluation using star and delta transformations. IEEE Transactions on Reliability, R-30(1):43–47, 1981. I. N. Gibra. Economic design of attribute control charts for multiple assignable causes. Journal of Quality Technology, 13(2):93–99, 1981. A. P. Godbole. Degenerate and Poisson convergence criteria for success runs. Statistics and Probability Letters, 10(3):247–255, 1990. A. P. Godbole. Poisson approximations for runs and patterns of rare events. Advances in Applied Probability, 23(4):851–865, 1991. A. P. Godbole. Approximate reliabilities of m-consecutive-k-out-of-n:failure systems. Statistic Sinica, 3:321–327, 1993. L. R. Goel and P. Gupta. Analysis of a k-out-of-n unit system with two types of failure and preventive maintenance. Microelectronics and Reliability, 24(5):877–880, 1984. I. P. Goulden and D. M. Jackson. Combinatorial Enumeration. Wiley, New York, 1983. I. P. Goulden and D. M. Jackson. Algorithmic connection for circular permutation enumeration. Studies in Applied Math, 70(1):121–139, 1984. K. K. Govil. Maintainability and availability calculations for series, parallel and r -out-of-n configurations. Microelectronics and Reliability, 23(5):785–787, 1983.
BIBLIOGRAPHY
531
I. Greenberg. The first occurrence of n successes in N trials. Technometrics, 12(3):627–634, 1970. H. Gupta and J. Sharma. A delta-star transformation approach for reliability evaluation. IEEE Transactions on Reliability, R-27(3):212–214, 1979. R. Gupta and A. Chaudhary. Analysis of reliability bounds for series, parallel, and k-out-of-n system configuration. Microelectronics and Reliability, 34(1):183–185, 1992. S. V. Gurov, L. V. Utkin, and I. B. Shubinsky. Optimal reliability allocation of redundant units and repair facilities by arbitrary failure and repair distributions. Microelectronics and Reliability, 35(12):1451–1460, 1995. W. A. Hailey. Minimum sample size single sampling plan: A computerized approach. Journal of Quality Technology, 12(4):230–235, 1980. A. Hald. The determination of single sampling attribute plans with given producer’s and consumer’s risk. Technometrics, 9(3):401–415, 1967. K. D. Heidtmann. Author reply. IEEE Transactions on Reliability, R-33:322, 1984. A. H. Hevesh. Comments on: Steady-state availability of k-out-of-n:G system with single repair. IEEE Transactions on Reliability, R-33(4):324, 1984. T. Hidaka. Approximated reliability of r -out-of-n:F system with common cause failure and maintenance. IEEE Transactions on Reliability, R-32(6):817–832, 1992. I. D. Hill. The design of MIL-STD-105D sampling tables. Journal of Quality Technology, 5(2):80–83, 1973. A. Hoyland and M. Rausand. System Reliability Theory: Models and Statistical Methods. Wiley, 1994. J. Huang and J. Daughton. Yield optimization in wafer scale circuits with hierarchical redundancies. Integration, the VLSI Journal, 33:43–51, 1986. J. Huang and L. Wu. A new approach to make a policy decision on mining equipment maintenance. Quarterly of CIMR, 10(1):29–34, 1990. J. C. Hudson and K. C. Kapur. Modules in coherent multi-state systems. IEEE Transactions on Reliability, R-15(2):127–135, 1983. J. C. Hudson and K. C. Kapur. Reliability bounds for multi-state systems with multi-state components. Operations Research, 33(1):153–160, 1985. R. P. Hughes. A new approach to common cause failure. Reliability Engineering, 17(3):211– 236, 1987. G. S. Hura. On the determination of all sets and minimal cutsets between any two nodes of a graph through Petri nets. Microelectronics and Reliability, 23(3):471–475, 1983. F. K. Hwang. Optimal partitions. Journal of Optimization Theory and Applications, 34(1):1– 10, 1981. F. K. Hwang. Relayed consecutive-k-out-of-n:F lines. IEEE Transactions on Reliability, R37:512–514, 1988. F. K. Hwang. Comments on strict consecutive system. IEEE Transactions on Reliability, 40(3):264,270, 1991. F. K. Hwang. Do component reliabilities depend on system size? IEEE Transactions on Reliability, R-41(4):486–487, 1992. F. K. Hwang. An O(kn)-time algorithm for computing the reliability of a circular consecutivek-out-of-n system. IEEE Transactions on Reliability, R-42(1):161–162, 1993. F. K. Hwang and S. Papastavridis. Binary vectors with exactly k non-overlapping m-tuples of consecutive ones. Discrete Applied Mathematics, 30:83–86, 1991.
532
BIBLIOGRAPHY
F. K. Hwang and D. Shi. Optimal relayed mobile communication systems. IEEE Transactions on Reliability, 38(4):457–459, 1989. F. K. Hwang, J. Sun, and E. Y. Yao. Optimal set partitioning. SIAM Journal on Algebraic and Discrete Methods, 6(1):163–170, 1985. F. K. Hwang and Y. C. Yao. On failure rates of consecutive k-out-of-n:F systems. Probability in Engineering and Applied Mathematics, 4(1):57–71, 1990. S. Iyer. Distribution of time to failure of consecutive-k-out-of-n:F systems. IEEE Transactions on Reliability, R-39(1):97–101, 1990. S. Iyer. Distribution of the lifetime of consecutive k-within-m-out-of-n:F systems. IEEE Transactions on Reliability, R-41(3):448–450, 1992. S. P. Jain and K. Gopal. Recursive algorithm for reliability evaluation of k-out-of-n:G system. IEEE Transactions on Reliability, R-34(2):144–146, 1985. S. P. Jain and K. Gopal. Reliability of k-to-l-out-of-n systems. Reliability Engineering, 12:175–179, 1985. J. M. Jobe and H. T. David. Buehler confidence bounds for a reliability-maintainability measure. Technometrics, 34(2):214–222, 1992. K. H. Jung, H. Kim, and Y. Ko. Reliability of a k-out-of-n:G system with common-mode outages. Reliability Engineering and System Safety, 41(1):5–11, 1993. S. C. Kao. Computing reliability from warranty. In Proceedings of the Statistical Computing Section, American Statistical Association, 1982, pp. 309–312. A. Kaplan and E. MacDonald. Instantaneous switching procedure for MIL-STD-105D. Journal of Quality Technology, 1(3):172–174, 1969. S. Karlin and J. McGregor. The differential equations of birth-and-death processes and Stieltjes moment problem. Transactions of the American Mathematical Society, 85(2):489–546, 1957. J. Karpinski. A multi-state system under an inspection and repair policy. IEEE Transactions on Reliability, R-35(1):76–77, 1986. R. L. Kenyon and R. J. Newell. Steady-state availability of k-out-of-n:G system with single repair. IEEE Transactions on Reliability, R-32(2):188–190, 1983. S. W. Kiu. MTTF and MTFF of a k-out-of-n system. Microelectronics and Reliability, 27(5):913–922, 1987. J. M. Kontoleon. Analysis of a dynamic redundant system. IEEE Transactions on Reliability, R-27:116–119, 1978. J. M. Kontoleon. Optimum allocation of components in a special 2-port network. IEEE Transactions on Reliability, R-27(2):112–115, 1978. J. M. Kontoleon. Reliability improvement of multiterminal networks. IEEE Transactions on Reliability, R-29(5):75–76, 1980. I. Kopocinska. Breakdown processes of conservative systems. Complex Systems, 10:301–309, 1996. E. Korczak. Reliability analysis of multi-state monotone systems. Safety and Reliability Assessment, 2:671–682, 1993. A. Kossow. Correction to: Reliability of consecutive-k-out-of-n:F systems with nonidentical component reliabilities. IEEE Transactions on Reliability, R-38(4):482, 1989. A. Kossow and W. Preuss. Failure probability of strict consecutive-k-out-of-n:F systems. IEEE Transactions on Reliability, R-36(5):551–553, 1987.
BIBLIOGRAPHY
533
A. Kossow and W. Preuss. Reliability of consecutive-k-out-of-n:F systems with nonidentical component reliabilities. IEEE Transactions on Reliability, R-38(2):229–233, 1989. S. Kounias and M. Sfakianakis. The reliability of a linear system and its connection with the birthday problem. Statistics Applications, 3:531–543, 1991. S. Kounias and K. Sotirakoglou. Bonferroni bounds revisited. Journal of Applied Probability, 26:1–9, 1989. M. V. Koutras. Consecutive-k,r -out-of-n:F systems. Microelectronics and Reliability, 37(4):597–603, 1997. M. V. Koutras. Consecutive dual failure mode systems. submitted to International Journal of Reliability, Quality and Safety Engineering, 1998. M. V. Koutras and S. G. Papastavridis. Application of the Stein-Chen method for bounds and limit theorems in the reliability of coherent structures. Naval Research Logistics, 40:617– 631, 1993. M. V. Koutras and S. G. Papastavridis. On the number of runs and related statistics. Statistica Sinica, 1993. L. Kronsjo. Algorithms: Their Complexity and Efficiency. Wiley, Chichester, 2nd edition, 1987. W. Kuo and B.S. Abbas. Performance modeling for multiple human-machine systems. Journal of Analysis of Modeling and Simulation, 11:57–71, 1993. W. Kuo, Editor. Quality through Engineering Design. Elsevier Science, Amsterdam, 1993. W. Kuo, H. H. Lin, Z. Xu, and W. Zhang. Reliability optimization with the Lagrange-multiplier and branch-and-bound technique. IEEE Transactions on Reliability, R-36(5):624–630, 1987. Y. Lam and Y. L. Zhang. Repairable consecutive-k-out-of-n:F system with Markov dependence. Naval Research Logistics, 47(1):18–39, 2000. H. E. Lambert. Measures of importance of events and cut sets in fault trees. In R. E. Barlow, J. B. Fussel, and N. D. Singpurwalla (Eds.), Reliability and Fault Tree Analysis, Society for Industrial and Applied Mathematics, Philadelphia, 1975, pp. 77–100. A. Lesanovsky. Multi-state Markov models for system with dependent units. IEEE Transactions on Reliability, R-37(5):505–511, 1988. G. Levitin and A. Lisnianski. Optimization of imperfect preventive maintenance for multistate systems. Reliability Engineering and System Safety, 67(2):193–203, 2000. E. E. Lewis. Introduction to Reliability Engineering. Wiley, New York, 1987. H. H. Lin and W. Kuo. A comparison of heuristic reliability optimization methods. In Proceedings of the 1987 International Industrial Engineering Conference, Washington, D.C., 1987, pp. 583–589. D. G. Linton. Life distributions and degradation for a 2-out-of-n:F system. IEEE Transactions on Reliability, R-30(1):82–84, 1981. D. G. Linton. Generalized reliability results for 1-out-of-n:G repairable systems. IEEE Transactions on Reliability, R-38:468–471, 1989. D. G. Linton and J. G. Saw. Reliability analysis of the k-out-of-n:F system. IEEE Transactions on Reliability, R-23(2):97–103, 1974. A. Lisnianski, G. Levitin, and H. Ben-Haim. Structure optimization of multi-state system with time redundancy. Reliability Engineering and System Safety, 67:103–122, 2000. M. O. Locks. System reliability analysis: A tutorial. Microelectronics and Reliability, 18:335– 345, 1979.
534
BIBLIOGRAPHY
M. O. Locks. Recent developments in computing of system reliability. IEEE Transactions on Reliability, R-34(5):425–436, 1985. M. O. Locks. Reliability, Maintainability, and Availability Assessment, 2nd ed., ASQC Quality Press, Milwaukee, Wisconsin, 1995. M. Mahmoud, N. A. Mokhles, and E. U. Saleh. Probabilistic analysis of k-out-of-n:F three state redundant system with common-cause failures and replacements. Microelectronics and Reliability, 28(5):729–742, 1988. K. Mak. A note on Barlow-Wu structure functions. Operations Research Letter, 8(1):43–44, 1989. N. J. McCormick. Asymptotic unavailability for an m-out-of-n system without testing. Reliability Engineering, 17(3):189–191, 1987. W. P. McCormick and J. H. Reeves. Approximating the distribution of an extreme statistic based on estimates from a generating function. Stochastic Processes and Their Application, 27:307–316, 1988. F. C. Meng. Contributions to multistate reliability theory. Ph.D. thesis, University of Illinois, Champaign, Illinois, 1989. J. Mi. Bolstering components for maximizing system lifetime. Naval Research Logistics, 45:497–509, 1998. M. S. Moustafa. Availability of k-out-of-n:G systems with M failure modes. Microelectronics and Reliability, 36(3):385–388, 1996. D. N. Naik. Estimating the parameters of a 2-out-of-3:F system. IEEE Transactions on Reliability, R-30(5):464–465, 1981. K. Nakashima and H. Matznaga. Optimal redundancy of systems for minimizing the probability of dangerous errors. ICICE Transactions on Fundamentals, E77-A(1):228–236, 1994. B. Natvig. A suggestion of a new measure of importance of system components. Stochastic Processes and Their Application, 9:319–330, 1979. B. Natvig. New light on measures of importance of system components. Scandinavian Journal of Statistics, 12(1):43–54, 1985. J. Naus. Probabilities for a generalized birthday problem. Journal of American Statistical Association, 69:810–815, 1974. J. B. Nelson. Minimal-order models for false-alarm calculations on sliding windows. IEEE Transactions on Aerospace and Electronics Systems, AES-14(2):351–363, 1978. J. Newton. Comment on: Reliability of k-out-of-n:G systems with imperfect fault-coverage. IEEE Transactions on Reliability, R-44(1):137–138, 1995. A. V. Nikolov. k-out-of-n:G system with multiple correlated failures. Microelectronics and Reliability, 26(6):1073–1076, 1986. Larry Oh and W. Kuo co-editor. Design for reliability. A special issue of IEEE Transactions on Reliability, R-44, 1995. F. Ohi and T. Nishida. On multi-state coherent systems. IEEE Transactions on Reliability, R-33(4):284–288, 1984. S. Osaki. Stochastic System Reliability Modeling. World Scientific, Singapore, 1985. Z. Pan and Y. Tai. Variance importance of system components by Monte Carlo. IEEE Transactions on Reliability, R-37(4):421–423, 1988. S. Papastavridis. A Weibull limit for the reliability of consecutive-k-whithin-r -out-of-n system. Advances in Applied Probability, 20(3):690–692, 1988.
BIBLIOGRAPHY
535
S. Papastavridis and O. Chrysaphinou. A limit theorem for the number of non-overlapping occurrences of a pattern in a sequence of independent trials. Journal of Applied Probability, 25:428–431, 1988. S. G. Papastavridis. Algorithms for strict consecutive k-out-of-n:F system. IEEE Transactions on Reliability, R-35:613–615, 1986. S. G. Papastavridis. The number of failed components in consecutive-k-out-of-n:F system. IEEE Transactions on Reliability, R-38(3):338–340, 1989. S. G. Papastavridis and O. Chrysaphinou. An approximation for large consecutive-k-out-ofn:F systems. IEEE Transactions on Reliability, R-37(4):386–387, 1988. S. G. Papastavridis and J. Hadjichristos. Mean time to failure for a consecutive-k-out-of-n:F system. IEEE Transactions on Reliability, R-36(1):85–86, 1987. S. G. Papastavridis and J. Hadjichristos. Formulas for the reliability of a consecutive-k-outof-n:F system. Journal of Applied Probability, 26:772–779, 1988. S. G. Papastavridis and M. V. Koutras. Consecutive-k-out-of-n systems. In K. B. Misra (Ed.), New Trends in System Reliability Evaluation, Elsevier, Amsterdam, 1993, pp. 228–248. H. Pham. Reliability analysis of a high voltage system with dependent failures and imperfect coverage. Reliability Engineering and System Safety, 37:25–28, 1992. A. Philippou, C. Georghiou, and G. Philippou. Fibonacci polynomials of order k; multinomial expansion and probability. International Journal of Mathematics and Mathematical Sciences, 6:545–550, 1983. A. N. Philippou. Distributions and fibonacci polynomials of order k, longest runs, and reliability of consecutive-k-out-of-n:F systems. In A. N. Philippou, G. E. Bergum, and A. F. Horadam (Eds.), Fibonacci Numbers and Their Applications, D. Reidel, Dordrecht, 1986, pp. 203–227. A. N. Philippou, C. Georphiou, and G. N. Philippou. Fibonacci-type polynomials of order k with probability applications. Fibonacci Quarterly, 23(2):100–105, 1985. A. N. Philippou and F. S. Makri. Longest success runs and Fibonacci-type polynomials. Fibonacci Quarterly, 23(4):338–346, 1985. A. N. Philippou and F. S. Makri. Success, runs and longest runs. Statistics and Probability Letters, 1(4):171–175, 1986. A. N. Philippou and F. S. Makri. Closed formulas for the failure probability of a strict consecutive-k-out-of-n:F system. IEEE Transactions on Reliability, R-36(1):80–82, 1987. A. N. Philippou and A. A. Muwafi. Waiting for the k-th consecutive success and the Fibonacci sequence of order k. Fibonacci Quarterly, 20:28–32, 1982. J. M. Pollard. The fast Fourier transform in a finite field. Mathematics of Computation, 25:365–374, 1971. V. R. Prasad, W. Kuo, and K. O. Kim. Optimal allocation of s-identical, multi-functional spares in a series system. IEEE Transactions on Reliability, R-48(2):118–126, 1999. W. Preuss and T. K. Boehme. On reliability analysis of consecutive-k-out-of-n:F systems and their generalizations—a survey. In G. Anastassiou and S. T. Rechev (Eds.), Proceedings of the First International Conference on Approximations, Probability and Related Fields, Plenum Press, New York, 1994. S. Rai, A. K. Sarje, E. V. Prasad, and A. Kumar. Two recursive algorithms for computing the reliability of k-out-of-n systems. IEEE Transactions on Reliability, R-36(2):261–265, 1987.
536
BIBLIOGRAPHY
J. G. Rau. Optimization and Probability in Systems Engineering. Van Nostrand Reinhold, New York, 1970. C. R. Reddy. Optimization of k-out-of-n systems subject to common-cause failures with repair provision. Microelectronics and Reliability, R-33(2):175–183, 1993. S. W. Roberts. Properties of control chart zone tests. Bell System Technical Journal, 37:83– 114, 1958. E. L. Robinson and N. Y. Schenectady. Effect of temperature variation on the long-time rupture strength of steels. Transactions of ASME, 74:777–781, 1952. S. M. Ross. Multivalued state component systems. The Annals of Probability, 7(2):379–383, 1979. A. M. Rushdi. How to hand-check a symbolic reliability expression. IEEE Transactions on Reliability, R-32:402–408, 1983. A. M. Rushdi. Efficient computation of k-to-l-out-of-n system reliability. Reliability Engineering, 17:157–163, 1987. A. M. Rushdi. On the analysis of ordinary and strict consecutive k-out-of-n:F systems. IEEE Transactions on Reliability, R-37:57–64, 1988. A. M. Rushdi. A conditional probability treatment of strict consecutive k-out-of-n:F systems. Microelectronics and Reliability, 29(4):581–586, 1989. A. M. Rushdi. Some open questions on: Strict consecutive k-out-of-n:F systems. IEEE Transactions on Reliability, R-39(3):380–381, 1990. A. M. Rushdi and F. M. A. Dehiawi. Optimal computation of k-to-l-out-of-r system reliability. Microelectronics and Reliability, 27(5):875–896, 1987. M. Santha and Y. Zhang. Consecutive-2 systems on trees. Probability in the Engineering and Informational Sciences, 1(4):441–456, 1987. B. Saperstein. The generalized birthday problem. Journal of American Statistical Association, 67(338):425–428, 1972. B. Saperstein. On the occurrence of n successes within N Bernoulli trials. Technometrics, 15(4):809–818, 1973. B. Saperstein. Note on a clustering problem. Journal of Applied Probability, 12(3):629–632, 1975. A. K. Sarje. On the reliability computation of a k-out-of-n system. Microelectronics and Reliability, 33(2):267–269, 1993. P. Sarmah and A. D. Dharmadhkari. Analysis of 1-out-of-n:G adjustable operating system. Microelectronics and reliability, 23(23):477–480, 1983. D. V. Sarwate. Computation of binary integration detection probabilities. IEEE Transactions on Aerospace and Electronic Systems, 27(6):894–897, 1991. M. Sasaki, Y. Nakai, and T. Yuge. Correction to: Mean time to failure for a consecutive k-outof-n:F system. IEEE Transactions on Reliability, R-41(2):306, 1992. M. R. Satam. Correction to: Mean time to failure for a consecutive k-out-of-n:F system. IEEE Transactions on Reliability, R-41(3):440–441, 1992. Z Schechner. A load-sharing model: The linear breakdown rule. Naval Research Logistics Quarterly, 31:137–144, 1984. E. G. Schilling and Sommers D. J. Two-point optimal narrow limit plans with applications to MIL-STD-105D. Journal of Quality Technology, 13(2):85–92, 1981. E. G. Schilling and J. H. Sheesley. The performance of MIL-STD-105D under the switching rules—part 1. Industrial Quality Control, 10(2):76–83, 1978.
BIBLIOGRAPHY
537
E. G. Schilling and J. H. Sheesley. The performance of MIL-STD-105D under the switching rules—part 2. Industrial Quality Control, 10(3):104–113, 1978. M. Sfakianakis, S. Kounias, and A. Hillaris. Reliability of a consecutive k-out-of-r -from-n:F system. IEEE Transactions on Reliability, R-41(3):442–447, 1992. M. Sfakianakis and S. G. Papastavridis. Reliability of a general consecutive-k-out-of-n:F system. IEEE Transactions on Reliability, R-42(3):491–496, 1993. J. G. Shanthikumar. Bounding network reliability using consecutive minimal cut sets. IEEE Transactions on Reliability, R-37(1):45–49, 1988. J. Shen. Optimal design of series and parallel consecutive-k-out-of-n systems. Master’s thesis, University of Alberta, Edmonton, Alberta, Canada, 1992. J. Shen and M. Zuo. A necessary condition for optimal consecutive-k-out-of-n:G system design. Microelectronics and Reliability, 34(3):485–493, 1994. E. M. Sheuer. Reliability of an m-out-of-n system when component failure induces higher failure rates in survivors. IEEE Transactions on Reliability, R-37(1):73–74, 1988. S. C. Shih. Poisson limit of reliability in consecutive-k-out-of-n:F systems. Technical report, Institute of Statistics, Academia Sinica, Taiwan, 1985. A. M. Shooman. Methods for communication-network reliability analysis: Probabilistic graph reduction. Proceedings Annual Reliability and Maintainability Symposium, pages 441– 448, 1992. D. Simmons, N. Ellis, H. Fujihara, and W. Kuo. Software Measurement: A Visualization Toolkit for Project Control and Process Improvement. Prentice Hall, Upper Saddle River, New Jersey, 1998. C. Singh and M. D. Kankam. Comments on “closed form solutions for delta-star and star-delta conversion of reliability networks”. IEEE Transactions on Reliability, R-25(5):336–339, 1976. H. Singh and S. Asgarpoor. Reliability evaluation of flow networks using delta-star transformations. IEEE Transactions on Reliability, R-35(4):472–477, 1991. M. Sobel and V. R. R. Uppurluri. On Bonferroni-type inequalities of the same degree for the probability of unions and intersections. Annals of Mathematical Statistics, 43:1549–1558, 1972. S. Soh and S. Rai. CAREL: Computer aided reliability evaluator for distributed computing networks. IEEE Transactions on Parallel and Distributed Systems, 2(2):199–213, 1991. V. K. Srivastava and A. Fahim. An enhanced integer simplical optimization method for minimum cost spares for k-out-of-n systems. IEEE Transactions on Reliability, R-40(3):265– 270, 1991. A. Subramanian and K. Usha. 1-out-of-2:F systems exposed to a damage process. IEEE Transactions on Reliability, R-29(3):284–285, 1980. C. Thibeault, Y. Savaria, and J. L. Houle. Fast prediction of the optimum number of spares in defect-tolerant integrated circuits. In Proceedings of IEEE Workshop on Defect and Fault Tolerance in VLSI Systems, 1990, pp. 119–130. S. J. Upadhyaya and H. Pham. Analysis of noncoherent systems and an architecture for the computation of the system reliability. IEEE Transactions on Computers, 42:484–493, 1993. L. V. Utkin. Uncertainty importance of multi-state system components. Microelectronics and Reliability, 33(13):2021–2029, 1993.
538
BIBLIOGRAPHY
N. VenuGopal, S. S. Ahamed, and R. Reddy. Optimal repair stage for k-out-of-n systems. Microelectronics and Reliability, 29(1):17–19, 1989. Z. X. Wang and D. R. Guo. Special Functions. World Scientific, Singapore, 1989. J. S. Wu and R. J. Chen. Efficient algorithm for reliability of a circular consecutive-k-out-ofn:F system. IEEE Transactions on Reliability, R-42(1):163–164, 1993. J. S. Wu and R. J. Chen. Reliability of consecutive-weighted-k-out-of-n:F systems. In A. Godbole and S. Papastavridis (Eds.), Runs and Patterns in Probability: Selected Papers, Kluwer, Amsterdam, 1994, pp. 205–211. J. Xue and K. Yang. Dynamic reliability analysis of coherent multi-state systems. IEEE Transactions on Reliability, R-44(4):683–688, 1995. G. L. Yang. A renewal look at switching rules in the MIL-STD-105D sampling system. Journal of Applied Probability, 27:183–192, 1990. Y. C. Yao and F. K. Hwang. Multistate consecutively connected systems. IEEE Transactions on Reliability, R-38(4):472–474, 1989. R. H. Yeh. Optimal inspection and replacement policies for multistate deteriorating systems. European Journal of Operational Research, 96(2):248–259, 1996. Y. B. Yoo. A comparison of algorithms for terminal pair reliability. IEEE Transactions on Reliability, R-37(2):210–215, 1988. Q. Zhang and Q Mei. Reliability analysis for a real non-coherent system. IEEE Transactions on Reliability, R-36(4):436–439, 1987. W. Zhang, C. Miller, and W. Kuo. Application and analysis for consecutive-k-out-of-n:G structure. Reliability Engineering and System Safety, 33:189–197, 1991. M. Zuo and J. Huang. User’s manual for Furnace. Technical report, Department of Mechanical Engineering, University of Alberta, Edmonton, Alberta, Canada, 1997. M. Zuo and W. Kuo. Reliability design for high system utilization. International Journal of Modeling and Simulation, 11(1):7–11, 1991. M. J. Zuo, R. Jiang, and R. Yam. Approaches for reliability modeling of continuous-state devices. IEEE Transactions on Reliability, R-48(1):9–18, 1999.
INDEX
k-out-of-n system, 231, 281 combined, 429 constant multistate, 483 decreasing multistate, 483 generalized multistate, 482 increasing multistate, 483 simple multistate, 480 s-stage, 401 two-stage, 401 k-terminal network, 141 n-step transition probability, 48 r repair facilities, 122 (r, s)/(m, n) system, 384 k-within-, 416 linear, 432 2-d linear, 435 r th moment about the mean, 15 about the origin, 15 active redundancy, 266, 295 addition law, 157 admissibility condition, 215 algorithm efficiency, 64 alternating renewal process, 45, 105, 106 arithmetic operations, 63, 69, 238, 413 as bad as old, 44, 106 as good as new, 33, 44, 105 asymptotic complexity, 63 lower bounds, 66
tight bounds, 65 upper bounds, 64 availability, 33, 104, 110, 114 availability function, 104, 106, 111, 382 average value, 15 axioms of probability, 9 balanced algorithm, 230 basic reordering operation, 213 bathtub curve, 34, 38 Bernoulli trial, 21 binary image, 498 binary-imaged system, 498 binary system, 452 birth and death process, 53, 106 birth rate, 52, 54 bivariate distribution, 16 Bonferroni inequalities, 153 Boolean algebra, 7, 96, 151 bottom up approach, 224 bounding summations, 73 bounds asymptotic lower, 66 asymptotic tight, 65 asymptotic upper, 64 lower, 16, 153, 250, 345, 391 min–max, 185 upper, 63, 153, 250, 334, 391, 411 bridge structure, 94, 96, 99, 145 BRO, 213 burn-in failures, 35 539
540
INDEX
CDF, 11, 13, 480 central moment, 15 chance failures, 35 Chapman–Kolmogorov equation, 48 Chebyshev theorem, 16, 23 circular consecutively connected systems, 458 class P problems, 67 coherent, 478 coherent system, 90, 91, 97, 350, 374, 478, 496 combined k-out-of-mn:F system, 432, 435 common-cause failures, 58, 86, 124, 302, 310 complement, 7 complete failure state, 474 complexity analysis, 62 component reliability vector, 101 components, 32, 85 component state, 87 component state vector, 87 conditional distribution, 18, 31, 49 conditional probability, 9, 10, 145, 413, 418 conditional reliability, 33, 36, 295 connection matrix, 148 connection vector, 479 consecutive-k-out-of-n system, 325 k-within-, 407 m-, 405 -kc -out-of-n, 429 1-d Con/kc /n:F, 435 circular, 325 circular m-, 407 constant multistate, 491 decreasing multistate, 491 generalized multistate, 490 increasing multistate, 491 linear, 325 multidimensional, 384 redundant, 387, 405 series, 424 weighted, 447 consecutively connected system, 438, 453 consecutive partition, 402 constant failure rate, 35, 37, 110, 381 continuous distributions, 27
continuous multistate system, 452 continuous random variable, 11 convolution, 14 counting process, 40 critical components, 104, 378, 381 cubic rate of growth, 67 cumulative distribution function, 11, 13, 480 cumulative failure rate function, 34 cumulative hazard function, 34 cut, 93 minimal cut, 93, 94, 97, 127, 148, 197, 235, 378 minimal cut set, 93, 345, 447, 458 minimal cut vector, 93, 479 set, 93 vector, 93 death rate, 54 decomposition theorem, 11 decreasing failure rate, 34 decreasing order, 199 definition domain, 474 degraded states, 474 delta structure, 172 dependence, 111 dependent failures, 124 DFR, 34 discrete distributions, 20 discrete multistate system, 452 discrete random variable, 11 distribution Bernoulli, 21 beta, 28 binomial, 21 bivariate normal, 31 discrete uniform, 20 exponential, 14, 36 gamma, 38 geometric, 12, 23 hypergeometric, 24 lognormal, 39 multinomial, 26 multivariate hypergeometric, 27 negative binomial, 23 normal, 30 Poisson, 25, 36, 43 standard normal, 30 standard uniform, 28
INDEX
uniform, 28 Weibull, 109 divide-and-conquer algorithm, 76 dominance condition, 214 dominant system, 495 dormant failures, 129 dual, 90, 94, 252, 383 dual failure mode, 311 duality, 479 dual system, 252, 369, 479 dynamic measure, 103 early failures, 35 equivalent, 67 equivalent component state vectors, 495 Esary–Proschan method, 183 evaluation of summations, 69 event decomposition, 339, 383, 420, 423 exchange principle, 207, 229 expansion method, 77 expected life, 33, 105 expected value, 15, 33 factoring theorem, 11 failure rate function, 33 fault diagnosis, 196 first moment about the origin, 15 frequency interpretation, 8 gamma function, 28, 38, 505 general distribution, 106 generating function, 71, 167 generation of minimal cuts, 148, 152 generation of minimal paths, 148, 149 graph-coloring problem, 68 greedily assembled, 403 greedy module assembly rule, 227 guess-and-prove method, 80 hazard function, 33 homogeneous Markov processes, 47 homogeneous Poisson process, 41 homogeneous process, 40 homogeneous transition probabilities, 47, 50 HPP, 41 IE method, 153, 181, 235, 251, 494 IFR, 35
541
imperfect fault coverage, 124, 277, 295 imperfect repair, 43 imperfect sensing and switching, 136, 138 importance Birnbaum, 193, 350, 383, 398, 414 component, 192 criticality, 195 criticality importance in terms of system failure, 195 criticality importance in terms of system success, 195 position, 193 reliability, 193, 328, 350, 352, 362 structural, 192, 194, 356 inadmissible permutation, 207 increasing failure rate, 35 independent events, 9 independent increments, 40 independent random variable, 19 infant mortality, 35 inherently intractable, 68 inner loop, 159 integration, 72, 74, 190 interarrival time, 40 invariant optimal arrangement, 209 invariant optimal assignment, 209 invariant optimal design, 209, 350, 356, 366, 396, 425 irrelevant, 90, 439, 440 isotonic, 200 joint conditional distribution, 20 joint marginal distribution, 19 joint probability distribution function, 16 Kolmogorov’s forward equations, 53, 54, 56 Laplace transform, 504 lattice subadditive, 203 superadditive, 203 system, 384 linear consecutively connected systems, 453 linearly connected systems, 165 linear rate of growth, 67 logic function, 96, 98, 395
542
INDEX
long ordered, 402 lower critical connection vector, 479 majorization, 199 marginal distribution, 18 Markov chain imbeddable structures, 164 Markov chains, 165 continuous-time, 50 discrete-time, 46 finite state, 49 homogeneous, 47, 50 imbedded, 49, 50, 344 stationary, 47, 50 Markovian property, 47 master method, 82 mathematical induction, 80 maximum ordered, 403 mean time between failures, 105 mean time to repair, 105 measures of performance, 100 median, 14 memoryless property, 36, 106 minimal cut set, 59 minimal repair, 43, 106, 308 minimum ordered, 403 MIS, 164, 415 mission time, 101 modular decomposition, 98, 186 module, 97, 124 monotonic, 200 MTBF, 33, 105, 274 MTTF, 33, 36, 104, 106, 107, 112, 126, 270 MTTR, 33, 105, 106 multidimensional, 386 multistate monotone system, 478 multistate parallel system, 481 multistate series system, 481 multistate system, 452 multivariate distribution, 16, 19 mutually exclusive, 6 NDTO, 214 network diagram, 141, 144 NHPP, 43 node, 141 child, 465 intermediate, 464 leaf, 464
parent, 465 root, 464 node removal method, 149 nondeterministic polynomial, 68 nondominant system, 496 nondominated totally ordered, 214 nonhomogeneous Poisson process, 43, 106, 308 nonrepairable system, 103, 303 NP, 68 complete, 68 hard, 68 one-step transition probability, 46, 47 optimal designs, 396 ordered allocation, 214, 402 ordered partition, 402 order of growth, 64 order of magnitude, 64 ordinary moment, 15 outer loop, 158 pairwise independent, 9 pairwise rearrangement, 207 parallel reductions, 141 parallel–series, 210, 403 parallel–series system model, 124 parallel structure, 92, 98, 107, 189 parallel system, 88, 90, 94, 96, 102, 103, 169, 171, 209 parallel system model, 112 path, 93 minimal path, 93, 96, 148, 197, 235 minimal path set, 93, 363 minimal path vector, 93, 479 set, 93 vector, 93 pdf, 11, 12, 480 percentile, 14, 36 perfect repair, 33, 106 perfect sensing and switching, 130, 137 perfect state, 170, 474 performance, 85, 480 performance measure, 109, 480 permutation equivalent, 207 pivotal decomposition, 89, 96, 145, 233, 240 point-to-point probabilities, 172 polynomial time algorithms, 67
INDEX
preference relation set, 214 probability density function, 11, 12, 480 probability mass function, 11 probability of entering a state, 52 pure birth process, 52 quadratic rate of growth, 67 random variable, 11, 347, 474 rate of occurrence, 42 rate of transition, 51, 382 receiving capability, 441 recurrence, 76 recurrence relations, 75 recursion tree, 79 recursive algorithms, 75 redundancy, 89, 112, 189 redundancy at component level, 92 redundancy at subsystem level, 92 redundancy design, 398 redundant components, 89 redundant system structure, 112 relationship O relationship, 64 relationship, 66
relationship, 65 ∼ relationship, 66 relative criticality, 197, 207, 350 relevancy conditions, 479 relevant, 90, 439, 440, 476 reliability block diagram, 86 reliability bounds, 180 renewal function, 45 renewal process, 44, 309 renewal reward process, 46, 309 repairable devices, 33 repairable system, 104, 504 reverse tree-structured systems, 465 rule of elimination, 11 sample space, 5 sampling without replacement, 25 sampling with replacement, 22 Schur concave, 201 Schur convex, 201, 228 SDP method, 157, 182, 237, 343, 387, 494 sensing and switching mechanism, 129
543
series arithmetic, 70 cubic, 71 geometric, 70 harmonic, 73, 75 quadratic, 70, 75 series–parallel system, 127, 222, 404 series reductions, 141 series structure, 92, 98, 107, 189 series system, 88, 90, 93, 96, 102, 103, 168, 209 shock-free probability, 212 short ordered, 402 single repair facility, 115 slow algorithms, 67 sojourn time, 52, 106 space complexity, 62 special multistate systems, 480 spread out, 200 standard bridge structure, 146 standard deviation, 15 standby, 129 n-component cold, 131 cold, 129, 130, 264, 277 hot, 129 redundancy, 129 two-component cold, 136 warm, 129, 137, 265 star structure, 172 state distribution, 480 state space, 40 state transition diagram, 52 static measures, 102 stationary increments, 40 stationary Markov processes, 47, 167 stationary process, 40 stationary transition probabilities, 47, 50 steady-state availability, 105, 106, 111, 115, 273, 382 stochastic process, 40 continuous-time, 40 discrete-time, 40 strength of a structure, 91 strictly Schur convex, 201 stronger structure, 91, 496 strongly coherent, 478 strongly relevant, 475 structure function, 87, 395 subjective interpretation, 8
544
INDEX
subsystem a-hazard vector, 213 supercomponent, 141 survivor function, 33, 42 symmetric components, 207 symmetric function, 201 symmetric set, 201 three-dimensional, 384 time complexity, 62 top-down heuristics, 223 totally ordered allocation, 214 transformation delta–star, 171, 178 star–delta, 171 transition probability matrix, 47, 166, 244, 344, 416 traveling-salesman problem, 68 tree-structured consecutively connected systems, 464 two-component parallel system, 114
two-dimensional, 384 two-stage systems, 227 two-terminal network, 141 two-way circular consecutively connected system, 470 two-way consecutively connected systems, 470 two-way linear consecutively connected system, 470 variant optimal design, 362, 398 weaker structure, 91 weakly coherent, 478 weakly relevant, 477 wearout period, 35 worst-case scenario, 63 YM algorithm, 388