Queueing Networks and Markov Chains
Queueing Networks and Markov Chains Modeling and Performance Evaluation with Computer Science Applications
Gunter Hermann
de Meer,
Belch,
Stefan
and Kishor
Greiner, S. Trivedi
A Wiley-Interscience
JOHN WILEY New York / Chichester
/ Weinheim
Publication
& SONS, INC.
/ Brisbane
/ Singapore
/ Toronto
Copyright 1998 by John Wiley & Sons, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic or mechanical, including uploading, downloading, printing, decompiling, recording or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the Publisher. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 605 Third Avenue, New York, NY 10158-0012, (212) 850-6011, fax (212) 850-6008, E-Mail:
[email protected]. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold with the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional person should be sought. ISBN 0-471-20058-1. This title is also available in print as ISBN 0-471-19366-6 For more information about Wiley products, visit our web site at www.Wiley.com. Library of Congress Cataloging in Publication Data: Queueing networks and Markov chains: modeling and performance evaluation with computer science applications / Gunter Bolch . . . [et al.]. p. cm. “A Wiley-Interscience publication.” Includes bibliographical references and index. ISBN 0-471-19366-6 1. Electronic digital computers — Evaluation. 2. Markov processes. 3. Queuing theory. I. Bolch, Gunter. QA76.9.E94Q48 1998 97-11959 004.20 40 01519233 — dc21 CIP Printed in the United States of America. 10 9 8 7 6 5 4 3 2
Contents
Preface 1 Introduction 1.1 Motivation 1.2 Basics of Probability and Statistics 1.2.1 Random Variables 1.2.1.1 Discrete Random Variables 1.2. 1.2 Continuous Random Variables 1.2.2 Multiple Random Variables 1.2.2.1 Independence 1.2.2.2 Conditional Probability 1.2.2.3 Important Relations 1.2.2.4 The Central Limit Theorem 1.2.3 Transforms 1.2.3.1 z- Transform 1.2.3.2 Laplace Transform 1.2.4 Parameter Estimation 1.2.4.1 Method of Moments 1.2.4.2 Maximum-Likelihood Estimation 1.2.4.3 Confidence Intervals 1.2.5 Order Statistics
xv 1 1 5 5 5 7 19 20 22 23 24 25 25 26 27 28 29 29 30
vi
CONTENTS
1.2.6
Distribution
of Sums
2
Markov Chains 2.1 Markov Processes 2.1. I Stochastic and Markov Processes 2, I .2 Markov Chains 2. I .2.1 Discrete- Time Markov Chains 2.1.2.2 Continuous- Time Markov Chains 2.1.2.3 Recapitulation 2.2 The Modeling Process 2.2.1 Modeling Life-cycle Phases 2.2.2 Performance Measures 2.2.2.1 A Simple Example 2.2.2.2 Markov Reward Models 2.2.2.3 A Case Study 2.2.3 Generation Methods 2.2.3.1 Petri Nets 2.2.3.2 Generalized Stochastic Petri Nets 2.2.3.3 Stochastic Reward Nets 2.2.3.4 GSPN/SRN Analysis 2.2.3.5 A Larger Example
3
Steady-State Solutions of Markov Chains 3.1 Symbolic Solution: Birth-Death Process 3.2 Hessenberg Matrix: Non-Markovian Queues 3.2.1 Non-Exponential Service Times 3.2.2 Server with Vacations 3.2.2.1 Polling Systems 3.2.2.2 Analysis 3.3 Numerical Solution: Direct Methods 3.3.1 Gaussian Elimination 3.3.2 The Grassmann Algorithm 3.4 Numerical Solution: Iterative Methods 3.4.1 Convergence of Iterative Methods 3.4.2 Power Method 3.4.3 Jacobi’s Method 3.4.4 Gauss-Seidel Method 3.4.5 The Method of Successive Over-Relaxation
31 35 35 35 37 37 49 55 56 56 61 61 64 70 80 81 86 87 91 98 103 105 106 107 112 113 114 118 119 125 132 132 133 136 139 140
CONTENTS
3.5
The Relaxation Parameter 3.4.5.1 3.4.5.2 The Test of Convergence 3.4.5.3 Overflow and Underflow 3.4.5.4 The Algorithm Comparison of Numerical Solution Methods 3.5.1 Case Studies
vii 141 143 143
144 144 146
4
Steady-State Aggregation/ Disaggregation Methods 4.1 Courtois’s Approximate Method 4.1.1 Decomposition 4.1.2 Applicability 4.1.3 Analysis of the Substructures 4.1.4 Aggregation and Unconditioning 4.1.5 The Algorithm 4.2 Takahashi’s Iterative Method 4.2.1 The Fundamental Equations 4.2.2 Applicability 4.2.3 The Algorithm 4.2.4 Application 4.2.4.1 Aggregation 4.2.4.2 Disaggregation 4.2.5 Final Remarks
153 153 154 160 162 163 165 166 167 169 170 170 172 173 174
5
Transient Solution of Markov Chains 5. I Transient Analysis Using Exact Methods 5.1.1 A Pure Birth Process 5.1.2 A Two-State CTMC 5.1.3 Solution Using Laplace Transforms 5.1.4 Numerical Solution Using Uniformixation 5.1.4.1 The Instantaneous Case 5.1.4.2 Stiffness Tolerant Uniformization 5.1.4.3 The Cumulative Case 5.1.5 Other Numerical Methods 5.1.5.1 Ordinary Differential Equations 5.1.5.2 Weak Lumpability 5.2 Aggregation of Stiff Markov Chains 5.2.1 Outline and Basic Definitions 5.2.2 Aggregation of Fast Recurrent Subsets
177 178 178 181
184 184
184 186 187 189 189 189 190 191 192
...
VIII
CONTENTS
52.3 5.2.4 5.2.5
5.2.6 5.2.7
Aggregation of Fast Transient Subsets Aggregation of Initial State Probabilities Disaggregations 5.2.5.1 Fast Transient States 5.2.5.2 Fast Recurrent States The Algorithm An Example: Server Breakdown and Repair
195 196 197 197 197 198 200
6 Single Station Queueing Systems 6.1 Kendall’s Notation 6.2 Performance Measures 6.3 The M/M/l System 6.4 The M/M/m System 6.5 The M/M/m System 6.6 The M/M/l/K Finite Capacity System 6.7 Machine Repairman Model 6.8 Closed Tandem Network 6.9 The M/G/l System 6.10 The GI/M/l System 6.11 The GI,M/m System 6.12 The GI/G/l System 6.13 The M/G/m System 6.14 The GI/G/m System 6.15 Priority Queueing 6.15.1 System without Preemption 6.15.2 Conservation Laws 6.15.3 System with Preemption 6.15.4 System with Time-Dependent Priorities 6.16 The Asymmetric System 6.16.1 Approximate Analysis 6.16.2 Exact Analysis 6.16.2.1 Analysis of M/M/m Loss Systems 6.16.2.2 Extending to Non-Lossy System 6.16.3 Exact Analysis of an Asymmetric M/M/2 System 6.17 Systems with Batch Service
209 210 212 214 217 218 219 221 222 223 225 228 229 230 233 237 239 242 242 243 248 248 249 250 251
7
263
Queueing Networks
253 258
CONTENTS
ix
Definitions and Notation 7.1.1 Single Class Networks 7.1.2 Multiclass Networks Performance Measures 7.2.1 Single Class Networks 7.2.2 Multiclass Networks Product-Form Queueing Networks 7.3.1 Global Balance 7.3.2 Local Balance 7.3.3 Product-Form 7.3.4 Jackson Networks 7.3.5 Gordon/Newell Networks 7.3.6 BCMP Networks 7.3.6.1 The Concept of Chains 7.3.6.2 BCMP Theorem
265 265 267 268 268 271 273
Algorithms for Product-Form Networks 8.1 The Convolution Algorithm 8.1 .l Single Class Closed Networks 8. I. 2 Multiclass Closed Networks 8.2 The Mean Value Analysis 8.2.~ Single Class Closed Networks 8.2.2 Multiclass Closed Networks 8.2.3 Mixed Networks 8.2.4 Networks with Load-Dependent Service 8.2.4.1 Closed Networks 8.2.4.2 Mixed Networks 8.3 The RECAL Method 8.4 Flow Equivalent Server Method 8.4.1 FES Method for a Single Node 8.4.2 FES Method for Multiple Nodes 8.5 Summary
311 313 313 320 326 327 335
9 Approximation Algorithms for Product-Form Networks 9.1 Approximations Based on the MVA 9.1.1 Bard-Schweitzer Approximation 9.1.2 Self- Correcting Approximation Technique 9.1.2.1 Single Server Nodes
379 380 380 385 385
7.1
7.2
7.3
8
274 277 283
284 288 295 296 300
342 347 348 352 360 368 368 373 376
X
CONTENTS
9.2
9.3
9.4
9.5 9.6
9.1.2.2 Multiple Server Nodes 9.1.2.3 Extended SCAT Algorithm Summation Method 9.2.1 Single Class Networks 9.2.2 Multiclass Networks Bottapprox Method 9.3.1 Initial Value of X 9.3.2 Single Class Networks 9.3.3 Multiclass Networks Bounds Analysis 9.4.1 Asymptotic Bounds Analysis 9.4.2 Balanced Job Bounds Analysis Networks with Variabilities in Workload Summary
10 Algorithms for Non-Product-Form Networks 10.1 Non-Exponential Distributions 10.1.1 Diffusion Approximation 10.1.1.1 Open Networks 10.1.1.2 Closed Networks 10.1.2 Maximum Entropy Method 10.1.2.1 Open Networks 10.1.2.2 Closed Networks 10.1.3 Decomposition for Open Networks 10.1.4 Methods for Closed Networks 10.1.4.1 Robustness for Closed Networks 10.1.4-Z Marie’s Method 10.1.4.3 Extended SUM and BOTT 10.1.5 Closing Method for Mixed Networks 10.2 Different Service Times at FCFS Nodes 10.3 Priority Networks 10.3.1 Extended MVA (PRIOMVA) 10.3.2 The Method of Shadow Server 10.3.2.1 The Original Shadow Technique 10.3.2.2 Extensions of the Shadow Technique 10.3.3 Extended SUM 10.4 Simultaneous Resource Possession 10.4.1 Memory Constraints
389 393 397
400 403 405 405
405 408 410 411 414 418 420 421 423 423 424 427 430 431 433 439
448 448 452 463 464 469 470 470 478 478
480 494 497 498
CONTENTS
10.5 10.6
IO. 7
10.8
10.4.2 I/O Subsystems 10.4.3 Method of Surrogate Delays 10.4.4 Serialization Programs with Internal Concurrency Parallel Processing 10.6.1 Asynchronous Tasks 10.6.2 Fork Join Systems 10.6.2.1 Modeling 10.6.2.2 Performance Measures 10.6.2.3 Analysis Networks with Asymmetric Nodes 10.7.1 Closed Networks 10.7.1.1 Asymmetric SUM (ASUM) 10.7.1.2 Asymmetric MVA (AMVA) 10.7.1.3 Asymmetric SCAT (ASCAT) 10.7.2 Open Networks Networks with Blocking 10.8. I Different Blocking Types 10.8.2 Solution for Networks with Two Nodes
xi
501 504 505 506 507 507 517 518 519 521 533 534 534 535 536 537 547 548 549
11 Optimization 11.1 Optimization Problems and Cost Functions 11.2 Optimization based on the Summation Method 11.2.1 Maximization of the Throughput 11.2.2 Minimization of Cost 11.2.3 Minimization of the Response Time 11.3 Optimization based on the Convolution Algorithm 11.3.1 Maximization of the Throughput 11.3.2 Optimal Design of Storage Hierarchies
557 558 559 560 561 562 564 564 566
12 Performance Analysis 12.1 PEPSY 12.1.1 Structure 12.1.1.1 12.1.1.2 12.1.1.3 12.1.2 Different 12.1.3 Example
571 572 573 573 573 574 574 575
Tools of PEPSY
The Input File The Output File Control Files Programs in PEPSY of using PEPSY
xii
CONTENTS
12.1.4 Graphical User Interface XPEPSY 12.2 SPNP 12.2.1 SPNP Features 12.2.2 The CSPL Language 12.2.3 iSPN 12.3 MOSES 12.3.1 The Model Description Language MOSEL 12.3.2 Examples 12.3.2.1 Central-Server Model 12.3.2.2 Fault Tolerant Multiprocessor System 12.4 SHARPE 12.4.1 Central-Server Queueing Network 12.4.2 M/M/m/K System 12.4.3 M/M/l/K System with Server Failure and Repair 12.5 Characteristics of Some Tools 13 Applications 13.1 Case Studies of Queueing Networks 13.1.1 Multiprocessor Systems 13.1.1.1 Tightly Coupled Systems 13.1.1.2 Loosely Coupled Systems 13.1.2 Client Server Systems 13.1.3 Communication Systems 13.1.3.1 Description of the System 13.1.3.2 Queueing Network Model 13.1.3.3 Model Parameters 13.1.3.4 Results 13.1.4 UNIX Kernel 13.1.4.1 Model of the UNIX Kernel 13.1.4.2 Analysis 13.1.5 Flexible Production Systems 13.1.5.1 An Open Network Model 13.1.5.2 A Closed Network Model 13.2 Case Studies of Markov Chains 13.2.1 Wafer Production System 13.2.2 Polling Systems 13.2.3 Client Server Systems
577 579 579 580 585 588 588 590 590 591 594 594 596 597 600 603 603 603 603 606 607 608 608 611 612 617 620 621
624 630 630 634 638 639
641 645
CONTENTS
13.24
ISDN Channel 13.2.4.1 The Baseline Model 13.2.4.2 Markov Modulated Poisson Processes 13.2.5 ATM Network under Overload i3.3 Case Studies of Hierarchical Models 13.3.1 A Multiprocessor with Different Cache Strategies 13.3.2 Performability of a Multiprocessor System
. .. XIII
650 650 653 658 665 666 674
Glossary
679
Bibliography
691
Index
719
Preface Queuemg networks and Markov chains are commonly used for the performance and reliability evaluation of computer, communication, and manufacturing systems. Although there are quite a few books on the individual topics of queueing networks and Markov chains, we have found none that covers both of these topics. The purpose of this book, therefore, is to offer a detailed treatment of queueing systems, queueing networks, and continuous and discrete-time Markov chains. In addition to introducing the basics of these subjects, we have endeavored to: l
Provide some in-depth numerical solution algorithms.
0 Incorporate a rich set of examples that demonstrate the application of the different paradigms and corresponding algorithms. l
Discuss stochastic Petri nets as a high-level description language, thereby facilitating automatic generation and solution of voluminous Markov chains.
l
Treat in some detail approximation els.
l
Describe and apply four software packages throughout the text.
l
Provide problems as exercises.
met hods that will handle large mod-
This book easily lends itself to a course on performance evaluation in the computer science and computer engineering curricula. It can also be used for a course on stochastic models in mathematics, operations research and industrial engineering departments. Because it incorporates a rich and comprehensive set of numerical solution methods comparatively presented, the text may also
xvi
PREFACE
well serve practitioners in various fields of applications as a reference book for algorithms. With sincere appreciation to our friends, colleagues, and students who so ably and patiently supported our manuscript project, we wish to publicly acknowledge: l
J&g Barner and Stephan Kijsters for their painstaking work in keying the text and in laying out the figures and plots.
l
Peter Bazan, who assisted both with the programming of many examples and comprehensive proofreading.
l
Hana Sevcikova, who lent a hand in solving many of the examples and contributed with proofreading.
l
Janos Sztrik, for his comprehensive proofreading.
l
Doris Ehrenreich, who wrote the first version of the section on communication systems.
l
Markus Decker, who prepared the first draft of the mixed queueing networks section.
l
Those who read parts of the manuscript and provided many useful comment s, including: Khalid Begain, Oliver Diisterhoft, Ricardo Fricks, Swapna Gokhale, Thomas Hahn, Christophe Hirel, Graham Horton, Steve Hunter, Demetres Kouvatsos, Yue Ma, Raymond Marie, Varsha Mainkar, Victor Nicola, Cheul Woo Ro, Helena Szczerbicka, Lorrie Tomek, Bernd Wolfinger, Katinka Wolter, Martin Zaddach, and Henry Zang.
Gunter Belch and Stefan Greiner are grateful to Fridolin Hofmann, and Hermann de Meer is grateful to Bernd Wolfinger, for their support in providing the necessary freedom from distracting obligations. Thanks are also due to Teubner B.G. Publishing house for allowing us to borrow sections from the book entitled Leistungsbewertung von Rechensystemen (originally in German) by one of the coauthors, Gunter Belch. In the present book, these sections are integrated in Chapters 1 and 7 through 10. We also thank Andrew Smith, Lisa Van Horn, and Mary Lynn of John Wiley & Sons for their patience and encouragement. The financial support from the SFB (Collaborative Research Centre) 182: “Multiprocessor and Network Configurations” of the DFG (Deutsche Forschungsgemeinschaft) is acknowledged. Finally, a web page has been set up for further information regarding the book. The URL is http://www4.cs.fau.de/QNMC/ GUNTER BOLCH, STEFAN GREINER,
HERMANN
DE MEER, KISHOR
S. TRIVEDI
Queueing Networks and Markov Chains
Introduction
1.1
MOTIVATION
Information processing system designers need methods for the quantification of system design factors such as performance and reliability. Modern computerr communicationI’ and production line systems process complex workloads with random service demands. Probabilistic and statistical methods are commonly employed for the purpose of performance and reliability evaluation. The purpose of this book is to explore major probabilistic modeling techniques for the performance analysis of information processing systems. Statistical methods are also of great importance but we refer the reader to other sources [JainSlI’Triv82] for this topic. Although we concentrate on performance analysisI’we occasionally consider reliabilityI”availabilityI’and combined performance and reliability analysis. Performance measures that are commonly of interest include throughputI’resource utilizationrloss probabili@and delay (or response time). The most direct method for performance evaluation is based on actual measurement of the system under study. HoweverI’during the design phaser the system is not available for such experimentsrand yet performance of a given design needs to be predicted to verify that it meets design requirements and to carry out necessary trade-offs. HenceI’abstract models are necessary for performance prediction of designs. The most popular models are based on discrete-event simulation (DES). DES can be applied to almost all problems of interestrand system details to the desired degree can be captured in such simulation models. F’urthermoreI’many software packages are available that facilitate the construction and execution of DES models. 1
2
INTRODUCTION
The principal drawback of DES models, however, is the time taken to run such models for large, realistic systems particularly when results with high accuracy (i.e., narrow confidence intervals) are desired. A cost-effective alternative to DES models, analytic models can provide relatively quick answers to “what if” questions and can provide more insight into the system being studied. However, analytic models are often plagued by unrealistic assumptions that need to be made in order to make them tractable. Recent advances in stochastic models and numerical solution techniques, availability of software packages, and easy access to workstations with large computational capabilities have extended the capabilities of analytic models to more complex systems. Analytical models can be broadly classified into state space models and non-state space models. Most commonly used state space models are Markov chains. First introduced by A. A. Markov in 1907, Markov chains have been in use in performance analysis since around 1950. In the past decade, considerable advances have been made in the numerical solution techniques, methods of automated state space generation, and the availability of software packages. These advances have resulted in extensive use of Markov chains in performance and reliability analysis. A Markov chain consists of a set of states and a set of labeled transitions between the states. A state of the Markov chain can model various conditions of interest in the system being studied. These could be the number of jobs of various types waiting to use each resource, the number of resources of each type that have failed, the number of concurrent tasks of a given job being executed, and so on. After a sojourn in a state, the Markov chain will make a transition to another state. Such transitions are labeled either with probabilities of transition (in case of discrete-time Markov chains) or rates of transition (in case of continuous-time Markov chains). Long run (steady-state) dynamics of Markov chains can be studied using a system of linear equations with one equation for each state. Transient (or time dependent) behavior of a continuous-time Markov chain gives rise to a system of first-order, linear, ordinary differential equations. Solution of these equations results in state probabilities of the Markov chain from which desired performance measures can be easily obtained. The number of states in a Markov chain of a complex system can become very large and, hence, automated generat ion and efficient numerical solution met hods for underlying equations are desired. A number of concise notations (based on queueing networks and stochastic Petri nets) have evolved, and software packages that automatically generate the underlying state space of the Markov chain are now available. These packages also carry out efficient solution of steady-state and transient behavior of Markov chains. In spite of these advances, there is a continuing need to be able to deal with larger Markov chains and much research is being devoted to this topic. If the Markov chain has nice structure, it is often possible to avoid the generation and solution of the underlying (large) state space. For a class of queueing networks, known as product-form queueing networks (PFQN),
MOTIVATION
3
it is possible to derive steady-state performance measures without resorting to the underlying state space. Such models are therefore called non-state space models. Other examples of non-state space models are directed acyclic task precedence graphs [SaTr87] and fault-trees [STP96]. Other examples of methods exploiting Markov chains with “nice” structure are matrix-geometric methods [Neut81]. We do not discuss these model types in this book due to space limitations. Relatively large PFQN can be solved by means of a small number of simpler equations. However, practical queueing networks can often get so large that approximate methods are needed to solve such PFQN. Furthermore, many practical queueing networks (so-called non-product-form queueing networks, NPFQN) do not satisfy restrictions implied by product-form. In such cases, it is often possible to obtain accurate approximations using variations of algorithms used for PFQNs. Other approximation techniques using hierarchical and fixed-point iterative methods are also used. The flowchart shown in Fig. 1.1 gives the organization of this book. The rest of this chapter, Section 1.2, covers the basics of probability and statistics. In Chapter 2, Markov chains basics are presented together with generation methods for them. Exact steady-state solution techniques for Markov chains are given in Chapter 3 and their aggregation/disaggregation counterpart in Chapter 4. These aggregation/disaggregation solution techniques are useful for practical Markov chain models with very large state spaces. Transient solution techniques for Markov chajns are introduced in Chapter 5. Chapter 6 deals with the description and computation of performance measures for single-station queueing systems in steady state. A general description of queueing networks is given in Chapter 7. Exact solution methods for PFQN are described in detail in Chapter 8 while approximate solution techniques for PFQN are described in Chapter 9. Solution algorithms for different types of NPFQN ( sue h as networks with priorities, non-exponential service times, blocking, or parallel processing) are presented in Chapter 10. The solution algorithms introduced in this book can also be used for optimization problems as described in Chapter 11. For the practical use of modeling techniques described in this book, software packages (tools) are needed. Chapter 12 is devoted to the introduction of a queueing network tool, a stochastic Petri net tool, a tool based on Markov chains and a toolkit with many model types, and the facility for hierarchical modeling is also introduced. Each tool is described in some detail together with a simple example. Throughout the book we have provided many example applications of different algorithms introduced in the book. Finally, Chapter 13 is devoted to several large real-life applications of the modeling techniques presented in the book.
4
INTRODUCTION
Fig. 1.1 Flowchart mance problem.
describing
how to find the appropriate
chapter for a given perfor-
BASICS
1.2
BASICS OF PROBABILITY
OF PROBABILITY
AND
STATISTICS
5
AND STATISTICS
We begin by giving a brief overview of the more important definitions and results of probability theory. The reader can find additional details in books such as [AllegO, Fe1168,Koba78, Triv82]. We assumethat the reader is familiar with the basic properties and notations of probability theory. 1.2.1
Random Variables
A random variable is a function that reflects the result of a random experiment. For example, the result of the experiment “toss a single die” can be described by a random variable that can assumethe values one through six. The number of requests that arrive at an airline reservation system in one hour or the number of jobs that arrive at a computer system are also examples of a random variable. So is the time interval between the arrivals of two consecutive jobs at a computer system, or the throughput in such a system. The latter two examples can assumecontinuous values, whereas the first two only assumediscrete values. Therefore, we have to distinguish between continuous and discrete random variables. 1.2.1.1 Discrete Random Variables A random variable that can only assume discrete values is called a discrete random variable, where the discrete values are often non-negative integers. The random variable is described by the possible values that it can assumeand by the probabilities for each of these values. The set of these probabilities is called the probability mussfunction (pmf) of this random variable. Thus, if the possible values of a random variable X are the non-negative integers, then the pmf is given by the probabilities: pk = P(x
= k),
for k = 0, 1,2.. . ,
(1-l)
the probability that the random variable X assumesthe value Ic. The following is required: P(X = k) 2 0, c
P(X = k) = 1.
all k
For example, the following pmf results from the experiment “toss a single die” : P(X=k)=i,
forIc=1,2
,...,
6.
The following are other examples of discrete random variables: l
Bernoulli random variable: Consider a random experiment that has two possible outcomes, such as tossing a coin (Ic = 0,l). The pmf of the random
random variable X is given by: P(X l
= 0) = 1 -p
Binomial random is carried out n random variable The pmf of X is
and
with 0 < p < 1.
(1.2)
i
~“(1 -p)+lc,
0
k = O,l,...,
n.
w
The experiment with two possible outGeometric random variable: comes is carried out several times, where the random variable X now represents the number of trials it takes for the outcome 1 to occur (the current trial included). The pmf of X is given by: P(X
l
= 1) = p,
variable: The experiment with two possible outcomes times where successive trials are independent. The X is now the number of times the outcome 1 occurs. given by:
P(X= k)= l
P(X
= k) = p(l
Poisson random variable: pmf ) is given by:
- PI”-‘7
k = 1,2,. . . .
(1.4
The probability of having k events (Poisson
j’(X = k) = @&.e-,
k = 0, 1,2, . . . ; a > 0.
.
(1.5)
The Poisson and geometric random variables are very important to our topic; we will encounter them very often. Several important parameters can be derived from a pmf of a discrete random variable: l
Mean value or expected value: X=E[X]=~k?(X=k). all
(l-6)
k
The function of a random variable is another random variable with the expected value of:
all l
k
nth moments: X” = E[Xn] = c all
k”. P(X = k),
W)
k
that is, the nth moment is the expected value of the nth power of X. The first moment of X is simply the mean of X.
BASICS
l
OF PROBABILITY
AND
STATISTKS
7
nth central moment:
(X- xp = E[(X- E[X]y]= X(/k - X)“.P(X= k).
W)
all k
The nth central moment is the expected value of the nth power of the difference between X and its mean. The first central moment is equal to zero. l
The second central moment is called the variance of X: 0: = var(X)
= (X - 3Q2 = X2 -X2,
where crx is called the standard l
The coefficient of variation
(1.10)
deviation.
is the normalized cx=Y-.
standard
deviation:
*X
(1.11)
X
Information on the average deviation of a random variable from its expected value is provided by cx , ox, and var( X) . If cx = ox = var( X) = 0, then the random variable assumesa fixed value with probability one. Tab/e Random
Bernoulli Binomial Geometric Poisson
1.1 Propertiesof severaldiscreterandomvariables Variables
Parameter
P
w
52;
P
P&P)
w
w(l-P)
1 P
a
var(X)
1-P P 1-P
-
1-P
G
P2
a
cy
w
1-P
-1 Q
Table 1.1 gives a list of random variables, their mean values, variances, and the squared coefficients of variation for some important discrete random variables. A random variable X that can assume 121.2 Continuous Random Variables all values in the interval [a, b], where --00 5 a < b 5 +oo, is called a continuous random variable. It is described by its distribution function (also called CDF or cumulative distribution function): (1.12)
8
lNTRODlJCT/ON
which specifies the probability that the random variable X takes values less than or equal to x, for every x. Prom Eq. (1.12) we get for 2 < y: F,(x)
5 F,(Y)7
P(x < x 5 Y) = F,(Y)
- F, (4.
density function (pdf) f x (x ) can be used instead of the disThe probability tribution function, provided the latter is differentiable:
fx(x)
(1.13)
= y.
properties of the pdf are: fx (4 2 0 for all 2, 00 s -CO P(Xl
I
fx (4dx = 1,
x
L
x2)
=
x2 fx (x)dx, s Xl
~(x
= x) =
fx(x)dx
s
= 0,
2 cc
P(X
> x3) =
fx(x)dx.
s 23
Note that for so-called defective random variables [STP96] we have:
s
fx(x)dx
< 1.
-CC
The density function of a continuous random variable is analogous to the pmf of a discrete random variable. The formulae for the mean value and moments of continuous random variables can be derived from the formulae for discrete random variables by substituting the pmf by the pdf and the summation by an integral: l
Mean value or expected value: 03 x
= E[X]
=
s
x* fx (4dx
(1.14)
BASICS
OF PROBABILITY
AND
STATISTICS
9
and: (1.15)
l
nth moment: cc X”
= E[X”]
=
xn*
fx
(1.16)
(x)dx.
s --rx) l
nth central moment:
(X-X)” =E[(X -E[X]yy =px-T)“fx(x)dx. (1.17)
-CO
0 Variance: 0: = var(X) with ox as the standard 0 Coefficient
= (X - IT)’ = x2-
X2,
(1.18)
deviation.
of variation: cx
=- OX -.
(1.19)
X
A very well known and important continuous distribution function is the normal distribution. The CDF of a normally distributed random variable X is given by: F,(x)
= -$+J
] exp (-‘“LT2) --oo
du,
(1.20)
and the pdf by:
fx(2) = r&
exp (- (x2:?2)
.
X
The standard CDF:
normal distribution
is defined by setting x = 0 and ax = 1: @(x) = $= T. 1 exp (-z)
du,
--03
pdf:
c)(x) = -&.
II-
exp (-;)
.
(1.21)
10
INTRODUCTION
-3
-2
-1
0
1
2
3
X
Fig. 1.2
pdf of the standard
normal
random variable.
A plot of the preceding pdf is shown in Fig. 1.2. For an arbitrary normal distribution we have:
and
fx(x)
= 4
7
respectively. Other important continuous random variables are described as follows. (a) Exponential
Distribution
The exponential distribution is the most important and also the easiest to use distribution in queueing theory. Interarrival times and service times can often be represented exactly or approximately using the exponential distribution. The CDF of an exponentially distributed random variable X is given by Eq. (1.22):
o<x
with x =
X’ 1 2
if X represents interarrival times,
(1.22)
if X represents service times.
Here X or p denote the parameter of the random variable. In addition, for an exponentially distributed random variable with parameter X the following relations hold:
BASICS
OF PROBABILITY
pdf:
fx(z)
mean:
FE
variance:
var(X)
coefficient
of variation:
= Xe-
AND AX
STATISTICS
11
,
;, = $,
cx = 1.
Thus the exponential distribution is completely determined by its mean value. The importance of the exponential distribution is based on the fact that it is the only continuous distribution that possesses the memoryless property: P(XLU+tIX>u)=l-exp
(
-f
>
= P(X
5 t).
(1.23)
As an example for an application of Eq. (1.23), consider a bus stop with the following schedule: Buses arrive with exponentially distributed interarrival times and identical mean x. Now if you have already been waiting in vain for u units of time for the bus to come, the probability of a bus arrival within the next t units of time is the same as if you had just shown up at the bus stop, that is, you can forget about the past or about the time already spent waiting. Another important property of the exponential distribution is its relation to the discrete Poisson random variable. If the interarrival times are exponentially distributed and successive interarrival times are independent with identical mean F, then the random variable that represents the number of buses that arrive in a fixed interval of time [0, t) has a Poisson distribution with parameter a = t/s?. Two additional properties of the exponential distribution can be derived from the Poisson property: 1. If we merge n Poisson processeswith distributions for the interarrival times 1 - eSxtt, 1 -< i < - n, into one single process, then the result is a Poisson process for which the interarrival times have the distribution 1 - eext with X = Cy=, Xi (seeFig. 1.3).
Fig. 1.3 Merging of Poissonprocesses.
2. If a Poisson process with interarrival time distribution 1 - e- xt is split into n processesso that the probability that the arriving job is assigned
12
1NTRODUCTlON
to the ith process is qi, 1 5 i 5 n, then the ith subprocess has an interarrival time distribution of 1 - e-qiXt, i.e., n Poisson processeshave been created, as shown in Fig 1.4. ax / x -e
q2x . . .
Fig. 1.4 Splitting of a Poisson process.
The exponential distribution has many useful properties with analytic tractability, but is not always a good approximation to the observed distribution. Experiments have shown deviations. For example, the coefficient of variation of the service time of a processoris often greater than one, and for a peripheral device it is usually lessthan one. This observed behavior leads directly to the need to consider the following other distributions: (b) Hyperexponential
Distribution,
H,+
This distribution can be used to approximate empirical distributions with a coefficient of variation larger than one. Here k is the number of phases.
Fig. 1.5
A random variable with Hk distribution.
Figure 1.5 shows a model with hyperexponentially distributed time. The model is obtained by arranging k phaseswith exponentially distributed times and rates ~1,,22,. . . ,pk in parallel. The probability that the time span is given by the jth phase is qj , where C:=r qj = 1. However, only one phase can be occupied at any time. The resulting CDF is given by:
l+(x)
= &$(lj=l
e--I-Lq,
x 2 0.
(1.24)
BASICS
OF PROBABILITY
AND
STATISTICS
13
pdf:
mean:
variance:
coefficient of variation:
cx =
2&3-l i
j=l
2 1. ’
For example, the parameters PI, t32 of an Hz distribution can be estimated to approximate an unknown distribution with given mean x and sample coefficient of variation cx as follows: (1.25)
(1.26) The parameters qr and q2 can be assignedany values that satisfy the restrictions ql, q2 2 O,ql +qz = 1 and pi, ~2 > 0. (c) Erlang-k
Distribution,
Ek
Empirical distributions with a coefficient of variation less than one can be approximated using the Erlang-Ic distribution. Here Ic is the number of exponential phases in series. Figure 1.6 shows a model of a time duration that has an EI, distribution. It contains Ic identical phases connected in series, each with exponentially distributed time. The mean duration of each phase is x/i?, where x denotes the mean of the whole time span.
Fig. 1.6
A random variable with Ek distribution.
If the interarrival times of some arrival processlike our bus stops are identical exponentially distributed, it follows that the time between the first arrival and the (Ic + 1)th arrival is Erlang-k distributed.
14
INTRODUCTION
The CDF is given by: Fx(x)
x 2 0,k = 1,2....
= 1 -e
pdf:
fx(x)
=
kdkpx)k-l (k -
2 >
e-kpx,
I)!
0?k =
(1.27)
1 7 2 Y”‘,
1
xc
variance:
1, I-L var(X) = -+$,
coefficient of variation:
cx = 5
mean:
1
5 1.
If the sample mean x and the sample coefficient of variation cx are given, then the parameters k and ,IAof the corresponding Erlang distribution are estimated by: (1.28) and: 1 c$ Icy’
p=-
(d)
Hypoexponential
(1.29)
Distribution
The hypoexponential distribution ariseswhen the individual phasesof the Ek distribution are allowed to have assigned different rates. Thus, the Erlang distribution is a special case of the hypoexponential distribution. For a hypoexponential distributed random variable X with two phasesand the rates ~1 and ,u2 (~1 # ~2), we get the CDF as: l+(x)
= l-
P2
-e P2 -
-P1X +
Pl
Pl
P2
pdf:
fx(x)
mean:
XC
variance:
var(X)
coefficient of variation:
cx=h5Z
-
c-P2X
7
= s(emp2x
= -j!j + i,
fP2
0.
- evplx),
;+-$
Pl
x >
(1.30)
Pl
*
x > 0,
BASICS
OF PROBABILITY
AND
STATISTICS
15
The values of the parameters ~1 and ~2 of the hypoexponential CDF can be estimated given the sample mean x and sample coefficient of variation by:
with 0.5 5 c$ 5 1.
P2,
For a hypoexponential distribution pk, we get [Bega93]: *** >
with
k phases and the phase rates ~1,
k
pdf:
fx(x)
= ca&e;Aix,
2 > 0,
i=l k
with ai =
rI j=l,jzi
& pj - I&’
l 5 2 L Icy
mean:
coefficient
(e) Gamma
of variation:
cx =
Distribution
Another generalization of the Erlang-k distribution for arbitrary coefficient of variation is the gamma distribution. The distribution function is given by: X
cl+’ Fx(x)
=
(cxpu)“-l r(~)
f emapudu,
J
0, ct > 0,
(1.31)
0
with I’(a) =
x 2
u~-‘~~-~cZU,
a! > 0.
Jrn0
If a = k is a positive integer, then I’(k) = (k - l)! Thus the Erlang-k distribution can be considered as a special case of the gamma distribution:
16
INTRODUCTION
pdf: y=
variance:
1, P var(X) = --&,
coefficient of variation:
cx=-.
mean:
A
It can be seen that the parameters Q and ,Qof the gamma distribution can easily be estimated from cx and x: and
1
a=-.
CL (f)
Generalized
fig. 1.7
Erlang
Distribution
A random
variable with generalized
Erlang
distribution.
Rather complex distributions can be generated by combining hyperexponential and Erlang-Ic distributions; these are known as generalized Erlang distributions. An example is shown in Fig. 1.7 where 72parallel levels are depicted and each level j contains a series of rj phases connected, each with exponentially distributed time and rate rjpj. Each level j is selected with probability j. The pdf is given by: x 2 0.
(1.32)
Another type of generalized Erlang distribution can be obtained by lifting the restriction that all the phases within the same level have the same rate, or, alternatively, if there are non-zero exit probabilities assigned such
that remaining
phases can be skipped.
These generalizations
complicated equations that are not further described here.
lead to very
BASICS
(g) Cox Distribution,
cr,
(Branching
OF PROBABILITY
Erlang
AND
17
STATISTICS
Distribution)
In [Cox55] the principle of the combination of exponential distributions is generalized to such an extent that any distribution that possessesa rational Laplace transform can be represented by a sequence of exponential phases with possibly complex probabilities and complex rates. Figure 1.8 shows a model of a Cox distribution with k exponential phases.
Fig. 1.8 A randomvariable with Ck distribution.
The model consists of k phases in series with exponentially distributed times and rates ~1, ~2,. . . , ,~k. After phase j, another phase j + 1 follows with probability aj and with probability bj = 1 - aj the total time span is completed. As described in [SaCh81], there are two cases that must be distinguished when using the sample mean value x and the sample coefficient of variation cx to estimate the parameters of the Cox distribution: Case I: cx 5 1 To approximate a distribution with cx 5 1, we suggestusing a special Cox distribution with (see Fig. 1.9): /Lj
=/A
j=
l,...,k,
j=2,...,k-1.
CLj=l
From the probabilistic definitions, we obtain: x
mean:
=
h
+
k(l
-
bl)
I-L variance:
var(X)
squared coefficient of variation:
=
2 cx
=
7
k + bl(k - 1) (bl(1 - k) + k - 2) 7
P2 k + bl(k - 1) (bi(1 - k) + k - 2) [bl
+
k(l
-
h)12
’
In order to minimize the number of stages, we choose k such that: rli
k= k- . IG I
(1.33)
18
INTRODUCTION
Fig. 1.9
A random
variable with ck distribution
and cx 5 1
Then we obtain the parameters 131and ,LLfrom the preceding equations: b = 2kc$ + (ii - 2) - j/k2 + 4 - 41cc; 7 1 2(c$ + l)(k - 1) k - bl * (k - 1) P=
x
(1.34)
(1.35)
*
Case 2: cx > 1 For the case cx > 1, we use a Cox-2 model (see Fig. 1.lO)
Fig. 1.10
A random variable with Cox-2 distribution.
From classical formulae, we obtain: mean: variance:
var(x)
= l-4 + a/-w - a> ’ d 94
squared coefficient of variation:
2 _ Pi + ad@ - a> (~2 +wI)~ * cx -
We have two equations for the three parameters ~1, ~2, and a, and therefore obtain an infinite number of solutions. To obtain simple equations we choose: (1.36)
p1=$ and obtain from the preceding equations: 1 and a=ZC2. X
(1.37)
BASICS
(h) Weibull
OF PROBABILITY
AND
STATISTICS
19
Distribution
The Weibull distribution is at present the most widely used failure distribution. Its CDF is given by: Fx(x)
= 1 - exp(-(Xx)“),
pdf:
fx(x)
mean:
.x=;r
(1.38)
2 2 0.
= ~X(XX)“-~
exp(-(Xx)“),
X > 0,
1+: ) ( ) r(1 + 2/cu) cc = {I?(1 + l/o.)}2 - l*
squared coefficient of variation:
with the shape parameter a! > 0 and the scale parameter X > 0. (i) Lognormal
Distribution
The lognormal distribution is given by:
171(2)=m(l+y), pdf: mean:
f+)
= ---&
w+-@(x)
x>o. - J92/2a2>,
(1.39)
x > 0,
X = exp(X + a2/2),
squared coefficient of variation:
c$- = exp(a2) - 1,
where the shape parameter a! is positive and the scaleparameter X may assume any real value. The parameters a and X can easily be calculated from c$ and x: Q = -\/ln(cz + l), Q2 (1.40) 2 The importance of this distribution arises from the fact that the product of n mutually independent random variables has a log-normal distribution in the limit n -+ 00. In Table 1.2 the formulae for the expectation E[X], the variance war(X), and the coefficient of variation cx for some important distribution functions are summarized. Furthermore, in Table 1.3 formulae for estimating the parameters of these distributions are given. X=lnX--.
1.2.2
Multiple
Random
Variables
In some cases, the result of one random experiment determines the values of several random variables, where these values may also affect each other. The joint probability mass .function of the discrete random variables
20
INTRODUCTION
Table 1.2 Expectation important distributions.
E[X],
variance
var(X),
and coefficient
Distribution
Parameter
-WI
var(X)
Exponential
-1 P
1
CL cl,k k=l,Z,...
-1 P
kcL2
-cl! P
CL2
Erlang
Gamma
Hypoexponential
Pl
cx 1
CY
o<
L+L P2
2
4
k
Hyperexponential
k,I.Lz,qz
x:
k
2/A2cq1>1
= ;
i=l
i=l
&,X2,-.
cx of
/-L2 1
L+'
PlTP2
of variation
4
, X, is given by:
P(X1 = 21,x2 = x2,. . . ,xn = XJ
(1.41)
and represents the probability that Xi = xi, X2 = x2, . . . X, = x,. continuous case, the joint distribution function:
In the (1.42)
~x(x)=~(x1121,x2122,...,x7L<~~)
represents the probability that Xi 5 xi, X2 5 x2, . . . X, 5 x,, where X = X,) is the n-dimensional random variable and x = (xi, x2, . . . , x,). (X l,***, A simple example of an experiment with multiple discrete random variables is tossing a pair of dice. The following random variables might be determined: Xi
number that shows on the first die,
X2
number that shows on the second die, sum of the numbers of both dice.
X3=&+&,
independence The random variables Xi, X2, . . . , X, are called (statistically) independent if, for the continuous case:
1.2.2.1
P(XlLXl,X:! 5x2,...,Xn <xn) = P(X1 5 x1)* P(X2 < X2)‘. . . * P(X,
5 x,),
(1.43)
or the discrete case: P(X1=21,X2
=x2,...,Xn
=x,)
= P(X1 = Xl). P(X2 = X2)‘. , . * P(X,
= 2,).
(1.44)
BASICS
Table 1.3
Formulae
Distribution
for estimating
OF PROBABILITY
the parameters
Parameter
AND
of important
Calculation
STATISTICS
distributions.
of the Parameters
Exponential
Erlang
k
=
ceil(l/c$)
P
=
l/(+
cl,k
/c=1,2,...
kx)
Gamma
Hypoexponential
CL1= + [l+q-l
Hyperexponential
Wd
Pl,P2,Ql,Q2
= $ [l+&q-1
p2
41 +42
k
=
bl
=
=
1,/K!
> 0
ceil(l/c$) 2kc;+k-2-
cox
(cx
5 1)
2(C$
k, bz, puz bp p1
cox
(cx
> 1)
k,b,/a,m
7 + l)(k
+4-4kc, - 1)
=
b3 = . . . = bk-1
=
k - bl. (k - 1) /&’ = . . . = /.hk = x
k
=
2
b
=
c$[l-,/*I
=o,
k
bl, = 1
21
22
INTRODUCTION
Otherwise, they are (statisticuZZy) dependent. In the preceding example of tossing a pair of dice, the random variables Xr and X2 are independent, whereas Xr and Xa are dependent on each other. For example:
since there are 36 different possible results altogether probable. Because the following is also valid:
that are all equally
=-t6,
P(x,=i)*P(x2=j)+;
P(Xi = i, X2 = j) = P(Xi = i)- P(X2 = j) is true, and therefore Xr and X2 are independent. On the other hand, if we observe the dependent variables Xi and Xs, then:
since Xr and X2 are independent. P(X1 = 2) = ;
Having:
and
P(X3=4)=P(X1=1,Xz=3)+P(X1=2,X2=2) +P(X,=3,Xz=1)=$& results in P(Xr = 2). P(Xa = 4) = 6. & = $, which shows the dependency of the random variables Xr and X3 : P(X1 = 2,X3 = 4) # P(X1 = 2)+(X3 1.2.2.2
Conditional
A conditional probability:
Probability P(X1
= 4).
=x1
1 x2 =22,.*.,X,=x,)
is the probability for Xr = 21 under the conditions X2 = x2, X3 = x3, etc. Then we get: P(X1
= Xl =
1 x2 =
x2,.
, . ) x,
= xn)
P(X1=21,X2 =x2 )...) x, =x,) P(X2 =x2,...,xn
(1.45)
=x,) *
For continuous random variables we have: P(X1
I =
P(Xl
Xl
1 x2
L
1X1,X2 P(X2
x2,.
. . ) x,
5
x?J
Lx2,...Jn L
x2,-..&I
(1.46)
LGL) I
xn)
*
BASICS
OF PROBABILITY
AND
STATISTICS
23
We demonstrate this also with the preceding example of tossing a pair of dice. The probability that Xa = j under the condition that Xi = i is given by:
= j,x, = i) P(X3 = j ) x1 = i) = P(X3 P(X1= i) * For example, with j = 4 and i = 2: l/36 1 P(X3 = 4 ) x1 = 2) = = l/6 6’ If we now observe both random variables Xi and X2, we will see that because of the independence of Xi and X2,
P(X1 =j)
x2 =i)
=
P(X1=
j, x2 = i) P(X2 = i) ’
= P(X1=
j)*P(X2 P(X2 = i)
= P(X1=
= i) ’
j).
1.2.2.3 important Relations The following relations concerning multiple random variables will be used frequently in the text: l
The expected value of a sum of random variables is equal to the sum of the expected values of these random variables. If cl, . . . , c, are arbitrary constants and Xi, X2, . . . , X, are (not necessarily independent) random variables, then:
rn
1
n
E c cixi = c CiE[XJ. i i=l 1 i=l l
(1.47)
If the random variables Xi, X2, . . . , X, are stochastically independent, then the expected value of the product of the random variables is equal to the product of the expected values of the random variables:
rn
ln
(1.48)
l
The covariance of two random variables X and Y is a way of measuring the dependency between X and Y. It is defined by: cov[X, Y] = E [(X - E[X])(Y = E[X.Y]
- E[Y])] = (X - r?)(Y - L) -- E[X]. E[Y] = XY - X-Y. (1.49)
24
INTRODUCTION
If X = Y, then the covariance is equal to the variance: cov[X, X] = x2 - X2 = var(X) If X and Y are statistically
independent,
= a$.
then Eq. (1.48) gives us:
E[X* Y] = E[X]LqY], and Eq. (1.49):
cov(X, Y) = 0. In this case, X and Y are said to be uncorrelated. l
The correlation coeficient of standard deviations:
is the covariance
cor[X, Y] = Therefore,
to the product
cov[X, Y] Ux’Uy -
(1.50)
if X = Y, then: cor[X,X]
l
normalized
= 2
= 1.
The variance of a sum of random variables can be expressed covariance:
using the
n
var
[c
i=l
ciXi
1= 2
For two independent
cT var(Xi)
+ 2.2
i=l
i=l
(uncorrelated)
[c
i=l
ciXi
cicj COV[Xi, Xj].
(1.51)
random variables we get:
n
var
f: j=i+1
= 2
1
cf var(Xi).
(1.52)
i=l
1.2.2.4 The Central Limit Theorem Let Xr , X2, . . . , X, be independent, identically distributed random variables with an expected value of E[X,] = z, and a variance of var( Xi) = a$. Then their arithmetic mean is defined by:
sn=;gxi. i=l We have E[Sn] = x and var(&) = OS/n. Regardless of the distribution of Xi, the random variable S, approaches, with increasing n, a normal distribution with a mean of x and a variance of var(X) /n: (1.53)
BASKS
1.2.3
OF PROBABILITY
AND
25
STATISTICS
Transforms
Determining the mean values, variance, or moments of random variables is often rather tedious. However, these difficulties can sometimes be avoided by using transforms, since the pmf of a discrete or the density function of a continuous random variable are uniquely determined by their x- or Laplace transform, respectively. Thus if two random variables have the same transform, they also have the same distribution function and vice versa. In many cases, the parameters of a random variable are, therefore, more easily computed from the transform. 1.2.3.1 x-Transform Let X be a discrete random variable with the probability massfunction pk = P(X = k) for k = 0, 1,2,. . . , then the x-transform of X is defined by: G,(z)
= Fp/+z”
with jz[ 5 1.
(1.54)
k=O
The function G,(z) is also called the probability generating function. Table 1.4 specifies the probability generating functions of some important discrete random variables. If the x-transform G, (x) of a random variable is given, then the probabilities pk can be calculated immediately: (1.55)
PO = G,(O),
1 d”G,(z) pk=
jJ’
k = 1,2,3,. . * *
7
&k
(1.56)
z=o
And for the moments: (1.57) (1.58) x"=
k-l
d”G, (4 dx’”
z=l
ak,,i’ 3,
+x
k=3,4,...,
(1.59)
i=l
with
3ak,l = (-l)“(k ak,k-l ak,i
- I)!,
k = 3,4,. . . ,
k(k - 1) 2 ’
=
~
=
ak-l,i-1
-
k = 3,4,. . . ) (k
-
l)ak-l,i,
k = 4,5, . . . and i = 2, . . . , k - 2.
26
INTRODUCTION
Table 1.4 Generating functions of some discrete random variables Random
Variable
Parameter
Binomial
G,
n,p
Geometric
P
Poisson
a
The generating function product of the generating CF=, Xi we get:
u
(4
- P> +
P4”
PZ
1 - (1 -
p)z
pb-1)
of a sum of independent random variables is the functions of each random variable, so for X =
G,(z) = fi G&).
(1.60)
i=l
1.2.3.2 Lap/ace Transform The Laplace transform plays a similar role for non-negative continuous random variables as the z-transform does for discrete random variables. If X is a continuous random variable with the density function fx(z), then the Lap&e transform (LT) of the density function (pdf) of X is defined by:
L,(s)
= mfx(x)e-s5dz, I
1emsI< _ 1,
(1.61)
0
where s is a complex parameter. Lx(s) is also called the Laplace-Stieltjes transform (LST) of the distribution function (CDF) and can also be written 03 Lx(s) =
I
e-““dFx(z).
(1.62)
0
LST of a random variable X will also be denoted by X” (s). The moments can again be determined by differentiation: ,
k=1,2
,....
(1.63)
s=o The Laplace transforms of several pdfs of well-known continuous random variables as well as their moments are given in Table 1.5.
BASICS
Table 1.5 Random
Laplace transform Variable
OF PROBABILITY
AND
and moments of several continuous
Parameter
27
STATISTICS
random variables X”
L&)=x-(s)
Exponential k(k
Erlang
k,X
+ 1). . . (k + n - 1) An
CY(LY+ 1). . . (CL:+ n - 1) An
Gamma
Hyperexponential
x
Hypoexponential
Analogous to the x-transform, the Laplace transform of a sum of independent random variables is the product of the Laplace transform of these random variables: (1.64) i=l
More properties and details of Laplace and z-transforms are listed in [Klei75]. 1.2.4
Parameter
Estimation
Probability calculations discussedin this book require that the parameters of all the relevant distributions are known in advance. For practical use, these parameters have to be estimated from measured data. In this subsection, we briefly discussstatistical methods of parameter estimation. For further details see [Triv82]. Definition 1.1 Let the set of random variables Xi, X2, . . . , X, be given. This set is said to constitute a random sample of size n from a population with associated distribution function Fx(x). It is presupposed that the random variables are mutually independent and identically distributed with distribution function (CDF) Fxi (x) = Fx(x) V’i, x. We are interested in estimating the value of some parameter 6’ (e. g. , the mean or the variance) of the population based on the random samples. Definition 1.2 A function 6 = 6(X1, X2, . . . , X,) is called an estimator of the parameter 8, and an observed value 6 = 6(x,, 22 . . . , x~) is known as
28
INTRODUCTION
an estimate of 8. An estimator
6 is considered
unbiased, if:
E[QXl,
x2, * * . )X,)] = 8. It can be shown that the sample mean X defined as:
,2exi
(1.65) i=l
is an unbiased estimator S2 defined as:
of the population
mean p, and the sample variance
s2 = 1 &Xi n - l i=l is an unbiased estimator
- 7q2
of the population
(1.66)
variance 02,
Definition 1.3 An estimator 6 of a parameter 8 is said to be consistent if 6 converges in probability to 8 (so-called stochastic convergence); that is: lim P(lC3 - 81 2 e) = OV/E> 0.
n+oo
1.2.4.1
Method
of Moments
The Icth sample moment of a random variable
X is defined as: n x’” ML =cx--$
k= 1,2 ,....
(1.67)
i=l
To compute one or more parameters of the distribution of X, using the method of moments, we equate the first few population moments with their corresponding sample moments to obtain as many equations as the number of unknown parameters. These equations can then be solved to obtain the required estimates. The method yields estimators that are consistent, but they may be biased. As an example, suppose the time to failure of a computer system has a gamma distributed pdf: pz”-le-x~ fx(4
,
=
k= 1,2 ,...,
(1.68)
wd
with parameters X and a. We know that the first two population moments are given by: and I_L~= 5 + ~1~. Pl = ; Then the estimates i and & can be obtained by solving:
Hence: &!=
Ml2 M2 - M12’
and i =
Ml
M2 - M12’
(1.69)
BASICS OF PROBABILITY
AND STATISTICS
29
The principle of this method is to 1.2.4.2 Maximum- Likelihood Estimation select as an estimate of 6’the value for which the observed sample is most likely to occur. We define the likelihood function as the product of the marginal densities:
,13,) is the vector of parameters to be estimated. Under where t9 = (&,&,... some regularity conditions, the maximum-likelihood estimate of 8 is the solution of the simultaneous equations: ALP) -
dei
=o,
i=
1,2 ,...) Ic.
As an example, assume the interarrival time, X, of a system is exponentially distributed with arrival rate X. To estimate the arrival rate X from a random sample of n interarrival times, we define: L(X) = fiAexp(-Xxi,) i=l
= X’“exp(-AF
zi),
(1.70)
i=l
dL = nAna dX
from which we get the maximum-likelihood estimator of the arrival rate that is equal to the reciprocal of the sample mean x. 1.2.4.3 Confidence Intervals So far we have concentrated on obtaining the point estimates of a desired parameter. However, it is very rare in practice for the point estimate to actually coincide with the exact value of the parameter 8 under consideration. In this section, we instead consider the evaluation of an interval estimate. We construct an interval, called the confidence interval, in such a way that we have a certain confidence that this interval contains the true value of the unknown parameter 8. In other words, if the estimator 6 satisfies the condition:
P@-~~<~&+E~)=Y,
(1.72)
then the interval A(8) = (6 - cl,6 + ~2) is a 100 x y percent confidence interval for the parameter 8. Here y is called the confidence coefficient (the probability that the confidence interval contains S). Consider obtaining confidence intervals when using the sample mean x as the estimator for the population mean. If the population distribution is normal with unknown mean ,Q and a known variance g2, then we can show that the sample mean is normally distributed with mean ,Qand variance 02/n, so that the random variable 2 = (57 - ,LL)/ (a/+) has a standard normal distribution N(0, 1). To determine a 100 x y percent confidence interval for
30
/NTROoUCnON
the population mean ,LL,then we find a number z,12 (using N(0, 1) tables) such that P(Z > z,,s) = a/2, where a = 1 - y. Then we get: < 2 <
PC-%/2
&/2)
(1.73)
=Y*
We then obtain the 100 x y percent confidence interval as: < (X-P)I(+m
-G/2
<
(1.74)
G/2
or: (1.75) The assumption that the population is normally distributed does not always hold. But when the sample size is large, by central limit theorem, the statistic CT- PL)l(~ldT n is asymptotically normal (under appropriate conditions). Also, Eq. (1.75) requires the knowledge of the population variance u2. When a2 is unknown, we use the sample variance S2 to get the approximate confidence interval for p: (1.76) When the sample size is relatively small and the population is normal distributed, then the random variable T = (x - p)/(S/fi) has the student t distribution with n - 1 degrees of freedom. We can then obtain the lOO( 1 - o) percent confidence interval of /J from: x
- L1;,/2-
where P(T > tn-pa,2) 1.2.5
Order
$
< p < x
+ t,-pap-
g
(1.77)
= a/2.
Statistics
Let X1, X2 . . . , X, be mutually independent, identically distributed continuous random variables, each having the distribution function F(.) and density f(.). Let Yr, Ys,. . . , Y, be random variables obtained by permuting the set ,X, so as to be in increasing order. Thus, for instance: x1,x2,... Yr = min{Xr,
X2,. . . ,X,}
and: Y, = max{Xr,X2,.
. . ,Xn}.
The random variable Yk is generally called the kth -order statistic. Because X, are continuous random variables, it follows that Yr < YZ < x1,x2,..., . . . < Y, (as opposed to Yr 5 Y2 < . . . < Yn) with a probability of one.
BASICS
OF PROBABILITY
AND
STATISTICS
31
As examples of the use of order statistics, let Xi be the time to failure of the ith component in a system of n independent components. If the system is a series system, then Yr will be the system lifetime. Similarly, Y, will denote the lifetime of a parallel system and Y,-,+I will be the lifetime of an m-out-of-n system. The distribution function of Yk is given by:
FYdY)
= 2
&%)[l
-oo
- F(y)]“-?
(1.78)
j=k
In particular, the distribution functions of Y, and Yr can be obtained from Eq. (1.78) as:
--oo< Y< 00,
= F%m
Fy,(Y)
and F&(Y) = 1 - [I - F(y)]“, 1.2.6
Distribution
-oo < y < 00.
of Sums
Assume that X and Y are continuous random variables with joint probability density function f. Suppose we are interested in the density of a random variable 2 = @(X, Y). The distribution of 2 may be written as: F&)
= P(Z 5 x) =
fh JJ A*
Y)dXdY,
(1.79)
where A, is a subset of R2 given by:
Az =
{(x,Y>
I Q(x, Y> L 21
= @-“((-co,
x]).
A function of special interest is 2 = X + Y with
AZ = {(x, Y)
I x + Y 5 4,
which is the half-plane to the lower left of the line x+y = x, shown in Fig. 1.11. Then we get:
JJf(x, = f(x, JJ -m -ccl
Fz(z) =
Y)dXdY
00
z-x
YPYdX*
32
INTRODUCTION
Fig. 1.11
Area of integration
for the convolution
of X and Y.
Making a variable substitution y = t - LC,we get: co rz(~)
z
= JJ -co .z
-
x)dtdx
f(x,t
-
x) dxdt
00
= JJ -cm
=
f(x,t -cm
-co
Jfz(Wt.
Thus the density of 2 is given by:
fz(~> = 7 f(x, x - x)dx, -CO
-co<x<m.
(1.80)
Now if X and Y are independent random variables, then f (x, y) = fx (x) fu (y), and Formula (1.80) reduces to:
f.h>
= 7 fx(x)fy(x --00
- x)dx,
-oo<X<~.
(1.81)
Further, if both X and Y are nonnegative random variables, then:
f&d
= /- fx(x)fu(z 0
- x)dx,
o
(1.82)
BASICS
OF PROBABILITY
AND
STATISTICS
33
This integral is known as the convolution of fx and fy . Thus, the pdf of the sum of two non-negative independent random variables is the convolution of their individual pdfs. If xr, x2,..., X, are mutually independent, identically distributed exponential random variables with parameter X, then the random variable Xi + x2 + * * * + X, has an Erlang-r distribution with parameter X. Consider an asynchronous transfer mode (ATM) switch with cell interarrival times that are independent from each other and exponentially distributed with parameter X. Let Xi be the random variable denoting the time between the (i - 1)st and ith arrival. Then 2, = Xi +X2 +. . . +X, is the time until the rth arrival and has an Erlang-r distribution. Another way to obtain this result is to consider N,, the number of arrivals in the interval (0, t]. As pointed out earlier, Nt has a Poisson distribution with parameter At. Since the events [Z, > t] and [N, < ~1 are equivalent, we have: P(&
> t) = P(N, < r)
which implies that: F&(t)
= P(.G = l-c,,e
I t) ?--I @t)j
-At >
j=o
*
which is the Erlang-r distribution function. If X N EXP(Xr), Y N EXP(&), X and Y are independent, and Xi # X2, then 2 = X + Y has a two-stage hypoexponential distribution with two phasesand parameters Xi and X2. To seethis: 2> 0
(by Eq. (1.82))
= Jjle-AlXX2e-X2(z-z)dZ .I 0
WQ + -ewXIZ
h&i = -e-‘2’ x1-
x2
x2 -x1
(see Eq. (1.30)).
34
INTRODUCTION
Fig. 1.12
Hypoexponential
A more general
version
distribution
as a series of exponential
stages.
of this result follows: Let 2 = 2 Xi, where i=l
. , X, are mutually independent and Xi is exponentially distributed with parameter Xi (Xi # Xj for i # j). Then 2 has a hypoexponential distribution with r phasesand the density function: x1,x2,..
(1.83) where: r ai=
rI j.l
- XLi Xj - Ai ’
l
(1.84)
j#i Such a phase type distribution can be visualized as shown in Fig. 1.12 (see also Section 1.2.1.2).
2 Markov Chains 2.1
MARKOV
PROCESSES
Markov processes provide very flexible, powerful, and efficient means for the description and analysis of dynamic (computer) system properties. Performance and dependability measures can be easily derived. Moreover, Markov processes constitute the fundamental theory underlying the concept of queueing systems. In fact, the notation of queueing systems has been viewed sometimes as a high-level specification technique for (a sub-class of) Markov processes. Each queueing system can, in principle, be mapped onto an instance of a Markov process and then mathematically evaluated in terms of this process. But besides highlighting the computational relation between Markov processes and queueing systems, it is worthwhile pointing out also that fundamental properties of queueing systems are commonly proved in terms of the underlying Markov processes. This type of use of Markov processes is also possible even when queueing systems exhibit properties such as non-exponential distributions that cannot be represented directly by discrete-state Markov models. Markovizing methods, such as embedding techniques or supplementary variables, can be used in such cases. Here Markov processes serve as a mere theoretical framework to prove the correctness of computational methods applied directly to the analysis of queueing systems. For the sake of efficiency, an explicit creation of the Markov process is preferably avoided. 2.1.1
Stochastic
and Markov
Processes
There exist many textbooks like those from King [KingSO], Trivedi [Triv82], Allen [AllegO], G ross and Harris [GrHa85], Cinlar [Cin1?5], Feller (his two
volume classic [FeUSS]), and Howard [Howa71] that provide excellent intro35
36
MARKOV CHAINS
ductions into the basics of stochastic and Markov processes. Besides the theoretical background, many motivating examples are also given in those books. Consequently, we limit discussion here to the essentials of Markov processes and refer to the literature for further details. Markov processes constitute a special, perhaps the most important, subclass of stochastic processes, while the latter can be considered as a generalization of the concept of random variables. In particular, a stochastic process provides a relation between the elements of a possibly infinite family of random variables. A series of random experiments can thus be taken into consideration and analyzed as a whole. Definition 2.1 A stochastic process is defined as a family of random variables {Xt : t E T} where each random variable Xt is indexed by parameter t E 57, which is usually called the time parameter if T G IK+ = [0, co). The set of all possible values of Xt (for each t E T) is known as the state space S of the stochastic process. If a countable, discrete-parameter set T is encountered, the stochastic process is called a discrete-parameter process and 7’ is commonly represented by (a subset of) No = {O,l,. . .}; otherwise we call it a continuous-parameter process. The state space of the stochastic process may also be continuous or discrete. Generally, we restrict ourselves here to the investigation of discrete state spaces and in that case refer to the stochastic processes as chains, but both continuous- and discrete-parameter processes are considered. Continuous-parameter stochastic processes can be probaDefinition 2.2 bilistically characterized by the joint (cumulative) distribution function (CDF) Fx(s; t) for a given set of random variables {Xt, , Xt, , . . . , Xt,}, parameter vector t = (tl, t2, . . . , tn) E Iw”, and state vector s = (si, ~2,. . . , sn) E Iw”, where tl < t2 < . . . < t,: Fx(s;t)
=P(Xt,
<
Sl,&
I
S2,-.Jt,
I
sn>.
(24
The joint probability density function, pdf:
PFx(s; t) fx(s; t) = as&~ . . . ds, is defined correspondingly if the partial derivatives exist. If the so-called Markov property is imposed on the conditional CDF of a stochastic process, a Markov process results: A stochastic process {Xt : t E T} constitutes a Murkov Definition 2.3 process if for all 0 = to < tl < . . . < t, < tn+l and all si E S the conditional CDF of Xt,+l depends only on the last previous value Xt, and not on the earlier values Xt, , X,, , . . . , Xtnel : P(Xt =
n+l L &+1
p(&,+l
I
I Xtn
= %,xt,-l
%+1
I Xtn
= 4.
= h-1,.
* * A,
= so) cw
MARKOV PROCESSES
37
This most general definition of a Markov process can be adopted to special cases. In particular, we focus here on discrete state spaces and on both discrete- and continuous-parameter Markov processes. As a result, we deal primarily with continuous-time Markov chains (CTMC), and with discretewhich are introduced in the next section. time Markov chains (DTMC), Finally, it is often sufficient to consider only systems with a time independent, i.e., time-homogeneous, pattern of dynamic behavior. Note that timehomogeneous system dynamics is to be discriminated from stationary system behavior, which relates to time independence in a different sense. The former refers to the stationarity of the conditional CDF while the latter refers to the stationarity of the CDP itself, Definition 2.4 Letting to = 0 without loss of generality, a Markov process is said to be time-homogeneous if the conditional CDF of Xt,+l does not depend on the observation time, that is, it is invariant with respect to time epoch t,: P(&n+l
2.1.2
Markov
I
%+1
I xt,
= sn) = qG,+&
I
h&+1 I x0
= Sn).
(2.3)
Chains
Equation (2.2) d escribes the well-known Markov property. Informally this can be interpreted in the sense that the whole history of a Markov chain is summarized in the current state Xtn. Equivalently, given the present, the future is conditionally independent of the past. Note that the Markov property does not prevent the conditional distribution from being dependent on the time variable t,. Such a dependence is prevented by the definition of homogeneity (see Eq. (2.3)). A um‘q ue characteristic is implied, namely, the sojourn time distribution in any state of a homogeneous Markov chain exhibits the memoryless property. An immediate, and somewhat curious, consequence is that the mean sojourn time equals the mean residual and the mean elapsed time in any state and at any time [Triv82]. If not explicitly stated otherwise, we consider Markov processes with discrete state spaces only, that is, Markov chains, in what follows. Note that in this case we are inclined to talk about probability mass functions, pmf, rather than probability density functions, pdf. Refer back to Sections 1.2.1.1 and 1.2.1.2 for details. 2.1.2.1 Discrete-Time Markov Chains We are now ready to proceed to the formal definition of Markov chains. Discrete-parameter Markov chains are considered first, that is, Markov processes restricted to a discrete, finite, or countably infinite state space, S, and a discrete-parameter space T. For the sake of convenience, we set T 2 lVe. The conditional pmf reflecting the Markov property for discrete-time Markov chains, corresponding to Eq. (2.2), is summarized in the following definition:
MARKOV CHAINS
A given stochastic process (X0, Xi,. . . , Xn+i, . . .} at the Definition 2.5 consecutive points of observation 0, 1, . . . , n + 1 constitutes a DTMC if the following relation 011the conditional pmf, that is, the Markov property, holds for all n E Ne and all .si E S:
P(X,+1 = s,+1 1x, = s,, X,-l
= s,-1, . . . ,x0 = so) (2.4
=
q&a+1
=
ha+1
I xx
=
sn).
Given an initial state se, the DTMC evolves over time, that is, step by step, according to one-step transition probabilities. The right-hand side of Eq. (2.4) reveals the conditional pmf of transitions from state s, at time step n to state s,+i at time step (n + 1). Without loss of generality, let an d wri t e conveniently the following shorthand not ation for S = {0,1,2,...} the conditional pmf of the process’s one-step transition from state i to state j at time n: p,‘:‘(n) = P(X,+l
= s,+l = j 1X, = sn = i).
P-5)
In the homogeneous case, when the conditional pmf is independent of epoch n, Eq. (2.5) reduces to: p!!) = p!!)(n) = P(X n+l = j 1X, = i) = P(X1 = j 1X0 = i), 2.7 a.7
Vn E T. (2.6)
For the sake of convenience, we usually drop the superscript, so that pij = ~13) refers to a one-step transition probability of a homogeneous DTMC. Starting with state i, the DTMC will go to some state j (including the possibility of j = i), so that it follows that Cj pij = 1, where 0 5 pij 5 1. The one-step transition probabilities pij are usually summarized in a nonnegative, stochastic’ transition matrix P:
p
= p(l)
=
bij]
=
PO0
PO1
po2
**’
;;;
;;;
g:;
: 1:
I
.
. . .
. . .
.
.
.I
Graphically, a finite-state DTMC is represented by a state transition diagram, a finite directed graph, where state i of the chain is depicted by a vertex, and a one-step transition from state i to state j by an edge marked with one-step transition probability pij. As an example, consider the one-step transition probability matrix in Eq. (2.7) with state space S = (0, 1) and the corresponding graphical representation in Fig. 2.1.
‘The
elements
in each row of the matrix
sum
up to
1.
MARKOV PROCESSES
Example 2.1 The one-step transition probability DTMC in Fig. 2.1 is given by:
39
matrix of the two-state
Conditioned on the current DTMC state, a transition is made from state 0 to state 1 with probability 0.25, and with probability0.75, the DTMC remains in state 0 at the next time step. Correspondingly, a transition occurs from state 1 to state 0 with probability 0.5, and with probability 0.5 the chain remains in state 1 at the next time step.
Fig. 2.1
Example of a discrete-time Markov chain referring to Eq. (2.7).
Repeatedly applying one-step transitions generalizes immediately to n-step transition probabilities. More precisely, let p$‘(Ic, Z) denote the probability that the Markov chain transits from state i at time Ic to state j at time Z in exactly n = 1 - Ic steps:
Again, the theorem of total probability applies for any given state i and any given time values k and Z such that Cj piy)(b, 1) = 1, where 0 < piy)(k, 1) 5 1. This fact, together with the Markov property, immediately leads us to a procedure for computing the n-step transition probabilities recursively from the one-step transition probabilities: The transition of the process from state i at time Ic to state j at time Zcan be split into subtransitions from state i at time k to an intermediate state2 h, say, at time m and from there, independently of the history that led to that state, from state h at time m to state j at time I, where k < m < Z and n = Z - k. This condition leads to the well known system of Chapman-Kolmogorov equations: p$)(k, Z) = c
p,(;-“) (k,m)pL3ym)(m 7Z)7 0 5 k < m < 1.
cw
hES
Note that the conditional independence assumption, i.e., the Markov property, is reflected by the product of terms on the right-hand side of Eq. (2.9).
2The Markov chain must simply traverse some state at any time.
40
MARKOV CHAINS
Similar to the one-step case, the n-step transition probabilities can be simplified for homogeneous DTMC such that pi:) = p$) (Ic, Z) depend only on the difference n = I - k and not on the actual values of k and I: p!,“’ = P(x~+,
= j 1XI, = i) = P(X,
= j 1X0 = i),
Under this condition, the Chapman-Kolmogorov DTMC simplifies to: p!?) %.I
=
C
(2.10)
Eq. (2.9) for homogeneous
0 <
p,(;l”)p~-m’,
‘dk E T.
m
<
n.
&S
Because Eq. (2.11) holds for all m < n, let m = 1 and get
(n)-_c Pi, (1) (n-1) Pij P&j * hES
With Pen) as the matrix of n-step transition probabilities ply’, Eq. (2.11) can be rewritten in matrix form for the particular case of m = 1 as Pen) = p(l)p(n-1) = pp(n-1). A pp 1ying this procedure recursively results in the following equation3: ~(4 = pp(n--1) = pn.
(2.12)
The n-step transition probability matrix can be computed by the (n - l)-fold multiplication of the one-step transition matrix by itself. Referring back to the example in Fig. 2.1 and Eq. (2.7), the Example 2.2 four-step transition probability matrix, for instance, can be derived according to Eq. (2.12): p(4) = pp(“) = (‘b;;
=
= p2pw og2Prz)
0.65625 0.67188
= (y-$7;
0.34375 0.32813
y-y;;)
0.66406 0.66797
PPW
(2*13)
0.33594 0.33203 > *
Ultimately, we wish to compute the pmf of the random variable X,, that is, the probabilities v;(n) = P(X, = i) that the DTMC is in state i, at time step n. These probabilities, called transient state probabilities at time n, will then allow us to derive the desired performance measures. Given the n-step transition probability matrix P cn), the vector of the state probabilities at time
31t is important
to keep in mind
that
P cn)
= Pn holds
only
for homogeneous
DTMC.
MARKOV PROCESSES
n, y(n) =
@0(n),
vi(n),
v2(n),
, . .>
the initial probability vector
can be obtained by unconditioning
y(O)
=
(vo(O),
e(O),
vz(O),
41
Pen) on
- * .>:
y(n) = v(O)P (4 = v(o)Pn = v(?2 - l)P.
(2.14)
Note that both v(n) and v(0) are represented as row vectors in Eq. (2.14). Assume that the DTMC from Eq. (2.7) under investigation Example 2.3 is initiated in state 1, then the initial probability vector v(l)(O)=(O, 1) is to be applied in the unconditioning according to Eq. (2.14). With the already computed four-step transition probabilities in Eq. (2.13) the corresponding pmf y(l) (4) can be derived: “(l)(4) Example
2.4
0.66797 0.66406
= (0,l)
Alternatively,
0.33203 = (0.66406,0.33594). 0.33594 >
with another initial probability
vector:
“(2)(O) = ($, ;> = (0.66,0.33) , according to Eq. (2.14), we would get: ~(~~(4) = (0.66,0.33) as the probabilities at time step 4. Of particular importance are homogeneous DTMC on which a so-called stationary probability vector can be imposed in a suitable way: State probabilities u = (~0, ~1,. . . vi,. . .) of a discrete-time Definition 2.6 Markov chain are said to be stationary, if any transitions of the underlying DTMC according to the given one-step transition probabilities P = bij] have no effect on these state probabilities, that is, vj = ‘& uipij holds for all states j E S. This relation can also be expressed in matrix form: u = VP,
c
vi = 1.
(2.15)
iES
Note that according to the preceding definition, more than one stationary pmf can exist for a given, unrestricted, DTMC. By substituting the one-step transition matrix from Eq. (2.7) Example 2.5 in Eq. (2.15), it can easily be checked that: uc2) = (;, 3) = (0.66,0.33) , is a stationary probability
vector while Y(‘) = (0,l) is not.
42
MARKOV CHAINS
For an efficient analysis, we are interested in the limiting Definition 2.7 state probabilities 6 as a particular kind of stationary state probabilities, which are defined by: fi = &nrV(n)
= $r& v(0)P(nJ = y(O) lim Pen) = y(O)p. n-+03
(2.16)
As n + cc, we may require both the n-step transition Definition 2.8 probability matrix Pen) and the state probability vector v(n) to converge independently of the initial probability vector v(0) to %’ and V, respectively. Also, we may only be interested in the case where the state probabilities Vi > 0, Vi E S, are strictly positive and xi Yi = 1, that is, 5 constitutes a pmf. If all these restrictions apply to a given probability vector, it is said to be the unique steady-state probability vector of the DTMC. Example 2.6 tion probabilities
Returning to Example 2.1, we note that the n-step transiconverge as n -+ 00:
With this result, the limiting state probability vector G, Example 2.7 which is independent of any initial probability vector Y(O), can be derived according to Eq. (2.16):
2, = (0.66,0.33). Example 2.8 Since all probabilities in the vector (0.66,0.33) are strictly positive, this vector constitutes the unique steady-state probability vector of the DTMC. Eventually, the limiting state probabilities become independent of time steps, such that once the limiting probability vector is reached, further transitions of the DTMC do not change this vector, i.e., it is stationary. Note that such a probability vector does not necessarily exist for all DTMCs. If Eq. (2.16) holds and fi is independent of v (0)) it follows that the limit p(n) = [p!“‘] is independent of time n and of index i. All rows of 6 would be identiczl, that is, the rows would match element by element. Furthermore, the jt h element j&j of row i equals zP~for all i E S:
(2.17)
If the unique steady-state probability vector of a DTMC exists, it can be determined by the solution of the system of linear Eqs. (2.15), so that I? need
MARKOV PROCESSES
43
not be determined explicitly. From Eq. (2.14), we have y(n) = v(n - l)P. If the limit exists, we can take it on both sides of the equation and get: lim v(n) = ZI = lim v(n - l)P = C/p.
n+cc
(2.18)
n+cc
In the steady-state case no ambiguity can arise, so that, for the sake of convenience, we may drop the annotation and refer to steady-state probability vector by using the notation u instead of t. Steady-state and stationarity coincide in that case, i.e., there is only a unique stationary probability vector. The computation of the steady-state probability vector u of a DTMC is usually significantly simpler and less expensive than a time-dependent computation of v(n). It is therefore the steady-state probability vector of a DTMC that is preferably taken advantage of in modeling endeavors. But a steadystate probability vector does not exist for all DTMCS.~ Additionally, it is not always appropriate to restrict the analysis to the steady-state case, even if it does exist. Under some circumstances time-dependent, i.e., transient, analysis would result in more meaningful information with respect to an application. Transient analysis has special relevance if short-term behavior is of more importance than long-term behavior. In modeling terms, “short term” means that the influence of the initial state probability vector Y(O) on v(n) has not yet disappeared by time step n. Before continuing, some simple example DTMCs are presented to clarify the definitions of this section. The following four one-step transition matrices are examined for the conditions under which stationary, limiting state, and steady-state probabilities exist for them. Example 2.9 Consider the DTMC shown in Fig. 2.2 with the one-step transition probability matrix (TPM): p=
Fig. 2.2
l
l (0
O 1> *
(2.19)
Example of a discrete-time Markov chain referring to Eq. (2.19).
For this one-step TPM, an infinite number of stationary probability vectors exists: Any arbitrary probability vector is stationary in this case according to Eq. (2.15).
*The conditions under following section.
which
DTMCs
converge
to steady-state
is precisely
stated
in the
MARKOV CHAINS
l
The n-step TPM Pen) converges in the limit to: lim PC”) = fi =
n-boo
t
i
(
. >
Furthermore, all n-step TPMs are identical: P = P = Pen),
Vn E T.
The limiting state probabilities z/ do exist and are identical to the initial probability vector Y(O) in all cases: fi = Y(O)P = u(0). l
A unique steady-state probability vector does not exist for this example.
Example TPM:
2.10
Next, consider the DTMC
p=
Fig. 2.3
l
shown in Fig. 2.3 with the
O l ( 1 0> *
(2.20)
Example of a discrete-time Markov chain referring to Eq. (2.20).
For this one-step TPM, a stationary probability vector, which is unique in this case, does exist according to Eq. (2.15): u = (0.5,0.5).
l
The n-step transition matrix P cn) does not converge in the limit to any p. Therefore, the limiting state probabilities t do not exist.
l
Consequently, a unique steady-state probability
Example
2.11
Consider the DTMC shown in Fig. 2.4 with the TPM: p =
0.5 ( 0.5
l
vector does not exist.
0.5 0.5 > 4
For this one-step TPM a unique stationary probability according to Eq. (2.15): u = (0.5,0.5) .
(2.21) vector does exist
MARKOV PROCESSES
Fig. 2.4
Example of a discrete-time Markov chain referring to Eq. (2.21).
Note that this is the same unique stationary probability the different DTMC in Eq. (2.20). l
45
vector as for
The n-step TPM Pen) converges in the limit to:
Furthermore, all n-step TPMs are identical: P = @ = Pen),
Vn E T.
The limiting state probabilities fi do exist, are independent of the initial probability vector, and are unique: 2, = u(O)P = (0.5,0.5). l
A unique steady-state probability vector does exist. All probabilities are strictly positive and identical to the stationary probabilities, which can be derived from the solution of Eq. (2.15): u = ti = (0.5,0.5) .
Example
2.12
Now consider the DTMC shown in Fig. 2.5 with the TPM: (2.22)
Fig. 2.5
l
Example of a discrete-time Markov chain referring to Eq. (2.22).
For this one-step TPM a unique stationary probability according to Eq. (2.15): u = (0,l).
vector does exist
46
MARKOV CHAINS
l
The n-step TPM Pen) converges in the limit to
Furthermore, all n-step TPMs are identical: P=p=P(“),
‘inET.
The limiting state probability vector ti does exist, is independent of the initial probability vector, and is identical to the unique stationary probability vector u: ti = u = (0,l) . l
A unique steady-state probability vector does not exist for this example. The elements of the unique stationary probability vector are not strictly positive.
We proceed now to identify necessary and sufficient conditions for the existence of a steady-state probability vector of a DTMC. The conditions can be given immediately in terms of properties of the DTMC. 2.1.2.1.1 C/as.sificationsof DTMC DTMCs are categorized based on the classifications of their constituent states. Definition 2.9 Any state j is said to be reachable from any other state i, where i, j E S, if it is possible to transit from state i to state j in a finite number of steps according to the given transition probability matrix. For some integer n 2 1, the following relation must hold for the n-step transition probability: p(n) ij
>
O7
3n,n > 1.
(2.23)
A DTMC is called irreducible if all states in the chain can be reached pairwise from each other, i.e., V’i, j E S, 3n, n 2 1 : p$’ > 0. A state i E S is said to be an absorbing state5 if and only if no other state of the DTMC can be reached from it, i.e., pii = 1. Note that a DTMC containing at least one absorbing state cannot be irreducible. If countably infinite state models are encountered, we have to discriminate more accurately how states are reachable from each other. The recurrence time and the probability of recurrence must also be taken into account. 5Absorbing states play an important role in the modeling of dependable systems where transient analysis is of primary interest.
MARKOV PROCESSES
47
Let f!“’ called the n-step recurrence probability, denote Definition 2.10 the conditional probabilitj 0; the first return to state i E S in exactly n 2 1 steps after leaving state i. Then, the probability fi of ever returning to state i is given by:
fi = 2 f,!“‘.
(2.24)
n=l
Any state i E S to which the DTMC will return with probability fi = 1, is called a recurrent state; otherwise, if fi < 1, i is called a transient state. Given a recurrent state i, the mean recurrence time rni of state i of a DTMC is given by: co
rni = Cf n i(n) *
(2.25)
n=l
If the mean recurrence time is finite, that is, rni < co, i is called positive recurrent or recurrent non-null; otherwise, if rni = 00, state i is said to be recurrent null. For any recurrent state i E S, let di denote the period of state i, then di is the greatest common divisor of the set of positive integers n such that pi:) > 0. A recurrent state i is called aperiodic if its period di = 1, and periodic with period di if di > 1. It has been shown by Feller [Fe11681that the states of an irreducible DTMC are all of the same type. Hence, all states are periodic, aperiodic, transient, recurrent null, or recurrent non-null. If one of the states i of an irreducible DTMC is aperiodic Definition 2.11 then so are all the other states j E S, that is, dj = 1,Vj E S, and the DTMC itself is called aperiodic; otherwise it is said to be periodic with unique period d. An irreducible, aperiodic, discrete-time Markov chain with all states i being recurrent non-null with finite mean recurrence time rni is called an ergodic Markov chain. We are now ready to summarize the main results for the classification of discrete-time Markov chains: l
The states of a finite-state, non-null.
irreducible Markov chain are all recurrent
l
Given an aperiodic DTMC, the limits fi = limn-+oo v(n) do exist.
l
For any irreducible and aperiodic DTMC, the limit ti exists and is independent of the initial probability vector Y(O).
b For an ergodicDTMC, the limit V = (GO,fi1, fiz, . . .) exists and comprises the unique steady-state probability
vector u.
MARKOV CHAINS
48
l
The steady-state probabilities vi > 0, i E S, of an ergodic Markov chain can be obtained by solving the system of linear Eq. (2.15) or, if the (finite) mean recurrence times rni are known, by exploiting the relation: 1 vi = rni ’
vi E s.
(2.26)
If finite-state DTMCs are investigated, the solution of Eq. (2.15) can be obtained by applying standard methods for the solution of linear systems. In the infinite case, either generating functions can be applied or special structure of the one-step transition probability matrix P(l) = P may be exploited to find the solution in closed form. An example of the latter technique is given in Section 3.1 where we investigate the important class of birth-death processes. The special (tridiagonal) structure of the matrix will allow us to derive closedform solutions for the state probabilities that are not restricted to any fixed matrix size so that the limiting state probabilities of infinite state DTMCs are captured by the closed-form formulae as well. 2.1.2.1.2 DTMC State Sojourn Times The state sojourn times - the time between state changes - play an important role in the characterization of DTMCs. Only homogeneous DTMCs are considered here. We have already pointed out that the transition behavior reflects the memoryless property, that is, it only depends on the current state and neither on the history that led to the state nor on the time already spent in the current state. At every instant of time, the probability of leaving current state i is independently given by (1 - pii) = Cizj pij. Applying this repeatedly leads to a description of a random experiment in form of a sequence of Bernoulli trials with probability of success (I- pii), where “success” denotes the event of leaving current state i. Hence, the sojourn time Ri during a single visit to state i is a geometrically distributed random variable6 with pmf:
P(Ri = k) = (1 - pi,)pfi-l,
vlc E IQ+.
(2.27)
We can therefore immediately conclude that the expected sojourn time E[Ri], that is, the mean number of time steps the process spends in state i per visit, is: (2.28)
Accordingly, the va,riance var[&] given by:
of the sojourn time per visit in state i is (2.29)
6Note that distribution,
the geometric distribution i.e., it is the only discrete
is the discrete-time equivalent of the exponential distribution with the memoryless propert,y.
MARKOV PROCESSES
49
2. I .2.2 Continuous-Time Markov Chains Continuous- and discrete-time Markov chains provide different yet related modeling paradigms, each of them having their own domain of applications. For the definition of CTMCs we refer back to the definition of general Markov processes in Eq. (2.2) and specialize it to the continuous parameter, discrete state space case. CTMCs are distinct from DTMCs in the sense that state transitions may occur at arbitrary instants of time and not merely at fixed, discrete time points, as is the case with DTMCs. Therefore, we use a subset of the set of non-negative real numbers lR$ to refer to the parameter set 7’ of a CTMC, as opposed to Ne for DTMCs: Definition 2.12 A given stochastic process {Xt : t E T} constitutes a CTMC if for arbitrary ti E lR$, with 0 = te < ti < . +. < t, < tn+i, Vn E N, and ‘dsi E S = Ne for the conditional pmf, the following relation holds: fy&,+l =
= P(&,+l
%+1 =
I xtn
=
sn,&,-~
%+1
I Xtn
=
=
%-1,
* * * Jt,
=
so>
(2.30)
4.
Similar to Eq. (2.4) for DTMCs, Eq. (2.30) expresses the Marlcow property of continuous-time Markov chains. If we further impose homogeneity, then because the exponential distribution is the only continuous-time distribution that provides the memoryless property, the state sojourn times of a CTMC are necessarily exponentially distributed. Again, the right-hand side of Eq. (2.30) is referred to as the transition probability 7 pij(u, V) of the CTMC to travel from state i to state j during the period of time [u, w), with u, w E T and u 5 u: Pij (U, v) = P(X,
= j 1 X, = i).
(2.31)
For u = w we define: Pij (UT U) =
1, i = j, 0, otherwise.
(2.32)
If the transition probabilities pii(u, V) depend only on the time difference t = w - u and not on the actual values of u and w, the simplified transition probabilities for time-homogeneous CTMC result: Pij(t)
= Pij(O, t) = P(Xu+t
= j I -&A = i) = P(Xt
= j 1 x0 = i),
VJUE T.
(2.33) Given the transition probabilities pij(u, w) and the probabilities ri(u) of the CTMC at time u, the unconditional state probabilities ~j (v), j E S of the
7Note that, as opposed to the discrete-time transition steps considered here.
case, there is no fixed, discrete number of
50
MARKOV CHAINS
process at time v can be derived: Vu,v E T (u 5 w).
?tv> = Ouija,
(2.34)
iES
With P(u, V) = [pij (u, v)] as the matrix of the transition probabilities, for any pair of states i, j E S and any time interval [u, v), u, w E T, from the parameter domain, and the vector X(U) = (71-e(u),7rr(u),~(u), . . .) of the state probabilities at any instant of time u, Eq. (2.34) can be given in vectormatrix form: n(v) = rr(u)P(u, w),
Vu, u E T (u 5 v).
(2.35)
Note that for all u E T, P(u,u) = I is the identity matrix. In the time-homogeneous case, Eq. (2.34) reduces to:
TAt)= ~P,wm iES
or in vector-matrix
= ~Pij(O&r~(O), iES
(2.36)
= 7r(O)P(OJ).
(2.37)
notation: n(t) = T(O)P(i)
Similar to the discrete-time case (Eq. (2.9)), the Chapman-Kolmogorov Eq. (2.38) for the transition probabilities of a CTMC can be derived from Eq. (2.30) by applying again the theorem of total probability: PdU~4
0 5 u < w < 21.
= ~Pik(W)P&v):
(2.38)
kES
But, unlike the discrete time case, Eq. (2.38) cannot be solved easily and used directly for computing the state probabilities. Rather, it has to be transformed into a system of differential equations which, in turn, leads us to the required results. For this purpose, we define the instantaneous transition rates qij(t) (i # j) of the CTMC traveling from state i to state j. These transition rates are related to conditional transition probabilities. Consider the period of time [t, t + At), where At is chosen such that Cj,-s qij(t)At + o(k) = 18. The non-negative, finite, continuous functions qij(t) can be shown to exist under rather general conditions. For all states i, j , i # j , we define: f&j(t)=
lim
At-+0
qii(t) = lim At-+0
pij (6 t + At)
At
,
i#j,
pii(t, t + At) - 1 At
’
(2.39) (2.40)
‘The notation o(At) is defined such that limAt+o q = 0; that is, we might substitute any function for o(At) that approaches zero faster than the linear function At.
MARKOV PROCESSES
51
If the limits do exist, it is clear from Eqs. (2.39) and (2.40) that, since c jEs pij(t, t + At) = 1, at any instant of time t:
Cqij(t)=o,
ViES.
(2.41)
jCS
The quantity -qii (t) can be interpreted as the total rate at which state i is exited (to any other state) at time t. Accordingly, q~ (t), (i # j), denotes the rate at which the CTMC leaves state i in order to transit to state j at time t. As an equivalent interpretation, we can regard qij(t)At + o(At) as the transition probability pij(t, t + At) of the Markov chain to transit from state i to state j in [t, t + At), where At is chosen appropriately. Having these definitions, we return to the Chapman-Kolmogorov Eq. (2.38). Substituting v + At for v in (2.38) and subtracting both sides of the original Eq. (2.38) from the result gives us: pd”~
v + At> - P&-d
= c
P&J,
w)[;or~j(w, v + At) - ~,z&J, v)].
(2.42)
kES
Dividing both sides of Eq. (2.42) by At, taking limat+e of the resulting quotient of differences, and letting w --+ v, we derive a differential equation, the well known Kolmogorov’s forward equation: d”i$;‘“’
= ~p&h,v)qQ(v),
0 5 u < v.
(2.43)
kES
In the homogeneous case, we let t = v - u and get from Eqs. (2.39) and (2.40) time-independent transition rates qij = qij(t),Vi, j E S, such that simpler versions of the Kolmogorov’s forward differential equation for homogeneous CTMCs result: (2.44) Instead of the forward Eq. (2.43), we can equivalently derive and use the Kolmogorov’s backward equation for further computations, both in the homogeneous and non-homogeneous cases, by letting w -+ u in Eq. (2.42) and taking limat+o to get: 8Pij (u, v) dU
=
~pkj(u,+hk(U), kES
0 5 u < v.
(2.45)
Differentiating Eq. (2.34) on both sides gives Eq. (2 $47)) using the Kolmogorov’s (forward) Eq. (2.43) yields Eq. (2.48), and then, again, applying Eq. (2.34) to Eq. (2.49), we derive the differential equation for the uncondi-
52
MARKOV CHAINS
tional state probabilities ~j(v), ‘dj E S, at time v in Eq. (2.50): d Ci@
d~J+J) -----z
Pij(U, V)%(?-q
dv
(2.46)
dV
II
c
8PG (% 4 ~ @) i
ZZ iES c(c =
c
Pi&
c
+7kj(V)
qkj(u>
qkj (+k
n&)
(2.48)
)
kES
kES =
(2.47)
dV
iES
&ik(U, iES
v)%(u)
(2.49) (2.50)
(v).
kES
In the time-homogeneous case, a simpler version of Eq. (2.50) results by assuming t = v - ‘u and using time-independent transition rates qzj. So we get the system of differential Eqs. (2.51): dTi (4 &t = CQi.jni(t),
Vj E S,
(2.51)
iES
which is repeatedly used throughout this text. Usually, we prefer vectormatrix form rather than the notation used in Eq. (2.51). Therefore, for the homogeneous case, we define the infinitesimal generator matrix Q of the transition probability matrix P(t) = [pij(O, t)] = bij(t)] by referring to Eqs. (2.39) and (2.40). The matrix Q:
Q = [qiJ,
K j E S,
(2.52)
contains the transition rates qij from any state i to any other state j, where i # j, of a given continuous-time Markov chain. The elements qii on the main diagonal of Q are defined by qii = - Cj jfi qij. With the definition in Eq. (2.52), Eq. (2.51) can be given in vector-matrix form as:
+)
da = - &
= rr(t)Q.
(2.53)
For the sake of completeness, we include also the matrix form of the Kolmogorov differential equations in the time-homogeneous case. The Kolmogorev’s forward Eq. (2.44) can be written as: = P(t)&. The Kolmogorov’s matrix form as:
(2.54)
backward equation in the homogeneous case results in
Ii)(t) = T
= QP(t).
(2.55)
MARKOV PROCESSES
53
As in the discrete-time case, often the steady-state probability vector of a CTMC is of primary interest. The required properties of the steady-state probability vector, which is also called the equilibrium probability vector, are equivalent to the discrete time case. For all states i E S, the steady-state probabilities ri are: 1. Independent of time t 2. Independent of the initial state probability
vector ?r(O)
3. Strictly positive, 7ri > 0 4. Given as the time limits, ri = limt,, ri(t) = limt+W pji(t), of the state probabilities 7ri( t) and of the transition probabilities pji (t), respectively If existing for a given CTMC, the steady-state probabilities are independent of time, we immediately get: lim 3.2 t+oo dt
= 0
’
(2.56)
Under Condition (2.56), the differential Eq. (2.51) for determining the unconditional state probabilities resolves to a much simpler system of linear equations: Vj E S.
0 = Cqijri,
(2.57)
iES
In vector-matrix
form, we get accordingly: O=nQ.
(2.58)
In analogy to the discrete-time case, we call a CTMC for Definition 2.13 which a unique steady-state probability vector exists, an ergodic CTMC. The strictly positive steady-state probabilities can be gained by the unique solution of Eq. (2.58), when an additional normalization condition is imposedg. To express it in vector form, we introduce the unit vector 1 = [l, 1,. . . , llT so that the following relation holds: 7rl=
c
7Ti= 1.
(2.59)
iES
Another possibility for determining the steady-state probabilities ri for all states i E S of a CTMC, is to take advantage of a well-known relation ‘Note that besides the trivial solution 7r2 = 0, Vi E S, any vector, obtained by multiplying a solution of Eq. (2.58) by an arbitrary real-valued constant, would also yield a solution of Eq. (2.58).
MARKOV CHAINS
between the ;TT~ and the mean recurrence time lo Ik& < 00, that is, the mean time elapsed between two successive visits of the CTMC to state i: 1 ri = - &Jqii ’
vi E s.
(2.60)
In the time-homogeneous case, we can derive from Eq. (2.41) that for any j E S qjj = - &fj qji, and from Eq. (2.57) we get xi,ifj qijri = -qjj’rrj. Putting these together immediately yields the system of global balance equations:
c
qij7ri = 77-j c
i,i#j
qji,
VjES.
(2.61)
i,i#j
On the left-hand side of Eq. (2.61), the total flow from any other state i E S into state j is captured. On the right-hand side, the total flow out of state j into any other state i is summarized. The flows are balanced in steady state, i.e., they are in equilibrium. The conditions under which a CTMC is called ergodic are similar to those for a DTMC. Therefore, we can briefly summarize the criteria for classifying CTMCs and for characterizing their states. 2.1.2.2.1
Classifications
of CTMC
Definition 2.14 As for DTMCs, we call a CTMC irreducible if every state i is reachable from every other state j, where i, j E S; that is, Vi, j, i # j, 3 : pji(t) > 0. In oth er words, no proper subset 3 c S, 3 # S, of state space S exists, such that CjE3 ciEs,g qji = 0. Definition 2.15 An irreducible, homogeneous CTMC is called ergodic if and only if the unique steady-state probability vector 7r exists. As opposed to DTMCs, CTMCs cannot be periodic. Therefore, it can be shown that for an irreducible, homogeneous CTMC: l
The limits iii = limt+oo ni(t) = Zimt,,pji(t) exist Vi, j E S and are independent of the initial probability vector r(0). l1
l
The steady-state probability vector X, if existing, can be uniquely determined by the solution of the linear system of Eq. (2.58) constrained by normalization Condition (2.59).
l
A unique steady-state, or equilibrium, probability irreducible, homogeneous CTMC is finite.
l
The mean recurrence times AL&are finite for all states i E S, A4i < co, if the steady-state probability vector exists.
vector 7r exists, if the
loIn contrast to DTMCs, where lowercase notation is used to refer to recurrence time, uppercase notation is used for CTMCs. llThe limits do not necessarily constitute a steady-state probability vector.
MARKOV PROCESSES
55
2.1.2.2.2 CT/UC State So;ourn Times We have already mentioned that the distribution of the state sojourn times of a homogeneous CTMC must have the memoryless property. Since the exponential distribution is the only continuous distribution with this property, the random variables denoting the sojourn times, or holding times, must be exponentially distributed. Note that the same is true for the random variable referred to as the residual state holding time, that is, the time remaining until the next state change occurs. l2 Furthermore, the means of the two random variables are equal to l/(-qi;). Let the random variable IQ denote either the sojourn time or the residual time in state i, then the CDF is given by:
FRi(r) = 1 - eqzir,
(2.62)
r 2 0.
The mean value of I&, the mean sojourn time or the mean residual time, is given by:
q&l = --$
(2.63)
where qii is defined in Eq. (2.40). 2.1.2.3 Recapitulation We have introduced Markov chains and indicated their modeling power. The most important feature of homogeneous Markov chains is their unique memoryless property that makes them remarkably attractive. Both continuous- and discrete-time Markov chains have been defined and their properties discussed. The most important algorithms for computation of their state probabilities are discussed in following chapters. Different types of algorithms are related to different categories of Markov chains such as ergodic, absorbing, finite, or infinite chains. Furthermore, the algorithms can be divided into those applicable for computing the steady-state probabilities and those applicable for computing the time-dependent state probabilities. Others provide approximate solutions, often based on an implicit transformation of the state space. Typically, these methods fall into the categories of aggregation/disaggregation techniques. Note that this modeling approximation has to be discriminated from the mathematical properties of the core algorithms, which, in turn, can be numerically exact or approximate, independent of their relation to the underlying model. Typical examples include round-off errors in direct methods such as Gaussian elimination and convergence errors in iterative methods for the solution of linear systems.
12The
residual
time
is often
referred
to as the forward
recurrence
time.
56
MARKOV CHAINS
2.2
THE
MODELING
PROCESS
So far, we have limited our discussion to the mathematical foundations and properties of DTMCs and CTMCs. Since we follow an application-oriented approach, it is worthwhile to provide explicit guidelines on how to use the theoretical concepts in a practical manner. DTMCs and CTMCs are extensively used for performance and dependability analysis of computer systems, and the same fundamental techniques can be employed in different contexts. But the context strongly determines the kind of information that is meaningful in a concrete setting. Note that the context is not simply given by a plain technical system or a given configuration that is to be modeled. Additional requirements of desired qualities are explicitly or implicitly specified that need to be taken into account. As an illustrative example, consider the outage problem of computer systems. For a given configuration, there is no ideal way to represent it without considering the application context and defining respective goals of analyses, which naturally ought to be reflected in the model structure. In a real-time context, such as flight control, even the shortest outage might have catastrophic implications for the system being controlled. Therefore, an appropriate model must be very sensitive to such a (hopefully) rare event of short duration. In contrast, the total number of jobs processed or the work accomplished during flight time is probably a less important performance measure for such a safety-critical system. If we look at a transaction processing system, however, short outages are less significant for the success but throughput is of predominant importance. So it is not useful to represent effects of short interruptions, which are of less importance in these circumstances. There are cases where the right choice of measures is less obvious. Therefore, guidelines are equally useful for the practitioner and the beginner. Later in this chapter, a framework based on A4urlcov reward models (MRMs) is presented providing recipes for a selection of the right model type and the definition of an appropriate performance measure. But before going into details of model specification, the modeling process is sketched from a global view point, thereby identifying the major phases in the modeling life-cycle and the interdependencies between these phases. 2.2.1
Modeling
Life-cycle
Phases
There can be many reasons for modeling a given system. Two of the important ones are: 1. Existing systems are modeled for a better understanding, for analyses of deficiencies such as identification of potential bottlenecks, or for upgrading studies. 2. Models are used during the design of future systems in order to check
whet her requirements are met.
THE MODELING PROCESS
57
Different levels of detail are commonly entailed in the two cases. Since for projection and design support fewer details are known in advance, the models tend to be more abstract so that analytical/numerical techniques are more appropriate in these instances.
Control
Fig. 2.6
A simplified view of the modeling process.
In general, the modeling process can be structured into phases as depicted in Fig. 2.6. First, there are certain requirements to be met by the application; these have an impact on what type of model is used. At this point we use the term “requirements” in a very broad sense and use it to refer to everything related to the evaluation of accomplishment levels, be it task or system oriented. As an example, consider the earlier discussion on the significance of outage phases for the expressiveness of the model, which cannot be
MARKOV CHAINS
58
determined a priori without reference to an application context. More details related to this issue are discussed in the following sections. A high-level description of the model is the first step to be accomplished. Either information about a real computer system is used to build the model, or experiences gained in earlier modeling studies are implicitly used. Of course, this process is rather complicated and needs both modeling and system application-specific expertise. Conceptual validation of the correctness of the high-level model is accomplished in an iterative process of step-wise refinement, which is not represented in full detail in Fig. 2.6. Conceptual validation can be further refined [NaFi67] into “face validation,” where the involved experts try to reach consensus on the appropriateness of the model on the basis of dialogs, and into the “validation of the model assumptions,” where implicit or explicit assumptions are cross-checked. Some of the crucial properties of the model are checked by answering the following questions [HMTSl]: l
Is the model logically correct, complete, or overly detailed?
l
Are the distributional assumptions justified? How sensitive are the results to simplifications in the distributional assumptions?
l
Are other stochastic properties, such as independence assumptions,valid?
l
Is the model represented on the appropriate level? Are those features included that are most significant for the application context?
Note that a valid model is not necessarily complete in every respect, but it is one that includes the most significant effects in relation to the system requirements. It leaves higher-order effects out or simplifies them drastically. In order to produce a useful model, leaving out may be of the same importance as the validity of underlying stochastic assumptions. Once confidence is gained in the validity of the high-level model description, the next step is to generate a computational model. Sometimes the available solution techniques are inadequate for solving the high-level model. Some of these difficulties arise from: l
Stochastic dependencies and correlated processes.
l
Non-exponential distributions.
l
l
Effects of stiffness, where classes of events are described in a single model that occur at extremely different rates. System requirements that are too sophisticated leading to an overly complicated model and system measures that cannot be computed easily, such as dist,ribution functions capturing response times or cumulative work.
Note that even discrete-event simulation (DES) techniques might suffer from difficulties such as largeness or stiffness. For instance, it is a non-trivial
THE MODELING PROCESS
59
problem to simulate with high confidence scenarios entailing relatively rare events while others are occurring much more often. Large or complex problems, in contrast, require the input of excessively numerous parameter sets so that correspondingly numerous simulation runs need to be executed. Many techniques have been suggested to overcome some of the problems. Among the most important ones are: l
Hybrid approaches that allow the combination of different solution techniques to take advantage of and to combine their strengths. Examples are mixed simulation and analytical/numerical approaches, or the combination of fault trees, reliability block diagrams, or reliability graphs, and Markov models [STP96]. Also product-form queueing networks and stochastic petri nets or non-product-form networks can be combined. More generally, this approach can be characterized as intermingling of state-space based and non-state-space based methods [STP96].
l
Phase-type expansions are used to replace and approximate non-exponential distributions. As a trade-off, model size and computational complexity need to be taken into account.
l
Sometimes aggregation/disaggregation techniques can be used to eliminate undesirable model properties. An approach to eliminate stiffness for transient analysis is introduced and discussed in detail in Chapter 5.
l
Transformations can be applied leading from one domain of representation to another. Space, for example, is sometimes traded with time to replace one cumulative measure by another that is easier to calculate.
l
DES of rare events is achieved by artificially speeding up these events in a controlled manner and taking some correction measures afterwards. This technique is called importance sampling [NNHGSO].
Finally, the computational model may be prohibitively large due to the complexity of the problem at hand so as to preclude a direct use of the solution techniques. A low-level description is often out of the question due to the sheer size of the model and the error-prone process of creating it by hand. What is needed are techniques allowing for a compact or high-level representation of the lower-level model to be analyzed. Usually the higher-level representation languages are referred to as specification techniques and come with paradigms providing application specific structures that can be of help to practitioners. Two primary methods to deal with large models: l
Many high-level specification techniques, queueing systems, generalized stochastic Petri nets (GSPNs), and stochastic reward nets (SRNs), being the most prominent representatives, have been suggested in the literature to automate the model generation [HaTr93]. While GSPNs/SRNs are covered in more detail in Section 2.2.3, our major theme is queueing systems. Both approaches can be characterized as tolerating largeness
60
MA RKOV CHAINS
of the underlying computational for generating them. l
models and providing effective means
Another way to deal with large models is to avoid the creation of such models from the beginning. The major largeness-avoidance technique we discuss in this book is that of product-form queueing networks. The main idea is that t,he structure of the underlying CTMC allows an efficient solution that obviates the need of the generation, storage, and solution of the large state space. The second method of avoiding largeness is to separate the originally single large problem into several smaller problems and to combine submodel results into an overall solution. Both approximate and exact techniques are known for dealing with such multilevel models. The flow of information needed among submodels may be acyclic, in which case we have a hierarchical model [STP96]. If the flow of needed information is not acyclic, a fixed-point iteration may be necessary [CiTr93]. Other well-known techniques applicable for limiting model sizes are state truncations [BVDT88, GCSS86] and state lumping [NicoSO].
Ideally, the computational model is automatically generated from a highlevel description based on a formal and well-understood approach such as GSPNs or SRNs. In this case, verification is implicitly provided through the proven correctness and completeness of the techniques and tools employed. In practice, heuristics are often applied and, even worse, model generation is carried out manually. In this case, verification of the correctness of the computational model with respect to a given high-level description is of importance. But because approximations are often used, correctness is not given by a one-to-one relation between computational model and high-level description. Thus errors incurred need to be explicitly specified or bounded. Note that the distinction between a high-level description and a computational model may sometimes be conceptual when the two representations coincide. Both the issues of model generation and analytical/numerical solution methods are the main topics of this book. Once computational results have been produced, they need to be validated against data collected through measurements, if available, and against system requirements. The validation results are used for modification of the existing (high-level) model or for a proof of its validity. Of course, many iterations may be required for this important task. Note that a change in the model might ultimately result in a change in the system design. There is a strong relation between measurements and modeling. Measurement data are to be used for model validation. Furthermore, model parumeterixution relies heavily on input from measurement results. In case of a system design where the computer system does not yet exist, input usually comes from measurements of earlier studies on similar systems. Conversely, measurement studies are better planned and executed if guided by a model and by requirements placed on the computer system to be measured. In prac-
THE MODELING PROCESS
61
tice, measurements and models are not always applied on the same system. Of course, measurements cannot be used directly for the design of computer systems. Consequently, models-based solutions ought to be applied. Measurements, on the other hand, can provide the most accurate information on already existing systems. Measurement studies are often resource demanding, both with respect to hardware and software, and may require a great deal of work. The best approach, as always, is to combine the strengths of both techniques as much as possible. In the following, we examine the issue of model formulation and the formal definition of measures based on given system requirements. While the general discussion of this section is not limited to a specific model type, in what follows we narrow our view and concentrate on CMTCs. 2.2.2
Performance
Measures
We begin by introducing a simple example and then provide an introduction to Markov reward models as a means to obtain performance measures.
Meaning
Parameter l/Y 116 lb l/P C
mean time to failure mean time to repair mean time to reboot mean time to recover coverage probability
Fig. 2.7
State i E {0,1,2} RC RB
A simple model of a multiprocessor
Meaning i working processors recovery reboot
system.
2.2.2.1 A Simple Example As an example adapted from Heimann, Mittal and Trivedi [HMTSI] , consider a multiprocessor system with n processor elements processing a given workload. Each processor is subject to failures with a mean time to failure (MTTF), l/y. In case of a failure, recovery can successfully be performed with probability c. Typically, recovery takes a brief period of time with mean l/p. Sometimes, however, the system does not successfully recover from a processor failure and suffers from a more severe impact. In this case, we assume the system needs to be rebooted with longer average duration of l/a. Probability c is called the coverage factor and is usually close
62
MARKOV CHAINS
to 1. Unsuccessful recovery is most commonly caused by error propagation when the effect of a failure is not sufficiently shielded from the rest of the system. Failed processors need to be repaired, with the mean time to repair (MTTR) of l/S. Only one processor can be repaired at a time and repair of one processor does not affect the proper working of the remaining processors. If no processor is running correctly, the whole system is out of service until first repair is completed. Neither reboot nor recovery is performed when the last processor fails. If all times are assumed to be independent, exponentially distributed random variables, then a CTMC can be used to model the scenario. In Fig. 2.7 an example is given for the case of n = 2 processors and state space S = (2, RC, RB, l,O}. Since CTMC in Fig. 2.7 is ergodic, the unique steady-state probability vector z = (~2, ARC, ~TTRB, ~1, ~0) is the solution of the Eqs. (2.58) and (2.59). From Fig. 2.7 the infinitesimal generator matrix can be derived:
Q=
-27 0 o s 0 i
2cy -P o 0 0
2(1-
c)y
0 --a 0 0
Clearly, the model in Fig. 2.7 is already an abstract representation of the system. With Eqs. (2.53) and (2.58), transient and steady-state probability vectors could be computed, but it is not yet clear what kind of measures should be calculated because we have said nothing about the application context and the corresponding system requirements. We take this model as a high-level description, which may need further elaboration so that a computational model can be generated from it. We consider four classes of system requirements for the example. 1. System availability is the probability of an adequate level of service, or, in other words, the long-term fraction of time of actually delivered service. Usually, short outages can be accepted, but interruptions of longer duration or accumulated outages exceeding a certain threshold may not be tolerable. Accordingly, the model in Fig. 2.7 must be evaluated with respect to requirements from the application context. First, tolerance thresholds must be specified as to what extent total outages can be accepted. Second, the states in the model must be partitioned into two sets: one set comprising the states where the system is considered “up,” i.e., the service being actually delivered, and the complementary set comprising the states where the system is classified as “down.” In our example, natural candidates for down states are in the set: (0, RC, RI?}. But not all of them are necessarily classified as down states. Since reconfiguration generally takes a very short time, applications may well not be susceptible to such short interruptions. As a consequence, the less significant state RC could even be eliminated in the generation of the
THE MODELING PROCESS
63
computational model. Finally, the measures to obtain from the computational model must be decided. For example, the transient probability of the system being in an up state at a certain time t conditioned on
measures. 2. System reliability is the probability of uninterrupted service exceeding a certain length of time. By definition, no interruption of service at all can be tolerated. But note that it still needs to be exactly specified as to what kind of event is to be considered as an interruption. In the most restrictive application context, reconfiguration (RC) might not be acceptable. In contrast, nothing prevents the assumption to be made that even a reboot (RB) can be tolerated and only the failure of the last component leads to the single down state (0). As a third alternative, reconfiguration may be tolerated but not reboot and not the failure of the last component. The three possible scenarios are captured in Fig. 2.8. The model structures have also been adapted by introducing absorbing down states that reflect the fact that down states are considered as representing catastrophic events.
0‘2 A
0,lVariant
Variant
1
2
Variant 3 Fig. 2.8
Model variants with absorbing states capturing reliability
requirements.
3. System performance takes the capacity of different configurations into account. Typical measures to be calculated are the utilization of the resources or system throughput. Other measures of interest relate to
64
MA RKOV CHAINS
the frequency with which certain incidents occur. With respect to the model in Fig. 2.7, the individual states need to be characterized by their contribution to a successful task completion. The higher the degree of parallelism, the higher the expected accomplishment will be. But it is a non-trivial task to find the right performance indices attributable to each particular state in the computational model. The easiest way would be to assign a capacity proportional to the number of working processors. But because each such state represents a whole configuration, where each system resource and the imposed workload can have an impact on the overall performance, more accurate characterization is needed. One way would be to execute separate modeling studies for every possible configuration and to derive some more detailed measures such as statedependent effective throughputs or response time percentiles. Another way would be to expand the states and to replace some of them with a more detailed representation of the actual configuration and the workload. This approach will lead to a model of enormous size.
4. Tusk completion is reflected in the probability that a user will receive service at the required quality, or in other words, in the proportion of users being satisfied by the received service and its provided quality. Many different kinds of measures could be defined in this category. It could be the proportion of tasks being correctly processed or the probability that computation-time thresholds are not exceeded. With advanced applications such as continuous media (e.g., audio and video streams), measures are investigated relating the degree of user’s satisfaction to the quality of the delivered service. Usually, such applicationoriented measures are composed of different constituents such as timeliness, delay-variation, loss, and throughput measures. In this subsection, we have indicated how system requirements affect model formulation. Furthermore, different types of performance measures were motivated and their relation to the model representation outlined. Next, we will show how the performance measures derived from system requirements can be explicitly specified and integrated into the representation of the comput at ional model. 2.2.2.2 Markov Reward Models MRMs provide a unifying framework for an integrated specification of model structure and system requirements. We consider the explicit specification of system requirements as an essential part of the computational model. Once the model structure has been defined so that the infinitesimal generator matrix is known, the basic equations can be written depending on the given system requirements and the structure of the matrix. For the sake of completeness, we repeat here the fundamental Eqs. (2.53),
THE MODELING PROCESS
65
(2.58) and (2.59) for the analysis of CTMCs:
(t> = WQ, -d7r dt
@)
O=rQ,
= no,
7rl= 1,
where n(t) is the transient state probability vector of the CTMC and 7r is the steady-state probability vector (assuming it exists). In addition to the transient state probabilities, sometimes cumulative probabilities are of interest. Let: L(t) =
s
(2.64)
+)dw
0
then Li(t) denotes the expected total time the CTMC spends in state i during the interval [0, t). A more convenient way to calculate the cumulative state probabilities is by solution of differential Eq. (2.65) [RST89]:
$
=
L(t)Q +r(o),
L(O) = 0.
(2.65)
Closely related to the vector of cumulative state probabilities is the vector describing the time-average behavior of the CTMC [SmTrSO]: M(t) = ;L(t).
(2.66)
With I denoting the identity matrix, M(t) can be seen to satisfy the differential Eq. (2.67):
M(0) = 0.
(2.67)
With these definitions, most of the interesting measures can be defined. But the special case of models containing absorbing states, like those in Fig. 2.8, deserves additional attention. Here, it would be interesting to compute measures based on the time a CTMC spends in non-absorbing states before an absorbing state is ultimately reached. Let the state space S = A U N be partitioned into the set A of absorbing and the set N of non-absorbing (or transient) states. Then, the time spent before absorption can be calculated by taking the limit limt+oo LN(t) restricted to the states of the set N. Note that, in general, unless very special cases for the initial probability vector are considered, the limit does not exist for the states in A. Therefore, to calculate LN(oo), the initially given infinitesimal generator matrix Q is restricted to those states in N, so that matrix QN of size IN] x IN] results. Note that QN is not an infinitesimal generator matrix. Restricting also the initial probability vector ~(0) to the non-absorbing states N results in TN(O) and allows the
66
MARKOV CHAINS
computation of lim t+oo on both side of differential linear Eq. (2.68) results [STP96]:
LN(~)QN
Eq. (2.65) so that following
= -n-v(O).
(2.68)
To give an example, consider the variant three in Fig. 2.8. Here the state space S is partitioned into the set of absorbing states A = {RB, 0) and nonabsorbing states N = (2, RC, 1) so that Q reduces to:
With
LN(oo),
the mean time to absorption
MTTA
= c
(MTTA)
L&o).
can then be written
as:
(2.69)
iEN
MRMs have long been used in Markov decision theory to assign cost and reward structures to states of Markov processes for an optimization [Howa71]. Meyer [Meye80] adopted MRMs to provide a framework for an integrated approach to performance and dependability characteristics. He coined the term performability to refer to measures characterizing the ability of faulttolerant systems, that is, systems that are subject to component failures and that can perform certain tasks in the presence of failures. With MRMs, rewards can be assigned to states or to transitions between states of a CTMC. In the former case, these rewards are referred to as reward rates and in the latt’er as impulse rewards. In this text we consider state-based rewards only. The reward rates are defined based on the system requirements, be it availability, reliability, or task oriented. Let the reward rate ri be assigned to state i E S. Then, a reward ri7-i is accrued during a sojourn of time 7; in state i. Let {X(t),t 2 0) d enote a homogeneous finite-state CTMC with state space S. Then, the random variable:
w> = TX(t)
(2.70)
refers to the instantaneous reward rate of the MRM at time t. Note the difference between reward rates ri assigned to individual states i and the overall reward rate Z(t) of the MRM characterizing the stochastic process as a whole. With the instantaneous reward rate of the CTMC defined as in Eq. (2.70), the accumulated reward Y(t) in the finite time horizon [0, t) is given by:
t
t
Y(t) = 1 Z(r)&- = S,\.(,i'k. 0
0
(2.71)
THE MODELING PROCESS
67
State A Markov Reward Model 2 --
1 --
0'. t1
t2 t3
t4
Reward Rate t
0 Q =
-,“r”
-(A
0
Xl) P2
x1 --CL2
Total Reward A .
Y4 --
. . . l
y2 =y3--
.a’
yw
Yl = rot1
= 3t1
Yz=Yl+rl(t2-tl)=yl+t2-tl r
Yl --
Y3 = y2 +r2(t3
-
t2)
= y2
Y4=Y3+7$t4-t3)=y3+t4-t3
,; l
I I
I I
I I
I I
t1
t2
t3
t4
Fig. 2.9 A three-state and Y(t) processes.
,t
Markov reward model with sample paths of the X(t),
Z(t),
68
MARKOV CHAINS
For example, consider the sample paths of X(t) , Z(t) , and Y(t) processes in Fig. 2.9 adapted from [SmTrSO]. A simple three-state MRM is presented, consisting of a CTMC with infinitesimal generator matrix Q, and the reward rate vector r = (3,1,0) assigning reward rates to the states 0, 1, and 2, respectively. Assuming an initial probability vector ~(0) = (l,O, 0), the process is initiated in state 0, that is, X(0) = 0. Since the sojourn time of the first visit to state 0 is given by tl, the reward y1 = 3tl is accumulated during this period. After transition to state 1, additional reward y2 - y1 = l(t2 - tl) is earned, and so forth. While process X(t) is assumed to follow the state transition pattern as shown in Fig. 2.9, the processes Z(t) and Y(t) necessarily show the behavior as indicated, because they are depending on X(t) through the given reward vector r. While X(t) and Z(t) are discrete valued and nonmonotonic functions, Y(t), in contrast, is a continuous-valued, monotonically non-decreasing function. Based on the definitions of X(t), Z(t), and Y(t) , which are non-independent random variables, various measures can be defined. The most general measure is referred to as the performability [Meye80]: (2.72) where q(y, t) is the distribution of the accumulated reward over a finite time [0, t). Unfortunately, the performability is difficult to compute for unrestricted models and reward structures. Smaller models can be analyzed via double Laplace transform [SmTr90], while references to other algorithms are summarized in Table 2.17. The same mathematical difficulties arise if the distribution Q(y, t) of the time-average accumulated reward is to be computed:
(2.73) The problem is considerably simplified if the system requirements are limited to expectations and other moments of random variables rather t>han distribution functions of cumulative rewards. Alternatively, efficient solution algorithms are available if the rewards can be limited to a binary structure or if the model structure is acyclic. The expected instantaneous reward rate can be computed from the solution of differential Eq. (2.53): E[Z(t)]
= Cri~i(t).
(2.74)
iES
If the underlying model is ergodic, the solution of linear Eqs. (2.58) and (2.59) can be used to calculate the expected reward rate in the limit as t -+ 00:
(2.75)
THE MODELING PROCESS
69
To compute the first moment of the performability, or the expected accumulated reward, advantage can be taken of the solution of differential Eq. (2.65): E[Y(t)]
(2.76)
= Cr&(t). iES
For models with absorbing states, the limit as t --+ 00 of the expected accumulated reward exists and is called the expected accumulated reward until absorption. It follows from the solution of linear Eq. (2.68) that: (2.77)
E[Y(oo)] = Cril,(m). iEN
Also, higher moments of the random variables can be derived. As an example, the variance of the instantaneous reward rate Z(t) is obtained from: 2
var[Z(t)]
= C rf7ri(t) -
(
C ri7ri(t) iES
iES
)
.
(2.78)
Probabilities of 2(-L) and 2 can also be easily written [TMWH92]:
P[z(t)= x]= c 7ri(t),
(2.79)
iES,ri=x
P[Z=x]= c 7Ti.
(2.80)
iES,ri=X
Two different distribution functions can also be easily derived. With the transient state probabilities gained from the solution of Eq. (2.53), the probability that the instantaneous reward rate Z(t) does not exceed a certain level x is given as: P[Z(t)
5
X] =
C
ri(t).
(2.81)
The distribution P[Y(oo) 5 y] of the accumulated reward until absorption can be derived by applying a substitution according to Beaudry [Beau781 and Ciardo et al. [CMSTSO]. Th e essential idea is to generate a (computational) model with a binary reward structure and infinitesimal generator matrix Q from the given (high-level) CTMC with a general reward structure and infinitesimal generator matrix Q. This transformation is achieved by applying a normalization in the time domain according to the reward rates so that the distribution of the time until absorption of the new CTMC can be used as a substitute measure: (2.82) icN
MARKOV CHAINS
Returning to the example model in Fig. 2.7, we apply the MRM framework with respect t,o different system requirements to demonstrate how the model can be instantiated in several ways to yield meaningful computational models. The following case study [HMTSI] serves as a guideline as to how to describe models in light of different system requirements, such as availability and reliability or task-oriented and system performance. 2.2.2.3 A Case Study example in Fig. 2.7.
We now illustrate the use of MRMs by means of the
2.2.2.3.1 System Availability Availability measures are based on a binary reward structure. Assuming for the model in Fig. 2.7 that one processor is sufficient for the system to be “up” (set of states U = (2,l)) and that otherwise it is considered as being down (set of states D = {RC, RI?, 0}, where S = U U D), a reward rate 1 is attached to the states in tJ and a reward rate 0 to those in D. The resulting reward function r is summarized in Table 2.1. The instantaneous availability is given by:
A(t) = Jwwl = &R(t) iES
= -&i(t) = 7r@)+ 7rz(t), iEU
Unavailability can be calculated with a reverse reward assignment to that for availability. The instantaneous unavailability, therefore, is given by:
UA(t)= E[Z(t)]= xriri(t) = C ri(t) = ran + rRB(t)+ TO(~). iES
iED
Steady-state availability, on the other hand, is given by: A=E[Z]=Cri~i=~~i=~2+-irl. iES
iEU
The interval availability provides a time average value: A(t) = iE[Y(b)]
= Jj / Ads 0
= i E/*i(s)dr iEU 0
= i[Lz(t)
+ Ll(t)].
Note that a different classification of the model states would lead to other values of the availability measures. Instantaneous, interval, and steady-state availabilities are the fundamental measures to be applied in this context, but there are related measures that do not rely on the binary reward structure.
Other measuresrelated to availability can be considered. To compute the mean transient uptime T(l) (t) and the mean steady-state uptime Tc2)(t) in a
THE MODELING PROCESS
Tab/e 2.2 Reward for mean uptimes
Table 2.1 Reward assignment for computation of availability State i
assignment
Reward Rate r,
State i
Reward Rate rz
71
2
1
2
t
RC RB
0 0
RC RB
0 0
1 0
1 0
1 0
0
t
given time horizon [0, t), reward rates 1 in Table 2.1 are replaced by reward rates t in Table 2.2:
W(t) = +[Y(t)]
= it c
L&)
iEV
= c
L&)
= tA(t),
iEV
2d2)(t) = E[Z] = tA. Very important measures relate to the frequency of certain events of interest. Any event represented in the model can be captured here. For example, we could be interested in the average number of repair calls made in a given period of time [0, t). With assumed repair rate 6 and the fact that repair calls are made from states 0 and 1 of the model of Fig. 2.7, the transient average number of repair calls N(l)(t) and steady-state average number of repair calls M2) (t) can again be determined, with the reward function as defined in Table 2.3:
Nyt) = +[Y(t)] lm(t)
Tab/e 2.3 Reward assignment for number of repair calls in [0, t) State i
Reward Rate ri
= 6(L)(t) + L&)),
= E[Z] = tS(;lr() + 7rl).
Tab/e 2.4 Reward assignment for computation of mean number of outages exceeding certain state-dependent thresholds in finite time [O,t) with the CTMC from Fig 2.7
2
0
RC RB
0 0
State i
1 0
t6 t6
RC RB
Reward Rate r,
2
1 0
0 t&+RC tae-OrRB
72
MARKOV CHAINS
Sometimes, local tolerance levels Ti, i E S are considered such that certain events, such as outages, can be tolerated as long as they do not last longer than a specified threshold ri. In our context it would be natural to assume a tolerance with respect to duration of reconfiguration and possibly also with respect to reboot. Based on these assumptions, the reward function as shown in Table 2.4 is defined to compute the mean number of severe interruptions I(l)(t), 1c2)(t) in finite time [0, t), i.e., interruptions lasting longer than certain thresholds for the transient and the steady-state based cases, respectively: I(l)(t)
= iE[Y(t)]
Ic2) (t) = E[Z]
= PemaTRCLRC(t)
+ CliemaTRBLRB(t) + 6eehToLo(t),
= tpemPTRC7rRC + t&e-“rRB
rTTRB+ t&e-b’TOTo.
The reward rates in Table 2.4 are specified based on the observation that event duration times are exponentially distributed in CTMCs, such as the one depicted in Fig. 2.7. Therefore, if the parameter were given by X, the probability of event duration lasting longer than Ti units of time would be e--x7X. This probability is used to weight the average number of events 0. But note that the reward rates are by no means limited to exponential distributions, any known distributions could be applied here. It is useful to specify events related to timeouts or events that have a (deterministic) deadline attached to them. Again, a binary reward function T is defined that assigns reward rates 1 to up states and reward rates 0 to (absorbing) down states, as given in Table 2.5. Recalling that reliability is the likelihood that an unwanted event has not yet occured since the beginning of the system operation, reliability can be expressed as: 2.2.2.3.2
System Reliability
R(t) = P[T > t] = qqt>
= I] = 1 - P[z(t)
< o] = qqt)],
where the random variable T characterizes the time to the next occurrence of such an unwanted (failure) event. Referring to the scenarios depicted in Fig. 2.8, three different reliability functions can be identified: h(t)
= 7r2(t),
Rz(t)
= r2(t)
+ rRC(t)
+ rRB(t)
R3(t)
= x2(t)
+ rRC(t)
+ rl(t).
+ nl(t),
Remember that Ri (t) , R2 (t), and R3 (t) are computed on the basis of three different computational models. The function to be used depends on the application requirements. With a known reliability function R(t), the mean time to the occurrence of an unwanted (failure) event is given by: E[T] = MTTF = MTTA =
s
0
R(t)dt.
THE MODELING PROCESS
73
MTTF and MTTA are acronyms for mean time to failure and mean time to absorption, respectively. With the formula of Eq. (2.69), the MTTF (MTTA) can be efficiently computed. Table 2.5 Reward assignment for computation of reliability with variant-2 CTMC from Fig 2.8
Table 2.6 Reward assignment for predicting the number of catastrophic incidents with variant-2 CTMC from Fig 2.8
State i
Reward Rate pi
State i
2
1 1
RC RB 1 0
With the reliability complement: cm(t)
= 1-
2
0
1
RC RB
0 0
1 0
1 0
t-l 0
R(t) = E[Z(t)]
qqt)]
Reward Rate r,
= 1 -
given, the unreliability
23(t)
= 1 - P[T
> t] = P[T
follows as the
5
t].
It is an important observation that the unreliability is given ZLSa distribution function of a time variable. Note that we have gained this result simply by computing E[Z(t)]. The trick was to modify the computational model in the right way such that absorbing states were introduced and the appropriate reward structure was chosen, as the one depicted in Table 2.5. Rather general time distribution functions can be effectively computed in this way. This method allows the calculation of percentiles and other powerful performance measures like response-time distribution functions, which are not necessarily of exponential type [MMKT94]. It is also worth mentioning that the unreliability could be calculated based on a reward assignment complementing the one in Table 2.5. In this case, we would get UR(t) = E[Z(t)] = To(t). Related to reliability measures, we would often be interested in knowing the expected number of catastrophic events C(t) to occur in a given time interval [0, t). To this end: c(t) = ;E[Y(t)]
= y&(t)
needs to be calculated for some interesting initial up state, i E U with the reward assignment, as given in Table 2.6. In the preceding two subsections, where availability and reliability measures were considered, binary reward functions were of predominant importance. However, other reward functions have already been used to model availability- and reliability-oriented measures 2.2.2.3.3
System and Task Performance
74
MARKOV CHAlNS
such as frequencies of outages or other related incidents of interest. A more general view is presented in this section. First, we look at a particular task with some assumed task execution time x. Under the given circumstances, we would like to know the probability of successful completion of this task. Two different aspects come into the picture here: the dynamics of the system and the susceptibility of the task (or its user) to these dynamic changes in the configuration. In the case of the example in Fig 2.7, just out ages and their impact on the task completion are considered. But this situation can be generalized to different degrees of the task’s susceptibility to loss of data or other disturbing events as well. In the case of a user requiring uninterrupted service for some time x, the probabilities that no interruption occurs in the respective states are assigned as reward rates. We allow task requirement to be state dependent as well so that in state 2 the task requires x2 time units to complete and likewise for state 1. The probabilities can be easily concluded from Fig. 2.7. In state 2, for example, the probability P[O > 221 that the system will be operational for more than 22 units of time is given by e-‘YZ2. Therefore, the task interruption probability is:
conditioned on the assumption the system is running in state 2 and on the hypothesis that the first failure causes a task’s interruption. These probabilities can be assigned as reward rates as shown in Table 2.7, so that the interruption probability IP(‘) (xl, x2) of a task of length xl, x2, respectively is computed with E[Z], w hen assuming that probability of a user finding the system initially being up is given by the steady-state probabilities: IP(1)(x1,x2)
= E[Z] = (1 - e-2Y22)iTTa+ (1 - emyIc1)7rr.
Table 2.7 Reward assignment for computing a task’s interruption probability with duration x for the CTMC from Fig 2.7 State
2 RC RB 1 0
i
Reward
Rate
1 -e--2Y~2 0 0 1 _ e-Yzl
0
Table 2.8 Reward assignment for computing a task’s interruption probability in case of uncovered error or loss of required processor for the CTMC from Fig 2.7
ri State
2 RC RB 1 0
i
Reward l-
Rate
r(i)
e-(YP--c)+Y)s2
0 0 l-ee-Yzl
0
Another scenario would be that a running task is constrained to using a single processing element, even if two were available. In this case, a task would
THE MODELING PROCESS
75
only be susceptible to a failure of its own required processor or if an uncovered failure occurs that also has severe impacts on the currently running one. With the reward assignment as shown in Table 2.8 the desired interruption probability IPc2) is obtained:
,TP(2)(~l, x2) = E[Z] = (1 - e-(y(1-c)+y)z2)7r2 + (1 - e-yz1)7ri. Recall also the reward function as defined in Table 2.4. Based on these reward rates, it is also possible to compute the mean number of severe interruptions in a finite time duration. These measures can also be applied as success measures for running a task with finite execution time x and with certain interruption tolerance patterns attached to it. In particular, continuous media exhibit such a kind of susceptibility profile. Interruptions or losses can be tolerated up to a certain threshold. Beyond that threshold, a severe degradation in quality is perceived by users. Referring back to the example in Fig.2.7, it could be of interest to take into account different levels of performance that the modeled system may exhibit in various states. One simple method is to use the processing power (proportional to the number of processing elements actually working). This would lead to the reward function depicted in Table 2.9. With this reward assignment, the expected available computing capacity in steady state E[Z], the expected instantaneous capacity at time t, E[Z(t)], or the cumulative capacity deliverable in finite time [O,t), E[Y(t)], or its time average iE[Y(t)] could be computed. These availability measures are known as capacity-oriented availability. But this reward assignment is too coarse to capture performance effects accurately. First, it is overly optimistic with respect to the actual performance and, second, effects due to outages tend to be masked by the coarse reward structure and tend to become insignificant as a consequence. From this point of view, either the reward function or the model structure are ill-posed with respect to the system requirements. Table 2.9 Reward assignment for computation of capacity oriented availability measures for the CTMC from Fig 2.7
Table 2.10 Reward assignment for computing the total loss probability for the CTMC from Fig 2.7
State i
Reward Rate ri
State i
2 RC RB 1
2 0 0
2 RC RB
12
1 0
1 0
11
0
Reward Rate r(i) 1
1 1
More sophisticated approaches either take into account the user’s requirements related to the tasks to be processed or incorporate more information about the investigated system into the model. A hierarchical method is pos-
76
MARKOV CHAINS
sible, where every working configuration is modeled separately with some (queueing) models so that more interesting parameters can be derived. Trivedi et al. [TMWH92], f or example, use a finite queueing system that is subject to losses due to buffer overflow and to response time deadline violation [TSIHSO], while de M eer et al. [MTD94] apply infinite queueing systems with limits imposed on the system’s response time for processing the tasks, so that losses occur if a deadline is missed. In both cases, closed-form solutions are available to calculate the loss probabilities Zi for all working configurations or up states i E U [GrHa85]. Of course, knowledge of the processing power of the working components and of the traffic characteristics are needed to parameterize the underlying performance models. Many of the details are given in later chapters when appropriate. Table 2.11 Reward assignment for computing the normalized throughput for the CTMC from Fig 2.7
Table 2.12 Reward assignment for computing of the expected number of tasks lost in finite time for the CTMC from Fig 2.7 -
State i
State i
Reward Rate r,
2 RC RB
12x
1 0
11X x
Reward Rate r(i)
1 - 12 0 0
2 RC RB 1 0
1 -I1 0
x x
The values Zi are used to characterize the percentage loss of tasks arriving at the system in state i E U. Consequently, tasks arriving in down states are lost with a probability of 1.0, that is, Zi = 1, i E D. Furthermore, it is assumed that (quasi) steady state in the performance model is maintained in all configuration states, such as those depicted in Fig. 2.7. In other words, tasks arriving at the system in a certain state are assumed to leave the system in the same state. This assumption is admissible because the events related to traffic effects usually occur in orders of magnitude faster than events related to changes in configurations (due to failure-repair events), The resulting reward assignment is summarized in Table 2.10. Of course, a complementary reward structure as in Table 2.11 could also be chosen, so that the normalized throughput would be calculated as has been done by Meyer [MeyeBO]. The expected total loss probability, TLP, in steady state and the transient expected total loss probability TLP(t) are then given by:
7wyt) = E[Z(t)l = C liTi iEU
= /2n2(t)
+ C ITi iED + Zm(t)
+ rdt)
+ ma(t)
+ 7@(t),
THE MODELING PROCESS
77
TLP = E[Z]
= c li7ri+c hi iEU iED
= /2x2
+ llrl
+ TRC
+ TRB
+ TO.
To compute the expected number of tasks lost in finite time [0, t), two choices are available: the interval-based measure TL(r)(t) or the steady-statebased measure TLc2) (t) can be applied. In the first case, the reward rates in Table 2.10 are multiplied by the tasks’ arrival rate X to yield Table 2.12 and to compute E[Y(t)]. In the second case, E[Z] is calculated on the basis of a modified reward assignment. All reward rates from Table 2.12 need to be multiplied by the length of the investigated time horizon [O,t) to get Table 2.13. Of course, the same reward assignment could be used for the interval-based case and iE[Y(t)] could be evaluated instead:
To(t)
= E[Y(t)] =
ifU = A
iED
(z2L&>
= tX (l2r2
+ hLl@)
+ llrl
+ TRc
+ LRC(t)
+ TRB
+ LRB(t)
+ b-j(t)),
+ To).
Table 2.13 Reward assignment for computing the expected total number of tasks lost in finite time for the CTMC from Fig 2.7
Table 2.14 Reward assignment for throughput computation with the CTMC of Fig 2.10
State i
Reward Rate r,
State i
2 RC RB 1 0
t12x tX tX tllX tX
3 2 1 0
Reward Rate r, P CL I-L 0
Response time distributions and percentiles of more complicated systems can rarely be computed by simple elementary queueing systems, and closed-
78
MARKOV CHAiNS
form solutions are rarely available. Muppala et al. have shown how CTMCs with absorbing states (or their equivalent SRNs) can be exploited to compute these measures numerically in rather general cases [MuTrSl, MMKT94]. Once the probabilities that the response time exceeds a certain threshold have been derived with this method, the reward assignments can be made corresponding to Tables 2.10, 2.12, or 2.13 and the appropriate measures can be derived. The use of reward rates is not restricted to reliability, availability, and performability models. This concept can also be used in pure (failure-free) performance models to conveniently describe performance measures of interest. In many computer performance studies, expected throughput, mean response time, or utilization are the most important measures. These measures can easily be specified by means of an appropriate reward function. Throughput characterization, for example, can be achieved by assigning the state transition rate corresponding to departure from a queue (service completion) as reward rates to the state where the transition originates. Mean response time of queueing systems can be represented indirectly and computed in two steps. First, by assigning the number of customers present in a state as the reward rate to that state and to compute the measures as required. Second, it can be proven that there exists a linear correspondence between mean response time and mean number of customers present so that Little’s theorem [LittGl, King901 can be used for the computation of the mean response times from the mean number of customers. Finally, it is worth mentioning that the utilization is also based on a binary reward assignment. If a particular resource is occupied in a given state, reward rate 1 is assigned, otherwise reward rate 0 indicates the idleness of the resources. To illustrate these measures, reward definitions for a simple CTMC are considered. Imagine customers arriving at a system with exponentially distributed interarrival times with mean l/X. In the system they compete for service from a single server station. Since service is exclusively received by each customer, if more than one customer is in the system at the same time, the others have to wait in line until their turn comes. Service times are independent exponentially distributed with mean l/p. To keep the example simple, we limit the maximum number of customers in the system to three. The system is described by the MRM shown in Fig. 2.10.
Fig. 2.10 A simple CTMC with arrivals and services.
Every state in S = {3,2,1,0} tomers in the system. A state completed by the service station arrives (arc annotated with X).
of the MRM represents the number of custransition occurs if a customer’s service is (arc annotated with p) or if a new customer Tables 2.14, 2.15, and 2.16 summarize the
THE MODELING PROCESS
Table 2.15 Reward assignment for computing the mean number of customers with the CTMC of Fig 2.10 State i
Tab/e 2.16 Reward assignment for computing utilization measureswith the CTMC of Fig 2.10
Reward Rate pi
3 2 1 0
79
State i
Reward Rate rz
3 2
1 1
1
1 0
3 2 1 0
0
reward assignments for throughput, mean number of customers, and utilization measure computations, respectively. The formulae for the computation of throughput measures A(l)(t), Xc21(t), Xc3); the mean number of customers K(1)(t), K(2)(t), K(3); mean response time measures T(r) (t), Z’(2) (t), Z’(s) ; and utilization measures p(‘)(t), pc2)(t), pc3) are also given. Note that response time measures Tci) can be calculated with the help of Little’s theorem [LittGl]:
T(i) = @I.
X(‘)(t) = E[Y(t)]
= j
(c i
= CL&3(t) XC2)(Q = W(t)]
KO(t)
= E[Z(t)l
To(t)
= +(l)(t)
+ L2(t)
(2.83)
T~TQ(T)) dr = &Li(t) i
+ h(i)),
= -&n(t)
= p (r3(t)
= C
= 3n3(t)
Tini
= ; (37r3(t)
+ 7T2(t) + n1(t))
+ 27T2(t) + lTl(t>,
+ 27Q(t) + 17rl@)) )
)
80
MARKOV CHAINS
To(t)
= $w(t)
= ;
(3&(i)
h’(3) = E[Z] = &ri
T(3)
PW)
=
1&w x
=
~Lw)l
;
=
= 37r3+ 27Q + hl,
(37r3
pC3) =
5 &3(t)
E[Z]
27r2
= ; j
+ J52(t)
= c
+ 1x1) *
= x3(t)
0 =
+
= pi%(t)
,d2)(t) = +[Y(t)]
+ 2&?(t) + L&(t)),
( xri7ri(~)) i
+ n2(t)
+ 7r&),
dr = 5 C&(t) 2
+ b(t)),
rilTi=T3+7T2+7Tl.
More MRM examples with all these types of reward assignments can be found in [MTBH93], and MuTr92, and in [TMWH92] where multiprocessor architectures with different interconnection techniques are studied by defining corresponding reward functions. As another example, queues with breakdown are investigated. Tables 2.17, 2.18, 2.19, and 2.20 summarize the different reward assignments discussed in this section. 2.2.3
Generation
Methods
We have already pointed out the importance of model generation, There are many reasons for the separation of high-level model description and lower-level computational model. For one thing, it will allow the modeler to focus more on the system being modeled rather than on low-level modeling details. Recall that CTMCs and similar other state-space based models tend to become very large when real-world problems are studied. Therefore, it is attractive to be able to specify such large models in a compact way and to avoid the errorprone and tedious creation of models manually. The compact description is based on the identification of regularities or repetitive structures in the system model. Besides reducing the size of the description, such abstractions provide visual and conceptual clarity. It is worth repeating that regularities must be present in order to take advantage of automatic model generation techniques. If repetitive structures cannot be identified, the “higher-level” representation would hardly be smaller
THE MODELING PROCESS
Table 2.17
Overview
of important
Measure
Formula
Em
ci,s
ra*a
Jww1
c 2ES 7’24)
Jw WI
CiES
Jwwl
P[z(t) = Lz]
raL (t>
c aE~ rib(m) CT,=z,iES 4) c ?-$<2,iES 4) 1 - I&r %(Y> (al + uR - &)a*(~, a) = e (for small models)
81
MRM measures Literature [GaKe79], [Kuba86], [MuTrSl], [MTBH93], [TSIHSO] [BRT88], [DaBh85], [HMTSI], [MuTrSl], [NaGa88] [BRT88], [NaGa88], [MTD94], [MTBH93] [TMWH92] [Husl81], [LeWi89], [Wu82] [SoGa89] [Beau78], [CMSTSO] [STR88], [GoTa87], [CiGr87], [SoGa89], [DoIy87], [Meye82]
and may even be more cumbersome. Furthermore, since software tools, hardware resources, human labor, and time are necessary to install and run such an automatic generation facility, its application is recommended only beyond a certain model size. Finally, the structure of the high-level description language generally makes it more suited for a specific domain of applications. As an example, effects of resource sharing and queueing are naturally represented by means of queueing systems, while synchronization can be easily specified with some Petri net variant. Many high-level description languages exist, each with its own merits and limitations. A comprehensive review of specification techniques for MRMs can be found in [HaTr93]. In the remainder of this chapter, we discuss two types of stochastic Petri nets: generalized stochastic Petri nets (GSPNs) and stochastic reward nets (SRNs). We also discuss algorithms to convert a GSPN into a CTMC and an SRN into an MRM. 2.2.3.1 Petri Nets Petri nets (PNs) were originally introduced by C.A. Petri in 1962 [Petr62]. A PN is a bipartite directed graph, consisting of two types of nodes, namely places, P, and transition, 7’. Arcs between the nodes fall in two categories: input arcs lead from an input place to a transition and output arcs connect a transition to an output place. Arcs always have to lead from one type of nodes to the complementary one. A finite number of tokens can be distributed among the places and every place may contain an arbitrary (natural) number of tokens. A marking m E M is defined as a possible assignment of tokens to all places in the PN. Markings are sometimes referred to as states of the PN. If P denotes the set of places, then a marking m is amultiset,mEM ClN IpI , describing the number of tokens in each place.
MARKOV CHAINS
82
Table 2.18
Use of reward rates to compute
Requirement
Rewsr
Avazlabzlity [HMTSI],
r
[Table
1
ifiEU
0
else
measures - part I
Measure
d Assignments
L-
different
E[Z]
(steady = (instantaneous) +E[Y(t)](interval)
E[Z(t)]
2. l]
state)
P[Z(t)
= l]
Jw(~)l* 1 P[Y(=J) I T/1* Pl+w I Yl I-
Unavazlabzlaty [HMTSl], [MTD94], [MCT94]
L-
1
ifiED
0
else
E[Z](steady E[Z(t)] = (instantaneous) $E[Y(t)](interval)
state)
P[Z(t)
= l]
P[SW) 5 Yl uptzme in
Mean
I’, =
WI t) [Table
Mean
uptame in
rz =
[O,t) [Table
Approz. quency
fre-
of
calls in repair [0, t) with mean duration $ [HMTSl]
of
Frequency repair
calls
t)
[0, with duration 6 [HMTS~]
Mean age interruptzons
[HMTSl]
of
I-, =
ifiEU
0
else
r
= z
(SR:
cd)
7-t =
EL-4 (ww-ox.1 $aY(t)l
2.21 1
ifiEU
0
else
ED’-(t)1 EP’(~)I*> WY(~) 5 YI*
2. l]
tS
if i E SR
0
else
(SR: states where cd) [Table 2.31
in rnean
percentsevere
t
6
if i E SR
0
else
states where [Table 2.31
P[T 0
> pi]
EL? repair
is need-
WY(t)1 repair
is need-
if z E U W’(t)1
else
(rZ 2 0: threshold for of outage) [Table 2.41
state
i,
T: duration
*The measure is valid for models with absorbing states. N: non-absorbing states; A: absorbing states; U: up states; D: down states.
THE MODELING PROCESS
Tab/e 2.19
Use of reward rates to compute different measures - part II
Requirement
Reward
Reliabilzty
ri =
Measure
Assignments 1 0
ifiEN else
E[Z(t)], P[z(t) = 11
[Table 2.51
r2 =
Unreliability [STP96] System MTTF [HMTSl]
ri =
1 0
ifiEA else
E[Z(t)l, P[z(t) = 11
1 0
ifiEN else
Jw(=J)l*
[Table 2.51 Expected ber of trophic in [0, t) [HMTSl]
numcatasevents
r, =
ifiESc else
W’(t)1
(SC: states where catastrophe may occur, X: rate of catastrophe) [Table 2.51
Task interruption probabilzty [HMTSl], [TSIHSO]
r, =
oriCapacity ented availabilztY [HMTSl], [MTD94], [CiGr87]
r, =
Total loss probabality [MuTrSl], [TMWH92], [TSIHSO]
7-i =
Normalzzed throughput
r, =
[MeyeW, [MeyeW,
X 0
P[Oi t 0
5 xi]
if i E U else
task execution time (xi: state i, Oi: sojourn time in state i) [Table 2.7, 2.81 cap(i) 0
in
if i E U else
(cap(i): capacity [Table 2.91
1 1,
E[Zl
of state i)
ifiED else
(li: percentage [Table 2.101 l-li 0
+ W(t)1
P[z(t) = x:1,P[z(t) 5x1, PLY(t)
5 Yl
E[Zl, wZ(t)l, w(t)11 loss in state i)
$WWl, P[z(t) = xc], P[z(t) W(t) I Yl
2 xl,
ifiEU else
[Table 2.111
[GaKe79],
*The measure
E[Zl>E[Z(t)l, -W(t)l,
is valid for models with absorbing states.
P[z(t) = xl, p[z(t) 5 xl, W(t) 5 Yl
83
84
MA RKOV CHAINS
Table 2.20
Use of reward rates to compute different measures - part III
Requirement
Reward Assignments
Approx.
7-i =
total
tX1,
ifiEU else
tX
ZOSS
Measure
-w-l
[MTD94], [MeSe97]
[Table 2.131
Loss [HMTSl]
r, =
Mean number of customers [MTBH93]
ri = c(i) (c(i): number of customers in state i)
Expected throughput [MeSe97], [DoIy871, [MTBH93]
r, =
Utilization [Husl81], [GoTa87], [MTBH93]
7-i =
i
61, 6
pi 0
if i E U else
= x]
if i E ST else
service (w state i) 0 1
WI Y-wYtx T
qz(t)
rate
while
in
if resource is idle else
ww
$w(t)l~
WI, p[$W I ~1
*The measure is valid for models with absorbing states
To refer to the number of tokens at a particular place Pi, 1 5 i 5 IPI, in a marking m, the notation #(Pi,m) is used. A transition is enabled if all of its input places contain at least one token. Note that a transition having no input arc is always enabled. An enable transition can fire by removing from each input place one token and by adding to each output place one token. The firing of a transition may transform a PN from one marking into another. With respect to a given initial marking, the reachability set, RS, is defined as the set of all markings reachable through any possible firing sequences of transitions, starting from the initial marking. The 72% could be infinite if no further restrictions are imposed. Even PNs appearing as being relatively simple can give rise to large or infinite RSs. If more than one transition is simultaneously enabled but cannot fire in parallel, a conflict between transitions arises that needs to be resolved through some selection policy, for example, based on a priority scheme or according to a given pmf. Markings in which no transition is enabled are called absorbing markings.
THE MODELING PROCESS
85
For a graphical presentation, places are usually depicted as circles and transitions as bars. A simple PN with an initial marking: m0
= C&O,O,O>
y
and subsequent marking:
ml = (Lb LO>, reachable through firing of transition ti, is shown in Fig. 2.11.
p3
t. 0
t 0
p4
Fig, 2.11 A simple PN before and after the firing of transition tl S
Transition ti is enabled because its single input place Pi contains two tokens. Since P2 is empty, transition t2 is disabled. When firing takes place, one token is removed from input place PI and one token is deposited in both output places P2 and P3. In the new marking, both transitions ti and t2 are enabled. The apparent conflict between ti and t2 must be solved to decide which of the alternative (absorbing) markings will be reached: m2 = (0,2,2,0)
or
m3 = (O,O, 1,l). The reachability set in this example is given by (me, ml, m2, m3), where the markings are given as defined earlier. Definition with:
2.16
A PN is given by a 5-tuple PN = {P,T, D-, D+,mo}
l
A finite set of places P = {PI, P2, . . . , PIPI}, where each place contains a non-negative number of tokens.
l
A finite set of transitions T = {ti, . . . , +I},
l
The set of input arcs D- E (0, l}IPxTl and the set of output arcs D+ E (0, l}lPxTl. If D-(Pi,tk) = 1, then there is an input arc from place Pi to transition tk, and if D+(Pj,tl) = 1, then there is an output arc from transition tl to place Pj.
l
The initial marking
mo
E M.
such that 7’ n P = 8.
86
MARKOV CHAlNS
2.2.3.2 Generalized Stochastic Petri /Vets GSPNs generalize PNs in such a way that each transition has a jiring time assigned to it, which may be exponentially distributed or constant zero. Transitions with exponentially distributed firing times are called timed transitions, while the others are referred to as immediate transitions. The new type of transitions require enabling and firing rules to be adapted accordingly. The markings M = V U ‘T in the reachability set RS of a GSPN are partitioned into two sets, the vanishing markings v and the tangible markings 7. Vanishing markings comprise those in which at least one immediate transition is enabled. If no immediate transition is enabled, that is, only timed transitions or no transitions are enabled, a tangible marking results. Vanishing markings are not resided in for any finite non-zero time and firings are performed instantaneously in this case. Therefore, immediate transitions always have priority over timed transitions to fire. If several immediate transitions compete for firing, a specified pmf is used to break the tie. If timed transitions compete, a race model is applied so that the transition whose firing time elapses first is the next one to fire. From a given GSPN, an extended reachability graph (ERG) is generated containing the markings of the reachability set as nodes and some stochastic information attached to the arcs, thereby relating the markings to each other. Absorbing loops of vanishing markings are a priori excluded from the ERG, because a stochastic discontinuity would result otherwise. The GSPNs we are interested in are bounded, i.e., the underlying reachability set is finite. Marsan, Balbo, and Conte proved that exactly one CTMC corresponds to a given GSPN under condition that only a finite number of transitions can fire in finite time with non-zero probability [ABC84].
Fig. 2.12
A simple GSPN before and after the firing of transition tl .
Thick bars are used to represent timed transitions graphically, while thin bars are reserved for immediate transitions. An example of a GSPN is shown in Fig. 2.12. In contrast to the PN from Fig. 2.11, an exponentially distributed firing time with rate Xi has been assigned to transitions tl. While in the PN of the right part in Fig. 2.11 both transitions tl and tz were enabled, only
THE MODELING PROCESS
87
immediate transition t2 is enabled in GSPN of the right part in Fig. 2.12, because immediate transitions have priority over timed transitions. 2.2.3.3 Stochastic Reward Nets Many proposals have been launched to provide extensions to GSPNs (PNs). Some of the most prominent ones are revisited in this section: arc multiplicity, inhibitor arcs, priorities, guards, markingdependent arc multiplicity, marking-dependent firing rates, and reward rates defined at the net level. With these extensions we obtain the formalism of stochastic reward nets. We note that the first three of these extensions are already present in the GSPN formalism. Arc Multiplicity. Frequently more than one token needs to be removed from a place or deposited into a place, which can be easily represented with arc multiplicities. It could be awkward, otherwise, to draw an arc for each token to be moved. Arc multiplicity is a syntactical extension that does not increase the principal modeling power but makes the use of GSPNs more convenient. Arcs with multiplicity are represented by a small number attached to the arc or by a small line cutting through the arc. By default, a multiplicity of 1 is assumed. An example of arcs with multiplicities is depicted in Fig. 2.13.
t2
fig 2.13
Simple example of multiple arcs.
Since there are initially three tokens in place Pi, transition ti is enabled, and after elapsing of the firing time, two tokens are removed from place Pi (multiplicity 2) an d one token is deposited into place P2. No other transition is enabled in the initial marking. Since in the new marking there are two tokens in Pz and one token in PI, only t2 is enabled and fires immediately, thereby removing two tokens from P2 (multiplicity 2) and adding one token to PI. After some exponentially distributed firing time has elapsed, transition ti fires for the last time, because an absorbing marking will then be reached. Finally, there will be zero tokens in PI and one token in Pz. Inhibitor Arcs. An inhibitor arc from a place to a transition disables the transition in any marking where the number of tokens in the place is equal to or greater than the multiplicity of the inhibitor arc. An inhibitor arc exhibits a complementary semantic to the one of a corresponding ordinary arc
88
MARKOV CHAINS
(with the same multiplicity). If a transition were enabled under the conditions imposed by an ordinary arc it would never be enabled under the conditions imposed by its counterpart, an inhibitor arc, and vice versa. Graphically, an inhibitor arc is indicated by a small circle instead of an arrowhead. The use of an inhibitor arc is dernonstrated in Fig. 2.14.
Fig. 2.14
A GSPN with an inhibitor
arc having a multiplicity
assigned.
The figure entails a queue with finite capacity, which customers enter with a mean interarrival time of 1 /X as long as the queue is not full, that is, fewer than k customers are already in the system. Arrivals to a full system may not enter and are considered as lost. The arrival transition is disabled in this case. The effective arrival rate, therefore, turns out to be less than X. Note that we have made use of these kind of scenarios in earlier sections when specifying reward assignments capturing throughput losses or related performance requirements. Customers leave the system after receiving service with exponentially distributed length of mean duration l/cl. Arrival t1
41 A
Case
t2 l-
Queue1
(Priority 1)
t4
Fig. 2.15
Priorities, tions,
92
t3 -r
Queue2
ts
(Priorityz)
A GSPN with priorities.
Although inhibitor arcs can be used to specify priority relait is easier if priority assignments are explicitly introduced in the pa-
THE MODELING PROCESS
89
radigm. Priorities are specified by integer numbers assigned to transitions. A transition is enabled only if no other transition with a higher priority is enabled, In Fig. 2.15 a simple example of a GSPN with priorities is shown. An idle server is assigned with different priority to each type of waiting customers. Arriving customers belong to priority class i with probability qi, xi pi = 1. Recall that immediate transitions must always have priority
Fig. 2.16
An SRN with guards.
Guards. Guards are general predicates that determine when transitions are to be enabled. This feature provides a powerful means to simplify the graphical representation and to make GSPNs easier to understand, for example, by identifying modular substructures. With these extensions, we have moved out from GSPN class to SRNs [CBC+92]. An example for an SRN with guards is given in Fig. 2.16. A system with two types of resources being subject to failure and repair is represented. The scenario modeled captures the situation of a finite capacity queueing system with finite number of servers (net 1) and additional buffer space (net 2) usable if the active servers are temporarily occupied. The total number of customers in the system is limited by the total of failure-free buffer spots plus the failure-free server units. Hence, arriving customers, which enter with a mean interarrival time l/X (net 3), can only be admitted if fewer customers are already present than given by the current capacity. This condition is described by the guard attached to the “arrival” transition. The guard characterizing the enabling conditions of the “service” transition explicates the fact that service can only be delivered if at least one server is failure free (besides the apparent condition of the presence of at least one customer).
90
MARKOV CHAINS
1
fig. 2.17
Arrival
Place flushing modeled with marking-dependent
arc multiplicity.
Marking-Dependent Arc Multiplicity. This feature can be applied when the number of tokens to be transferred depends on the current marking. A useful application is given when places need to be “flushed” at the occurrence of a certain event, such as the appearance of a token at a particular place. Such a case is shown in Fig. 2.17. Customers arrive from outside at place Pr and are immediately forwarded and deposited at places Pz, P3, or PJ, with probabilities 41, q2, and 43, respectively. Whenever there is at least one token at place PA, transition t5 is enabled, regardless of the number of tokens in the other input places P2 and Pa. When t5 fires, one token is removed from place P4, all tokens are removed from place P2 and P3, and a single token appears in place P5. Marking-Dependent Firing Rates. A common extension of GSPNs is to allow marking-dependent firing rates, where a firing rate can be specified as a function of the number of tokens in any place of the Petri net. Restricted variants do exist, where the firing rate of a transition may depend only on the number of tokens in the input places. As an example, the firing rate of transition service in net 3 of Fig. 2.16 is a function of the number of servers currently up, that is, the number of tokens in place serv-up. Reward Rate Specification. Traditional output measures obtained from a GSPN are the throughput of a transition and the mean number of tokens in a place. Very often it is useful to have more general information such as: l
The probability enabled.
that a place Pi is empty while the transition
tj is
THE MODELING PROCESS l
The probability ously enabled.
91
that two different transitions tl and t2 are simultane-
Very general reward rates can be specified on an SRN so as to allow the user to compute such custom measures. Examples of the use of reward rates can be found in [CFMT94, MCT94, MMKT94, MaTr93, MuTr91, MuTr92, WLTV96]. 2.2.3.4 GSP/V,/SRN Analysis In this section we introduce the techniques of automated generation of stochastic processes underlying a GSPN/SRN. But it is worthwhile remembering that GSPNs can also be evaluated with discreteevent simulation techniques, a non-state-space based method, and that each technique has its merits. In general, the analysis of a GSPN/SRN can be decomposed into four subtasks:
1. Generation of the UZ&J, thereby defining the underlying stochastic process. A semi-Markov process (SMP) results, with zero or exponentially distributed sojourn times. In the case of an SRN, a reward rate for each tangible marking is also generated in this step. 2. Transformation of the SMP into a CTMC by elimination of the vanishing markings with zero sojourn times and the corresponding state transitions. 3. Steady-state, transient, or cumulative transient analyses of the CTMC with methods presented in Chapters 3, 4, and 5. 4. Computation of measures. In the the case of a GSPN, standard measures such as the average number of tokens in each place and the throughput of each timed transition are computed. For the SRN, the specification of reward rates at the net level enables the computation of very general custom measures. In the ERG, which reflects the properties of the underlying stochastic process, arcs representing the firing of timed transitions are labeled with the corresponding rates, and arcs representing the firing of immediate transitions are marked with probabilities. The ERG can be transformed into a reduced reachability graph (RG) by eliminating the vanishing markings and the corresponding transitions. Finally, a CTMC will result. In the case of an SRN, a reward vector is also generated and, hence, an MRM is produced. 2.2.3.4.1 CR&T Generation An U?$ is a directed graph with nodes corresponding to the markings (states) of a GSPN/SRN and weighted arcs representing the probabilities or rates with which marking changes occur. The basic ER&7 generation algorithm is summarized in Fig. 2.18. Starting with an initial marking mo, subsequent markings are generated by systematically following all possible firing patterns. If a new marking is
92
MARKOV CHAINS
fig. 2.18 Basic ERG generation algorithm. generated, the arcs labeled with the corresponding rates and probabilities are added to the E’JZS. The algorithm terminates if all possible markings have been generated. Three main data structures are used in this algorithm: a stack (S), the ERG, and a search structure (D). l
Stack (S): Newly generated markings are buffered in a stack until they are processed.
l
ERG: This data structure is used to represent the generated extended reachability graph.
l
Search structure (D): This data structure contains all earlier generated markings so that a current marking can be checked as to whether it is new or already included in the set D. For efficiency reasons, memory requirements and access time to the data structure should be of concern in an implementation. If the data structure is stored in hash tables, fast access results but memory requirements could be excessive. In contrast, tree structures are more flexible and can be stored more compactly, but access time could be intolerably high for unbalanced trees. Balanced trees such as AVL trees and B-trees are used for this reason [AKH97].
Elimination of Vanishing Markings The elimination of vanishing markings is an important step to be accomplished for generation of a CTMC from a given GSPN/SRN. Two different techniques can be distinguished,
2.2.3.4.2
THE MODELING PROCESS
“elimination on the fly” and “post-elimination” consider the GSPN in Fig. 2.19a. t3
p2
c
[CMTSl].
9.3
To give an example,
p4
Q t-0 1-q t&-4
Fig. 2.19 The elimination of vanishing markings demonstrated by: (a) a GSPN, (b) the underlying ER&7, (c) the resulting R&Y, and (d) the corresponding CTMC.
The underlying ERG in Fig. 2.19b contains five markings. Tangible markings are represented by oval boxes, while rectangular boxes are reserved for vanishing markings. The elimination of the vanishing markings (0, 1, 0, 0,O) and (O,O, l,O,O) leads to an RG, shown in Fig. 2.19c. The RG is a CTMC (redrawn
in Fig 2.19d).
The elimination
of vanishing
markings
can either be
integrated into the generation of the &RS (e1imination on the fly) or performed afterwards (post-elimination). The merits of both methods are discussed in the subsequent subsect ions. Consider the snapshot in Fig. 2.19b. Generation of vanishing markings such as (0, l,O, 0,O) is avoided by splitting the arc and its attached rate, which would be leading to this vanishing marking, and redirecting the resulting new arcs to subsequent markings. If at least one of these subsequent markings is also of vanishing type, the procedure must be iterated accordingly. Such a situation is indicated in Fig. 2.19b. The splitting is accomplished by weighing the rates in relation to the firing probabilities of the immediate transitions (in our example, t2 and ta). As a result, only tan2.2.3.4.3
Elimination
on the f/y
94
MARKOV CHAINS
gible markings and the arcs with adjusted rates need to be stored. Vanishing markings and the associated firing probabilities are discarded immediately. The principle of rate splitting and elimination of vanishing markings is illustrated in Fig. 2.20. rl
” T G
T
<
r2
G T
7-1 = r - q, 7-2 = r . (1 - q)
Fig. 2.20
On-the-fly elimination
of vanishing markings.
Elimination on the fly is efficient with respect to memory requirements, because vanishing markings need not be stored at all. But these savings in space have to be traded with additional cost in time. The same vanishing markings may be hit several times through the &%&generation process. The elimination step would be repeated although it had been perfectly executed before. This repetition could amount to a significant wast of execution time. 2.2.3.4.4 Post-Elimination In post-elimination, the complete &RG is stored during generation. With this technique, it is easier to recognize and resolve (transient) loops consisting of vanishing markings only [CMT90]. Furthermore, no work is duplicated during ERG generation, and, as an additional advantage, the complete information included in the ERG is kept for further use. Once the &RS has been generated, it is transformed into a CTMC by simply applying the same rate splitting method as has been described in Section 2.2.3.4.3. The success of the post-elimination method largely depends on the use of efficient matrix algorithms. Let:
pv = [PVV ] PV7] denote a matrix, which is split into transition probabilities between vanishing markings only (P”‘) and between vanishing markings and tangible markings (Pv7), where Pv is ’ of dimension ]VJ x IM 1 and the set of markings is partitioned such that M = I/ U 7. Furthermore, let:
u7 = [P
1CT]
denote a matrix, which is split into transition rates from tangible markings to vanishing markings (UT’) and between tangible markings only (UT7), where U7 is of dimension (71 x IM I. P recisely the same information as contained in the ERG is provided by U7 together with Pv. With representation in matrix form, it is easy to complete the set of transition rates by formally applying the rate-splitting method as indicated earlier.
THE MODELING PROCESS
95
The complete set of transition rates, gained after the rate-splitting method has been applied, is summarized in the rate matrix U, which is of dimension 171 x 171 [CMTSl]:
u&J77+urv
(I - PVV) --I PV?
(2.84)
Rate matrix U = [uij], which contains all initially present rates between tangible markings plus the derived rates between tangible markings, obtained by applying the rate-splitting method, needs to be completed to derive the infinitesimal generator matrix Q = [aij], the entries of which are given by: 4ij
=
ifi#j,
uij
-c
kcT,k#i
uik
(2.85)
if i = j,
where 7 denotes the set of tangible markings. Note that U may contain non-zero entries on its diagonal, which can be simply ignored as redundant information when creating the infinitesimal generator matrix Q as shown in Eq. (2.85). Once the generator matrix has been derived with the described method, numerical solution methods can be applied to yield performance measures as required. Finally, it may be of interest to realize that instead of eliminating the vanishing markings, they can be preserved and the underlying semi-Markov process can be solved numerically [CMTSO, MTD94].
fig. 2.21
Simple queueing
network.
Example 2.13 A simple example is given to demonstrate the generation of an ERG by applying the algorithm from Fig. 2.18. Consider the GSPN in Fig. 2.21a that models a simple network of queues. An equivalent queueing network representation is shown in Fig. 2.21b (f or a detailed introduction to queueing networks see Chapter 6). The network contains K = 2 customers, which are assumed to be initially in node 1. Let (i, j) denote a state of the network, where i represents the number of customers at node 1 and j those at node 2.
96
MARKOV CHAINS
The CTMC can be directly derived from the resulting &ES since there are no vanishing markings in this ERG. Example 2.14 With the example GSPN from Fig. 2.22a, the intermediate steps to create the rate matrix U and the infinitesimal generator matrix Q are demonstrated. Post-elimination is assumed so that the complete ERG is available, as indicated in Fig. 2.22b. The four matrices Pvv, Pv7, UT’, and UT7 are given by: 0> pvv _- ( 00 cl2 h-4
un-
-
I 0
pVT=
'
Pa\
0 I
'
\o 0)
( 00
0 Ql
/o
0
0
0000 0
0
UT7 = I
q4 0
!I3 0> ' o\ 0 I * CL3
\o 0 0 0)
97
THE MODELING PROCESS
/7
t
Fig. 2.22 CTMC.
P3
p3
I
n
p5
p6
(a) Simple GSPN, (b) the underlying
ERG, and (c) the corresponding
Then we get:
(I-py-1
=
Pl
( > :,
“1”
u~yI-pvv)-l=
)
uTv(I~pvv)-lpv7=
i
0
q1cL1
0 0
0 0
0
0
q2rcL1+
;
;
0
0
Q3(Q2Pl
+ P2)
t
Q4672Pl
+ P-12)
0 0 0
P2
0 0 0
)
1 i
9
98
MARKOV CHAINS
!I3k2Pl
+ P2) P3
0 0 -(w1
+ q4(q2111 +q3(q2pl
Q=
+ p2)
QPl
> 1
c?4(cIZPl
+ pa)
q3(q2/!L1 + /L2)‘
+ p2))
0 0 0
-P3
0 0
0 0 0
P3
0 0
I
With Eq. (2.85), the entries of the generator matrix Q can be determined from U. To demonstrate the automated generation of 2.2.3.5 A larger Example CTMCs from high-level GSPN description, an example of a polling system adapted from [IbTrSO] is now presented. (For an introduction to polling systems see [Taka93].) Consider a two-station single-buffer polling system as modeled by the GSPN shown in Fig. 2.23. The places in the GSPN are interpreted as follows. A token in place Pi represents the case that station 1 is idle and a token in P2 correspondingly indicates the idleness of station 2. Tokens in places P, (Ps) indicate that the server is serving station 1 (station 2). Tokens appearing in Places P3 (P4) indicate that station 1 (station 2) has generated a message. If a token appears in place P7, the server is polling station 2, while a token in place Ps indicates polling of station 1. In the situation illustrated in Fig. 2.23, both stations are idle and the server has finished polling station 1 (the server is ready to serve station 1). We assume that the time until station i generates a message is exponentially distributed with mean l/Xi. Service and polling times at station i are assumed to be exponentially distributed with mean l/pi and l/yi, respectively (i = 1,2). As soon as station 1 (station 2) generates a message, the timed transition ti (tz) fires. Consider initial marking me shown in Fig. 2.23, where Ps contains a token and P3 contains no token. In this situation the immediate transition si is enabled. It fires immediately and a token is deposited in place PT. This represents the condition that the server, when arriving at station 1 and finding no message there, immediately proceeds to poll station 2. In the case that a token is found in place P3, the timed transition t3 is enabled and its firing causes a token to be deposited in place Pi and another token to be deposited in place PT. This represents the condition that station 1 returns to the idle state while the server proceeds to poll station 2. Let Zi denote the number of tokens in place P; and rnj = (Zr, . . . , 1s) denote the jth marking of the GSPN. The ERG, obtained from the initial marking mo, is shown in Fig. 2.24. Labels attached to directed arcs represents those transitions whose firing generates the successor markings. If a previ-
THE MODELING PROCESS
Fig. 2.23
99
GSPN Model for the two-station single-buffer polling system.
ously found marking is obtained again (like ms), further generations from such a marking are not required. All sixteen possible and unique markings are included in Fig. 2.24. To reduce the clutter, several markings appear more than once in the figure. The &RG consists of four vanishing markings {mo,m4,m6,m13}, represented in the graph by rectangles and tangible markings represented by ovals. We can eliminate the vanishing markings by applying the algorithm presented in Section 2.2.3.4.2. From Fig. 2.24 we can see that all the tangible markings containing a token in place Pi indicate the condition that station 1 is idle. The set of markings in which station 1 is idle is S’(l) = { ml, m3, m7, rnlo, rnls}. The set of markings Sc2) = {ml, m2, m7, mg, rn12) gives the condition that station 2 is idle. In Fig. 2.25, the finite and irreducible CTMC corresponding to the GSPN of Fig. 2.23 is shown, where each state j corresponds to the tangible marking rnj of the &RG in Fig. 2.24. With the ergodic CTMC in Fig. 2.25, the steadystate probability vector 7r can be derived according to Eq. (2.58). Letting 7rk denote the steady-state probability of the CTMC being in state Ic and letting pci) denote the steady-state probability that station i, i = 1,2 is idle,
100
MARKOV CHAINS
Fig. 2.24
Extended reachability
graph for the GSPN Model in Fig. 2.23.
we conclude: p(l) =
Tl
p
~1+~2+~7+~9+~12.
=
+ 7r3 + r7
+ TlO + r15,
(2.86)
As can be seen in Fig. 2.23, the model is structurally symmetric relative to each station and therefore it can easily be extended to more than two stations. With Eq. (2.86), the mean number of customers E[K(i)] = Kci) at singlebuffer station i is determined as: KG) = 1 - #).
Also with Eq. (2.86), the effective arrival rate at single-buffer station Z follows aS:
101
THE MODELING PROCESS
Fig. 2.25
CTMC for the GSPN in Fig. 2.23.
From Little’s theorem [LittGl], f ormulae can be derived for the computation of mean response times of simple queueing models such as the polling system we are investigating in this section. The mean response time JY[T(~)] = T(i) at station i can be computed as:
T(i) = Idi) A;’ *
(2.87)
As an extension of the two-station model in Fig. 2.23, let us consider a symmetric polling system with n = 3,5,7,10 stations, with ,ui = ,u = 1, l/y = l/yi = 0.005, and Xi = X for all stations 1 5 i 5 n. The offered load is defined by p = x7=1 X/p = r-A/p. For a numerical computation of steady-state probabilities of CTMCs such as the one depicted in Fig. 2.25, algorithms introduced in Sections 3.3 and 3.4 can be used. In Table 2.21, the mean response times Tci) = T of single buffer schemes are given for different number of stations. For many other versions of polling system models, see [IbTr90]. With Definition 2.64, prove differential Eq. (2.65) by inteProblem 2.1 gration of Eq. (2.53) on both sides. Problem
2.2
With Definition 2.66, prove differential Eq. (2.67).
Problem
2.3
Prove linear Eq. (2.68) by observing $L~(co)
= ON.
Classify the measure mean time to absorption (MTTA) as Problem 2.4 being of transient, steady-state, or stationary type. Refer back to Eqs. (2.65) and (2.68) and give reasons for your decision. exist if the underlying model has at Does lim t+coJw(t)] Problem 2.5 least one absorbing state? If so, what would this measure mean?
102
MARKOV CHAINS
Tab/e 2.21
Mean response times T at a station
P
3 Stations
5 Stations
7 Stations
10 Stations
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1.07702 1.14326 1.20813 1.27122 1.33222 1.39096 1.44732 1.50126 1.55279
1.09916 1.18925 1.28456 1.38418 1.48713 1.59237 1.69890 1.80573 1.91201
1.11229 1.21501 1.32837 1.45216 1.58582 1.72836 1.87850 2.03464 2.19507
1.12651 1.24026 1.37051 1.51879 1.68618 1.87310 2.07914 2.30293 2.54220
Problem 2.6 Specify the reward assignments to model variants one and three in Fig. 2.8 for reliability computations. Compare reliability (unreliability) R(t) (UR(t)) functions Problem 2.7 from Section 2.2.2.3.2 with the computation formula of the distribution of the accumulated reward until absorption, P[Y(oo) 5 y] in Eq. (2.82) and comment on it.
3 Steady-State Solutions of Markov Chains In this chapter, we restrict ourselves to the computation of the steady-state probability vector’ of ergo&c Markov chains. Most of the literature on solution techniques of Markov chains assumes ergodicity of the underlying model. A comprehensive source on algorithms for steady-state solution techniques is the book by Stewart [Stew94]. From Eq. (2.15) and Eq. (2.58), we have v = VP and 0 = nQ, respectively, as points of departure for the study of steady-state solution techniques. Eq. (2.15) can be transformed so that: 0 = Y(P -1).
(3.1)
Therefore, both for CTMC and DTMC, a linear system of the form: O=xA
(3.2)
needs to be solved. Due to its type of entries representing the parameters of a Markov chain, matrix A is singular and it can be shown that A is of rank n - 1 for any Markov chain of size IS’] = n. It follows immediately that the resulting set of equations is not linearly independent and that one of the equations is redundant. To yield a unique, positive solution, we must impose a normalization condition on the solution x of equation 0 = xA. One way to approach the solution of Eq. (3.2) is to directly incorporate the normalization condition xl=
’ For the sake of convenience hand notation.
sometimes
1
use the term
(3.3) ‘steady-state
analysis’
as a short-
103
104
STEADY-STATE SOLUTIONS OF MARKOV CHAINS
into the Eq. (3.2). This can be regarded as substituting one of the columns (say, the last column) of matrix A by the unit vector 1 = [I, 1,. . . , llT. With a slight abuse of notation, we denote the new matrix also by A. The resulting linear system of non-homogeneous equations is: b=xA,
b=
[O,O,..., OJ].
(3.4)
An alternative to solving Eq. (3.2) is to separately consider normalization Eq. (3.3) as an additional step in numerical computations. We demonstrate both ways when example studies are presented. It is worthwhile pointing out that for any given ergodic CTMC, a DTMC can be constructed that yields an identical steady-state probability vector as the CTMC, and vice versa. Given the generator matrix Q = [qij] of a CTMC, we can define:
P = Q/q + I,
(3.5)
where Q is chosen such that Q > max;,jcs )qij]. Setting q = maxi,jeS lqijJ should be avoided in order to assure uperiodicity of the resulting DTMC [GLT87]. Th e resulting matrix P can be used to determine the steady-state probability vector x = u, by solving u = VP and vl = 1. This method, used to reduce a CTMC to a DTMC, is called randomization or sometimes uniformixation in the literature. If, on the other hand, a transition probability matrix P of an ergodic DTMC were given, a generator matrix Q of a CTMC can be defined by: Q=P-I.
(3.6)
By solving 0 = 7rQ under the condition ~1 = 1, the desired steady-state probability vector 7r = v can be obtained. To determine the steady-state probabilities of finite Markov chains, three different approaches for the solution of a linear system of the form 0 = xA are commonly used: direct or iterative numerical methods and techniques that yield closed-form results. Both types of numerical methods have merits of their own. Whereas direct methods yield exact results2 iterative methods are generally more efficient, both in time and space. Disadvantages of iterative methods are that for some of these met hods no guarantee of convergence can be given in general and that determination of suitable error bounds for termination of the iterations is not always easy. Since iterative methods are considerably more efficient in solving Markov chains, they are commonly used for larger models. For smaller models with fewer than a few thousand states, direct methods are reliable and accurate. Though closed-form results are highly desirable, they can be obtained for only a small class of models that have some structure in their matrix.
2Modulo round-off errors resulting from finite precision arithmetic.
SYMBOLIC SOLUTION: BIRTH-DEATH PROCESS
Fig. 3.1
Birth-death
105
process.
Problem matrix.
3.1
Show that P - I has the properties of a CTMC generator
Problem
3.2
Show that Q/q+1
has the properties of a stochastic matrix.
Problem 3.3 Define a CTMC and its generator matrix Q so that the corresponding DTMC would be periodic if randomization were applied with q = rnaxi,jeS 1qijl in Eq. (3.5).
3.1
SYMBOLIC
SOLUTION:
BIRTH-DEATH
PROCESS
Birth-death processes are Markov chains where transitions are allowed only between neighboring states. We treat the continuous time case here, but analogous results for the discrete-time case are easily obtained. A one-dimensional birth-death process is shown in Fig. 3.1 and its generator matrix is shown as Eq. (3.7): --X0 Pl
-(k
+ CLl)
. ..
x2
et*
-@3+p3)
*** ..
Xl -(Xz+p2)
“=i
ii
“0”
I93
. .
!
*
...
0 0
0
X0
:
.
(3.7)
* 1
The transition rates Xk, Ic 2 0 are state dependent birth rates and ~11,L 2 1, are referred to as state dependent death rates. Assuming ergodicity, the steadystate probabilities of CTMCs of the form depicted in Fig. 3.1 can be uniquely determined from the solution of Eq. (2.58):
Solving Eq. (3.8) f or ~1, and then using this result for substitution lc = 1 in Eq. (3.9), and solving it for 7r2 yields: 7Tl =
X0 -7r0, Pl
IT2=
AoX1 plP2ro*
with
(3.10)
106
STEADY-STATE SOLUTIONS OF MARKOV CHAINS
Eq. (3.10) together with Eq. (3.9) suggest a general solution of the following form: (3.11) Indeed, Eq. (3.11) provides the unique solution of a one-dimensional birthdeath process. Since it is not difficult to prove this hypothesis by induction, we leave this as an exercise to the reader. From the law of total probability, xi ri = 1, we get for the probability ~0 of the CTMC being in State 0: 1 7ro = l+
,F;,
go
(3.12)
=
co k-l ik
The condition for convergence of the series in the denominator of Eq. (3.12)) which is also the condition for the ergodicity of the birth-death CTMC, is: gko,
Vk > k. :
AI,
-
< 1.
Eq. (3.11) and Eq. (3.12) are used extensively in Chapter 6 to determine the probabilities n!, for many different queueing systems. These probabilities are then used to calculate performance measures such as mean queue length, or mean waiting time for these queueing systems. We deal with multidimensional birth-death processes in Chapter 7. Problem 3.4 Consider a discrete-time birth-death process with birth probability ‘bi, the death probability di, and no state change probability 1 - bi - di in state i. Derive expressions for the steady-state probabilities and conditions for convergence [Triv82].
3.2
HESSENBERG
MATRIX:
NON-MARKOV’IAN
QUEUES
Section 3.1 shows that an infinite state CTMC (or DTMC) with a tridiagonal matrix structure can be solved to obtain a closed-form result. In this section we consider two other infinite state DTMCs. However, the structure is more complex so as to preclude the solution by “inspection” that we adopted in the previous section. Here we use the method of generating functions (or z-transform) to obtain a solution. The problems we tackle originate from non-Markovian queueing systems where the underlying stochastic process is Markov regenerative [Kulk96]. One popular method for the steady-state analysis of Markov regenerative processes is to apply embedding technique so as to produce an embedded DTMC from the given Markov regenerative process. An alternative approach is to use the method of supplementary variables [Hend72]. We follow the embedded DTMC approach.
HESSENBERG MATRIX: NON-MARKOVIAN
QUEUES
107
We are interested in the analysis of a queueing system, where customers arrive according to a Poisson process, so that successive interarrival times are independent, exponentially distributed random variables with parameter X. The customers experience service at a single server with the only restriction on the service time distribution being that its first two moments are finite. Order of service is first-come-first-served and there is no restriction on the size of the waiting room. We shall see later in Chapter 6 that this is the M/G/l queueing system. Characterizing the system in such a way that the whole history is summarized in a state, we need to specify the number of customers in the system plus the elapsed service time received by the current customer in service. This description results in a continuous state stochastic process that is difficult to analyze. But it is possible to identify time instants where the elapsed time is always known so they need not be explicitly represented. A prominent set of these time instants is given by the departure instants, i.e., when a customer has just completed receiving service and before the turn of the next customer has come. In this case, elapsed time is always zero. As a result, a state description given by the number of customers is sufficient. Furthermore, because the service time distribution is known and arrivals are Poissonian, the state transition probabilities can be easily computed. It is not difficult to prove that the stochastic process defined in the indicated way constitutes a DTMC. This DTMC is referred to as embedded into the more general continuous state stochastic process. Conditions can be identified under which the embedded DTMC is ergodic, i.e., a unique steady-state pmf does exist. In Section 3.2.1 we show how the steady-state probability vector of this DTMC can be computed under given constraints. Fortunately, it can be proven that the steady-state pmf of the embedded DTMC is the same as the limiting probability vector of the original non-Markovian stochastic process we started with. The proof of this fact relies on the so-called PASTA theorem, stating that “Poisson arrivals see time averages” [Wolf82]. The more difficult analysis of a stochastic process of non-Markovian type can thus be reduced to the analysis of a related DTMC, yielding the same steady-state probability vector as of the original process. 3.2.1
Non-Exponential
Service
Times
The service times are given by independent, identically distributed (i.i.d.) random variables and they are independent of the arrival process. Also, the first moment E[S] = 9 and the second moment E [S2] = s2 of the service time S must be finite. Upon completion of service, the customers leave the system. To define the embedded DTMC X = {X,; n = 0,l.. .}, the state space is chosen as the number of customers in the system (0, 1, . . .}. As time instants where the DTMC is defined, we select the departure epochs, that is, the points
108
STEADY-STATE SOLUTIONS OF MARKOV CHAINS
in time when service is just completed and the corresponding customer leaves the system. Consider the number of customers X = {X,; n = 0, 1, . . .} left behind by a departing customer, labeled n. Then the state evolution until the next epoch n + 1, where the next customer, labeled n + 1, completes service, is probabilistically governed by the Poisson arrivals. Let the random variable Y describe the number of arrivals during a service epoch. The following one-step state transitions are then possible: X n+1
if X, > 0, if X, = 0.
X,+Y-1, =
1 Y,
(3.14)
Let c&k= P[Y = Ic] denote the probability of k arrivals in a service period with given distribution B(t). If we fix the service time at t, then Y is Poisson distributed with parameter At. To obtain the unconditional probability ak, we uncondition using the service time distribution:
cm ak =
(3.15)
.I0
Now the transition probabilities are given as: i>O, j>i--1, i = 0, j > 0.
P[X,,l = j 1x, = i] =
(3.16)
The transition probability matrix P of the embedded stochastic process, which is a Hessenberg matrix, is then given by: a0 al
a2
... (3.17)
P can be effectively created, because the arrivals are Poisson and the service time distribution B(t) is known. With P given as defined in Eq. (3.17), it is not difficult to prove that a DTMC has been defined, i.e., the Markov property from Eq. (2.2) holds. The reader should also verify that this DTMC is aperiodic, and irreducible. It is intuitively clear that State 0 is positive recurrent, if the server can keep up with arrivals, i.e., the mean service time 3 is smaller than than mean interarrival time l/X. Equivalently, the mean number of arrivals E[N] = N during mean service period ?? should be less than one [Triv82]: N=A;T
(3.18)
HESSENBERG MATRIX: NON-MARKOVIAN
QUEUES
109
We know from Section 2.1.2.1 that the states of an irreducible DTMC are all of the same type. Therefore, positive recurrence of state 0 implies positive recurrence of the DTMC constructed in the indicated way. A formal proof of the embedded DTMC being positive recurrent if and only if Relation (3.18) holds has been given, for example, by Cinlar [Cin175]. We also know from Section 2.1.2.1 that irreducible, aperiodic, and positive recurrent DTMCs are ergodic. Therefore, the embedded DTMC is ergodic if Relation (3.18) holds. Assuming the DTMC to be ergodic, the corresponding infinite set of global balance equations is given as: uo = vOa0+ vm, k+l
uk = uoak + c
(3.19)
via]c-i+l,
k > 1.
i=l
To compute the steady-state probability vector u of the DTMC, we use the method of generating functions wherein we also need to use the LST of the service time random variable. Let the service times distribution be denoted by B(t) and its LST B”(s) be given by: 00 B'(s)
(3.20)
eestdB(t).
= s 0
Defining generating functions of the state probabilities: G(x) = 2
VkZk,
(3.21)
k=O
and of the { aj } sequence: (3.22)
GA(X) = Fajzj, j=o from Eq. (3.19), we have:
G(x) = euk,zk = u()Eakz' + ~A~Yiak-i+lZk, k=O
k=O
k=Oi=l
or: G(x)
= uoGA(~)
+ c i=l
= Us&
c
wk-i+l~IC
k=i-1
+ CC uiajzjsiel ix1 j=O
110
STEADY-STATE SOLUTIONS OF MARKOV CHAINS
= ‘OGA(X) + --(G(z) - VO)GA(Z), or:
G(4
x -
GA(X)
= u. bG&)
-
GA(d)
x
z
7
or: G(x)
= u.
ZGA(X)
-
GA(~)
X-GA(X)
(3.23) ’
Now, because the arrival process is Poisson, we can obtain the generating function of the {aj} sequence by first conditioning on a fixed service time and then unconditioning with the service time distribution:
GA(Z) = C ajzj= C J P[Y j=O
= j ) service = t)dB(t)z”
j=O o
coo0 = cs 0 j=o co
e~“t~&?(t)zi
O” ow c j!e
=
j=O
I(0
&qt)
-At )
(3.24)
cc
e -xt+xtzqq
= .I
0 co e-wl-")&qt)
= .I
= i-(x(1
-
2)).
Thus, the generating function for the {aj} sequence is given by the LST of the service time distribution at X(1 - z). Hence, we get an expression for the generating function of the DTMC steady-state probabilities:
G(z) = u. zB-(X(1 - ‘d> - B-(X(1 - ‘>> 2 - &(X(1 - 2)) B-(X(1 - z))(z - 1) = u” x -X(X(1 - 2)) *
(3.25)
HESSENBERG MATRIX: NON-MARKOVIAN
QUEUES
111
The generating function in Eq. (3.25) allows the computation of the infinite steady-state probability vector u of the DTMC embedded into a continuoustime stochastic process at the departure epochs. These epochs are given by the time instants when customers have just completed their generally distributed service period. The steady-state probabilities are obtained by repeated differentiations of the probability generating function G(x), evaluated at x = 1: (3.26) If we set z = 1 on the right hand side of Eq. (3.25) and since GA(~) = ET=, aj = 1, we have O/O. Differentiating the numerator and the denominator, as per L’Hospital’s rule, we obtain: G(1) = 1 = uo
B-(X(1 - z)) + z(-l)XK’(X(l 1+ XK’(X(1
=uow~ -
2)) + XK’(X(l1+ XB-‘(X(l-
- 2)) + XB”‘(X(1 - x))
x))[l-
- x))
z]
z))
B”(O) = ” 1 + XB*‘(O) ’ From Eq. (3.24) it can be easily shown B”(O) = 1 and -B-‘(O) we have:
= 3, so that
1 l=ve------l-X?? or: uo = 1 - xz7.
(3.27)
Taking the derivative of G(z) and setting z = 1, we get the expected number of customers in the system, E[X] = x, in steady state:
x=x9+
x2s2 2(1 - XS)’
(3.28)
Equation (3.28) is known as the Pollacxek-Khintchine formula. Remember that this formula has been derived by restricting observation to departure epochs. It is remarkable that this formula holds at random observation points in steady state, if the arrival process is Poisson. The proof is based on the PASTA theorem by Wolff, stating that “Poisson arrivals see time averages” [Wolf82]. We refrain from presenting numerical examples at this point, but refer the reader to Chapter 6 where many examples are given related to the stochastic process introduced in this section.
112
STEADY-STATE SOLUTIONS OF MARKOV CHAINS
Problem
3.5
Specialize Eq. (3.28) above to:
(a) Exponential service time distribution (b) Deterministic
with parameter 1-1
service time with value l/p
(c) Erlang service time distribution
with mean service time l/p and Ic phases
In the notation of Chapter 6, these three cases correspond to M/M/l, and M/EI, / 1 queueing systems, respectively.
M/D/l,
Problem 3.6 Given a mean interarrival time of 1 second and a mean service time of 2 seconds, compute the steady-state probabilities for the three queueing systems M/M/l, M/D/l, and M/Ek/l to be idle. Compare the results for the three systems and comment. Show that the embedded discrete-time stochastic process Problem 3.7 X = {X,;n = O,l,. . .} defined at the departure time instants (with transition probability matrix P given by Eq. (3.17)), forms a DTMC, i.e., it satisfies the Markov property Eq. (2.2). Give a graphical representation of the DTMC defined by Problem 3.8 transition probability matrix P in Eq. (3.17). Show that the embedded DTMC X = {X,;n Problem 3.9 defined in this section is aperiodic and irreducible.
= O,l, . . .}
Consider a single-server queueing system with indepenProblem 3.10 dent, exponentially distributed service times. Furthermore, assume an arrival process with independent, identically distributed interarrival times; a general service time distribution is allowed. Service times and interarrival times are also independent. In the notation of Chapter 6, this is the GI/M/l queueing system. 1. Select a suitable state space and identify appropriate time instants where a DTMC X* should be embedded into this non-Markovian continuous state process. 2. Define a DTMC
X* = {X:; n = 0, 1, . . .} to be embedded into the non-Markovian continuous state process. In particular, define a DTMC X* by specifying state transitions by taking Eq. (3.14) as a model. Specify the transition probabilities of the DTMC X* and its transition probability matrix P*.
3.2.2
Server with
Vacations
Server vacations can be modeled as an extension of the approach presented in Section 3.2.1. In particular, the impact of vacations on the investigated
HESSENBERG MATRIX: NON-MARKOVIAN
QUEUES
113
performance parameters is of additive nature and can be derived with decomposition techniques. Non-Markovian queues with server vacations have proven to be a very useful class of models. They have been applied in a great variety of contexts [DoshSO], some of which are: l
Analysis of server brealcdowns, which may occur randomly and preempt a customer (if any) in service. Since breakdown (vacation) has priority over customer service, it is interesting to find out how the overall service capacity is affected by such breakdowns. Such insights can be provided through the analysis of queueing systems with vacations.
0 Investigation of maintenance strategies of computer, communication, or manufacturing systems. In contrast to breakdowns, which occur randomly, maintenance is usually scheduled at certain fixed intervals in order to optimize system dependability. l
Application of polling systems or cyclic server queues. Different types of polling systems that have been used include systems with exhaustive service, limited service, gated service, or some combinations thereof.
3.2.2.1 Polling Systems Because polling systems are often counted as one of the most important applications of queueing systems with server vacations, some remarks are in order here. While closed-form expressions are derived later in this section, numerical methods for the analysis of polling systems on the basis of GSPNs is covered in Section 2.2.3. The term polling comes from the polling data link control scheme in which a central computer interrogates each terminal on a multidrop communication line to find out whether it has data to transmit. The addressed terminal transmits data, and the computer examines the next terminal. Here, the server represents the computer, and a queue corresponds to a terminal. Basic polling models have recently been applied to analyze the performance of a variety of systems. In the late 195Os, a polling model with a single buffer for each queue was first used in an analysis of a problem in the British cotton industry involving a patrolling machine repairman [MMW57]. In the 196Os, polling models with two queues were investigated for the analysis of vehicleactuated traffic signal control [Newe69, NeOs69]. There were also some early studies from the viewpoint of queueing theory, apparently independent of traffic analysis [AMM65]. In the 197Os, with the advent of computer communication networks, extensive research was carried out on a polling scheme for data transfer from terminals on multidrop lines to a central computer. Since the early 198Os, the same model has been revived by [Bux81] and others for token passing schemes (e.g., token ring and token bus) in local area networks (LANs). In the current investigation of asynchronous transfer mode (ATM) for broadband ISDN (integrated services data network), cyclic scheduling is often proposed. Polling models have been applied for scheduling moving arms in secondary storage devices [CoHo86] and for resource arbitration and load
114
STEADY-STATE SOLUTIONS OF MARKOV CHAINS
sharing in multiprocessor computers. A great number of applications exist in manufacturing systems, in transportation including moving passengers on circular and on back-and-forth routes, internal mail delivery, and shipyard loading, to mention a few. A major reason for the ubiquity of these applications is that the cyclic allocation of the server (resource) is natural and fair (since no station has to wait arbitrarily long) in many fields of engineering. The main aim of analyzing polling models is to find the message waiting time, defined as the time from the arrival of a randomly chosen message to the beginning of its service. The mean waiting time plus the mean service time is the mean message response time, which is the single most important performance measure in most computer communication systems. Another interesting characteristic is the polling cycle time, which is the time between the server’s visit to the same queue in successive cycles. Many variants and related models exist and have been studied. Due to their importance, the following list includes some polling systems of interest: 0 Single-service polling systems, in which the server serves only one message and continues to poll the next queue. l
Exhaustive-service polling systems, in which the server serves all the messages at a queue until it is empty before polling the next queue.
l
Gated-service polling systems, in which the server serves only those messages that are in the queue at the polling instant before moving to the next queue. In particular, message requests arriving after the server starts serving the queue will wait at the queue until the next time the server visits this queue.
l
Mixed exhaustive- and single-service polling systems.
l
Symmetric and asymmetric limited-service polling systems, in which the server serves at most Z(i) customers in in each service cycle at station i, with Z(i) = I for all stations i in the symmetric case.
3.2.2.2 Analysis As in Section 3.2.1, we embed a DTMC into a more general continuous-time stochastic process. Besides non-exponential service times S with its distribution given by B(t) and its LST given by B * (s), we consider vacations of duration V with distribution function C(t) and LST C”(s), and rest periods of the server of duration R, with distribution D(t) and LST D’(s). Rest periods and vacations are both given by i.i.d. random variables. Finally, the arrivals are again assumed to be Poisson. Our treatment here is based on that given in [KingSO]. If the queue is inspected and a customer is found to be present in inspection in epoch n, that is X, > 0, the customer is served according to its required service time, followed by a rest period of the server (see Fig. 3.2). Thus, after each service completion the server takes a rest before returning for inspection of the queue. If the queue is found to be empty on inspection, the server takes a
HESSENBERG MATRIX: NON-MARKOVIAN
Fig. 3.2
Phases of an M/G/l
QUEUES
115
queue with vacations.
vacation before it returns for another inspection. We embed a DTMC into this process at the inspection time instants. With E denoting the arrivals during service period including following rest period and F denoting the arrivals during a vacation, Eq. (3.14) d escribing possible state transitions from state X, at embedding epoch n to state X n+r at embedding epoch n + 1 can be restated:
if X, > 0, if X, = 0.
X,+=-l,
X n-t1
= F7
(3.29)
With ek = P[E = k] and fi = P[F = Z] d enoting the probabilities of Ic or I arrivals during the respective time periods, the transition probabilities of the embedded DTMC can be specified in analogy to Eq. (3.16):
p[&+l
= j 1x,
= i] =
The resulting transition by:
probability
fo
P=
Note that this transition the M/G/l case.
i>O, j>i-120, i = 0, j 2 0.
ej-i+17 1 fj7
matrix of the embedded DMTC is given
fl
fz
. ..’
eo el e2 . . . 0 eo el . . . 0 0 eo . . . * .. . ..
probability
(3.30)
(3.31)
matrix is also in Hessenberg form like
116
STEADY-STATE SOLUTIONS OF MARKOV CHAINS
Let p denote the state probability vector of the embedded DTMC at inspection instants. Then its generating function is given by:
G(z) = epkzk
(3.32)
k=O
k=O
= PoG&)
+ 2
2
i=l =
po
~Gdd
-
k=O
i=l
(3.33)
p&4+12”
k&-l
(3.34)
GE(Z)
x-G&)
*
In analogy to Eq. (3.24)) the generating functions GE(Z) and GF (z) can be derived: GE(X) = ge.jzj j=o
P [E = j ) service = t]d[B(t) + D(t)],2 = 27 j=o 0
moo =cs eFxt y
j=o 0
.
d[B(t) + D(t)]2
s
XC me-xt(‘-z)d[B(t)
= ii-(x(1
+ D(t)]
- z>>o-(X(l-
x)),
(3.35) (3.36)
G&z) = C-(X(1 - z)).
With generating functions G(x), GE(Z), and GF(x) defined, and the transition probability matrix as given in Eq. (3.31), an expression for the probability generation function at inspection time instants can be derived for the server with vacation:
+) =pozc(‘(’ - ‘>>- B-(X(1 - x))D”(x(l - ‘d> x - B”(X(1 - x))D”(X(l
- 2))
*
(3.37)
Remembering G( 1) = 1 and by evaluating Eq. (3.37) at x = 1, again by differentiating the denominator and the numerator, we have: 1 - X(S + R) po = 1 - X(S + R) + XV’
(3.38)
So far, we have derived an expression for the probabilities pi at inspection time instants. But to arrive at the steady-state probabilities, we infer the state probabilities un at departure instants. Then an inspection at a time
instant immediatelyprecedingthe departure must find at least one customer present, i > 0, with probability
pi, because a departure occurred.
With
HESSENBERG MATRIX: NON-MARKOVIAN
QUEUES
117
ak = P[Y = k] d enoting again the probability of Ic arrivals in a service period, the probabilities vn, pi, and ak can be related as follows: nfl
v, =
Pi%+l-i
c
i.l
1 -Po
*
Then, the generating function of the steady-state queue length X(x)
= 2 VjXj j=o
is given by:
x(x>=fJ Xnn$r&%+1--i=fJ 2 In&an+l-i n=O =
i=ln=i--1
i-J-&
= 41
g
2
2=1
j=o
l
-Po>
,Zi+'-lpiaj
=
z(l
t
po)
gpiPi
2
2=1
j=o
ajxi
(3*3g)
(G(~-Po)GA(~.
Recalling GA(Z) = B-(X(1 - x)) and the formula for G(Z), we have: G(x)
= PO
xC”(X(1 - x)) - X(X(1 - Z))D”(X(lx - B-(X(1 - Z))D”(X(lx))
x - X(X(1 -z-B-(X(1
- x))D”(X(l-
Z))
- x))D-(X(l-
x))
1
xC”(X(1 - x)) - x x - B-(X(1x))D”(X(l - z))
x))
1
(3.40)
’
and: x(z)
= Z(l~Po)
=
Z[c-(X(1 - 2)) - l]B”(X(l Z - B-(X(1 - Z))W(X(l-
- x)) Z))
1 - XE[S + JL] [c-(x(1 - Z)) - l]B”(X(l z - B-(X(1x))D”(X(lWV1
- 2)) x)) *
(3.41)
Taking the derivative and setting x = 1, we have:
x=x9+
X2(S + R)2 ____
2(1-XS+R)
xv2 -
+ 277’
(3.42)
If there is no rest period, that is, R = 0, and if the server takes no vacation, that is, V = 0, then Eq. (3.42) re d uces to the Pollaczek-Khintchine formula
118
STEADY-STATE SOLUTIONS OF MARKOV CHAINS
of Eq. (3.28). Furthermore, the average number of arrivals during a vacation, given by the term:
is simply added to the average number of customers that would be in the system without vacation. With respect to accumulating arrivals, the rest period can simply be considered as an extension of the service time. Problem 3.11 Give an interpretation of servers with vacation in terms of polling systems as discussed in Section 3.2.2.1. Give this interpretation for all types of polling systems enumerated in Section 3.2.2.1. Problem 3.12 How can servers with breakdown and maintenance strategies be modeled by servers with vacation as sketched in Fig. 3.2? What are the differences with polling systems? Problem 3.13 Give the ergodicity condition for an M/G/l vacation defined according to Fig. 3.2.
queue with
Problem 3.14 Derive the steady-state probabilities ve and ~1 of an M/G/l queue with vacation. Assume that the following parameters are given: Rest period is constant at 1 second, vacation is a constant 2 seconds, arrival rate X = 1, and service time distribution is Erlang Ic with k = 2 and mean l/p = 0.2 seconds. Check for ergodicity first. Use the same assumptions as specified in Problem 3.14 Problem 3.15 and, if steady state exists, compute the mean number of customers in system for the following cases: 1. M/G/l
queue with vacation according to Fig. 3.2.
2. Parameters same as in Problem 3.14 above but with rest period being constant at 0 seconds. 3. Parameters same as in Problem 3.15 part 2 above with vacation being also constant at 0 seconds.
3.3
NUMERICAL
SOLUTION:
DIRECT METHODS
The closed-form solution methods explored in Sections 3.1 and 3.2 exploited special structures of the Markov chain (or, equivalently, of its parameter matrix). For Markov chains with a more general structure, we need to resort to numerical methods. There are two broad classes of numerical methods to
solvethe linear systemsof equationsthat we are interestedin: direct methods and iterative methods.
Direct methods operate and modify the parameter
NUMERICAL SOLUTION: DIRECT METHODS
119
matrix. They use a fixed amount of computation time independent of the parameter values and there is no issue of convergence. But they are subject to fill-in of matrix entries, that is, original zero entries can become non-zeros. This makes the use of sparse storage difficult. Direct methods are also subject the accumulation of round-off errors. There are many direct methods for the solution of a system of linear equations. Some of these are restricted to certain regular structures of the parameter matrix that are of less importance for Markov chains, since these structures generally cannot be assumed in the case of a Markov chain. Among the techniques most commonly applied are the well known Gaussian elimination (GE) algorithm and, a variant thereof, Grassmann’s algorithm. The original version of the algorithm, which was published by Grassmann, Taksar, and Heyman, is usually referred to as the GTH algorithm [GTH85] and is based on a renewal argument. We introduce a newer variant by Kumar, Grassmann, and Billington [KGB871 w h ere interpretation gives rise to a simple relation to the GE algorithm. The GE algorithm suffers sometimes from numerical difficulties created by subtractions of nearly equal numbers. It is precisely this property that is avoided by the GTH algorithm and its variant through reformulations relying on regenerative properties of Markov chains. Cancellation errors are nicely circumvented in this way. 3.3.1
Gaussian
Elimination
As the point of departure for a discussion of Gaussian elimination, we refer back to Eq. (3.4). The idea of the algorithm is to transform the system of Eq. (3.43) which corresponds in matrix notation to Eq. (3.4), into an equivalent one by applying elementary operations on the parameter matrix that preserve the rank of the matrix: ao,oxo + alp1
+ . . . + an-l,oxn-1
= bo,
qm
+ . . . + an--1,1x,-l
= bl,
+ al,lxl
(3.43) aO,n-lxO + u~,~--~xI + . . . + ~~--l,~-lxn--l
=
bn-1.
As a result, an equivalent system of linear equations specified by Eq. (3.44) with a triangular matrix structure is derived, from which the desired solution x, which is identical to the solution of the original system given by Eq. (3.43), can be obtained:
(n-1)xo= b,(n-1), ao,o apl-2) ,
(0)
aO,n-lxO
(0)
+ ‘l,n-
I+...
(n-2)x1
= by-2) 7
x:o+al,l
+ af)Al n-lxn-l
(3.44) = b”? n 1’
120
STEADY-STATE SOLUTIONS OF MARKOV CHAINS
If the system of linear equations has been transformed into a triangular structure, as indicated in Eq. (3.44), the final results can be obtained by means of a straightforward substitution process. Solving the first equation for 20, substituting the result in the second equation and solving it for ~1, and so on, finally leads to the calculation of CC,-~. Hence, the xi are recursively computed according to Eq. (3.45):
j-1
(3.45)
(n-j)
aki
k=O c aj~-j)“k,
.i =
172,***9n-
1.
,
To arrive at Eq. (3.44), an elimination procedure first needs to be performed on the original system of Eq. (3.43). Informally, the algorithm can be described as follows: first the nth equation of Eq. (3.43) is solved for x,-i, and then x,-r is eliminated from all other n- 1 equations. Next, the (n- 1)th equation is used to solve for x,-2, and, again, x,-2 is eliminated from the remaining n - 2 equations, and so forth. Finally, Eq. (3.44) results, where ai:' denotes the coefficient of xi in the (j + 1)th equation, obtained after the /Ah elimination step .3 For the sake of completeness it is pointed out that a!‘) a.i = ai >j * ‘-More formally, for the lath elimination step, i.e, the elimination of x,-k from equations j, j = n - k, n - lc - 1,. . . , 1, the (n - k + 1)th equation is to be multiplied on both sides by: (k-1)
-
an-k,j-l (k-l) an-k,n-k
(3.46)
’
and the result is added to both sides of the jth equation. The computation of the coefficients shown in the system of Eq. (3.47) and Eq. (3.48) for the lath elimination step is:
I 0, a..(k) _ZJ
,yy-l)
-
23
(k-1)
ai,n-k
j = n - k - 1, n - k - 2, , . . , 0, i=n-l,n-2 ?“‘T n - li,
aW)
n-k,j a(k-l) n-k,n-k
7
(3.47)
otherwise,
a(W b(') j
=
b("-l) j
_
bF:t)
n-kJ'
,
j
=
n
-
k -
1, n
-
k,.
aW)
n-k,n-k
3Note that i n t he system of Eq. (3.44) relation
k = n - j >_ 0 holds.
. . ,O.
(3.48)
NUMERICAL SOLUTION: DIRECT METHODS
121
In matrix notation we begin with the system of equations:
an-i,0
an-l,1
- - - an-l,n-1
After the elimination procedure, a modified system of equations results, which is equivalent to the original one. The resulting parameter matrix is in upper triangular form, where the parameters of the matrix are defined according to Eq. (3.47) and the vector representing the right-hand side of the equations according to Eq. (3.48):
(n-1) ao,o
0 9 xn-1)
0
arl-2)
ac2-3)
(A-2) %,l
* -* ***
al,,-i
0
(A-3) %,2 ap2-3) )
*..
a2,n-1
0
...
0
ao,n-i
an-l,n-1 /
(3.49)
=xu
The Gaussian elimination procedure takes advantage of elementary matrix operations that preserve the rank of the matrix. Such elementary operations correspond to interchanging of equations, multiplication of equations by a realvalued constant, and addition of a multiple of an equation to another equation. In matrix terms, the essential part of Gaussian elimination is provided by the factorization of the parameter matrix A into the components of an upper triangular matrix U and a lower triangular matrix L. The elements of matrix U resulted from the elimination procedure while the entries of L are the terms from Eq. (3.46) by which the columns of the original matrix A were multiplied
122
STEADY-STATE SOLUTIONS OF MARKOV CHAINS
during the elimination process:
1 ape-l) > 0
=
($72)
(.$-3)
ap
aj;“,-3)
0
‘0
0
0
ao,n--1 **.. .
a$$2-3). . . .**
0
0'\
... 0
..*
0
1
0
i
1
0
al,,-i
ah
0 an-i,n-i.
(1) an-2,0 (1) an-2,n-2 an-l,0 an--l,n--l
= UL.
(1)
(1)
an-2,n-3 **
an-2,n-2 . . ,
an--l,n-2
J
an--l,+-l 1 (3.50)
As a result of the factorization of the parameter matrix A, the computation of the result vector x can split into two simpler steps: b=xA=xUL=yL. The solutions of both equations, first of: yL = b
(3.51)
for the vector of unknowns y and then finally of: xu=y
(3.52) (3.53)
for the vector of unknowns x is required. Note that the intermediate result y from Eq. (3.51), being necessary to compute the final results in Eq. (3.52), is identical to Eq. (3.49) and can readily be calculated with the formulae presented in Eq. (3.48). Since only the coefficients of matrix U are used in this computation, it is not necessary to compute and to represent explicitly the lower triangular matrix L. It is finally worth mentioning that pivoting is not necessary due to the structure of the underlying generator matrix, which is weakly diagonal dominant, since )q+ 1> qi,j, V’i, j. This property is inherited by the parameter matrices. Now the Gaussian elimination algorithm can be summarized as follows: Construct the parameter matrix A and the right-side vector b according to Eq. (3.4) as discussed in Chapter 3.
NUMERICAL SOLUTION: DIRECT METHODS
1.23
Carry out elimination steps or, equivalently, apply the standard algorithm to split the parameter matrix A into upper triangular matrix U and lower triangular matrix L such that Eq. (3.50) holds. Note that the parameters of U can be computed with the recursive formulae in Eq. (3.47) and the computation of L can be deliberately avoided. Compute the intermediate results y according to Eq. (3.51) or, equivalently, compute the intermediate results with the result from Eq. (3.53) according to Eq. (3.48). Perform the substitution to yield the final result x according to Eq. (3.52) by recursively applying the formulae shown in Eq. (3.45). Example 3.1 Consider the CTMC depicted in Fig. 3.3. This simple finite birth-death process is ergodic for any finite X and ,Q so that their unique steady-state probabilities can be computed. Since closed-form formulae have
Fig. 3.3
A simple finite birth-death
process.
been derived for this case, we can easily compute the steady-state probabilities ri as summarized in Table 3.1 with X = 1 and ,Q= 2 and with:
7ro =
1 c(> 3 i=O
x CL
2
3
lk = 1,2,3.
Table 3.1 Steady-state probabilities computed using closed-form expressions.
s
15
Alternatively, the state probabilities can be computed applying the Gaussian elimination method introduced in this section. The results from Table 3.1 can then be used to verify the correctness of the results.
124
STEADY-STATE SOLUTIONS OF MARKOV CHAINS
First, the generator matrix Q is derived from Fig. 3.3:
0
0
2
-2
In order to include the normalization condition and to derive the parameter matrix A of the linear system to be solved, the last column of Q is replaced by the unit vector: -1
1
0
1
0
0
2
1
The resulting system of linear equations is fully specified according to Eq. (3.4), when vector b = (O,O, 0,l) is given. -;-w-7 ;.:‘;gw;,<# ~~~~~~~ With Eq. (3.47) and Eq. (3.48) in mind, matrix A is transformed into upper-triangular matrix U via intermediate steps A(l), Ac2), Ac3) = U. In parallel, vector b is transformed via b(l) and bc2) into bc3), resulting in the following sequence of matrices:
b(l) = (0, o, -2, I), 0
0
0
1
bc2) = (o,+-,,,), 0
01
0
bc3) = 0
0
+,-f,-a,l
.
01
For the sake of completeness we also present the lower triangular matrix L containing the factors from Eq. (3.46) used in the elimination steps:
It can be easily verified that: y = bc3) = (4
-;,
-2,l)
NUMERICAL SOLUTION: DIRECT METHODS
125
is the intermediate solution vector y of Eq. (3.51) with given lower triangular matrix L. The last step is the substitution according to the recursive Eq. (3.45) to yield the final results x as a solution of Eqs. (3.52) and (3.53): -- 8 17
b(“) yo x0=--&=-----&= > >
II
-- l5
=-
17
8 15’ UO,l --x0 Ul,l
-- 4 5
;8
=-3-------
--
5
l7 5
4 15’
15
1 =
-2 -2 8 ---------=-5 -5 15 x3 =
b3
a3,3
=A!?-
-14 -515
Y2
u2,2
a0,3 -X0 a3,3
-
al,3 -X1 a3,3
-
a2,3 -22 a3,3
-
uo,3 -x0 u3,3
-
u1,3 -X1 u3,3
-
u2,3 -22
=l-x()-xi-x2=1-
uo,2 -20 u2,2
-
u1,2 -21 u2,2
2 15’
-
u3,3
-
u3,3
8+4+2 15
1 = 15’
The computational complexity of the Gaussian elimination algorithm can be characterized by O(n3/3) multiplications or divisions and a storage requirement of O(n2), where n is the number of states, and hence the number of equations. Note that cancellation and rounding errors possibly induced by Gaussian elimination can adversely affect the results. This difficulty is specially relevant true if small parameters have to be dealt with, as is often the case with the analysis of large Markov chains, where relatively small state probabilities may result. For some analyses though, such as in dependability evaluation studies, we may be particularly interested in the probabilities of states that are relatively rarely entered, e.g., system down states or unsafe states. In this case, the accuracy of the smaller numbers can be of predominant interest. 3.3.2
The Grassmann
Algorithm
Grassmann’s algorithm constitutes a numerically stable variant of the Gaussian elimination procedure. The algorithm completely avoids subtractions
126
STEADY-STATE SOLUTIONS OF MARKOV CHAINS
and it is therefore less sensitive to rounding and cancellation errors caused by the subtraction of nearly equal numbers. Grassmann’s algorithm was originally introduced for the analysis of ergodic, discrete-time Markov chains X = {X,; n = 0, 1, . . .} and was based on arguments from the theory of regenerative processes [GTH85]. A modification of the GTH algorithm has been suggested by Marsan, Meo, and de Souza e Silva [MSA96]. We follow the variant presented by Kumar et al., which allows a straightforward interpretation in terms of continuous-time Markov chains [KGB87]. We know from Eq. (2.41) that the following relation holds:
-qi,i =c Qi,j-
(3.54)
j,i#i
Furthermore, Eq. (2.57) can be suitably rearranged: n-l
--;TTiQi,i = c rjqj,i. j=O,j#i
(3.55)
Letting i = n - 1 and dividing Eq. (3.55) on both sides by qn-l,+l
yields:
n-2 -7rn-1
=
~. c j=o
%,n--1 ’ 4n-l,n-1’
This result can be used to eliminate rn-1 on the right side of Eq. (3.55): n-2
--7riqi,i =
c j=O,j#i
n-2
rjqj,i
-
iTT, qj,n-lqn-1,i c j=o
3
Qn-l,n-1
(3.56)
n-2 qj,n-lqn-1,i
=
rj
Qj,i
= j=O,j#i
_ ~ Qi,n-lQn-1,i
4n-l,n-1
(
>
’
*
Qn-l,n-1
Adding the last term of Eq. (3.56) on both sides of that equation results in an equation that can be interpreted similarly as Eq. (3.55): n-2 %,n-lqn-1,i -Ti
Qi,i (
4n-l,n-1
qj,n-lqn-1,i
ZZ >
rj = j=O,j#i
qj,i
,
4n-l,n-1
(
O
>
(3.57) With:
qj,i = qj,i _ qj7n-lqn-lTi = qj i + qf+;lqn--l,i, qn-l,n-1
’ C I=0
h-l,1
the transition rates of a new Markov chain, having one state less than the original one, are defined. Note that this elimination step, i.e., the computation
NUMERICAL SOLUTION: DIRECT METHODS
127
of x7j.i 7 is
achieved merely by adding non-negative quantities to originally nonnegative values Qj,i, j # i. Only the diagonal elements pi i and ai i are negative. It should be noted that the computation of the rates ii i on the diagonal of the reduced Markov chain can be completely avoided due to the property of Markov chains reflected in Eq. (3.54). I n order to assure the [i!jj,i] properly define a CTMC, it remains to be proven that the following equation: n-2 ‘j,i
c
=
0,
i=O
holds for all j : n-2 C
n-2 qj,i
=
C
+
i=O
i=O
C
q~~~lqn-l’i
i=”
C
=
C
qj,i
+
i=O
C qn-1,l I=0
qj,n-1
C i=O
n_Q2n-1’i
C qn-1,l l=O
n-l
n-2 =
n-2
n-2
n-2 qj,i
%,i
+
Qj,n-1
=
C
qj,i
=
0.
i=o
i=O
The new transition rates ~j,i can be interpreted as being composed of rate qj,i, describing direct transitions from state j to state i, and of rate qj,n-r describing transitions from state j via state n - 1 to state i with conditional branching probability: Qn-1,i n-2
C 4n-1,l I=0
This interpretation implies that no time is spent in (the otherwise eliminated) state n - 1. The elimination procedure is iteratively applied to the generator matrix with entries 41:’ of stepwise reduced state spaces until an upper triangular matrix results, where qi:’ denotes the matrix entries after having (n-1) on the applied elimination step Ic, 1 5 k 5 n - 1. Finally, each element qi,i main diagonal is equal to - 1. As usual, the elimination is followed by a substitution process to express the relations of the state probabilities to each other. Since the normalization condition has not been included initially, it must still be applied to yield the final state probability vector. Grassmann’s algorithm has been presented in terms of a CTMC generator matrix and, therefore, the parameter matrix must initially be properly defined:
A=
Q,
for a CTMC
P - I,
for a DTMC
128
STEADY-STATE SOLUTIONS OF MARKOV CHAINS
j < 1,i = 1,
j
# i, 1 < j, i < 1 - 1,
j=i=l
j = l,i <'1.4 For 1 = 1,2, . . . , n - 1 do: l-1 ,,(n-1) Xl = c x ai1 * i=O
For i = 0, 1, . . . , n - 1 do:
In matrix notation, the parameter matrix A is decomposed into factors of an upper triangular matrix U and a lower triangular matrix L such that the following equations hold: O=xA=xUL. Of course, any (non-trivial) solution of 0 = XU is also a solution of the original equation 0 = xA. Therefore, there is no need to represent L explicitly. Example 3.2 Consider the CTMC presented in Fig. 3.4. It represents a finite queueing system with capacity 3, where customers arrive according to independent, exponentially distributed interarrival times with rate X as long as buffer space is available. Service is received from a single server in two consecutive phases, both phases having mean duration 1/(2~). The pair of integers (k, Z) assigned to a state i represents the number of customers k(i) in the system and the phase Z(i) of the customer currently in service. To construct the generator matrix Q, the states are ordered as indicated in Table 3.2. 4Settingthe elementson the diagonalequalto
-1 and below the diagonal equal to 0 is only included here for the sake of completeness. These assignments can be skipped in an actual implementation for efficiency reasons.
NUMERICAL SOLUTION: DIRECT METHODS
Fig. 3.4
A CTMC for a system with Erlang-2 service time distribution.
Table 3.2
(h 1) #
w9
A possible state ordering for Fig. 3.4
(1, 1)
0
(al)
1
(3,1>
(1,2)
(2,2)
(3,2)
3
4
5
6
2
Grassmann’s algorithm is applied with X = p = 1:
f
-1 0 0
1 -3 0
0 1 -3
0 0 1
0
0
0
2 0 0
0 2 0
0 0 2
-2 0 0 0
A=
\
0 2
0 0
0 0
0
2
0
0 -3 0 0
0
2
1
0
-3 0
1 -2
1 -3
0
0
0
0
0
1
0
2
0
0
0
0
1 -2 0
2
0
0 0 2 0
-3 2 0
0
0 2 0 0 -1
0
0
1
-3
1
0
-1 0 0 0 2 0
0
I
,o
1
0
0
-3
0
0
0
0
-1 1
1 -3 4 0 2 Fi
0 1 -3 2 1 ?I
0 0 1 -2 0 0
0 2 0
0 0 $
0 0 0
0
0
1
-3 0
1 21
0 ;
0
0
0
0
0
-1 I
,
$
,
129
130
STEADY-STATE SOLUTIONS OF MARKOV CHAINS
I A(3)
0 0 0 0
y0 -3 2 0 0 0
0 1 -2 0 0 0
0g 0 0 -1 0 0
0 23 0 ; -1 0
0 0 1 0 $ -1
1 -3 ! 0 0 0 0
0 II -“3 0 0 0 0
0 01
0 i 0 0 -1 0 0
0 0 ; 0 l “1 0
0 0 0 1 0 1 -“l
-1 z
0
-3 0 0
0 0 0
0 0 0
-143.
-31
0
$
0 0 0 0 -1 63
=
0 0 0 0
-11 84
0 0 0 0 o\
A(“) =
(0
0
0 + -1
$ 0 0
0 0 0
0 0 0
-1 0 0
0 $ 0
0’ 0 1
)
1 0 21 3 0 -1 J
0 0 0 0 0
fo ; 0 -1 0 0 0 0
-“l 0 0 0
fi -1 0 0 0 0
0 0 0 0 0
0 $ -1 0 0 0
$ 0 0 -1 0 0
0 $ 0 l “1 0
0 0 1 0 1 -“l
Since Ac6) is in upper triangular form the solution of xAc6) = 0 can be easily obtained: 3 Xl = -x0 4 2 x4 = -21 3
11 1 3
x5 = -x4+
1 2 1 -25 2
x3 = -x2
x2 = 5x1
2 3
-22
26 =
+x3.
Because only six equations with seven unknowns were derived, one equation is deliberately added. For convenience, we let x0 = 2, thereby yielding the intermediate solution: x=
+--,--,-?-,4
11 11 2 5 21 12 24 3 6 24
NUMERICAL SOLUTION: DIRECT METHODS
131
With: -=- 1 n-l c i=o
12 xi
73’
and with: xi = -
Xi
n-l c i=o
the state probability
’ xi
vector 7r results:
T = (0.2192,0.1644,0.1507,0.0753,0.1096,0.1370,0.1438)
.
Measures of interest could be the average number of customers in the system E[N] = N: n-l N
=
c
k(i)7ri
= r1+
r4
+ 2(7b
+ WI)
+
3(7ra + n-6) = 1.507,
i=o the effective throughput: A, = X(x0 + nl + 7r2+ 7~~+ q,) = 0.7809, or the mean response time E[T] = ?;: N T = x = 1.9298. e
Comparing these results to the measures obtained from an analysis of the birth-death process for this queueing system assuming exponentially (rat her than Erlang) distributed service times with same parameters X = p = 1 shows that response time has decreased and the effective throughput has increased simply by reducing the variance of the service process: cEXP) = f. = 0 25 T/c 4 *’
k = 0 71) 2 73 >
&fCEXP)= (1 + 2 + 3) * a = ; = 1.5, pw e
= 3. ; = 0.75,
i$EW
= 3 4 _ 2 2’3*
The computational complexity of Grassmann’s algorithm is, similar to Gaussian elimination, O(n3/3) multiplications or divisions, and the storage
132
STEADY-STATE SOLUTIONS OF MARKOV CHAINS
requirement is O(n2). Sparse storage techniques can be applied under some circumstances, when regular matrix structures, like those given with diagonal matrices, are preserved during computation. Although cancellation errors are being avoided with Grassmann’s algorithm, rounding errors can still occur, propagate, and accumulate during the computation. Therefore, applicability of the algorithm is also limited to medium size (around 500 states) Markov models.
3.4
NUMERICAL
SOLUTION:
ITERATIVE
METHODS
The main advantage of iterative methods over direct methods is that they preserve the sparsity the parameter matrix. Efficient sparse storage schemes and efficient sparsity-preserving algorithms can thus be used. Further advantages of iterative methods are based on the property of successive convergence to the desired solution. A good initial estimate can speed up the computation considerably. The evaluation can be terminated if the iterates are sufficiently close to the exact value, i.e., a prespecified tolerance level is not exceeded. Finally, because the parameter matrix is not altered in the iteration process, iterative methods do not suffer from the accumulation of round-off errors. The main disadvantage of iterative methods is that convergence is not always guaranteed and depending on the method, the rate of convergence is highly sensitive to the values of entries in the parameter matrix. 3.4.1
Convergence
of Iterative
Methods
Convergence is a very important issue for iterative methods that must be dealt with consciously. There are some heuristics that can be applied for choosing appropriate techniques for decisions on convergence, but no general algorithm for the selection of such a technique exists. In what follows we mention a few of the many issues to be considered with respect to convergence. A complete picture is beyond the scope of this text. A tolerance level E must be specified to provide a measure of how close the current iteration vector xck) is to the desired solution vector x. Because the desired solution vector is not known, an estimate of the error must be used to determine convergence. Some distance measures are commonly used to evaluate the current iteration vector xc’) in relation to some earlier iteration vectors x(‘), 1 < k. If the current iteration vector is “close enough” to earlier ones with respect to E, then this condition is taken as an indicator of convergence to the final result. If E were too small, convergence could become very slow or not take place at all. If E:were too large, accuracy requirements could be violated or, worse, convergence could be wrongly assumed. Some appropriate norm functions have to be applied in order to compare different iteration vectors. Many such norm functions exist, all having a different impact on speed and pattern of convergence. Size and type of the parameter
NUMERICAL SOLUTION: ITERATIVE METHODS
133
matrix should be taken into consideration for the right choice of such a norm function. Concerning the right choice of E and the norm function, it should be further noted that the components Q of the solution vector can differ by many orders of magnitude from each other in their values. An appropriate definition must take these differences into account and relate them to the accuracy requirements derived from the modeling context. 3.4.2
Power
Method
Referring to Eq. (2.15) immediately leads to a first, reliable, though sometimes slowly converging, iterative method for the computation of the steady-state probability vector of finite ergo&c Markov chains. For the sake of completeness we may mention here that for the power method to converge the transition probability, matrix P only needs to be aperiodic; irreducibility is not necessary. The power method mimics the transient behavior of the underlying DTMC until some stationary, not necessarily steady-state, convergence is reached. Therefore, it can also be exploited as a method for computing the transient state probability vector v(n) of a DTMC. Equation v = VP suggests starting with an initial guess of some probability vector Y(O) and repeatedly multiplying it by the transition probability matrix P until convergence to v is assured, with lim+, vci) = u. Due to the assumed ergodicity, or at least aperiodicity, of the underlying Markov chain, this procedure is guaranteed to converge to the desired fixed point of the unique steady-state probability vector. A single iteration step is as follows:
i -> 0,
yG+l) = &)p,
(3.58)
Hence, the iteration vector at step i is related to the initial probability via the Pi, the ith power of the transition probability matrix P:
y(i) = JO)pi 7 i 2 0.
vector
(3.59)
The power method is justified by the theory of eigenvalues and eigenvectors. The vector u can be interpreted as the left eigenvector y1 corresponding to the unit eigenvalue Xr = 1, where X1 is the dominant eigenvalue, which is the largest of all n eigenvalues, IX11 > IX21 2 . . - 2 IX,1 . Since the eigenvectors are mutually orthogonal, Y(O) can be expressed as a linear combination of all left eigenvectors of P so that the following relation holds: Y(O) = clyl
+ c2y2 + * * * + c,y,,
cjEII%,
ILjln.
(3.60)
Accordingly, after the first iteration step, the estimate Y(‘) = v(O)P can again be expressed in terms of the eigenvectors yi and eigenvalues Xi: Y(l) = CiAiyi
+ Q&y2
+ -. . + cnXnyn,
cj E R, 1 5 j 5 n.
(3.61)
cj E R,
(3.62)
Repeatedly applying this procedure yields: Jlci) = ClXZ;Yl
+ czAay2
+ * * * + CnAkyn,
1 < j 5 n.
134
STEADY-STATE SOLUTIONS OF MARKOV CHAINS
Since all eigenvalues 1Xj ) < 1 for j 2 2, the ith power of the eigenvalues Xj, j > 2, approaches 0 in the limit, lim;,, l&Ii = 0. With respect to the unit eigenvalue [Xrl = 1 we obtain lXrli = 1, for all i 2 1. Therefore, Relation (3.63) holds and proves our assumption of the correct convergence of the power method: lim VCi) = lim {C&Q i-bee i-boo = h& CJ”,Yl
+ cZ&7Z + . . . + c,$Y,j (3.63)
= ClYl.
Only a renormalization remains to be performed to yield the final result of the steady-state probability vector u. An immediate consequence of the nature of the power method is that speed of convergence depends on the relative sizes of the eigenvalues. The closer the non-dominant eigenvalues are to 1, the slower the convergence, as can be concluded from Eq. (3.63). The second largest eigenvalue, 1x2I, is the dominating factor in this process, i.e., ,if it is close to 1, convergence will be slow. The algorithm of the power method is outlined as follows: Select Q appropriately: A=
P,
Q/q + 1; (0) * JO) = v(y),ViO),“‘7 Vn-1 > ( Select convergence criterion E, and let n = 0. Define some vector norm function f (Ilvcn), y(‘)J/), n > 1. Set convergence = falseRepeat until convergence:
-2.1
dn+l) = &)A;
I-1
IF f (Il~(~+l), ~(‘+l)li) < E, 1 -< n THEN convergence = true;
STEP 2 lYIIYl
8
n=n+l,l=li-1.
Example 3.3 Assume three customers circulating indefinitely between two service stations, alternately receiving service from both stations having exponentially distributed service times with parameters ,~r and ~2, respectively. Let (i, j) denote a possible state of the system, where i + j = 3 and i, j refer
NUMERICAL SOLUTION: ITERATIVE METHODS
Fig. 3.5
A birth-death
135
model of a two station cyclic queue with three customers.
to the number of customers at the first and the second stations, respectively. The CTMC shown in Fig. 3.5 models this two station cyclic queueing network. With ~1 = 1 and ~2 = 2 the following generator matrix Q results: -1 2
Together with an arbitrarily n(o>
1 -3
0 1
0 0
given initial probability
vector:
= (~3,0(~),~2,1(~>,~1,2(0),~0,3(0))
= (0.25,0.25,0.25,0.25), and, given Q = 3.0000003, a transition probability p =
0.6666667 0.6666666 0 0
0.3333333 0.0000001 0.6666666 0
matrix:
0 0.3333333 0.0000001 0.6666666
0 0 0.3333333 0.3333334
’
the power method can be applied, recalling Y(O) = 7r(0). Since we are dealing here with a small ergodic model, Gaussian elimination or Grassmann’s algorithm could also be applied to yield the following exact steady-state probability vector: n = (0.533,0.26~,0.133,0.6~). Using this result, the iteration vector can be compared to the exact probability vector T at intermediate steps for the purpose of demonstrating convergence as shown in Table 3.3. The results are gained by choosing E = 10s7 and by applying the following test of convergence, presuming n 2 1:
The number of iteration steps necessary to yield convergence to the exact results depends strongly on the uniformization factor q. Some examples demonstrating this dependency are summarized in Table 3.4.
136
STEADY-STATE SOLUTIONS OF MARKOV CHAINS
Tab/e 3.3 y(O) y(l)
y(2) y(3) y(4) y(5)
yw
yW) y(30) y(3f3)
Intermediate steps and convergence of the power method 0.25 0.25 0.2777777722 0.2592592648 0.2716049333 0.2633744897 0.2671002038 0.2666741848 0.2666667970 0.2666666718
0.25 0.333333325 0.3888888778 0.4444444278 0.4691357926 0.4938271481 0.5276973339 0.5332355964 0.5333316384 0.5333332672
0.25 0.25 0.1944444556 0.1851851880 0.1604938370 0.1563786029 0.1357177959 0.1333746836 0.1333340504 0.1333333613
0.25 0.166666675 0.1388888944 0.1111111194 0.09876543704 0.08641975926 0.06948466636 0.06671553511 0.06666751412 0.06666669973
Table 3.4 Convergence of the power method as a function uniformization factor q
3.4.3
Jacobi’s
4
Iterations
3.000003 3.003 3.03 3.3 6 30
54 55 55 61 115 566
Method
The system of linear equations of interest to us is: b=xA.
(3.64)
The normalization condition may or may not be incorporated in Eq. (3.64). The parameters of both DTMC and CTMC are given by the entries of the matrix A = [aij]. The solution vector x will contain the unconditional state probabilities, or, if not yet normalized, a real-valued multiple thereof. If the normalization is incorporated, we have b = [0, 0, . . . , 0, 11, and b = 0 otherwise. Consider the jth equation from the system of Eqs. (3.64): (3.65)
bj = C aijxi. iES
Solving Eq. (3.65) for xj immediately results in: bj Xj
=
C aijxi i,i#j
.
(3.66)
ajj Any given approximate solution ji: = [go, 21, . . . , &-r] can be inserted for the variables xi, i # j, on the right side of Eq. (3.66). From these intermediate
NUMERICAL SOLUTION: ITERATIVE METHODS
137
values, better estimates of the xj on the left side of the equation may be obtained. Applying this procedure repeatedly and in parallel for all n equations results in our first iterative method. The values xc’) of the lath iteration step are computed from the values obtained from the (Ic - 1)st step for each equation independently: bj x(,“) 9
C a+~“-‘) i,i#j 7
-
by
E
s.
(3.67)
aj.i
The iteration may be started with an arbitrary initial vector x(O). The closer the initial vector is to the solution, the faster the algorithm will generally converge. Note that the equations can be evaluated in parallel, a fact that can be used as a means for computational speed-up. The method is therefore called the method of simultaneous displacement or, simply, the Jacobi method. Although the method is strikingly simple, it suffers from poor convergence and hence is rarely applied in its raw form. Splitting the matrix A = D - L - U into its constituents of the diagonal matrix D, the strictly lower-triangular matrix -L, and the strictly uppertriangular matrix -U provides a way to present the main computation step of the Jacobi method in matrix notation: xc’) - (b + x(‘-‘)
(U + L)) D-l.
(3.68)
Before the algorithm is presented, we may recall that we can deliberately incorporate the normalization condition into the parameter matrix A. Note that repeated normalization could have an adverse effect on convergence if large models are investigated, since relatively small numbers could result. Therefore, we leave it open whether to repeat normalization in each iteration step or not. Define parameter matrix A and b properly from generator matrix Q or transition probability matrix P. Choose initial vector x(O). Choose convergence criterion E. Choose some norm function f ([lx(“), x(“)/l), Ic > Z. Split parameter matrix A = D - L - U. convergence = false, and k = I = 1. Repeat until convergence: [I
[l
xck) = (b + x(lc-l) (U + L)) D-l;
. IF f (\\x’“’ - x(~-I)))) < E - THEN convergence = true
138
STEADY-STATE SOLUTIONS OF MARKOV CHAINS
-ELSEk=k+landZ~{l,...,Ic}.
Consider two customers circulating among three service staExample 3.4 tions according to the following pattern. When a customer has received service of mean duration l/p1 at the first station, it then queues with probability ~12 at station two for service of mean duration 1/,~2, or with ~13 at station three with a mean service duration of l/ps. After completion of service at stations two or three, customers return with probability one back to station one. This scenario is captured by a CTMC with the state diagram in Fig. 3.6, where the notation (Ic, 1,m) indicates that there are Ic customers in station one, 1 customers in station two, and m customers in station three. We number the states of the CTMC of Fig. 3.6 as per Table 3.5.
Table 3.5
#
A possible state ordering according to Fig. 3.6
2
1
4
3
5
6
With PI = 1, ~2 = 2, ~3 = 3, ~12 = 0.4, and ~13 = 0.6, the generator matrix Q can be obtained:
Q=
-1 2 3 0 0 0
0.4 -3 0 2 3 0
0.6 0 -4 0 2 3
0 0.4 0 -2 0 0
0 0.6 0.4 0 -5 0
0 0 0.6 0 0 -3
To provide a controlled experiment with respect to the iteration process, we first compute the exact state probabilities in Table 3.6 with Grassmann’s algorithm introduced in Section 3.3.2. With Table 3.6 we know the exact state probabilities x = 7r and can use them for comparison with the iteration vector xc’). With the following constraints concerning equation vector b, initial probability vector x(O), normalization after each iteration step, convergence criterion
NUMERICAL SOLUTION: ITERATIVE METHODS
Fig. 3.6
A birth-death
model of two customers traversing three stations.
Tab/e 3.6 State probabilities for Fig. 3.6 computed using Grassmann’s algorithm 0.6578947368 0.1315789474 0.1315789474 0.02631578947 0.02631578947 0.02631578947
TO Tl r2 r3 x4 r5
139
Table 3.7 State probabilities for Fig. 3.6 computed using Jacobi’s method 0.6578947398 0.1315789172 0.1315789478 0.02631578619 0.02631578846 0.02631582059
TO Tl 772 R3 r4 r5
E, and the norm for test of convergence, Jacobi’s method is applied:
I I 1I 1 b = (O,O, (4%(JO>, x(o)= I6’6’6’6’6’6 xU4 x@) = IIx(lc)lll?
Ip E = lo-7,
- X(k--1q2 11x@-1) 112
> ’ < E *
The results of using Jacobi algorithm, needing 324 iterations are given in Table 3.7. The method of Jacobi is of less practical importance due to its slow pattern of convergence. But techniques have been derived to speed up its convergence, resulting in well-known algorithms such as Gauss-Seidel iteration or the successive over-relaxation (SOR) method. 3.4.4
Gauss-Seidel
Method
To improve convergence, a given method often needs to be changed only slightly. Instead of deploying full parallelism by evaluating Eq. (3.67) for each state j E S independently, we can serialize the procedure and take advantage of the
already updated new estimates in each step. Assuming the computations to be arranged in the order 0, 1, . . . , n - 1, where ISI = n, it immediately follows,
140
that
STEADY-STATE SOLUTIONS OF MARKOV CHAINS
for the calculation
of the estimates
xj (k) , all j previously
computed
esti-
mates xik), i < j, can be used in the computation. Taking advantage of the more up-to-date information, we can significantly speed up the convergence. The resulting method is called the Gauss-Seidel iteration: b. _ 3
‘2
a.
.x(k) 23 2
+
Xi(kc) _-
ncl
a.
i=j+1
i=o
.x(k-l) 2.7 i
vj
7
E s.
(3.69)
aii
Note that the order in which the estimates xjk) are calculated in each iteration step can have a decisive impact on the speed of convergence. In particular, it is generally a matter of fact that matrices of Markov chains are sparse. The interdependencies between the equations are therefore limited to a certain degree, and parallel evaluation might still be possible, even if the most up-to-date information is incorporated in each computation step. The equations can deliberately be arranged so that the interdependencies become more or less effective for the convergence process. Equivalently, the possible degree of parallelism can be maximized or minimized by a thoughtful choice of that order. Apparently, a trade-off exists between the pattern of convergence and possible speedup due to parallelism. In matrix notation, the Gauss-Seidel iteration step is written as: xc’) =
b + x(‘-l)L
>
(D - u)-’
Equivalently, we could rewrite Eq. (3.70), thereby step from Eq. (3.69) more obviously: b + x(‘“)U
3.4.5
The Method
of Successive
+ x(‘“-l)L
>
,
k 2 1.
reflecting
D-l,
(3.70) the Gauss-Seidel
k > 1.
(3.71)
Over-Relaxation
Though Gauss-Seidel is widely applied, further variants of the iteration scheme do exist. A promising approach for an increased rate of convergence is to apply extrapolations via the so-called method of successive over-relaxation, SOR for short. The SOR method provides means to weaken or to enhance the impact of a Gauss-Seidel iteration step. A new estimate is calculated by weighting (using a relaxation parameter w) a previous estimate with the newly computed Gauss-Seidel estimate. The iteration step is presented in matrix notation in Eq. (3.72)) from which it can be concluded that the SOR method coincides with Gauss-Seidel if w is set equal to one: xc’) =
wb + x (‘--l)
(wL + (1 -w)
D))
(D - wU)-’
,
k 2 1.
(3.72)
the splitting of matrix Equation (3.72) can be easily verified. Remembering A = D - L - U and introducing a scalar w, we can immediately derive the
NUMERICAL SOLUTION: ITERATIVE METHODS
141
equation wb = xw (D - L - U) f rom Eq. (3.64) as a starting point. Adding xD to both sides of the equation yields: wb+xD=xw(D-L-U)+xD. By applying simple arithmetic
manipulations
wb+x(wL+(l-w)D)=x(D-wU),
(3.73) we get: (3.74)
and, finally, by multiplying both sides of Eq. (3.74) with (D - wU)-‘, we get Eq. (3.72). T o complete the description of the SOR algorithm, we need to focus on the impact of the numerical properties of the iterative method on the convergence process. First of all, a sensible choice of the relaxation parameter w is crucial for an accelerated convergence. Then we need to carefully specify appropriate criteria to test convergence in each iteration step. It is wise to choose these criteria as a function of the problem domain, i.e., in a model-specific manner. Finally, overflows and underflows have to be dealt with, especially if the investigated models contain states with very small probabilities associated with them. This situation commonly occurs if very large or so-called stiff models are investigated, such that some states have a very small probability mass. 3.4.5.1 The Relaxation Parameter The relaxation parameter w was introduced to gain an increased speed of convergence. It was mentioned that the incorporation of the normalization condition xl = 1 in each iteration step can have a negative impact on the rate of convergence. From this point of view, normalization should, for practical reasons, be postponed until convergence has been assured. Henceforth, we continue discussion of the iterative met hod on the basis of a homogeneous linear system 0 = xA according to Eq. (3.64). With this variant in mind, the solution of Eq. (3.72) can be identified as the largest left eigenvector yi of the matrix (wL + (1 - w) D) (D - wU)-‘. For the corresponding eigenvalue, we have IX,] = 1. (Note that the normalized eigenvector equals the desired solution vector x.) It is known from [StGo85] that the speed of convergence is a function of the second largest eigenvalue X2, with 1x11 2 IX:!), of the iteration matrix, which can be influenced by an accurate choice of w. To guarantee convergence, all eigenvalues should be real valued, the second largest eigenvalue [X2( should be smaller than one in magnitude because the smaller JX2I is, the faster the convergence would be. Conditioned on the assumption that 1x2) < 1, which is not always true, the following relation holds for the number of iteration steps i, the second largest eigenvalue (X2I, and the required accuracy E:
w4 i z log(]X2])’
(3.75)
If all the eigenvalues were known, an optimal we that minimizes IX2] of the iteration matrix could be derived. But this computation is generally
142
STEADY-STATE SOLUTIONS OF MARKOV CHAINS
more expensive than the gain due to the implied acceleration of the convergence. Therefore, heuristics are often applied and instead of an exact value we, an approximation ~0” is derived. Sometimes advantage is taken of certain regularities of the parameter matrix for a simplified estimation wr of we. Equivalently, the parameter matrix can sometimes be rearranged in a problem-dependent way so that a more favorable structure results. For the SOR iteration method to converge, it has been shown in [HaYo81] that the relaxation parameter must obey the relation 0 < w < 2. It is usually a promising approach to choose c3 adaptively by successive reestimations of LJ:. In particular, we follow [StGo85] for such a technique. With the subsequently introduced approach, w should be chosen such that w 5 WE, otherwise convergence is not guaranteed because of possible oscillation effects. Furthermore, we assume w.M > _ 1, i.e., underrelaxation is not considered. The computation would be initiated with tic11 = 1, which is fixed for some, say 10, steps of iterations. Using the intermediate estimate ]Xr(“)I of the second largest eigenvalue, which is also called the subdominant eigenvalue, in the iteration process an intermediate estimate WE(~) of the optimal relaxation parameter can be derived by evaluating Eq. (3.76):
Wr(k) =l+&y* The estimated subdominant eigenvalue ]A?(‘) ] is approximated as a function of the recent relaxation parameter w(‘-‘) and the relative change in values S(‘-r) of the successive iterates x(‘-~), x(lcm2), x(‘-‘) as shown in Eq. (3.77):
+ (p-1) - 1 p(k) _- ($(‘“-1) 2 I I w(“-l)j/jFG Finally, the relative change in values &‘-l) diate results as presented in Eq. (3.78): SC’“-1) =
(3.77)
is calculated from the interme-
IIX(k--l) - X(k-2) 11~ llx(“-2)
- x(k--3) Iloo *
(3.78)
Note that the maximumvector norm ]]x]]~ is defined as ]]x]]~ = maxr
NUMERICAL SOLUTION: ITERATIVE METHODS
143
3.4.5.2 To terminate an iterative computation proThe Test of Convergence cess, appropriate criteria need to be provided by means of which it can be assured that the iteration process converges to the desired solution x. Generally, some vector norm 1Ix - xc’) 11would be used to estimate the error that should be smaller than some threshold E. Since x is not known, it must either be estimated as shown in [StGo85], or some different heuristics have to be applied. Intuitively, convergence is reached if successive iterates do not differ substantially. But what “substantial difference” means can only be answered in the context of the model under consideration. Convergence could be relatively slow, that is, the vectors x(‘-‘), x(‘) would not be much apart, even though still being far from the final solution x. Furthermore, if the underlying model is large, the elements xi(k) of x(‘) could easily become smaller than E and convergence could be mistakenly assumed. To overcome these two problems, it is often a good choice to use the criterion from Eq. (3.79) of the relative change in values of the most recent iterate x(‘) and the vector obtained m > 1 steps earlier, x(lcWm), namely:
IIX(k)- X(k-m)lloo< E Wk) II00 *
(3.79)
According to [StGo85], m should be chosen as a function of the number of necessary iterations, 5 < m < 50. If SOR converges quickly, the much simpler and computationally more efficient criterion as given in Eq. (3.80) might be applicable:
11x(“)- X(k-l))) < E.
(3.80)
Commonly applied vector norms are:
lb41= 2 14, i=l
Il4I2
=
g- Ix?. \
(3.82)
i=l
3.4.5.3 Overflow and Underflow If the normalization condition is incorporated in the transition matrix, as discussed in the context of Eq. (3.64), overflow and underflow are less likely to occur, but convergence is much slower and computational overhead is higher. Therefore, some heuristics are applied to circumvent these numerical problems. To avoid underflow, elements xi(k) shrinking below a lower threshold el are simply set equal to zero. Conversely, if there exist elements xi(ICIthat exceed an upper threshold Ed, a normalization is adaptively performed and the values are scaled back to a more convenient range.
144
STEADY-STATE SOLUTIONS OF MARKOV CHAlNS
3.4.5.4 The Algorithm The algorithm presented in Fig. 3.7 is in a skeletal form. As we have discussed at length, it needs to be carefully adapted for the peculiarities of specific cases to which it is applied. Since for w = 1 SOR coincides with Gauss-Seidel, we do not separately present the Gauss-Seidel algorithm. Furthermore, for the sake of simplicity, we do not consider scaling and resealing of the parameter matrix, which is a technique that is commonly performed in order to avoid divisions by diagonal elements ajj (cf. Eq. (3.69)). It should be noted that a sum indexed from 0 to -1 is interpreted to be zero,
3.5
COMPARISON
OF NUMERICAL
SOLUTION
METHODS
The solution algorithms discussed in earlier sections are herein compared by the means of the model introduced in Fig. 2.7 (Section 2.2.2). Patterns of convergence and numerical accuracy are compared. According to Eq. (2.58), the following system of steady-state equations results: 0 = -2yn2 + 67r1, 0 = -prRC + 2c77@, 0 = -OTRB + 2(1- C)y;lrz, 0 = -(Y+S)q 0 =
-&p-J
+
QKRB
+ @RC
+ SrO,
+ y7r1.
With the normalization condition:
and the parameters from Table 3.8, three computational experiments were performed using the Gaussian elimination algorithm. The results are summaTable3.8 Experimental parameters for the model with two processorelements Parameter
Experiment 24 hr 54 set 4 hr 10 min 10 min 30 set 0.99
1
Experiment 1 yr 12 hr 4 hr 10 min 10 min 30 set 0.99
2
Experiment
3
100 yr 262 hr 48 min 4 hr 10 min ’ 10 min 30 set 0.99
rized in Table 3.9. From the data of the first experiment it is concluded that, for example, with probability 7r2+ ~1 w .71+ .25 = .96, at least one processor is up in steady state. This quantity could provide an important measure for
the availability of a modeledsystem. Other measuresof interest can be similarly derived. By comparing the results of different experiments in Table 3.9
COMPARISON OF NUMERICAL SOLUTION METHODS
I-
84lve
xl=
1 Fig. 3.7
The SOR iteration algorithm.
145
146
STEADY-STATE SOLUTIONS OF MARKOV CHAINS
Table 3.9
Steady-state
Probability r2 TRB TRC
m TO
probabilities Experiment
1
0.7100758490525 0.0000986153339 0.0004881459029 0.2465383347910 0.0427990549197
of the model with two processor elements Experiment
2
0.9990481945467 0.0000003796383 0.0000018792097 0.0009490957848 0.0000004508205
Experiment
3
0.9999904674119 0.0000000038040 0.0000000188296 0.0000095099093 0.0000000000452
it can be seen how the state probabilities depend on the MTTF, being varied from approximately 1 day to a little less than 100 years and 11 days. Although the probability mass is captured to a larger extent by ~2, the other probabilities are of no less interest to designers of dependable systems because the effectiveness of redundancy and recovery techniques is measured by the values of these small probabilities. Note, in the case of the third experiment in particular, that the numbers differ by many orders of magnitude. This property could impose challenges on the numerical algorithms used to analyze the underlying CTMC since an appropriate accuracy has to be assured. For instance, the results in Table 3.9 are exact values up to 13 digits. While such an accuracy is most likely not very significant for the probability ~2, it seems to be a minimum requirement for the computation of OTTO, since the 10 leading digits of ~0 are zeroes in the third experiment. The more the values of the model parameters diverge from each other and the larger the investigated model is, the more this property tends to manifest itself. Note that for the analysis of larger models, iterative methods are usually preferred. The convergence criteria have to be chosen very carefully in cases where the resulting values differ by orders of magnitude. On the one hand, the convergence criteria should be stringent if the significant measures result in very small values. One the other hand, computational complexity increases substantially with a more stringent convergence criteria. Even worse, the probability of convergence might significantly decrease. More details are presented in the following sections. 3.5.1
Case Studies
In the following, we refer back to the model in Fig. 2.7 as a starting point. The model can easily be extended to the case of n processors, while the basic structure is preserved. Furthermore, the same parameters as in Table 3.8 are used in three corresponding experiments. The computations are performed with Gauss elimination, the power method, the Gauss-Seidel algorithm, and different variants of the SOR method. The intention is to study the accuracy and pattern of convergence of the iterative methods in a comparative manner. Concerning the SOR method, for example, no sufficiently efficient algorithm
COMPARISON OF NUMERICAL SOLUTION METHODS
147
is known for a computation of the optimum w. With the help of these case studies some useful insights can be obtained. There are many possibilities to specify convergence criteria, as has been discussed earlier. In our example, convergence is assumed if both a maximum norm variant: (3.83) of the difference between the most recent iterates xcn) and x(+l) square root norm:
and the
In-1
lb% =
C(d2, 11i=o
(3.84)
of the residues: i-l ri
= C
n-l qj,ixy)
j=o
+ C
qj,ixi”-”
-
bi,
(3.85)
j=i
are less than the specified E:
Note that the pattern of convergence of iterative methods can depend on the ordering of the states as well, and thus on the iteration matrix. This dependence is particularly important for the SOR-type methods where in each iteration step the most up-to-date information is used in the computation of the components of the intermediate solution vector. Therefore, we provide in our comparative study the results of two different state ordering strategies. The resulting number of iterations are summarized in Table 3.10 for the cases of E = 10-l’, 10-15, 10-l”, 10-17, and 10m20as some examples. Comparing corresponding experiments, it can be seen from Table 3.10 how the number of iteration steps of the power method increases as the convergence criterion is tightened from E = 10-l’ to E = 10e2’. Furthermore, the power method tends to converge faster, the more the model parameters differ quantitatively from each other. In these examples, convergence is relatively slow in experiment one, medium in experiment two, and fast in the third experiment for fixed E. A similar pattern of convergence using the Gauss-Seidel method and the SOR variants can be observed. The relative best results are achieved by SOR with w = 0.8. Gauss-Seidel and SOR (w = 1.2) do not converge if Another view is provided in Fig. 3.8 where the number of iterations is depicted as a function of w that is systematically varied, 0 < w < 2, and the criterion of convergence E = 10-15 is chosen. Entries on the graph indicate that convergence was observed within 20,000 iterations for a particular w.
148
STEADY-STATE SOLUTIONS OF MARKOV CHAINS
Table 3.10
Number of iterations required to analyze the model with ten processors e= 1.e-10
e= l.e-15
Methods
Exp. 1
Exp. 2
Exp. 3
Exp. 1
Exp. 2
Exp. 3
Power Gauss-Seidel SOR (w = 1.2) SOR (w = 0.8)
32119 653
16803 23 49 41
16703 17 41
61013 *
24305 25 55 47
24083 17 41
2;9
3:7
e= lee-l6
e = 1. e-l7
Methods
Exp. 1
Exp. 2
Exp. 3
Exp. 1
Exp. 2
Exp. 3
Power Gauss-Seidel SOR (w = 1.2) SOR (w = 0.8)
66547 * * 369
26487 25 57 47
25309 17
70221 * * *
26883 *
26625 *
4;
4:
4:
f~1.e~~~ Methods
Exp. 1
Exp. 2
Exp. 3
Power Gauss-Seidel SOR (w = 1.2) SOR (w = 0.8)
71573 * * *
30039 * * *
30703 * 4*7
Note: An asterisk (*) indicates no convergence in 20,000 iterations
and a dash (-) indicates
convergence with negative results.
6000
4000
Iterations 3000
exp. 1
exp. 2 ,___r_____--
012
Fig. 3.8
014
0.6
0:s
i
"'
1:2
-e-e-
""
The number of SOR iterations as a function of w.
COMPARISON OF NUMERICAL SOLUTION METHODS
149
The results confirm that convergence is relatively slow in experiment one. Furthermore, in experiment one, SOR converges only if w < 1 while the number of iterations are quite sensitive to variations in ‘w. Fewer iteration steps are needed in experiments two and three. Relatively fewer iterations, 279, are observed in experiment one with w = 0.95, in experiment two, 25, with w = 1.0; and in experiment three, 21, with w = 1.01. It is interesting to note that in experiments one and three, even if w is chosen only slightly larger than 0.95 and 1.01, respectively, no convergence results. Similar outcomes are observed for different criteria of convergence E. Over-relaxation, that is w > 1, can only be successfully applied in experiment two.
Table3.11 Accuracy (A) in the number of digits and number of iterations (I) needed by different SOR variants with E = lo-l5 Exp. 2 (d)
Exp. 2 (r)
Exp. 3 (d)
Exp. 3 (r)
W
A
I
A
I
A
I
A
I
0.1 0.2 0.5 0.7 0.8 1.0 1.1 1.2 1.3
* 21 22 23 23 33 28 23 23
* 279 97 59 47 27 41 57 81
20 20 20 22 23 23 23 23 23
539 273 91 57 47 25 39 55 79
2: 23 23 24 49 45 * *
2:5 91 55 43 21 45 * *
20 20 21 21 23 30 -
507 247 83 51 41 17 -
Note: An asterisk (*) indicates no convergence in 20,000 iterations
and a dash (-) indicates
convergence with negative results.
Using different norms affects pattern of convergence and accuracy of the numerical results even for the same criterion of convergence E. In Fig. 3.9 the number of iterations is depicted as a function of w in experiment three, comparing the residue norm (T) to the difference norm (d) with E = 10-15. It can be seen that applying the difference norm consistently results in slightly more iterations for a given w. A similar pattern was observed in all the experiments. There are ranges of w where the application of the difference norm does not lead to convergence while the application of the residue norm does lead to convergence, and vice versa. The accuracy in terms of the number of correct digits of the numerical values resulting from the application of different norms in the SOR algorithm is indicated in Table 3.11. The numbers are compared to the results obtained using Gauss elimination. It can be seen that the difference norm, both in experiments two and three, consistently results in the same or higher accuracy as the residue norm for a given w. A trade-off is observed between the number of necessary iterations and the achieved accuracy. Both factors have to be taken into account when designing calculations.
150
STEADY-STATE SOLUTIONS OF MARKOV CHAINS
In experiment four, the data from Table 3.12 are applied computations. The number of iterations versus w is depicted The sensitivity of the convergence behavior to w is impressive While the minimum number of iterations are observed with number of iterations grows quickly as w approaches one.
in a series of in Fig. 3.10. in this study. w NN0.8, the
:, exp. 3(r) ,\ I ,
0
l.:::l::,:,,::,l.:::I:
::I::.:
0:2
0.4
0:6
0:8
W
Fig. 3.9
Comparison
of difference
(d) and residue
(T) norm in experiment
Table 3.12 Experimental parameters the model with ten processors Parameter
Experiment
three.
for
4
9512 yr 8243 hr 24 min
50 min 5 min 30 set 0.99
norm
10-20 residue
We do not claim to be complete in the discussion of factors that have an impact on the speed of convergence of numerical techniques. We only presented some issues of practical relevance that have to be taken into account while considering convergence. The results presented are conditioned on the fact that other convergence dominating issues such as state ordering were not
COMPARISON OF NUMERICAL SOLUTION METHODS
151
800--
600~-
Iterations 400~-
200~-
01
0:2
0.4
0.6
0.8
1
W
Fig. 3.10 Number of iterations applying the square root norm llrjlz < 10m2’ of the residue in experiment four.
taken into account. The methods could have been tuned further, of course, resulting in a different pattern of convergence.
Steady-State Aggregation/ D&aggregation Methods In this chapter we consider two main approximation methods: Courtois’s decomposition method and Takahashi’s iterative aggregation/disaggregation method.
4.1
COURTOIS’S
APPROXIMATE
METHOD
In this section we introduce an efficient method for the steady-state analysis of Markov chains. Whereas direct and iterative techniques can be used for the exact analysis of Markov chains as previously discussed, the method computations of Courtois [Cour75, Cour77] is mainly applied to approximate u NNof the desired state probability vector u. Courtois’s approach is based on decomposability properties of the models under consideration. Initially, substructures are identified that can separately be analyzed. Then, an aggregation procedure is performed that usesindependently computed subresults as constituent parts for composing the final results. The applicability of the method needsto be verified in each case. If the Markov chain has tightly coupled subsetsof states, where the states within each subset are tightly coupled to each other and weakly coupled to states outside the subset, it provides a strong intuitive indication of the applicability of the approach. Such a subset of states might then be aggregated to form a macro state as a basis for further analysis. The macro state probabilities, together with the conditional micro state probabilities from within the subsets, can be composed to yield the micro state probabilities of the initial model. Details of the approach are clarified through the following example. 153
154 4.1.1
STEADY-STATE AGGREGATION/ DLSAGGREGATION METHODS
Decomposition
Since the method of Courtois is usually expressed in terms of a DTMC, whereas we are emphasizing the use of a CTMC in our discussion of methodologies, we would like to take advantage of this example and bridge the gap by choosing a CTMC as a starting point for our analysis. With the following model in mind, we can explain the CTMC depicted in Fig. 4.1. We assume a system in which two customers are circulating among three stations according to some stochastic regularities. Each arbitrary pattern of distribution of the customers among the stations is represented by a state. In general, if there are N such stations over which K customers are arbitrarily distributed, then from simple combinatorial reasoning we know that (“$“F’) = (“‘,“-‘) such combinations exist. Hence, in our example with N = 3 and K = 2 we have 4 02 = 6 states.
Fig. 4.1
CTMC subject to decomposition.
In state 020, for example, two customers are in station two, while stations one and three are both empty. After a time period of exponentially distributed length, a customer travels from station two to station one or station three. The transition behavior is governed by the transition rates ,921or ~23 to states 110 or 011, respectively. The transition behavior between the other states can be explained similarly. The analysis of such a simple model could be easily carried out by using one of the standard direct or iterative methods. Indeed, we use an exact method to validate the accuracy of the decomposition/aggregation approach for our example. We use the simple example to illustrate Courtois’s method. To this end, the model needs to be explored further as to whether it is nearly completely decomposable, that is, whether we can find state subsets that represent tightly coupled structures. The application may suggest a state set partitioning along the lines of the customers’ circulation pat tern among the visited stations. This would be a promising approach if the customers are preferably staying within the
COURTOIS’S APPROXIMATE
’ , : ’ ‘\ :
METHOD
155
/’ /’ ,’ , 8 8’ ,’
1‘I,g 0 8002
Fig. 4.2 Decomposition of the CTMC with regard to station three.
Fig. 4.3
Decompositions
of the CTMC with regard to stations two and one.
bounds of a subset of two stations and only relatively rarely transfer to the third station, i.e., the most isolated one. All such possibilities are depicted in Fig. 4.2 and 4.3. In Fig. 4.2, station three is assumed to be the most isolated one, i.e., the one with the least interactions with the others. Solid arcs emphasize the tightly coupled states, whereas dotted arcs are used to represent the “loose coupling.” When customers are in the first station, they are much more likely to transit to the second station, then return to the first station, before visiting the third station. Hence, we would come up with three subsets {020,110,200}, {011, lOl}, and {002} in each of which the number of customers in the third station is fixed at 0, 1, and 2, respectively. Alternatively, in Fig. 4.3, we have shown the scenario where stations two and three are isolated. Now we proceed to discuss Courtois’s method on the basis of the set of parameters in Table 4.1. It suggests a decomposition according to Fig. 4.2. Clearly, the parameter values indicate strong interactions between stations one and two, whereas the third station seems to interact somewhat less with
156
STEADY-STATE
AGGREGATION/
DISAGGREGATION
Tab/e
Transition
4.1
rates = 0.20 jai32 = 0.20
1-121= 2.40 /A.23= 0.15
p12 = 4.50 p13 = 0.30
METHODS
p31
This level of interaction should be reflected at the CTMC level representation. The corresponding infinitesimal generator matrix Q is given
the others. ES:
-4.80 2.40
Q=
0 0.20
0 0
0
4.50 -7.35 2.40 0.20 0.20
0
0.30 0.15
4.50 -2.55
0.30 0.15 4.50
0
0 -5.20 0.20
0
2.40 0.20
0
-2.95 0.20
0 0 0 0.30 0.15 -0.40
and is depicted symbolically in Table 4.2.
Tab/e 4.2 Generator matrix structured for decomposition
200
110
200
-c
l-412
110
P21
-c
P12
020
0
1121
-c
101
P31
P32
0
011
0
P31
002
0
0
I
020 0
I
Q of the CTMC 101 P13
CL23 0
011
002
0 Pl3
P.23
-c
i-412
P32
P21
-c
0
P31
l-W.2
W3
It is known from Eq. (3.5) that we can transform any CTMC to a DTMC by defining P = Q/q + I, with Q > maxi,jES I~ijl. Next we solve v = VP instead of 7rQ = 0 and we can assert that v = 7r. Of course, we have to fulfill the normalization condition vl = 1. For our example, the transformation results in the transition probability matrix as shown in Table 4.3, where q is appropriately chosen. Given the condition g > max;,j )g;jJ = 7.35, we conveniently fix 4 = 10. Substituting the parameter values from Table 4.1 in Table 4.3 yields the tran-
COURTOIS’S
Tab/e 4.3 Transition-probability structured for decomposition
200 200 110 020
l-5
l-5
011
002
0
Lui4
0
0
y
1-F
so11
fz!a2 Q .&u 4
?!siz 4
002
0
0
0
0
0 1-F &u4 l!A3.l 4
157
P of the DTMC
101
y
&LL 4
METHOD
020
y
l!Lzl 4 0
matrix
l!Ql 4 0
101
sition
110
APPROXIMATE
y
0
l!Lzi Q
0
y 1-F t&XL 4
l!aa Q y 1-F
probability matrix P:
P=
0.52 0.45 0 0.03 0 0 0.24 0.265 0.45 0.015 0.03 0 0 0.24 0.745 0 0.02 0.02 0 0 0.02 0.02 0.24 0.705 0.015 0 0 0
(4.1)
As indicated, P is partitioned into I&! x M) number of submatrices PIJ, with A4 = 3 and 0 5 I, J 5 2 in our example. The submatrix with macro row index I and column index J is denoted by PIJ, while the elements of each submatrix can be addressed via double subscription PI Jij. The M diagonal submatrices PII represent for I = 0, 1,2 the interesting casesof zero, one, and two customers, respectively, staying in the third station. For example, the second diagonal submatrix Prr is:
The elements of the submatrix can be addressedby the schemajust introduced: pll 1,-,= 0.24. Since the indices I, 0 5 I 5 M - 1 are used to uniquely refer to subsetsas elementsof a partition of the state spaceS, we conveniently denote the corresponding subset of states by SI; for example, Sr = { 101,011). Of course, each diagonal submatrix could possibly be further partitioned into substructures according to this schema. The depth to which such multilayer decomposition/aggregation technique is carried out depends on the number of stations N and on the degree of coupling between the stations. We further elaborate on criteria for the decomposability in the following.
158
STEADY-STATE
AGGREGATION/DlSAGGREGATlON
METHODS
A convenient way to structure the transition probability example is to number the states using radix K notation: N c i=l
matrix
in our
N
kiKi,
where
c
ki = K.
i=l
where /Q denotes the number of customers in station i and N is the index of the least coupled station, i.e., the one to be isolated. For instance, in our example a value of 4 would be assigned to state 200 and a value of 12 to state 011. Note that the states in Table 4.2 and 4.3 have been numbered according to this rule. The transition probability matrix in block form is given as: /’
P=
PO0
PO1
PlO
Pll
...
. . .
PI0
PO(M-1)
. . .
PIJ
Pl(M-1)
* **
PI(M-1)
*
(4.3)
.
.
. . *
P(M-2)O
. . *
\ ,P(M-1)O
P(M-P)(M-1) q&f-1)(&l)
Generally, the partitioning of matrix P in submatrices PIJ is done as a function of the number of customers in the least, coupled station N. Given that K customers are in the system, there will be M = K + 1 diagonal submatrices. Each diagonal submatrix PII, 0 5 I 5 K, is then considered to be a basic building block for defining transition probability matrices P;I of a reduced system of N-l stations and K-I customers. For example, PT, would be used to describe a system with K - 1 customers circulating among N - 1 stations, while one customer stays in station N all the time. The transitions from station N to all other stations would be temporarily ign0red.l In terms of our example, we would only consider transitions between states 101 and 011. To derive stochastic submatrices, the transition probability matrix P is decomposed into two matrices A and B that add up to the original one. Matrix A comprises the diagonal submatrices PII, and matrix B the complementary off-diagonal submatrices PIJ, I # J: P=A+B.
(4.4)
In our example, Eq. (4.4) has the following form: Ot ‘Note that P11 is not a stochastic matrix; matrix
PT1.
it has to be modified to obtain a stochastic
COURTOIS’S
P=
/0.52 0.45 0 0.24 0.265 0.45 0 0.24 0.745
0
\
0.96
METHOD
0.03 0 0.015 0.03 0 0.015
0 0 0 0.02 0.02 0 0.03 0 0.02 0.02 0 0.015 0 0 0 0.02 0.02 0 0
0.48 0.45 0.24 0.705
0
APPROXIMATE
A
B
159
I
Since we wish to transform each diagonal submatrix PII into a stochastic matrix PT,, we need to define a matrix X, such that Eq. (4.5) holds and matrices P*, P&, . . . PT,-,)(,-,) are all stochastic:
P*=(A+X)=
There are multiple ways to define matrix X. Usually, accuracy and comput ational complexity of the method depend on the way X is defined. It is most important, though, to ensure that the matrices P;,,O 5 I 2 M - 1, are all ergodic, i.e., they are aperiodic and irreducible. Incorporating matrix X in matrix P, we get: P=(A+X)+(B-X) (4.6)
=p*+c.
In our example, we suggest the use of following matrix X in order to create a stochastic matrix P**. 0
0
0.03
0
0
0 \
x=
This gives us:
P=
'0.52 0.45 0.03 0.24 0.31 0.45 0.015 0.24 0.745 0
C
160
STEADY-STATE
4.1.2
Applicability
AGGREGATION/
DISAGGREGATION
METHODS
The applicability of Courtois’s method needs to be checked in each case. In general, partitioning of the state space and subsequent aggregation of the resulting subsets into macro states can be exactly performed if the DTMC under consideration has a lumpable transition probability matrix. In such a case, the application of Courtois’s method will not be an approximation. A transition probability matrix P = bij] is lumpable with respect to a partition of S in subsets SI, 0 5 1 5 n/r - 1, if for each submatrix PIJ,V’I, J,O 5 I # J 5 111- 1 real-valued numbers 0 < ~IJ 5 1 exist such that Eq. (4.7) holds [KeSn78]:
c
=
PIJij
vi
“+IJ,
E SI.
(4*7)
jESJ
Note that the diagonal submatrices PII need not be completely decoupled from the rest of the system, but rather the matrix has to exhibit regularities imposed by the lumpability condition in order to allow an exact aggregation. In fact, if the PII are completely decoupled from the rest of the system, i.e., the r~J = 0, I # J, then this can be regarded as a special caseof lumpability of P. More details on and an application example of state lumping techniques are given in Section 4.2, particularly in Section 4.2.2. From our example in Eq. (4.1), P is not lumpable with respect to the chosen partition. Hence Courtois’s method will be an approximation. A measureof accuracy can be derived according to Courtois from Eqs. (4.4) and (4.6). The degree Eof coupling between macro states can be computed from matrix B = [bij] in Eq. (4.4). If E is sufficiently small it can be shown that the error induced by Courtois’s method is bounded by O(E). Let Ebe defined as follows:
(44
e=max iES
Eq. (4.6) can be rewritten as: P=P*+&.
(4-g)
In our example, Ecan be computed to be: 0.03 0.015 0.03 0.015 0.02 0.02 0.03
+
E=max i
0.02
+ + 1=0-07* t 0.02 t 0.015 0.02 +0.02
COURTOIS’S
APPROXIMATE
161
METHOD
To prove that P is nearly completely decomposable, it is sufficient to show that Relation (4.10) holds between E and the maximum of the second largest eigenvalues A;(2) of P;, for all I, 0 5 I 5 M - 1 [Cour77]:
E < 1 - maw IAl;@) 2
(4.10)
-
For each PT,, the eigenvalues A; (k) can be arranged according to their decreasing absolute values: (4.11)
IW)I > IJw>l > *-* > Im-wl~
Since P;,, 0 5 I 5 2, are all stochastic, we can immediately conclude Ix;(l)( = 1 for all I. In our example, the eigenvalues of the three diagonal submatrices:
need to be computed. equation: det(P&
The eigenvalues of P&, are the roots of the following
- X:1) =
(0.52 - A;) 0.24 0.015
0.45 (0.31 - Xi) 0.24
= (XT - 1)(x; - 0.595)(x; = 0,
+ 0.02)
The resulting where I denotes the identity matrix. arranged according to their absolute value as: Ix;(l)j
= 1,
Ix;(a)/
= 0.595,
The eigenvalues of PT 1 are determined det(PT, - XTI) =
eigenvalues of P&
IX;;(3)1 = 0.02.
by the solution of:
(0.48 - A;) 0.295
0.52 (0.705 - A;)
= (A; - 1)(X; - 0.185) = 0,
0.03 0.45 (0.745 - A;)
are
162
STEADY-STATE
AGGREGATiON/DlSAGGREGATlON
METHODS
which, in turn, gives: IXi(l)I
= 1 and
IXi(2)l = 0.185.
The eigenvalue of P&, immediately evaluates to: Ix;(l)1 = 1. With the second largest eigenvalues computed, we can examine whether decomposability Condition (4.10) holds: m;xjJI;(2)/
= max
L
od5ps8\ = 0.595 1
0.07 = E< 1 - maxl IX”;(Z)1 2 1 - 0.595 2 = 0.2025.
o.o7=e<
Because E < 0.2025, Condition (4.10) holds and the transition probability matrix P is nearly completely decomposable with respect to the chosen decomposition strategy depicted in Fig. 4.2. 4.1.3
Analysis
of the Substructures
As a first step toward the computation of the approximate state probability vector v” w u, we analyze each submatrix P;I separately and compute the conditional state probability vector Y;, 0 5 I 5 M - 1: qp;,
up=
- I) = 0,
1.
(4.12)
Thus V; is the left eigenvector of Pf, corresponding to the eigenvalue X;(l) = 1. Substituting the parameters from P& of our example, the solution of
(yg*cvvo*P42) -
(0.52 - 1) 0.24 0.015
0.45 (0.31 - 1) 0.24
0.03 0.45 (0.745 - 1)
= 0,
YZl = 1
yields the conditional steady-state probabilities of the micro states aggregated to the corresponding macro state 0: z&, = 0.164,
u& = 0.295,
l& = 0.54.
Similarly, PT, is used in
(
40’4,)
-
(0.48 - 1) 0.295
(
0.52
(0.705 - 1)>
=o,
VT1 = 1
COURTOIS’S
APPROXIMATE
METHOD
163
to obtain the conditional state probabilities of the micro states aggregated to the corresponding macro state 1: l&
= 0.362,
z& = 0.638.
Finally:
Nearly completely decomposablesystems can be characterized with respect to “long-term” and “short-term” behavior: l
From a “short-term” perspective, systems described by their probability transition matrix P can be decomposedinto M independent subsystems, each of whose dynamics is governed by a stochastic processapproximately described by the matrices PF, , O
l
In the “long run,” the impact of the interdependencies between the subsystems cannot be neglected. The interdependencies between subsystems, or macro states, I and J, for all I, J, are described by the transition probabilities IIJ that can be - approximately - derived from the transition probability matrix P and the conditional state probability vector UT. Solving the transition matrix P = [I’,J] for the macro states yields the macro-state probability vector y. The macro-state probability vector y can, in turn, be used for unconditioning of UT, 0 5 I 5 Ad- 1, to yield the final result, the approximate state probability vector V” * V.
4.1.4
Aggregation
and Unconditioning
Having obtained the steady-state probability vector for each subset, we are now ready for the next step in Courtois’s method. The transition probability matrix over the macro states, T’ = [I’ 1~] is approximately computed as: (4.13)
Comparing Eq. (4.13) with the lumpability Condition (4.7), it is clear that the I’IJ can be exactly determined if the model under consideration is lumpable, that is: hJ
holds, independent of v;.
= rIJT
(4.14)
164
STEADY-STATE
AGGREGATION/
DISAGGREGATION
In our example, the macro-state according to Eq. (4.13):
METHODS
transition
probabilities
I~J are derived
roe= 40.(PO000 +PO001 +poo()J +z&.(pool0 +pool1 +POO12) +42.(PO020 +PO021+ poo22) = 0.9737, l-b1
=
uto* +
(PO100 42.
r02
=
40.
(P0200)
ho
=
40.
ho00
‘ll
=
‘TO*
hloo
I721
h3
+
+
vile
(poll0
(p02~~) +
+ Pllol)
- rll
V&b
+ POlll)
= 0.0263,
+ ~01~~)
+ moo1
r12 = 1 -ho r20
+Po101)
(Po120
+
+
uT3-
PlOo2)
+
&*
(JIlllo
(pozzo)
z&.
=
(plolo
1 -
roe
+ PIOll
+ pllll)
=
_ rol
=
+ plo12)
0, =
0.04,
0.9396,
= 0.0204,
=
40’
(P2000
+ p2001
=
G
(Pa100
+ p2101)
+ p2002) =
=
0,
0.04,
= 1- r20- r21= 0.96.
Accordingly, the macro-state transition probability matrix r is:
I?=
0.9737 0.04 0
0.0263 0.9396 0.04
0 0.0204 0.96
The next step is to compute the macro steady-state probability vector y using:
Yr=Y,
yl=
1.
(4.15)
In our example, Eq. (4.15) results in: (0.9737 - 1)
0.0263
0.04
(0.9396 - 1)
0
0.04
(YO/YlrY2)
0.0:04 (0.96 - 1)
= 0,
yl = 1,
from which the macro steady-state probabilities are calculated as:
YO = 0.502,
yl = 0.330,
y2 = 0.168.
The final step is to obtain the approximate steady-state probabilities for the original model. For convenience, the elements of the state probability vector vz E u are partitioned along the lines of the decomposition strategy. Hence, we refer to the elements of V” = [v?J in compliance with the usual notation in this context. The unconditioning in Eq. (4.16) of the VT is to be performed for all macro states I: qi
=
YI’TiT
0 5 I 5 M - 1 and V’i E SI
(4.16)
COURTOIS’S
Tab/e 4.4
State probabilities
State
Probability
200 110 020 101 011 002
6J Gl G2 +I U1”l UF
computed Formula Yo& You;1 70 u; 2 Yl$o
Yl& Y2$
APPROXIMATE
using Courtois’s Approx. 0.0828 0.1480 0.2710 0.1194 0.2108 0.1680
165
METHOD
method and exact ones Exact 0.0787 0.1478 0.2777 0.1144 0.2150 0.1664
% Error + + + +
5.2 0.1 2.4 4.4 2.0 0.1
The use of Eq. (4.16) in our example gives the results summarized in Table 4.4. For the sake of completeness, the exact state probabilities are also shown for comparison. For models with large state spaces, Courtois’s method can be very efficient if the underlying model is nearly completely decomposable. If the transition probability matrix obeys certain regularity conditions the results can be exact. The error induced by the method can, in principle, be bounded [CoSe84]. But since the computational complexity of this operation is considerable, a formal error bounding is often omitted. The efficiency of Courtois’s method is due to the fact that instead of solving one linear system of equations of the size of the state space S, several much smaller linear systems are solved independently, one system for each subset SJ of the partitioned state space S, and one for the aggregated chain. We conclude this section by summarizing the entire algorithm. For the sake of simplicity, only one level of decomposition is considered. Of course, the method can be iteratively applied on each diagonal submatrix P;,. 4.1.5
The
Algorithm
Create the state space and organize it appropriately according to of decomposition. Build the transition probability matrix P (by use of randomization P = Q/q+1 if the starting point is a CTMC), and partition P into AL! x M number of submatrices PIJ, 0 2 I, J, 5 M - 1, appropriately. Verify the nearly complete decomposability of P according to Relawith the chosen value of E. Decompose P such that P = P* + EC according to Eq. (4.9). Matrix P* contains only stochastic diagonal submatrices P;,, and E is a measure of the accuracy of Courtois’s method. It is defined as the maximum sum of the entries of the non-diagonal submatrices PIJ, I # J, of P. For each I, 0 5 I 5 M - 1, solve equation $PT, vT1 = 1 to obtain the conditional state probability vectors UT.
= UT with
166
STEADY-STATE AGGREGATION/ DISAGGREGATION METHODS
Compute the coupling between the decomposed macro states: -1 Generate the transition to Eq. (4.13). -1 vector y.
probability
matrix I’ = [l?~,] according
Solve Eq. (4.15) to obtain the macro steady-state probability
Compute the approximate steady-state probability vector zP of the micro states by unconditioning of the conditional state probability vectors vT,O 5 I < &! - 1, according to Eq. (4.16). From v” compute steady-state performance and dependability measures along the lines of a specified reward structure. Problem 4.1 Verify that the transition probability is not lumpable according to the chosen partition.
matrix from Eq. (4.1)
Problem 4.2 Recall Eq. (4.7) and modify the parameters in Table 4.1 of the example in Fig. 4.1 such that the resulting model is lumpable and nearly completely decomposable. Derive the generator matrix according to Table 4.2 and the transition probability matrix according to Table 4.3 for the resulting model. Problem 4.3 Discuss the relationship between lumpability and nearly complete decomposability. Under what condition does Courtois’s method yield the exact state probabilities if the model is both lumpable and nearly completely decomposable ? As a hint, compare Eq. (4.13) with lumpability Condition (4.7) and relate it to Eq. (4.16). Problem 4.4 Modify Courtois’s method such that it becomes directly applicable for an analysis of CTMCs. Apply the new algorithm directly, that is, without using uniformization, to the original model in Fig. 4.1 and solve for the steady-state probability vector.
4.2
TAKAHASHI’S
ITERATIVE
METHOD
Although closely related to and under some circumstances even coinciding with Courtois’ approach, Takahashi’s method [Taka75] differs substantially both with respect to the methodology used and the applicability conditions. While Courtois’s method is non-iterative and is applied for the approximate computation of the steady-state probability vector v” for a given ergodic DTMC (or 7rTxin the case of a CTMC), Takahashi’s iterative method allows a computation of the exact state probability vector. (Note that numerical errors are still encountered even in such “exact” methods. The method is exact in that there are no modeling approximations.) To allow a straightforward comparison of the two methods, we prefer a continued discussion in terms of
TAKAHASHI’S
ITERATIVE
METHOD
167
DTMCs. Recall that any given ergodic CTMC can easily be transformed into an ergodic DTMC. Much like Courtois’s method, Takahashi’s approach is to partition the state space S into M disjoint subsets of states Sr c S, 0 5 I 5 iVl - 1 such that state I. The criteria, however, each subset Sr is aggregated into a macro used to cluster states differ in the two approaches. The calculation of transition probabilities IIJ among the macro states I and J is performed on the basis of conditional micro-state probability vector V; and the originally given transition probabilities pij, or PI Jij according to Eq. (4.13). With Courtois’s method, the conditional probability vectors UT, given partition element I, can be separately computed for each subset of micro states SI, 0 5 I 5 M - 1, if the original model is nearly completely decomposable. By contrast, in Takahashi’s method, the partitioning of the state space is performed on the basis of approximate lumpabiEity.2 Note that if the state space is approximately lumpable with respect to a partition, it does not necessarily imply that the subsets are nearly decoupled as needed in Courtois’s approach. As an immediate consequence, the conditional micro-state probabilities cannot be calculated independently for each subset of states. Instead, the complementary subset of states and the interactions with this complement is taken into account when approximating conditional micro-state probabilities of a partition element. The whole complementary set is aggregated into a single state representing all external interactions from the states within the particular subset. Since the micro-state probabilities as well as the macro-state probabilities are not known in the beginning, initial estimates are needed. The macro- and micro-state probabilities are iteratively calculated with Takahashi’s method. Aggregation and disaggregation phases are repeated alternately until some convergence criterion is satisfied. By adding an extra computational step, Schweitzer [Schw84] shows that geometric convergence can be guaranteed. Usually, the extra computation takes the form of a power or a Gauss-Seidel iteration step. 4.2.1
The
Fundamental
Equations
Our discussion of Takahashi’s method is based on that given by Schweitzer [Schw84]. Further details can also be found in [Stew94]. Let a given state space S be partitioned into M subsets: M-l s=
u
SI
and
&nsJ=&
VI#J,
O
(4.17)
I=0
2While lumpability has been later in Section 4.2.2.
introduced
in Eq.
(4.7),
approximate
lumpability
is defined
168
STEADY-STATE AGGREGATION/ DISAGGREGATION METHODS
Formally, the macro-state probability vector y could be calculated from the micro-state probability vector v for each component I of the vector: O
YI= -&
(4.18)
iESI
Since u is not known in advance, Eq. (4.18) cannot be directly used. The macro-state probabilities can be derived, however, if the macro-state transition probability matrix I? = [I’IJ] were known via the solution of Eq. (4.19): y = yr,
yl = 1.
(4.19)
The implied aggregation step is based on iterative approximations of the probabilities uTi of states i E SI given an estimate vx of the micro-state probability vector: (4.20) Therefore, from Eq. (4.20), and Eq. (4.13), th e macro-state transition probabilities ~IJ can be approximated as:
(4.21)
For the disaggregation step, the probabilities rli of transitions from macro state I to micro state i are pairwise needed for all 0 5 I 5 M - 1 and for all i E S. The transition probabilities rli can be derived from Eq. (4.21) and are given by Eq. (4.22): .
(4.22)
To complete the disaggregation step, the micro-state probabilities vi are expressed in terms of macro-state probabilities ye and transition probabilities l?r; for each subset of states Sr separately. To accomplish this task, [SI] linear equations need to be solved for each subset SI : M-l vi =
c jESI
UjPji +
c
YKrKi, vi E sI.
(4.23)
K=O,K#I
The second term in Eq. (4.23) represents the aggregation of the complementary subset of macro states into a single external “super” state and the interactions in terms of transition probabilities with this state.
TAKAHASHI’S
ITERATIVE
METHOD
169
Macro-state probability vector Eq. (4.19), together with transition probability aggregation steps as given in Eq. (4.21), constitute one major part of Takahashi’s method, the aggregation step. Micro-state probability vector Eq. (4.23), together with transition probability disaggregations, as indicated in Eq. (4.22), f orm the other major part of Takahashi’s method, the disaggregation step. 4.2.2
Applicability
Given partition: M-l s=
u
SJ,
J=O
a subset Sr is lumpable if and only if real-valued constants condition (4.24) holds Vj E S - SI:
rrj exist such that
ViESI.
pii=rIi
(4.24)
Condition (4.24) is necessary and sufficient for the lumpability of SI with respect to S - SI. The lumpability condition may be restated by referring to Eq. (4.7) if pairwise lumpability is required such that real-valued constants rI J exist and Condition (4.25) holds V’I, J, 0 5 I # J 5 A4 - 1: Vi E SI.
(4.25)
If the state space can be partitioned such that the lumpability condition holds for all pairs of subsets SI, SJ c S, then Takahashi’s algorithm terminates in one iteration with the exact state probability vector V. Unfortunately, lumpability implies very strict structural limitations on the underlying Markov model. More often, however, Markov chains exhibit approximate lumpability such that there exists a sufficiently small lumpability error E: M-l
(4.26)
E= J=O,J#I
iESI
~ESJ
Lumpability error E is a measure for the speed of convergence of Takahashi’s method. The problem remains, however, to find a good partitioning among the states of a given DMTC. Some hints concerning convergence and state space partitioning can be found in Stewart’s book [Stew94]. The underlying Markov models of some so-called product-form queueing networks, which are discussed in Chapter 7, exhibit the exact lumpability property. F’urthermore, the lumpability equations can be exploited theoretically to investigate accuracy of approximate product-form queueing network algorithms.
170
STEADY-STATE AGGREGATION/ DISAGGREGATION METHODS
In general, the efficiency of Takahashi’s method can benefit from an exploitable structure of the underlying transition probability matrix. This fact, however, imposes the burden on the user of finding a good clustering strategy, and no generally applicable rule is known for a good clustering strategy. Finally, it is worth mentioning that approximate lumpability is different from weak lumpability, which can be applied for an aggregation-disaggregation based transient analysis of CTMCs. A comprehensive study on lumpability and weak lumpability is presented by Nicola [NicoSO]. 4.2.3
The Algorithm
Since convergence of Takahashi’s method is a non-trivial problem, some care is needed here. We follow an approach that is often applied to enforce geometric convergence, namely, to incorporate an intermediate power, Gauss-Seidel, or SOR step between successive aggregations and disaggregations. If the decrease in residual error is not sufficient from iteration to iteration, such a step is suggested to be included in the computation. Usually, the corresponding condition is stated in terms of a real-valued lower improvement bound c, 0 < c < 1 on the error ratio from iteration step n - 1 to iteration step n: residualerror (~(“1) residualerror (~(~-l))
(4.27)
’ ‘*
The complete algorithm is presented as Fig. 4.4. 4.2.4
Application
We demonstrate Takahashi’s method with the same example used in Section 4.1 to illustrate Courtois’s method to allow an easy comparison of the numerical behavior of the two algorithms. Recall that we considered a network in which two customers were circulating among three stations according to some stochastic regularities. The resulting state transition diagram of the set S of six states was depicted in Fig. 4.1. Again, we will apply the decomposition strategy indicated in Fig. 4.2 and partition S into three disjoint subsets: So = {020,110,200}, Si = (011, lOl}, and S:! = (002). Arranging the states in the same order as defined in Table 4.3 and using the same numbers in our example results in the following transition probability matrix:
P=
0.52 0.24 0 0.02 0 0 L
0.45 0.265 0.24 0.02 0.02 0
0 0.45 0.745 0 0.02 0
0.03 0.015 0 0.48 0.24 0.02
0 0.03 0.45 0.705
0 (4.28) 0.015
TAKAHASHI’S
ITERATIVE
Create the state space 9 and organize it appropriately according 09 decomposition along the lines of approximate fumpability into probabiJ.ity matrix P = [PJJ~~]. fnitiaLioation: a) r1.:=0; b) estimate Y(O); c> choose c andO
f (w”’
,(nf
function
f([i,
METHOD
to a pattern transition
. .I/);
2 E
= v’P;
n:=n+l;
b)
according to Eq. (4.211; solve yfle) = yfn)I’(nf accordjag
Disa&gregation a)
calculate
for I?$’
all
0 5 15
= &g,
Normalizat With v(n)
to Eq. (4.22); VP’ = furi] with fpP$i
kf -1:
,a, kES1
according b) calculate equations : (n) = & yi
to Eq. (4.19);
tt’j E S Vk
Eq. (4,23)
Fig.
4.4
the system
vi E SIi
+ K=~p?ifYl;)rg. I
foa : = ~(;l’ from the previous E 3
by solving
step solve;
Takahashi’s
v(“)l=l,
algorithm.
of
171
172
STEADY-STATE
AGGREGATiON/DlSAGGREGATlON
METHODS
It has been shown that the transition probability matrix P in Eq. (4.28) is nearly completely decomposable so that Courtois’s method could be applied. We now investigate the lumpability error induced for the same partition with respect to approximate lumpability as defined in Eq. (4.26). To calculate the lumpability error, we first let I = 0, J = 1 and denote with EIJ the inner sum:
601
=
c c xi iESo jESl
- &
c c Pij iESo jESl
= o o3 _ 0.03 + 0.015 + 0.03 + 0.015 3 + 0.015 0.03 - 0.03 + 0.015 + 0.03 + 0.015 + 3 + o 015 _ 0.03 + 0.015 + 0.03 + 0.015 3 = IO.03- 0.031+ 10.045- 0.031+ 10.015- 0.031 = 0 + 0.015 + 0.015 = 0.03. Similarly, we get: E()g= 0. Therefore, by letting I = 0, J # I, the total lumpability error is given by: ql = EOl+ 602= 0.03 + 0 = 0.03. For the other elements of the partition, we get: El = 0.015,
E2= 0.
With: E=
o
= 0.03,
measurescan be derived for the speed of convergence if Takahashi’s method is applied with respect to the given partition of S = U”,=, SI. 4.2.4.1
Aggregation
With the initial probability vector: y(O) = y(0) =
(
111111 $$g’ g’s’ g
the first aggregation step, i.e., the computation of
r(l) = ri:’ [ I
>
7
TAKAHASHI’S
can be accomplished of r$) is shown:
ITERATIVE
METHOD
173
according to Eq. (4.21). As an example, the calculation
00= $ *lxo~ij r(l) = 5 + (0.52 + 0.45 + 0.24 + 0.265 + 0.45 + 0.24 + 0.745) = ; * 2.91 = 0.97. Thus, in the first aggregation sition probability matrix:
I’(‘)
=
step, we get the following
0.97 0.04 0
0.03 0.9375 0.04
0 0.0225 0.96
macro-state
.
tran-
(4.29)
Then the macro-state probability vector y(l) is given by the solution of the following system of linear equations:
#
7p,
#
-0.03 0.04 0
From Eq. (4.30) we obtain decimal digit:
0.03 -0.0625 0.04
0 0.0225 -0.04
the macro-state
= 0,
y(l)1
probabilities
y(l) = (0.4604,0.3453,0.1943)
= 1.
(4.30)
up to the fourth
.
(4.31)
4.2.4.2 Disaggregation In the first part of the disaggregation step, transition probabilities I’if’, are computed:
IT\:) = Jjo.02 + io
= 0.01, r&) = 0.
From there we compute:
(1) (1)+-iil)ral,) (1)-- c vJ(%jo +71 r10 Vo j=o,1,2
= (z@ 0.52 + vi’). 0.24 + v;l’ * 0) + (0.3453.0.01 + 0.1943.0) = (@ - 0.52 + vi’)- 0.24 + z@ 0) + 0.003453.
174
STEADY-STATE
AGGREGATION/
The remaining transition summarized as follows:
DlSAGGREGATlON
probabilities
METHODS
from macro states to micro states are
To complete disaggregation in the first iteration step, the following three sets of equations, for { vir), vi’), Vil)}, {Vi”, Vi” }, and { $)) need to be solved separately for a calculation of micro-state probabilities: VO('I
= (
[email protected]
+ $)-0.24)
+ 0.003453,
*1 ('I
= (~$0.45
+ i+O.265
+
[email protected])
V2(l)
= (zp.0
V3 (')
= (z&O.48
+ ~$0.24)
v4 (l)
= (zp.O.45
+ zp.O.705)
V5
(1) _ -
With the solutions dition:
(u5
+ 0.006906,
+ up. 0.45 + up. 0.745) + 0.003453,
(1) ~0.96)
+
+ 0.010792, + 0.010792,
0.00776925.
of these systems
of equations and the normalization
con-
vQ)l = 1, the probability vector Y(‘) at the end of the first iteration to the fourth digit is:
with a precision up
Y(‘) = (0.0812,0.1486,0.2753,0.1221,0.1864,0.1864)
.
(4.32)
In Table 4.5, the micro-state probabilities obtained after one iteration of Takahashi’s method are compared to the results gained if Courtois’s method is applied. The percentage error in the results obtained with Courtois’s approach are also included in the table. It can be seen that even after the first iteration step, the results are already relatively close to the exact values. After four iterations, the results gained with Takahashi’s method resemble the exact values up to the fourth decimal digit. 4.2.5
Final
Remarks
Takahashi’s method was presented in terms of DMTCs.
However this choice
implies no limitation since it is a well-known fact that ergodic DTMCs and
TAKAHASHI’S
Table 4.5
State probabilities:
State
y(O)
y(l)
200 110 020 101 011 002
0.166 0.16g 0.166 0.166 0.166 0.16G
0.0785 0.1436 0.2660 0.1179 0.2138 0.1801
Takahashi’s,
y(2)
y(3)
0.0787 0.1476 0.2770 0.1146 0.2150 0.1672
State
Courtois
Exact
200 110 020 101 011 002
0.0828 0.1480 0.2710 0.1194 0.2108 0.1680
0.0787 0.1478 0.2777 0.1144 0.2150 0.1664
y(4)
0.0787 0.1478 0.2777 0.1144 0.2150 0.1665
0.0787 0.1478 0.2777 0.1144 0.2150 0.1664
ITERATIVE
Courtois’s
175
METHOD
and exact methods
Exact
% Error(‘)
0.0787 0.1478 0.2777 0.1144 0.2150 0.1664
+ +
0.3 2.8 4.2 3.1 0.6 8.2
% Errod4) 0.0 0.0 0.0 0.0 0.0 0.0
% Error + + + +
5.2 0.1 2.4 4.4 2.0 0.1
CTMCs are equivalent with respect to their steady-state probability computations, when applying the transformations given in Eq. (3.6) and Eq. (3.5). But Takahashi’s method can be more conveniently applied directly on the generator matrix Q = [qij] of a CTMC. As usual, we refer to the steady-state probability vector of the given micro-state CTMC through z or VP, respectively. In the continuous-time case,the resulting equations correspond directly to their discrete-time counterparts. Instead of using Eq. (4.19), the macrostate probability vector o is now calculated based on infinitesimal generator matrix Z of the continuous-time macro state process:
o=aq
al=
(4.33)
1.
The matrix entries CIJ of X are calculated with an interpretation corresponding to the one applied to Eq. (4.21):
I # J. The disaggregation steps are analogous Igvi-1: CIj
=
c
iESI
jES,j#i
jESr,j#i
eqij c
kESI
-
to Eq. (4.22)
)
and Eq. (4.23)
j E SJ, J # I,
“g
J=O, J#I
(4.34) VI, 0 5
(4.35)
176
STEADY-STATE
AGGREGATlON/DlSAGGREGATlON
METHODS
Problem 4.5 Compare Courtois’s method to Takahashi’s method for numerical accuracy on the example studies performed in Sections 4.1 and 4.2.4. In particular, explain the observed numerical differences in the corresponding macro-state transition probability matrices r, in the corresponding macrostate probability vectors y, and in the corresponding micro-state probability vectors u. Apply decomposition with regard to station two, as indiProblem 4.6 cated in Fig. 4.3, to the DTMC underlying the discussions in Sections 4.1 and 4.2.4. Build up and organize a transition probability matrix that properly reflects the chosen strategy of decomposition. Calculate the approximate lumpability error E according to Eq. (4.26)) compare it to the results obtained in Section 4.2.4, and interpret the differences. Apply Takahashi’s algorithm on the basis of the decomposiProblem 4.7 tion strategy with regard to station two, as discussed in the previous problem. Compare the resulting macro-state transition probability matrices, the macrostate probability vectors, and the micro-state probability vectors obtained in the iteration step to those obtained under the original decomposition strategy in Section 4.2.4. Comment on and interpret the results. Problem 4.8 Use the modified algorithm of Takahashi and apply it for a numerical analysis of the CTMC specified by its generator matrix Q in Table 4.2 and the parameters in Table 4.1. Apply the same partition as indicated in Table 4.2.
5 Transient Solution of Markov Chains Transient solution is more meaningful than steady-state solution when the system under investigation needs to be evaluated with respect to its shortterm behavior, Using steady-state measures instead of transient measures could lead to substantial errors in this case. Furthermore, applying transient analysis is the onl y choice if non-ergodic models are investigated, Transient analysis of Markov chains has been attracting increasing attention and is of particular importance in dependability modeling. Unlike steady-state analysis, CTMCs and DTMCs have to be treated differently while performing transient analysis. Surprisingly, not many algorithms exist for the transient analysis of DTMCs. Therefore, we primarily focus on methods for computing the transient state probability vector r(t) for CTMCs as defined in Eq. (2.53). Furthermore, additional attention is given to the computation of quantities related to transient probabilities such as cumulative measures. Recall from Eq. (2.53) that for the computation of transient state probability vector n(t), the following linear differential equation has to be solved, given infinitesimal generator matrix Q and initial probability vector m(O):
d7r(t>= m(t)Q, dt
~(0) = (Mom,.
. .> .
(54
Measures that can be immediately derived from transient state probabilities are often referred to as instantaneous measures. However, sometimes measures based on cumulative accomplishments during a given period of time 177
178
TRANSIENT SOLUTION OF MARKOV CHAINS
Fig. 5.1
A pure birth
process.
[0, t) could be more relevant. Let: t
L(t) =
.I
7r(U)dU
0
denote the vector of the total expected times spent in the states of the CTMC during the indicated period of time. By integrating Eq. (2.53) on both sides, we obtain a new differential equation for L(t): -dW
dt
Cumulative
measures
= L(t)Q + r(O),
L(0) = 0.
can be directly computed from the transient solution
of Eq. (5.2).
5.1
TRANSIENT
ANALYSIS
USING EXACT METHODS
We introduce transient analysis with a simple example of a pure birth process. 5.1.1
A Pure Birth Process
As for birth-death processes in steady-state case, we derive a transient closedform solution in this special case. Consider the infinite state CTMC depicted in Fig. 5.1 representing a pure birth process with constant birth rate X. The number of births, N(t), at time t is defined to be the state of the system. The only transitions possible are from state Ic to state k + 1 with rate X. Note that this is a non-irreducible Markov chain for any finite value of X, so the steady-state solution does not exist. From Fig. 5.1, the infinitesimal generator matrix Q is given by:
Due to the special structure of the generator matrix, it is possible to obtain a closed-form transient solution of this process. With generator matrix Q and
TRANSIENT ANALYSIS USING EXACT METHODS
179
Eq. (2.53), we derive a system of linear differential equations for our example:
(5.3) (54 With the initial state probabilities:
‘Irk(O) =
1 Ic=o, 0 k>l, {
(5.5)
the solution of the differential equations can be obtained. From differentiation and integration theory the unique solution of Eq. (5.3) is: 7ro(t) = emAt.
(5.6)
By letting k = 1 in Eq. (5.4) and subsequently substituting for TO(~) into the differential equation, we get:
the solution
$m(t) = -k(t) + xTo(t) = -hrl(t)
+ Xesxt.
Applying again elementary differentiation and integration rules to this different i al equation results in another simple unique solution: nl(t) = AtemAt.
(5.7)
If this process is repeated, we get a closed-form solution for each transient state probability i~~k(t): --Xt,
k > 0 *
The proof of Eq. (5.8) is straightforward using the principle of induction and we leave it as an exercise to the reader. We recognize Eq. (5.8) as the Poisson pmf. Poisson probabilities are plotted as functions of t in Fig. 5.2 for several values of X. Thus, the random variable N(t) at time t is Poisson distributed with parameter X-L,while the stochastic process {N(t)It 2 0) is the Poisson process with rate X. So the transient analysis of a very simple birth process provided as a by-product one of the most important results of Markov chain and queueing theory, i.e., the stochastic process under consideration is the Poisson process. One important measure of the Poisson process is its mean value function m(t), defined as the expected number of births in the
180
TRANSIENT SOLUTION OF MARKOV CHAINS
1
0.8
fig. 5.2
Selected Poisson probabilities
with parameters X = 0.5 and 1.0.
TRANSIENT ANALYSIS USING EXACT METHODS
181
interval [0, t). This measure is computed as: m(t) = 2 i7r$) = 2 i y+ i=o * i=o
= e-xt 2 -(At) izl (i(W1 - v
= (Xt)cAt 2 y i=o *
(5.9)
= (Xt)emxtex” = At. Eq. (5.8) together with Eq. (5.9) characterize the Poisson pmf and its mean At. Problem 5.1 Using the principle of mathematical induction, show that the Poisson pmf as given in Eq. (5.8) is the solution to Eq. (5.4). Problem 5.2 Derive the solution of Eq. (5.2) of the pure birth process of Fig. 5.1, assuming no(O) = 1 and ~(0) = 0, V’k > 1. 5.1.2
A Two-State
CTMC
While in the previous section transient analysis based on closed-form expressions was possible due to very simple and regular model structure, in this section we carry out closed-form transient analysis of a CTMC with two states. Our treatment closely follows that in [STP96]. Fig. 5.3 shows a homogeneous, continuous time Markov chain model for this system with state space S = (0, 1). A typical application of this model is to a system subject to failure and repair.
Fig. 5.3
A simple two-state CTMC.
The generator matrix of the CTMC of Fig. 5.3 is:
Q=[
;’
:A].
From the system of Eqs. (2.51), we get:
&o(t) = -P-o(t) + h(t), d p(t)
= wo(t>
-
h(t).
(5.10)
182
TRANSIENT SOLUTION OF MARKOV CHAINS
Applying the law of total probability, To@>
we conclude:
=
1 -
w(t),
and rewrite Eq. (5.10) as: $1(t)
= p(1 - 7r1(t))
- h(t)
or: d g1
(t> + (P + +h(q
= p.
A standard method of analysis for linear differential Eq. (5.11) is to use the method of integrating factors [RaBe74]. Both sides of the (rearranged) equation are multiplied by the integrating factor:
to get:
p&+W = &+W --gl(t) d + (p + X)e(p+x)trl(t)
(5.12) (5.13)
By inspection of Eq. (5.12) it can be seen that the sum expression on the right side of the equation equals the derivative of the product of the subterms so that Eq. (5.13) results. Integrating Eq. (5.13) on both sides gives us as an intermediate step: I-L -e(CL+‘Jt PL+X Multiplying
+ c = e(P+‘b
71‘1(t).
(5.14)
Eq. (5.14) with: ,++w
provides the desired result: w(t)
=
5
+ ce-(P+X)t
(5.15)
Integration constant c reflects the dependency of the transient state probabilities on the initial probability vector m(O). Assuming, for example: 7r(l)(O) = (0,l) , results in: 1 = r(l)1 (0) =
5
=pL+c*
+ ce-(P+x)o
TRANSIENT ANALYSIS USING EXACT METHODS
Thus, unconditioning integration constant:
with an initial
c=l-
183
pmf 7r(l)(O) yields the corresponding
-I- P Pu+X
x p+X’
The final expressions of the transient state probabilities
in this case results
in:
x (,++X)t) ) @(t)=Lp+x+p+x 7&t> =1- 7p(t)
(5.16)
p + X (e--(p+X)t) =-- p + X I-L+X ,w+X = X - X (e--(p+X)t) PL+X
= - x - X (e-(P+h)t) . p+x p+x With a closed-form expression of the transient state probabilities formance measure can be easily derived.
(5.17) given, per-
Problem 5.3 First, derive expressions for the steady-state probabilities of the CTMC shown in Fig. 5.3 by referring to Eq. (2.58). Then, take the limits as t -+ 00 of the transient pmf z(‘)(t) and compare the results to the steady-state case 7r. Problem 5.4 Derive the transient state probabilities of the CMTC shown in Fig. 5.3 assuming initial pmf rrc2)(0) = (0.5,0.5). Take the limit as t + 00 of the resulting transient pmf 7rc2)(t) and compare the results to those gained with 7r(‘)()) when taking the limit. Problem 5.5 Assume the system modeled by the CTMC in Fig. 5.3 to be “up” in state 1 and to be “down” in state 0. (a) Define the reward assignments for a computation of availability measures based on Table 2.1. (b) Derive formulas for the instantaneous availability A(t), the interval unavailability UA( t) , and the steady-state availability A for this model, based both on 7r(‘)(t) and 7rc2)(t). Refer to Section 2.2.2.3.1 for details on how these availability measures are defined. (c) Let p = 1, X = 0.2 and compute all measures that have been specified in this problem. Evaluate the transient measures at time instants t E {1,2,3,5,10>.
186
TRANSIENT SOLUTION OF MARKOV CHAINS
computations with small numbers, a left truncation point 1 can be introduced. To this end an overall error tolerance E = ~1+E, is partitioned to cover both left and right truncation errors. Applying again a vector norm, a left truncation point 1 can be determined similarly to the right truncation point r:
1-l c
e-qt -(qtY
i=O
i!
5 El.
(5.24)
With appropriately defined truncation points I and r as in Eqs. (5.23) and (5.24), the original uniformization Eq. (5.21) can be approximated for an implement at ion: 7r(t) 73 2 v(i)emqtq, i=l
.
Y(0) = 7T(O).
(5.25)
In particular, O(a) terms are needed between the left and the right truncation points and the transient DTMC state probability vector Y(Z) must be computed at the left truncation point 1. The latter operation according to Eq. (5.22) requires O(qt) computation ‘time. Thus the overall complexity of uniformization is 0 (qqt) [ReTr88]. (Here q denotes the number of nonzero entries in Q.) To avoid underflow, the method of Fox and Glynn can be applied in the computation of Z and r [FoG188]. Matrix squaring can be exploited for a reduction of computational complexity [ReTr88, AbMa93]. But because matrix fill-ins might result, this approach is often limited to models of medium size (around 500 states). 5.1.4.2 Stiffness Tolerant Uniformiza tion As noted, uniformization is plagued by the stiffness index qt [ReTr88]. Muppala and Trivedi observed that the most time consuming part of uniformization is the iteration to compute y(i) in Eq. (5.22). But this iteration is the main step in the power method for computing the steady-state probabilities of a DTMC as discussed in Chapter 2. Now if and when the values of y(i) converge to a stationary value ti, the iteration Eq. (5.22) can be terminated resulting in considerable savings in computation time [MuTr92]. We have also seen in Chapter 2 that speed of convergence to a stationary probability vector is governed by the second largest eigenvalue of the DTMC, but not by qt. However, since the probability vectors v(i) are computed iteratively according to the power method, convergence still remains to be effectively determined. Because an a priori determination of time of convergence is not feasible, three cases have to be differentiated for a computation of the transient state probability vector z(t) [MuTr92, MMT94] :
1. Convergence occurs beyond the right truncation point. In this case, computation of a stationary probability vector is not effective and the transient state probability vector ?r(t) is calculated according to Eq. (5.25) without modification.
188
TRANSIENT SOLUTION OF MARKOV CHAINS
1. Convergence occurs beyond the truncation point, that is, c > r. As a consequence, Eq. (5.29) remains unaffected and must be evaluated as is. 2. If convergence occurs before truncation point, Eq. (5.29) is adjusted to: L(t) = k fy v(i)L’(t) 2=0 = f 5 v(i)L’(t) 2=0
+ TV
= ; $ v(i)L’(t) a=0
+ fv(c)
= ig
2
L’(t)
kc+1
2 L’(i) - 2 L’(t) i=O ( i=o
u (38 L ‘( t >+$44 a=0
)
(91-@f(t))
= ; 2 u(i) (l-ge-qq i=o iqt-g
+$44
(1-$0tQ$)),
(5.30)
with
In case truncation needs to be performed, a time-dependent error-bound estimate d’)(t) related to Eq. (5.29) can be given as a function of the truncation point r. Assuming an error tolerance allowance, the corresponding number of required terms r can be determined:
&9(t) 5 I 2
2
e-Gy
q i=r+1j=i+1 5;
,g
’
(i-(r+l))e-qty
z=?-+1 i
< -
l
cc
ie-ql(;) --f
c G i=r+1
&cqtll--
i==T
,e -
z=r+1 r+l
(d
.
(r+l)e-d$
O3 q
c i=r+l
(?-qt Glv i!
*
(5.31)
TRANSIENT ANALYSIS USING EXACT METHODS
5.1.5
Other
Numerical
189
Methods
We briefly discuss numerical methods based on ordinary differential equation approach followed by methods exploiting weak lumpability. 5.1.5.1 Ordinary Differential Equations Standard techniques for the solution of ordinary differential equations (ODE) can be utilized for the numerical solution of the Kolmogorov differential equations of a CTMC. Such ODE solution methods discretize the solution interval into a finite number of time intervals {ti, 62,. . *, t’Z?“‘? t 7% }. The difference between successive time points, called the step size h, can vary from step to step. There are two basic types of ODE solution methods: explicit and implicit. In an explicit method, the solution rr(ti) is approximated based on values 7r(tj) for j < i. The computational complexity of explicit methods such as Runge-Kutta is O(qqt) [ReTr88]. Although explicit ODE-based methods generally provide good results for non-stiff models, they are inadequate if stiff models need to be studied. Note that stiff CTMCs are commonly encountered in dependability modeling. In an implicit method, r(ti) is approximated based on values n;TT( tj) for j 5 i. Examples of implicit methods are TR-BDF2 [ReTr88] and implicit Runge-Kutta [MMT94]. At each time step a linear system solution is required in an implicit method. The increased overhead is compensated for by better stability properties and lower computational complexity on stiff models [MMT94, ReTr88]. For non-stiff models, Uniformization is the method of choice, while for moderately stiff models, uniformization with steady-state detection is recommended [MMT94]. For extremely stiff models, TR-BDF2 works well if the accuracy required is low (eight decimal digits). For high accuracy on extremely stiff models, implicit Runge-Kutta is recommended [MMT94].
5.1.5.2 Weak Lumpability An alternative method for transient analysis based on weak lumpability has been introduced by Nicola [NicoSO]. Nicola’s method is the transient counterpart of Takahashi’s steady-state method, which was introduced in Section 4.2. Recall that state lumping is an exact approach for an analysis of a CMTC with reduced state space and, therefore, with reduced computational requirements. State lumping can be a very efficient method for a computation of transient state probabilities, if the lumpability conditions apply. Note that state lumping can also be orthogonally combined with other computational methods of choice, such as stiff uniformization, to yield an overall highly accurate and efficient method. For the sake of completeness, we present the basic definitions of weak lumpability without going into further detail here. Given conditional proba-
192
TRANSIENT SOLUTION OF MARKOV CHAINS
A00 Aol ...
previously described state partition:
41
Al0
Q=
.
;
L
A(F-1)0 AFO
. . .
e-S ...
QF-1 AF(F-1)
AOF &F ..
(5.35)
.
A(F-l)F AFF*i
Let Qr = Q~r,l 5 I 5 F - 1, denote submatrices containing transition rates between grouped states within fast recurrent subset Sr only. Submatrices AIJ, I # J, contain entries of transitions rates from subset SI to SJ. Finally, Ace and AFF collect intra slow states and intra fast transient states transition rates, respectively. Matrices Qr , 1 5 I 5 F - 1, contain at least one fast entry in each row. Furthermore, there must be at least one fast entry in one of the matrices AFI, 0 5 I 5 F - 1 if SF # 8. Matrix AFF may contain zero or more fast entries. Finally, all other matrices, i.e., Aoo,&J,-~ # J,O L I L F-LO 5 J < F, cant ain only slow entries. By definition, only slow transitions are possible among these subsets SI and SJ. An approximation to r(t) is derived, where n(t) is the solution to the differential Eq. (5.36):
The reorganized matrix Q forms the basis to create the macro-state generator matrix X. Three steps have to be carried out for this purpose: first, aggregation of the fast recurrent subsets into macro states and the corresponding adaptation of the transition rates among macro states and remaining fast transient states, resulting in intermediate generator matrix %. In a second step, the fast transient states are eliminated and the transition rates between the remaining slow states are adjusted, yielding the final generator matrix X. Finally, the initial state probability vector ~(0) is condensed into a(O) as per the aggregation pattern. Transient solution of differential Eq. (5.37) describing the long-term interactions between macro states is carried out:
$7(t) = tT(t)lx, g(o)
=
(5.37)
(~Oo(~),~~~,~O,,_,(~),~l(~),~~~,~F-l(~))
a
Once the macro state probability vector o(t) has been computed, disaggregations can be performed to yield an approximation r”(t) of the complete probability vector n(t). 5.2.2
Aggregation
of Fast Recurrent
Subsets
Each subset of fast recurrent states is analyzed in isolation from the rest of the system by cutting off all slow transitions leading out of the aggregate. Collecting all states from such a subset S’I and arranging them together with the
AGGREGATION OF STIFF MARKOV CHAINS
193
corresponding entries of the originally given infinitesimal generator matrix Q into a submatrix &I gives rise to the possibility of computing the conditional steady-state probability vectors XT, 1 < I 5 F - 1. Since &I does not, in general, satisfy the properties of an infinitesimal generator matrix, it needs to be modified to matrix QT:
Q;=QI+DI.
(5.38)
Matrix DI is a diagonal matrix whose entries are the sum of all cut-off transition rates for the corresponding state. Note that these rates are, by definition, orders of magnitude smaller than the entries on the diagonal of Qr. The inverse of this quantity is used as a measure of coupling between the subset Sr and the rest of the system. Since Q: is the infinitesimal generator matrix of an ergodic CMTC, the solution of: $Q;
7rp = 1
= 0,
(5.39)
yields the desired conditional steady-state probability vector X; for the subset of fast states SI. For each such subset SI, a macro state I is defined. The set of aggregated states {I, 1 5 I 5 F - I}, constructed in this way, together with the initially given slow states Se form the set of macro states niir = SoU{I, 1 5 I 5 F - 1) for which a transient analysis is performed in order to account for long-term effects among the set of all macro states. To create the intermediate generator matrix 2 of size (]1M( + no) x (]A!!] + no), unconditionings and aggregations of the transition rates have to be performed. The dimensions [no x 7251of submatrices 21~ are added to the corresponding equations in what follows. Let us define matrices of appropriate size: 1 ... 1 ...
1
E=
containing 1s only as entries. The following cases are distinguished: l
Transitions remain unchanged among slow states in Se:
zoo= Aoo, [nox no]*
(5.40)
0 nanSitiOnS involving fast transient states from SF: - Transitions between slow states and fast transient states, and vice versa, remain unchanged: EFO = AFO,
[nF x 72017
(5.41)
EOF =AoF,
[no x nF].
(5.42)
195
AGGREGATION OF STIFF MARKOV CHAINS
a way that it reflects the partitioning states:
into macro states and fast transient
(5.49) If no fast transient states are included, then X = Z already constitutes the final generator matrix. In any case, a transient analysis can be carried out based on a model with reduced state space. Although the intermediate model may still be stiff because fast and slow states are still simultaneously included, the computational savings can be considerable due to smaller state space. But more aggregation is possible, as is highlighted in the following section. With 2 given, a differential equation can be derived for the computation of the transient state probability vector 3(t), conditioned on initial probability vector 8(O):
$=(t)= 5(t)%, w> 5.2.3
(5.50)
= (km..
. ,~o,,~Jo>,~l(o),
. . * ,&?-1(0),aF,(0),
Aggregation
of Fast Transient
Subsets
. . . ,i?F,F-l(o))
.
Recall that a fast transient subset of states is connected to states in other subsets by at least one fast transition. Thus if such a set is not nearly completely decomposable from the rest of the system, it is tightly coupled to at least one other non-empty subset of states. Now consider an intermediate state space consisting of a non-empty fast transient subset of states SF c S, and the set of macro states n/r = SoU{I, 1 5 I 5 F - l} resulting from aggregation of fast recurrent subsets and collecting them together with the slow states. Furthermore, assume the intermediate transition rates among the states in MU SF to be already calculated as shown in the previous section. From Eq. (5.50) and Eq. (5.49)) the following two equations can be derived: (5.51) (5.52) It is reasonable to assume that for the fast transient states, the derivative on the left-hand side of Eq. (5.52) approaches zero (i.e., d/d-G-~(t) = 0), with respect to the time scale of slow states. Hence from Eq. (5.52) we obtain an approximation of 0~ :
196
TRANSIENT SOLUTION OF MARKOV CHAINS
Using the result from Eq. (5.53) in Eq. (5.51) provides us with the following approximation:
(5.54) = c+gf(t) (2
MM
-
~M&%M
= +z(t)k;,.
>
(5.55) (5.56)
Hence, E x %EM is taken as the infinitesimal generator matrix of the final aggregated Markov chain. Bobbio and Trivedi [BoTr86] have shown that:
(5.57) is the asymptotic exit probability matrix from the subset of fast transient states SF to macro states M. The matrix entries of PFM = bi~(O, oo)] ,i E SF, I E AI are the conditional transition probabilities that the stochastic process, once initiated in state i at time t = 0 will ultimately exit SF and hit macro state I E 44 as time t -+ co. These probabilities are used to adjust the transition rates among macro states during elimination of fast transient states. The interpretation is that the fast transient states form a probabilistic switch in the limit, and the time spent in these sets of states can be neglected in the long run. Furthermore, related to this assumption, the exit stationary probability will be reached in a period of time much smaller than the time scale characterizing the dynamics among the macro states. The error induced by this approximation is inversely proportional to the stiffness ratio. For the sake of completeness, it is worth mentioning that the entries in matrix I!ZMF from Eq. (5.55) denote exit rates, leading from macro states in AL!via the probabilistic switch, represented by matrix PFM Eq. (5.57),_back to macro states. Hence, the rate matrix YZMFPFM is simply added to XMM in Eq. (5.55) to obtain the rate adjustment. 5.2.4
Aggregation
of Initial
State
Probabilities
Since the transient analysis is performed only on the set of macro states M, the initial probability vector ~(0) must be adjusted accordingly. This adjustment is achieved in two steps. First, the probability mass z1(0) assigned to micro states i E ,571,which are condensed into a single macro state I, is accumulated for all I, 1 5 I 5 F - 1:
61(O) = 7rI(O)l.
(5.58)
Second, if SF # 8, then the probability mass ?TF(O) assigned to the fast transient states SF is spread probabilistically across the macro states A4 according to rule implied by the probabilistic switch PFM given in Eq. (5.57):
5;(O) = &(O)+TF(O)PFM.
(5.59)
AGGREGATION OF STIFF MARKOV CHAINS
197
Depending on SF, the initial probability vector a(O) is given by 5; (0) according to Eq. (5.58) or by 5(O) according to Eq. (5.59). The aggregation part has now completely been described. With this technique, the transient analysis of large stiff Markov chains can be reduced to the analysis of smaller non-stiff chains. Transient analysis is accomplished only for the macro states in the model, be it either the initially present slow states or the macro states resulting from an aggregation of fast recurrent subsets of states. The accuracy of the method is inversely proportional to the stiffness ratio of the initial model. With the assumption that steady state is reached much quicker for the fast states than for the slow states, further approximations become possible. Applying certain disaggregation steps, approximations of the transient fast state probabilities can also be derived, thus resulting in a complete state probability vector. 5.2.5
Disaggregations
525.1 Fast Transient States With the approximate macro state probability vector &E(t), obtained as the solution of Eq. (5.56), the approximate fast transient state probability vector 8:(t) can be derived easily with a transient interpretation of Eq. (5.53):
*F(t) = -c&(t)&&;;.
(5.60)
With:
c = b;(t)l+
iqtp,
(5.61)
normalization can be taken into account for computation of a probability vector so that the first disaggregation step is accomplished, leading to the intermediate transient state probability vector u- =(c) (t) as an approximate solution of Eq. (5.50): b(t) z ec)
(t) = i (i?%(t), a;(t))
.
(5.62)
5.2.5.2 Fast Recurrent States In a second disaggregation step, the transient micro-state probability vector m”(t) is also calculated as an approximation of n(t), which is formally defined as the solution of Eq. (5.36), incorporating the original infinitesimal generator matrix Q from (5.35). For a computation of 7?(t) = @y(t)), we need to refer to the conditional micro-state probability vectors 7r; = (7rT0,.. . ,7r;,1P1 ), 1 5 I -< F - 1, from Eq. (5.39). The transient macro-state probabilities ar(t), 1 5 I 5 F - 1, are used to uncondition the conditional steady-state probability vectors $, 1 5 I 5 F - 1: 7rr(t) z 7ry (t) = (ar(t)-$)
.
(5.63)
198
TRANSIENT SOLUTION OF MARKOV CHAINS
If SF # 0 unconditioning has to be performed on the basis of b(t) &“)(t) according to Eq. (5.62):
7Q(t) x Try (4 = ( p
(4
$
) .
z
(5.64)
Collecting all the intermediate results from Eq. (5.62) and Eq. (5.64) yields the final approximate transient probability vector in its most general form:
=
(
p(t),Ty(t),
. . . ,7r~-l(t),ti~~c~(t))
.
(5.65)
If SF = 8 then Eq. (5.65) simplifies to the following:
7(t) E57P (t> = (ab&), . * * 7~o,,-l (4,
(5.66) 5.2.6
The Algorithm
Initialization: l
Specify r >> l/T, the transition rate threshold, as a function of the time horizon T for which the transient analysis is to be carried out.
l
Partition the state space S into slow states SO and fast states S - SO with respect to r.
l
Partition the fast states S - So further into fast recurrent subsets Sl, 1 5 I < F - 1 and possibly one subset of fast transient states SF.
l
Arrange the infinitesimal generator matrix Q according to Eq. (5.35). Steady-state analysis of fast recurrent subsets:
l
l
Prom submatrices Q1 (Eq. (5.35)), 0bt ain infinitesimal ces Q; according to Eq. (5.38). Compute the conditional micro state probability
generator matri-
vectors $ according to
Construction of the intermediate generator matrix 2 as in Eq. (5.49): l
Use Q from Eq. (5.35) and n;, 1 5 I 5 F - 1 from Eq. (5.39) and apply operations starting from Eq. (5.40) through Eq. (5.48).
AGGREGATION OF STIFF MARKOV CHAINS
199
Aggregation of fast transient states: IF SF # 8
l
- THEN perform aggregation of fast transient states and construct -E;M: * Compu_te the probabilistic switch PFM according to Eq. (5.57) and ELM according to Eq. (5.55) and Eq. (5.56) from 2. * x=%~&f. - ELSE X = 2:. Aggregation of the initial state probability
vector a(O):
l
Accumulate initial probabilities ~1, (0) 0f micro states i E SI into 61(O), 1 5 Is F - 1 according to Eq. (5.58).
l
IF SF # 8 - THEN calculate approximate initial macro-state probability o(0) = 5$(O) according to Eq. (5.59).
vector
- ELSE a(O) = k(O) according to Eq. (5.58). Computation of the transient macro-state probability l
vector a(t):
Solve Eq. (5.37). Disaggregations:
- THEN * Compute intermediate transient state probability according to Eq. (5.62).
vector 6(t)
* Compute the approximate micro-state probability subvectors n?(t) by unconditioning of $ according to Eq. (5.64) for all 1 < I 5 F - 1. - ELSE compute ?ry(t) by unconditioning of 7rT according to Eq. (5.63) for all 1 5 I 5 F - 1. Final result: l
Compose the approximate transient probability vector r(t) according to Eq. (5.65) if SF # 0 or Eq. (5.66) if SF = 8.
w n”(t)
200
TRANSIENT SOLUTION OF MARKOV CHAINS
5.2.7
An Example:
Server
Breakdown
and Repair
Assume a queueing system with a single server station, where a maximum of m customers can wait for service. Customers arrive at the station with arrival rate X. Service time is exponentially distributed with mean I/,X Furthermore, the single server is subject to failure and repair, both times to failure and repair times being exponentially distributed with parameters y and 6, respectively. It is reasonable to assume that X and /J differ by orders of magnitude relative to y and 6. Usually, repair of a failed unit takes much longer than traffic-related events in computer systems. This condition is even more relevant for failure events that are relatively infrequent. While traffic-related events typically take place in the order of micro seconds, repair durations are in the order of minutes, hours, or days, and failure events in the order of months, years, or multiple thereof. Thus, the transition rates X and p can be classified as being fast, and y and S as slow. Note that we are interested in computing performance measures related to traffic events so that for the rate threshold, the relation r < S < y generally holds and the application of the aggregation/disaggregation approach is admissible. The described scenario is captured by a CTMC with a state space S = {(M),O L 1 I m,k E {OJ}}, w h ere Z denotes the number of customers in the queue and Ic the number of non-failed servers, as depicted in Fig. 5.4a. The state space S is partitioned according to the classification scheme applied to the rates, into a set of slow states Se, a set of fast recurrent states Si, and a set of fast transient states SF = S2: = {(m,W, Sl = {(O, I>, (1,1>, -. . , Cm, I)),
SO
S2 = {(0,0),(1,0),-,(m-
LO)}.
For the case of m = 2, the following state ordering is used in compliance with the partitioning:
&
lo
11
(071)
(171)
20
12 c&l>
W)
21 (LO)
*
To complete initialization, the infinitesimal generator matrix Q, m = 2, is restructured accordingly to reflect the partitioning:
Q=
-6 0 0 Y 0 x
-(x0+?) P 0 6 0
0 x -(X+Y+P) P 0 s
s 0 -(cw 0 0
0 Y
0 0
t
Y 0
-(A + S) 0
-(AA+
6)
. 1
202
TRANSIENT SOLUTION OF MARKOV CHAINS
birth-death process as described by QT : lq =
0-x i CL
l-; l-
3’
0
i = 0, 1, . . . ) 2.
;
To aggregate the set of fast recurrent states Sr into macro state 1, $ is used to derive the intermediate infinitesimal generator matrix 2 for the reduced state space {(2,0), 1, (O,O), (1,O)). By applying Eq. (5.40) through Eq. (5.48):
g=
-4 G,r 0 x
s -Y s 6
0 G,Y -(X+6) 0
0 xi,? x -(A + S)
Matrix 2 represents the CTMC as depicted in Fig. 5.4b for the case of and the m = 2. For an aggregation of the fast transient states, first 2,; probabilistic switch P FM are calculated from 9 according to Eq. (5.57):
With 2 and PFM given, I= = 2”MM can be easily derived using some algebraic manipulation according to Eq. (5.55) and Eq. (5.56):
where X/(X + S) d enotes the probability that a new customer arrives while repair is being performed, and y (X/(X + s>)2-i is the probability that a failure occurs when there are i customers present and that m - i customers arrive during repair. An initial probability vector ~(0) = (0, 1, 0, . . . , 0) is assumed, i.e., ~1~ = 1. This condition implies a(0) = (0,l) in Eq. (5.58) and Eq. (5.59). From Eqs. (5.16) and (5.17) we know symbolic expressions for the transient state probabilities of a two-state CTMC, given an initial pmf r(O). We can therefore apply this result by substituting the rates from I% into these expressions. By letting: 2
cl=ypri i=o
(
x xs6
2-i
)
’
AGGREGATION OF STIFF MARKOV CHAINS
203
we get the following macro-St ate probabilities: ao(t> = 1 - a(t), s m(t)
=
G
+
&
(,++a)
.
Finally, disaggregations must be performed. Given &c(t) = a(t), Eq. (5.60) needs to be evaluated for a computation of the fast transient probability vector:
i m(t)~dm&j
=
(QA+6
+$J).
(
Next, normalization
is accomplished with help of c according to Eq. (5.61):
c= 1+01(t)&
(6
(1+&J
The intermediate transient state probability lows immediately : W) = ;
(
1 - ol(t),~l(t),~l(t)~,~l(t)~
vector a(t) from Eq. (5.62) fol-
i
The final step consists of unconditioning $: = -01 w (TTo’ C
7qt)
+c).
(Qx + 6 +r:l)). the micro-state probability
4,)
vector
7r*12 ) .
Collecting all the results provides an approximation m%(t) to the transient probability vector, which is represented as the transpose:
7r(t)
73 n”(t)
=
204
TRANSIENT SOLUTION OF MARKOV CHAlNS
From the transient state probabilities, all the performance measures of interest can be calculated. Table5.2 Parameter set for Table 5.2. 6
Y
x
I-1
0.01
0.0001
0.5
1
The exact transient state probability vector n(t) is compared to the approximate state probability vector m”(t) at time instants t = 10,100 and 1000 in Table 5.2, with the parameter set from Table 5.1. (Note that the initial probability vectors are chosen as has been specified earlier.) The results of the approximate method are generally very close to the exact values for most states. Usually, the accuracy depends on the length of the investigated time horizon for a transient analysis. Short-term results are particularly dependent on the initial probability vector so that some more significant differences can arise as in the case of state (2,O) in our example, where the absolute state probabilities are rather small, and hence the percentage difference is high. Furthermore, whether the error is positive or negative may change over time.
0.008
0.006
0.004
0.002 8 A Q 5 2 2
0
-0.002
-0.004
-0.006
-0.008
Fig. 5.5
I I
I
0.0001
0.001
Errors of the approximate method as a function y.
#I
0.01
Y
AGGREGATION
OF STIFF
MARKOV
205
CHAINS
State probabilities as a function of time: exact vs. approximate
Table 5.2
Exact r(t) No.
State
0 lo
ii>;;
11 12
(1:1) Gw
20
(ON
21
(170)
t=o 0 1 0 0 0 0
t = 10
t = 100
t = 1000
0.00067567 0.57099292 0.28540995 0.14264597 0.00011195 0.00016354
0.00601869 0.56777268 0.28392083 0.14201127 0.00011134 0.00016484
0.00962544 0.56567586 0.28289339 0.14152880 0.00011092 0.00016421
Approx zTM(t) No.
State
0 lo
cw>
t=o 0 0.571429 0.285714 0.142857 0 0
11 12
20 21
t = 10
t = 100
t = 1000
0.00092448 0.57074200 0.28537100 0.14268500 0.00011191 0.00016567
0.00611894 0.56777400 0.28388700 0.14194400 0.00011133 0.00016481
0.00962543 0.56577100 0.28288600 0.14144300 0.00011094 0.00016423
Absolute error No. 0 lo 11 12
20 21
State
t
= 10
-2.488e-04 2.509e-04 3.895e-05 -3.903e-05 4e-08 -2.13e-06
t = 100 -l.O03e-04 -1.32e-06 3.383e-05 6.727e-05 le-08 3e-08
Error %
t
= 1000
le-08 -9.514e-05 7.39e-06 8.58e-05 -2e-08 -2e-08
t
= 10
-36.8242 0.04394 0.01365 -0.02736 0.03573 -1.30243
t
= 100
-1.66564 -0.00023 0.01192 0.04737 0.00898 0.01820
t
= 1000 0.00010 -0.01682 0.00261 0.06062 -0.01803 -0.01218
The impact of different degrees of stiffness on the accuracy of the results are indicated in Table 5.3 and in Fig. 5.5. In particular, the empirical results depicted in Fig. 5.5 support the assumption of the approximation being better for stiffer models. Almost no error is observed if y 5 10v4, while the error substantially increases for the less stiff models as y gets larger. The results depend on the three different parameter sets given in Table 5.4. A closer look at Fig. 5.5 reveals a non-monotonic ordering of the degree of accuracy. Let y = 0.01, then the worst result is obtained for the largest 6, that is, S = 0.1 in state (2, l), as anticipated. But this behavior does not occur for y < 0.001, where the approximate probability is closer to the exact value for S = 0.1 than for 6 = 0.01. Similar patterns can be observed in other cases, as well. Problem 5.6 Verify the construction of the intermediate infinitesimal generator matrix & in the example study by applying Eq. (5.40) through Eq. (5.48) for each submatrix of Q in the indicated way.
AGGREGATION OF STIFF MARKOV CHAINS
Table 5.4 Experiment ;i; (3)
207
Parameter set for Table 5.3 6
Y
x
I-L
0.001 0.01 0.1
0.0001 0.001 0.01
0.5
1
full, in which case they must be rejected. Use this probability to calculate the eflective arrival rate X,, defined as X minus the rejected portion of arrivals. Use Little’s law to calculate the average response time. Apply the aggregation technique for the transient analProblem 5.10 ysis of stiff Markov chains of this section to the example in Section 4.1 and derive the approximate transient probability vector. Assume ~(0) = 111111 transient state probability vector g7 g ), evaluate the approximate ( g7 g’g’g, at time instances t E {1,2,3,5, lo}, and compare the results to those obtained by applying Courtois’s method and to the exact results.
6 Single Station
Queueing Systems
A single station queueing system, as shown in Fig. 6.1, consists of a queueing buffer of finite or infinite size and one or more identical servers. Such an elementary queueing system is also referred to as a service station or, simply, as a node. A server can only serve one customer at a time and hence, it is
Arriving Jobs
Departing Jobs
Servers
Fig. 6.1
Service station with m servers (a multiple server station).
either in a “busy” or an “idle” state. If all servers are busy upon the arrival of a customer, the newly arriving customer is buffered, assuming that buffer space is available, and waits for its turn. When the customer currently in service departs, one of the waiting customers is selected for service according to a queueing (or scheduling) discipline. An elementary queueing system is further described by an arrival process, which can be characterized by its sequence of interarrival time random variables {Al, As, . . s}. It is common to assume that the sequence of interarrival times is independent and identically distributed, leading to an arrival process that is known as a renewal process [Triv82]. Recent work on correlated interarrival times can be found in [LucaSl] and [Luca93]. The distribution function of interarrival times can be continuous or discrete. We deal only with the former case in this book. 209
210
SINGLE STATION QUEUEING SYSTEMS
For information related to discrete interarrival time distributions, the reader may consult recent work on discrete queueing networks [Dadu96]. The average interarrival time is denoted by E[A] = TA and its reciprocal by the average arrival rate X:
The most common interarrival time distribution is the exponential, in which case the arrival process is Poisson. The sequence {Bi, &, +. +} of service times of successive jobs also needs to be specified. We assume that this sequence is also a set of independent random variables with a common distribution function. The mean service time E[B] is ’ d enoted by TB and its reciprocal by the service rate ,X
(6.2) 6.1
KENDALL’S
NOTATION
The following notation, known as Kendall’s notation, is widely used to describe elementary queueing systems: A/B/m
- queueing discipline,
where A indicates the distribution of the interarrival times, B denotes the distribution of the service times, and m is the number of servers (m 2 1). The following symbols are normally used for A and B: M EI, Hk Ck D G GI
Exponential distribution (memoryless property) Erlang distribution with Ic phases Hyperexponential distribution with k phases Cox distribution with k phases Deterministic distribution, i.e., the interarrival time or service time is constant General distribution General distribution with independent interarrival times
Due to proliferation of high-speed networks, there is considerable interest in traffic arrival processes where successive arrivals are correlated. Such non-G1 arrival processes include Markov modulated Poisson process (MMPP) [FiMe93] or batch Markovian arrival process (BMAP) [LucaSl]. Queueing systems with such complex arrival processes have also been analyzed [Luca93]. Except for an example of the use of MMPP, we do not consider such queueing systems in this book. The queueing discipline or service strategy determines which job is selected from the queue for processing when a server becomes available. Some commonly used queueing disciplines are:
KENDALL’S NOTATION
211
FCFS (First-Come-First-Served): If no queueing discipline is given in the Kendall notation, then the default is assumed to be the FCFS discipline. The jobs are served in the order of their arrival. LCFS (Last-Come-First-Served): SIR0 (Service-In-Random-Order): random.
The job that arrived last is served next. The job to be served next is selected at
RR (Round Robin): If the servicing of a job is not completed at the end of a time slice of specified length, the job is preempted and returns to the queue, which is served according to FCFS. This action is repeated until the job service is completed. PS (Processor Sharing): This strategy corresponds to round robin with infinitesimally small time slices. It is as if all jobs are served simultaneously and the service time is increased correspondingly. IS (Infinite Server): There is an ample number of servers so that no queue ever forms. Static Priorities: The selection depends on priorities that are permanently assigned to the job. Within a class of jobs with the same priority, FCFS is used to select the next job to be processed. Dynamic Priorities: The selection depends on dynamic priorities that alter with the passing of time. Preemption: If priority or LCFS discipline is used, then the job currently being processed is interrupted and preempted if there is a job in the queue with a higher priority. As an example of Kendall’s notation, the expression M/G/l
- LCFS preemptive resume (PR)
describes an elementary queueing system with exponentially distributed interarrival times, arbitrarily distributed service times, and a single server. The queueing discipline is LCFS where a newly arriving job interrupts the job currently being processed and replaces it in the server. The servicing of the job that was interrupted is resumed only after all jobs that arrived after it have completed service. Kendall’s notation can be extended in various ways. An additional parameter is often introduced to represent the number of places in the queue (if the queue is finite) and we get the extended notation: A/B/m/K
- queueing discipline,
where K is the capacity of the station (queue + server). This means that if the number of jobs at server and queue is K, a newly arriving job is lost.
212 6.2
SINGLE STATION QUElJElNG SYSTEMS
PERFORMANCE
MEASURES
The different types of queueing systems are analyzed mathematically to determine performance measures from the description of the system. Because a queueing model represents a dynamic system, the values of the performance measures vary with time. Normally, however, we are content with the results in the steady-state. The system is said to be in steady state when all transient behavior has ended, the system has settled down, and the values of the performance measures are independent of time. The system is then said to be in statistical equilibrium, i.e., the rate at which jobs enter the system is equal to the rate at which jobs leave the system. Such a system is also called a stable system. Transient solutions of simple queueing systems are available in closed-form, but for more general cases, we need to resort to Markov chain techniques as described in Chapter 5. Recall that the generation and the solution of large Markov chains can be automated via stochastic reward nets [MuTr92]. The most important performance measures are: Probability of the Number of Jobs in the System rk: It is often possible to describe the behavior of a queueing system by means of the probability vector of the number of jobs in the system rk. The mean values of most of the other interesting performance measures can be deduced from 7’rk: xk = P[there are Ic jobs in the system]. Utilization p: If the queueing system consists of a single server, then the utilization p is the fraction of the time in which the server is busy, i.e., occupied. In case there is no limit on the number of jobs in the single server queue, the server utilization is given by: P=
mean service time arrival rate X = I-* mean interarrival time service rate p
(6.3)
The utilization of a service station with multiple servers is the mean fraction of active servers. Since mp is the overall service rate:
p=X v’ and p can be used to formulate the condition for stationary mentioned previously. The condition for stability is:
PC 1,
(64 behavior
(6.5)
i.e., on average the number of jobs that arrive in a unit of time must be less than the number of jobs that can be processed. All the results given in Chapters 6-10 apply only to stable systems.
PERFORMANCE MEASURES
213
Throughput A: The throughput of an elementary queueing system is defined as the mean number of jobs whose processing is completed in a single unit of time, i.e., the departure rate. Since the departure rate is equal to the arrival rate X for a queueing system in statistical equilibrium, the throughput is given by:
in accordance with Eq. (6.4). W e note that in the case of finite buffer queueing system, throughput can be different from the external arrival rate. Response Time T: The response time, also known as the sojourn time, is the total time that a job spends in the queueing system. Waiting Time W: The waiting time is the time that a job spends in a queue waiting to be serviced. Therefore we have: Response time = waiting time + service time. Since W and T are usually random numbers, their mean is calculated. Then: (6.7)
The distribution functions of the waiting time, Fw(x), time, FT(~), are also sometimes required. Queue Length
and the response
Q: The queue length, Q, is the number of jobs in the queue.
Number of Jobs in the System K: The number of jobs in the queueing system is represented by K. Then:
k=l
The mean number of jobs in the queueing system K and the mean queue length Q can be calculated using one of the most important theorems of queueing theory, Little’s theorem:
K=AT, and
G = Xw.
Little’s theorem is valid for all queueing disciplines and arbitrary systems. The proof is given in [LittGl].
(6.9) (6.10) GI/G/m
214
6.3
SINGLE STATION QUEUEING SYSTEMS
THE M/M/l
SYSTEM
Recall that in this case, the arrival process is Poisson, the service times are exponentially distributed, and there is a single server. The system can be modeled as a birth-death process with birth rate (arrival rate) X and a constant death rate (service rate) CL.We assume that X < ,Qso the underlying CTMC is ergodic and hence the queueing system is stable. Then using Eq. (3.12)) we obtain the steady-state probability of the system being empty: 1 7ro
=
00
1 k-l =
1-tm-I;
1+
g(qky
k=l
k=li=O
p
which can be simplified to: 1
x1-x. I-L
?ro=l+*
From Eq. (3.11) we get for the steady-state probability jobs in the system: n,+ = TO
0
that there are k
k
-
k 2 0.
,
CL x
rk=(l-‘)
(
A’” )
/J-CL
or with the utilization
*
p = X/,X no-l-p
(6.11)
and: rk
=
(1 -
P)Pk,
(6.12)
the probability mass function (pmf) of the modified geometric random variable. In Fig. 6.2, we plot this pmf for p = l/2. The mean number of jobs is obtained using Eqs. (6.12) and (6.8):
x=21-p.
(6.13)
In Fig. 6.3, the mean number of jobs is plotted as a function of the utilization p. This is the typical behavior of all queueing systems. From Eq. (1.10) we obtain the variance of the number of jobs in the system: 2 -P
OK - (1 - p)2
(6.14)
THE M/M/l
SYSTEM
215
k
fig. 6.2 The solution for ~TTI, in a M/M/l
system.
and the coefficient of variation:
With Little’s theorem (Eqs. (6.9) and (6.10)) we get for the mean response time:
T=
lh l-p’
(6.15)
with Eq. (6.7) for the mean waiting time: (6.16) and with Little’s theorem again for the mean queue length:
&=p2 l-p’
(6.17)
The same formulae are valid for M/G/l-PS and M/G/l-LCFS preemptive resume (see [Lave83]). For M/M/l-FCFS we can get a relation for the response time distribution if we consider the response time as the sum of k + 1 independent exponentially distributed random variables [Tkiv82]:
x+x1+&+“‘+&,
216
SINGLE STATION QUEUEING SYSTEMS zo-
15--
K
lo--
5--
1
0.8
P Fig. 6.3
The mean number of jobs ?? in a M/M/l
system.
where X is the service time of the tagged job, Xr is the remaining service time of the job in service when the tagged job arrives, and X2,. . . , XI, are the service times of the jobs in the queue; each of these is exponentially distributed with parameter CL.Note that Xr is also exponentially distributed with rate p due to the memoryless property of the exponential distribution. Noting that the Laplace-Stieltjes transform (LST) of the exponentially distributed service time (see Table 1.5) is:
I-L Lx(s) = ___ p+s’ we get for the conditional LST of the response time:
Unconditioning using the steady-state probability the response time is:
Eq. (6.12), the LST of
LT(S) =co(&)Ic+l *(1-P)Pk LT(S)
=
I41
-
P)
s + PC1 -
P> *
(6.18)
THE M/M/m
Thus the response time 7’ is exponentially CL0 - 4:
distributed
SYSTEM
217
with the parameter
FT(x) = 1 -e -P(l--Pb
(6.19)
and the variance: var(T) = Similarly we get the distribution
1
of the waiting time: x = 0,
1 - P, ev(x)
(6.20)
P2(l - d2’
(6.21)
=
1 - p . e--cLC1-P12, x > 0. Thus Fw(O) = P(W = 0) = 1 - p is the mass at origin, corresponding to the probability that an arriving customer does not have to wait in the queue.
6.4
THE M/M/co
SYSTEM
In an M/M/co queueing system we have a Poisson arrival process with arrival rate X and an infinite number of servers with service rate ,LLeach. If there are k jobs in the system, then the overall service rate is kp because each arriving job immediately gets a server and does not have to wait. Once again, the underlying CTMC is a birth-death process. Prom Eq. (3.11) we obtain the steady-state probability of k jobs in the system:
with Eq. (3.12), we obtain the steady-state probability tem:
of no jobs in the sys-
1 A’“1 k=l
=
e-t,
0 i-J
‘Ic!
(6.22)
and finally:
(6.23)
218
SINGLE STATION QUEUEING SYSTEMS
This is the Poisson lmf, and the expected number of jobs in the system is:
K=
x.
(6.24)
P With Little’s theorem the mean response time as expected:
(6.25)
6.5
THE M/M/m
SYSTEM
An M/M/m queueing system with arrival rate X and service rate p for each server can also be modeled as a birth-death process with:
Jc2 0,
XI, = 4 O
b%
Pk =
i *t%
The condition for the queueing system to be stable (underlying CTMC to be ergodic) is X < mp. The steady-state probabilities are given by (from Eq. (3.11)):
rk=
k-l
m-l TO rI
rI i=m
With an individual server utilization,
rk
=
x *IQ’
p = A/(mp),
we obtain:
(6.26)
(
I
pkmm TO*! ’
JC2 *,
and from Eq. (3.12) -’
(6.27)
THE M/M/l/K
FINITE CAPACITY SYSTEM
219
that an arriving customer has to wait in the
The steady-state probability queue is given by:
Pm = P(K 2 m) = 5
rk
k=m
(6.28)
b-v)” = m!(l - p) * 7ro Using Eqs. (6.26) and (6.8) we obtain for the mean number of jobs in the system: E=mpfL-Pm,
(6.29)
1-P
and for the mean queue length: Q=P.p,. 1-P
(6.30)
From this the mean response time ?? and mean waiting time w by Little’s theorem (Eqs. (6.9) and (6.10)) can be easily derived. A formula for the distribution of the waiting time is given in [GrHa85]: &v(x)
= {
l-pm, 1 _ pm . e-mP(l-Pb,
x = 0, x > 0.
(6.31)
Figure 6.4 shows the interesting and important property of an M/M/m system that the mean number of jobs in the system K increases with the number of servers m if the server utilization is constant but the mean queue length s decreases.
6.6
THE M/M/l/K
FINITE CAPACITY
SYSTEM
In an M/M/l/K queueing system, the maximum number of jobs in the system is K, which implies a maximum queue length of K - 1. An arriving job enters the queue if it finds fewer than K jobs in the system and is lost otherwise. This behavior can be modeled by a birth-death process with: x = k
A, 09-K { 0,
/& =p
k 2 K,
k= l,...,K.
Using Eqs. (3.12), (3.11), and a = X/p, we obtain the steady-state probability of k jobs in the system: OLk
K.
(6.32)
220
SINGLE STATION QUEUEING SYSTEMS
m Fig. 6.4 Mean queue length 8 and mean number of jobs in the system K as functions of the number of servers m.
Since there are no more than K customers in the system, the system is stable for all values of X and CL.That is, we need not assume that X < p for the system to be stable. In other words, since the CTMC is irreducible and finite, it is ergodic. If X = CL,then a = 1 and: 1 TO =
K+1
= TIC,
k = 1,2, . . . , K.
(6.33)
The mean number of jobs in the system is given by (using Eq. (6.8)): ,
~a l-a
K ,2’
-
K+l 1 - (JK+l
. $+I
7
a# 1, (6.34) a = 1.
Note that the utilization p = 1 - ~0 # X/p in this case; it is for this reason that we labeled the quantity a = X/,Q in the preceding equations. The throughput in this case is not equal to X, but is equal to X(1 - no). Similar results have been derived for an M/M/m/K system (see [Klei75], [AllegO], [GrHa85], or [Triv82]).
221
MACHINE REPAIRMAN MODEL
6.7
MACHINE
REPAIRMAN
MODEL
Another special case of the birth-death process is obtained when the birth rate Xj is of the form (M - j)X, j = 0, 1, . . . , A4, and the death rate, ,~j = ,Q. This structure can be useful in the modeling of interactive computer systems where a an individual terminal user issues a request at the rate X whenever it is in the “thinking state.” If j out of the total of m terminals are currently waiting for a response to a pending request, the effective request rate is then (A4 - j)X. The request completion rate is denoted by ,X Similarly, when M machines share a repair facility, the failure rate of each machine is X and the repair rate is ,Y (see Fig. 6.5). Since the underlying CTMC is irreducible and
Fig. 6.5
Machine repairman model.
finite, it is ergodic implying that the queueing system is always stable. The expressions for steady-state probabilities are obtained using Eqs. (3.12) and (3.11) as: rk = TQ
k-1 X(A4 - i) rI CL i=o
or: (6.35)
Hence: 7ro =
1 M
(6.36)
k x
w k=O
’
The utilization of the computer (or the repairman) is given by p = (1 - ~0) (see Eq. (6.11)) and the throughput by ~(1 - ~0). With the average thinking time l/X, we get for the mean response time of the computer: TX
AL? &-no)
-- 1 J-l
and, using Little’s theorem, the average number of jobs in the computer:
-I( = j’,,f _ id1 ; To).
(6.38)
222
6.8
SINGLE STATION QUEUEING SYSTEMS
CLOSED TANDEM
NETWORK
Consider the closed tandem network in Fig. 6.6 with a total number of K L
fig. 6.6
Closed tandem network.
customers. This network is known as the cyclic queueing network. Assume that node 1 has service rate p, and node 2 has service rate ~2. The state space in this case is {(ICI, Jcz) 1 Icr 2 0, Ic2 2 0, Icr + lc:! = K} where Ici (i = 1,2) is the number of jobs at node i. Such multidimensional state spaces, we modify the earlier notation for steady-state probabilities. The joint probability of the state (IQ, Ic2) will be denoted by r(Icr, Icz) while the marginal pmf at node i will be denoted by ni(lci). Although, the underlying CTMC appears to have a two-dimensional state space, in reality it is one-dimensional birth-death process because of the condition Icr + i& = K. Thus the marginal pmf of node 2 can be derived from Eqs. (3.12) and (3.11):
k2
r2(k2) = 0, k2 > K.
k2 I K;
9
1 r2(0)
= 1+5
i=l
(g
and with u = I_L~/,x~: fl2@2)
(6.39)
=
l-u 1 _ @+l ’
u# 1, (6.40)
1 K+l’
u = 1.
Similarly, we obtain for node 1:
n(h) =m(O) ---&, 1-i 1 v(O)
u# 1,
(i)““’
(6.42)
=
I The utilizations
(6.41)
1 K+l’
u= 1.
are given by: P11 -
Tl (O),
P2 =
1 -
7r2(0),
(6.43)
THE M/G/l
223
SYSTEM
and the throughput: x = Xl
= x2 = p1p1
(6.44)
= p2p2,
and the mean number of customers in node 2 with equation: 772 = Q(O)
-5
ii3
*uk2
k2=0
a - cK Uk2 = 7r2(0) *?A& k2=0 8
=
7r2(0) *ux
=--
U
l-u K1=K-K2.
*
1 -
ukz+l
l-u (K+l)*uK+l 1 -UK+1
uz1 ’
(6.45)
7
(6.46)
6.9 THE M/G/lSYSTEM The mean waiting time w of an arriving job in an M/G/l components:
system has two
1. The mean remaining service time we of the job in service (if any), 2. The sum of the mean service times of the jobs in the queue. We can sum these components to:
W=Wo+Q-TB,
(6.47)
where the mean remaining service time we is given by: we = P(server is busy) . z + P(server is idle) . 0,
(6.48)
with the main remaining service time x of a busy server (the remaining service time of an idle server is obviously zero). When a job arrives, the job in service needs R time units on the average to be finished. This quantity is also called the mean residual life and is given by [Klei75]: T; TB R = --$- = -+1+
c”,).
B
For an M/M/l
system (c”, = l), we obtain:
72M/M/l
=TB
=
;,
(6.49)
224
SINGLE STATION QUEUEING SYSTEMS
which is related to the memoryless property of the exponential distribution. The probability P(server is busy) is t,he utilization p by definition. From Eq. (6.47) and Little’s theorem s = X . w, we obtain:
wo WE--1-P
(6.50)
and with Eq. (6.49) finally: 2
(1+
n=&*
2
4)
'
(6.51)
the well-known Pollaczek-Khintchine formula (see Eq. (3.28)) for the mean queue length. For exponentially distributed service time we have ci = 1 and for deterministic service time we have c& = 0. Thus: (see Eq. (6.17)),
The mean queue length is divided by two if we have deterministic service time instead of exponentially distributed service times. Using the theory of embedded DTMC (see Section 3.2.1), the Pollaczek-Khintchine transform equation can be derived (Eq. (3.25)):
G@) = B-(X - AZ) (’ - p)(l - ‘1 B”(X - AZ) - x’
(6.52)
where G(x) is the z-transform of steady-state probabilities of the number of jobs in the system rk and B”(s) is the LST of the service time. As an example we consider the M/M/l system with B” (s) = p/ (s + p) (see Table 1.5). From Eq. (6.52) it follows that:
(+) = l-p
1 - pz
G(x) = F(l
- p)pk - 2’.
k=O
Hence we find the steady-state probability
of Ic jobs in the system:
TI, = (1 - P)Pk.
THE GI/M/l
SYSTEM
225
With Eqs. (6.52) and (1.57) we obtain for the mean number in the system [Klei75] (see also Eq. (3.28)):
1+c; &p+p2.l-p 2
(6.53)
or, using Eq. (6.7) and Little’s theorem:
K=p$ifJ.
(6.54)
Hence:
&-
p2 1-P
y3,
which is the same as Eq. (6.51). Figure 6.7a shows that for an M/G/l system the mean number of jobs K increases linearly with increasing squared coefficient of variation, and that the rate of increase is very sensitive to the utilization. From Fig. 6.7b we see how dramatically the mean number of jobs K increases if the utilization approaches 1 and that this is especially the case if the squared coefficient of variation cg is large.
6.10 THE GI/M/lSYSTEM We state the results for GI/M/l
systems, using the parameter o given by:
CT= A-(/L - /LO) where A”(s) is the LST of the interarrival mean number of jobs in the system: j&-L-
(6.55)
time [Tane95]. For example, the
(6.56)
1-O’
the variance of the number of jobs in the system: p(l+*
2
OK=
(l-42
- P) ’
(6.57)
the mean response time: T,L.-
1 /!A l-a’
(6.58)
Pea 1-O’
(6.59)
the mean queue length: Q=
226
SINGLE STATION QUEUEING SYSTEMS
= 0.9
= 0.8
= 0.7 = 0.5
=3
=
1
= 0
Fig. 6.7 (a) Mean number of jobs ?? in an M/G/l system as a function of & with parameter p. (b) M ean number of jobs I? in a M/G/l system as a function of p with parameter ci .
THE GI/M/l
SYSTEM
227
the variance of the queue length:
0; = PO + 4 - P>> (l-a)2 ’
(6.60)
the mean waiting time:
and the waiting time distribution: &v(x)
= { 1 - 0, x = 0, 1 _ 0. e-P(l--a)z 7 x > 0.
(6.63)
In the case of an M/M/l system, we have A-(s) = X/(s + X) (see Table 1.5). Together with Eq. (6.55) we obtain:
and with the Eqs. (6.56)-(6.63), the well-known M/M/l formulae. A more interesting example is the Ez/M/l system. From Table 1.5 we have: 2
A-(s)= ( & > ) and with Eq. (6.55) we have
Using Eqs. (6.56)-(6.63), we obtain explicit formulae for Ez/M/l performance measures. system is very similar, espeThe behavior of an M/G/l and of a GI/M/l cially if c$ S-1. This is shown in Fig. 6.8 where we compare the mean number of jobs K for M/G/l and GI/M/l systems having the same coefficient of variation cg. Note that c$ in the case of GI/M/l denotes the coefficient of variation of the interarrival times, while in the M/G/l case it denotes the -coefficient of variation of the service times. Note also that in the M/G/l case, K depends only on the first two moments of the service time distribution, while in the GI/M/l case, the dependence is more extensive. In Fig. 6.8, for the GI/M/l system, the gamma distribution is used. For increasing value of c$ the deviation increases.
228
SINGLE STATION QUEUEING SYSTEMS c”x = 10/i
p
= 4
20
t
GI/M/l __-______ M/G/l
=
1
= 0.1
0:2
Fig. 6.8
6.11
0:4
0:3
”
0:5 ‘.’ P
0.6
:‘:I’:““‘: 0:7
Mean number of jobs K in an M/G/l
THE GI/M/m
0:8
and a GI/M/l
019
system.
SYSTEM
Exact results for GI/M/m queueing systems are also available, See [Alle90], [Klei75], or [GrHa85]. The mean waiting time is given by: jYj%
J m/!L(1 - 0)2 ’
(6.64)
where: a=A”(mp--mpo)
(6.65)
and: J= For Rk, see [Klei75] (page 408). We introduce only a heavy traffic approximation: (6.66) which is an upper bound [Klei75] and can be used for GI/G/l and GI/G/m system as well (see subsequent sections). Here X is the reciprocal of the mean interarrival time, ai is the variance of interarrival time, and a$ is the variance of service time.
THE G/,/G/l SYSTEM
229
6.12 THE GI/G/lSYSTEM In the GI/G/l case only approximation formulae and bounds exist. We can use M/G/l and GI/M/l results as upper or lower bounds, depending on the value of the coefficient of variation (see Table 6.1). Table 6.1 Upper bounds (UB) and lower bounds (LB) for the GI/G/l mean waiting time
4
43
>l >l
>l l
M/W
GI/M/l
LB LB UB UB
LB UB LB UB
Another upper bound is given by Eq. (6.66) with m = 1 [Klei75]: (6.67)
A modification of this upper bound is [Marc78]: (6.68)
This formula is exact for M/G/l and is a good approximation for GI/M/l and GI/G/l systems if p is not too small and ci or ci are not too big. A lower bound is also known [Marc78]: w
> P2 * c2B + P(P - 2)
2X(1-p)
'
(6.69)
but more complex and better lower bounds are given in [Klei76]. Many approximation formulae for the mean waiting time are mentioned in the literature. Four of them that are either very simple and straightforward or good approximations are introduced here. First, the well-known AllenCunneen approximation formula for GI/G/m systems is [AllegO]: (6.70)
This formula is exact for M/G/l (Pollaczek-Khintchine formula) and a fair approximation elsewhere and is the basis for many other better approximations. A very good approximation is the Kramer/Langenbach-Belz formula, a direct extension of Eq. (6.70) via a correction factor:
(6.71)
230
SINGLE STATION QUEUEING SYSTEMS
with the correction factor: exP --*-* 3 G KLB
P ( 2 1-P
(1-c?J2 ) 05 cA5 19 c; + ci >
=
(6.72)
CA > 1.
(=P
Another extension of the Allen-Cunneen formula is the approximation Kulbatzki [Kulb89]:
of
(6.73) with:
f 1,
CA E {%I},
[p(14.1cA - 5.9) + (-13.7cA + 4.1)] c; + [p(-59.7ca I + [&A
+ 21.1) + (54&A - 16.3)] cg
- 4.5) + (-1.5cA + 6.55)],
o
; -0.75~ + 2.775,
CA > 1,
(6.74 It is interesting to note that Eq. (6.74) was obtained using simulation experiments. A good approximation for the case ci < 1 is the Kimura approximation [Kimu85]:
c2,+c;wz -wM/M/rn 2
6.13
THE M/G/m
((I-&)exp(v)
+&)-l.
(6.75)
SYSTEM
We obtain the Martin’s approximation formula for M/G/m systems [Mart721 by an extension of the Eq. (6.47) for the mean waiting time for an M/G/l system: w=w
0 +”
m
**T
(6.76)
Because of the m servers, an arriving customer has to wait, on the average, only for the service of Q/m customers. The remaining service time in this case is: Tvo=Pm*R.
(6.77)
THE M/G/m SYSTEM
231
With Eq. (6.49) we obtain as an approximation: (6.78) and with Eqs. (6.76) and (6.77) finally: w
c
pmlP
. C1 +43)
2m
l-p
(6.79)
*
This is a special case of the Allen-Cunneen formula for GI/G/m systems (see next section) and is exact for M/M/m and M/G/l systems. For the waiting probability Pm we can use Eq. (6.28) for M/M/m systems or a simple yet good approximation from [Bolc83]:
(6.80)
As an example, we compare the exact waiting probability mation for m = 5 in Table 6.2.
with this approxi-
Tab/e 6.2 Exact (Eq. (6.28)) and approximate (Eq. (6.80)) values of the probability of waiting P
0.2
0.4
0.6
0.7
0.8
0.9
0.95
0.99
Pme, PmwP
0 0
0.06 0.06
0.23 0.21
0.38 0.34
0.55 0.56
0.76 0.75
0.88 0.86
0.97 0.97
From Fig. 6.9 we see that the deviation for the mean number of jobs K in the system is very small if we use the approximation for Pm instead of the exact value. A good approximation for the mean waiting time in M/G/m systems is due to Cosmetatos [Cosm76]: &c/m
% CB 2w
M/M/m
+ (I-
In Eq. (6.81), we can use Eq. (6.30) for FM/M/m WM/D/m
=
1 2
-
(6.81)
cf?)~M/D/m*
and use:
1 ’ nCDm
’ WM/M/m
where: ncDm =
I t (1- p)(m- 1)&TY5ii-2 16pm
-l >
’
(6.82)
232
SINGLE STATION QUEUEING SYSTEMS
--
20
= 10
= 5 = 2
Fig. 6.9 Mean number of jobs in an M/M/m formulae for P,.
system with exact and approximative
For WM,Dlrn we can also use the Crommelin approximation formula [Crom34]: (ypdUrn
(vm)! (6.83)
Boxma, Cohen, and Huffels [BCH79] a1so use the preceding formulae for w M/D/m as a basis for their approximation: (6.84)
wiV/G/m
where:
a=
1 (‘i+l)71 i 1, m- 1 --
m+l
(
and: 1 - c; 71 X-+---$
m+l
G?
1 , m m>l= 1,7
233
THE GI/G/m SYSTEM
Tijms [Tijm86] uses yi from the BCH-formula tion:
Fv,G,m= ((1
-
P)Ylm
+ $(c:,
(Eq. (6.84)) in his approxima-
+ 1))
(6.85)
WIv,M,m.
------mm_ DES 8--
ac: Allen-Cunneen cro: Crommelin co: Cosmetatos tij: Tijms bch: Boxma, Cohen,
tij bch
Huffles
l--
4 0
Fig. 6.10
Comparison of different M/G/5
approximations with DES (p = 0.7).
In Fig. 6.10 we compare five approximations introduced above with the results obtained from discrete-event simulation (DES). From the figure we see that for p = 0.7 all approximations are good for ci < 2, and that for higher values of ci the approximation due to Cosmetatos is very good and the others are fair.
6.14
THE GI/G/m
SYSTEM
For GI/G/m systems only bounds and approximation formulae are available. These are extensions of M/G/m or GI/G/l formulae. We begin with the well-known upper bound due to Kingman [King’i’O]: (6.86) and the lower bound of Brumelle [Brum71] and Marchal [Marc74]:
- PC2 - P>--.-m-l wz P2& m Nl - P>
cg+l (6.87)
2P
-
234
SINGLE STATION QUEUEING SYSTEMS
As a heavy traffic approximation duced for GI/M/m systems:
we have Eq. (6.66), which we already intro-
(6.88) and the Kingman-Kiillerstrijm distribution:
approximation
Fw(x) M 1 - exp
(
-
[Kii1174] for the waiting time
W-P) 1 0; +ai/m2
(6.89)
’ Xx > *
The most known approximation formula for GI/G/m systems is the AllenCunneen (A-C) formula [AllegO]. We already introduced it for the special case GI/G/l (Eq. (6.70)). Note that the A-C formula is an extension of Martin’s formula (Eq. (6.79)) w h ere we replace the 1 in the term (1 + c”,) by ci to consider approximately the influence of the distribution of interarrival times:
KnlP c; + c2B was:.l-p
2m
(6.90)
*
For the probability of waiting we can either use Eq. (6.28) or the good approximation provided by Eq. (6.80). As in the GI/G/l case, the AllenCunneen approximation was improved by Kramer/Langenbach-Belz [KrLa76] using a correction factor: Pm//L c; + c2 *B * GKLB, l-p 2m
w&z -
exp GKLB
=
21-P(1-4)2 ) ()
9
cl + c; >
exP
and by Kulbatzki CA in Eq. (6.90):
(6.91)
(6.92)
CA > 1,
[Kulb89] using the exponent f (CA, cB, p) in place of 2 for p,lp
$A
,CB4) + cg
w=:. 1-P
2m
*
(6.93)
For the definition of f(cA, cg, p), see Eq. (6.74). The Kulbatzki formula was further improved by Jaekel [JaekSl]. We start with the Kulbatzki GI/G/lformula and use a heuristic correction factor to consider the number of servers m:
(6.94)
235
THE G//G/m SYSTEM
This formula is applicable even if the values of m and the coefficients of variation are large. In order to extend the Cosmetatos approximation [Cosm76] from M/G/m to GI/G/m systems, FM/M/m and WM,D,~ need to be replaced by WGI,M,~ and m~r,n,~, respectively:
w GI/G/m where
2wGI/M/m
!-z CB
+ (l
-
C$i)~GI/D/rn,
is given by Eq. (6.64) or by the following approximation:
WGI/M/m
$cZ, + l)exp (-i.
A$.
‘:i$“)
m WGI/M/m
forO
5 I,
A
x
for CA > 1,
CA’CB “) + 1 WM,M,~ >
(6.96) and
WGI/D/m
by:
wGI/D/m
=
1 2
1
(6.97)
’ WGI/M/m> ncDm
- * -
with ncDm from Eq. (6.82) or: WGIIDIm z Cy,m)f(cA,o@) . jqYMIDlm (6.98) h(p, m) = 4J(m
- l)/(m
+ 4) f (1 - p) + 1
with WM,Dirn from Eq. (6.82) or Eq. (6.83). A good approximation case ci 2 l’is given by Kimura [Kimu85]:
-1
1 - ci 1 - 4c(m, p) exp c(m, p) = (1 - P>(m - 1)
for the
+
dG-2 16pm
l- 43 l+c(m,p)
+c2
+ c; - 1 A >
7
’ (6.99)
Finally the Boxma, Cohen, and Huffels formula GI/G/m systems: 2WGI/D/m WGI/G/m
=
2 I(’
+ “)
2&&I,D,m
[BCH79] can be extended to
w
+ (1 -:($;I,M,
(6.100) m
236
SINGLE STATION QUEUEING SYSTEMS
(b)
sac
Fig. 6.11 (a) Mean queue length g of a GI/G/lO 6 of a GI/G/m system with p = 0.8.
system, and (b) mean queue length
237
PRIORITY QUEUEING
1 =
0.95
= 0.9
= 0.7
Fig. 6.12
Mean queue length of a GI/G/m
as well as the Tijms
system with r-n = 1,5,10.
formula [Tijm86]:
~GI/G/m=
((1
-
P)Ylm
+
$c;
+
1,)
~GI,M/rn,
(6.101)
with a and yi from Eq. (6.84). In Figs. 6.11 and 6.12 the mean queue lengths as functions of the coefficients of variation ci, ci, the utilization p, and the number of servers m are shown in a very compact manner using the approximate formula of Allen-Cunneen. Fig. 6.13 shows that the approximations of Allen-Cunneen, Kramer/Langenbath-Belz, and Kulbatzki are close together, especially for higher values of the coefficients of variation c$ with a large number of servers m.
6.15
PRIORITY
QUEUEING
In a priority queueing system we assume that an arriving customer belongs to a priority class T (T = 1,2,. . . , R). The next customer to be served is the customer with the highest priority number r. Inside a priority class the queueing discipline is FCFS.
238
SINGLE STATION QUEUEING SYSTEMS
=
1
= 0
I
O'O.6
0.65
0.7
0.75
0.8
0.85
019
P
Fig. 6.13 (a) Mean queue length 0 for a GI/G/lO system with cl = 0.5. (b) Mean queue length Q for a GI/G/lO system with cl = 2.0.
239
PRIORITY QUEUEING
6.15.1
System
without
Preemption
Here we consider the case where a customer already in service is not preempted by an arriving customer with higher priority. The mean waiting time w, of an arriving customer of priority class T has three components: 1. The mean remaining service time we of the job in service (if any). 2. Mean service time of customers in the queue that are served before the tagged customer. These are the customers in the queue of the same and higher priority as the tagged customer. 3. Mean service time of customers that arrive at the system while the tagged customer is in the queue and are served before him. These are customers with higher priority than the tagged customers. Define: Ni,:
Mean number of customers of class i found in the queue by the tagged (priority r) customer and receiving service before him,
a+:
Mean number of customers of class i who arrive during the waiting time of the tagged customer and receive service before him.
Then the mean waiting time of class r customers can be written as the sum of three components: (6.102) For multiple server systems (m > 1): R N. 1 w,=w,+~“T-+~m.-, i=l
RYiTi,
m GUI i=l
1 Pi
(6.103)
where Ni, and YVi, are given by: 7Vir = 0 i < r, (6.104) and, with Little’s theorem: T&. = xgi
i 2 r, (6.105)
We can solve Eqs. (6.102) and (6.103) to obtain: TV,=
WO
(1 - G)(l
-
(6.106) Or+l)
240
SINGLE STATION QUEUEING SYSTEMS
where: R 0,
=
(6.107)
pi*
c i=r
The overall mean waiting time is:
R xi w=cx.wi.
(6.108)
i=l Values for wo are given by Eqs. (6.48), (6.49), (6.77), and (6.78) and are the weighted sum of the woi of all the classes: R WO,M/M/l
(6.109)
= Cpid7 i=l R
WO,M/G/l
=
C i=l
pi
1+ * 2
c;
(6.110)
2t%
’
R
(6.111) R WO,GI/G/l,KLB
R5 C i=l R
WO,GI/G/l,KUL
C3 c i=l
c$
+ c&
(6.112)
* GKLB,
Pi ’
=ZPi f(CA
z lcB,
pi ’ cAz
di)
+
3-G
cii
7
(6.113)
(6.114)
(6.115)
(6.116) cl, mO,GI/G/m,KLB
=
-
+ cg. ’ GKLB,
(6.117)
Pi
(6.118) see Eq. (6.74); for GKLB,GI/G/l, see Eq. (6.72); and for see Eq. (6.92).
FOT ~(cA~, cgi, pi), GKLB,GI/G/~,
PRIORITY QUEUEING
Also, the GI/G/m-FCFS to priority queues:
Approximation
241
of Cosmetatos can be extended
with: WGI/M/m,
WO,GI/M/m
= (l-
a,)(1
-G-+1)’
WO,GI/D/m &,D,mr
= (1
WO,GI/D/m
Fig. 6.14
e
-
G4l
PTnCR pi
2mP
i=l
Mean queue lengths g’, for an M/M/l
-
G-+1)
’
$Ai,O,Pi)
*
’ Pi
priority system without preemption.
The M/G/m Cosmetatos approximation can similarly be extended. All GI/G/m approximations yield good results for the M/G/m-priority queues as well. Figure 6.14 shows the mean queue length G, for different priority classes of a priority system without preemption and R = 3 priority classes together
242
SINGLE STATION QUEUEING SYSTEMS
with the mean queue length for the same system without priorities. It can be seen that the higher-priority jobs have a much lower queue length, and the lower-priority jobs have a much higher mean queue length than the FCFS system. 6.15.2
Conservation
Laws
In priority queueing systems, the mean waiting time of jobs is dependent on their priority class. It is relatively short for jobs with high priority and considerably longer for jobs with low priority because there exists a fundamental conservation law ([Klei65, Schr70, HeSo841): R c i=l
pJvi
=
pwo
=
p * WFCFS
(6.120)
1-P
or: PiTTi = 1-P
FCFS-
(6.121)
The conservation law to apply the following restrictions must be satisfied: 1. No service facility is idle as long as there are jobs in the queue, i.e., scheduling is work conserving [ReKo75]. 2. No job leaves the system before its service is completed. 3. The distributions of the interarrival times and the service times are arbitrary with the restriction that the first moments of both the distributions and the second moment of the service time distribution exist. 4. The service times of the jobs are independent of the queueing discipline. 5. Preemption is allowed only when all jobs have the same exponential service time distribution and preemption is of the type preemptive resume. 6. For GI/G/ m systems all classes have the same service times.
restriction is not necessary for GI/G/l
This
systems [HeSo84].
If for GI/G/ m systems the service times of the classes differ, then the conservation law is an approximation [Jaek9 11. 6.15.3
System
with
Preemption
Here we consider the case that an arriving customer can preempt a customer of lower priority from service. The preempted customer will be continued later at the point where he was preempted (preemptive resume). Thus a customer
PRIORITY QUEUEING
243
- 1. Hence of class T is not influenced by customers of the classes 1,2,...,r we need to consider only a system with the classes T, r + 1 9- - * , R to calculate
wr
=
-r - WO
(6.122)
1 -Or7
where wb is the mean remaining service time for such a system and we obtain it by substitution of p by or in the formulae. For example: P;
-r
W OAC
=
2ma,
-
R c
c; + cg pi
(6.123)
i=r
We sum only from r and R rather than from 1 to R. For P&, we replace p by or in the exact or approximative formula for Pm. To obtain the mean waiting time of a customer of class r we apply the conservation law twice:
i=r
(6.124) -r+l
Or+l’W
R
= c
pi ‘Wis
i=r+l
By substitution
we obtain:
W, = A- (OrWr- CTr+l~r+l)*
(6.125)
Pr
Equation (6.125) is exact for M/,&I/l and M/G/l systems. It is also exact for M/M/m systems if the mean service time of all classes are the same. For other systems, Eq. (6.125) is a good approximation. From Fig. 6.15 it can be seen that even for heavy utilization the mean queue length for the highest priority class is negligible in the case of preemption, at the expense of poorer service for lower-priority classes. 6.15.4
System
with
Time-Dependent
Priorities
Customers with low priorities have long waiting times especially when the utilization of the system is very high. Sometimes it is advantageous for a customer priority to increase with the time, so that he does not have to wait too long. This possibility can be considered if we do not have a fixed priority function: ar(t) = Priority
of class r at time t.
244
SING1 E STATION QUEUEING SYSTEMS
=
2
= 3
Fig. 6.15 emption.
Mean queue lengths for an M/M/l
priority
system with and without pre-
Such systems are more flexible but need more expense for the administration. A popular and easy to handle priority function is the following:
%@)= (t - to) *b,
(6.126)
(see Fig. 6.16a), with 0 < br 5 b2 5 . . . 5 b,. The customer enters the system at time to and then increases the priority at the rate b,. Customers of the same priority class have the same rate of increase b, but different values of the arrival time to. We only consider systems without preemption and provide the following recursive formula [BoBr84] for the mean waiting time of priority class-r customer:
TV, =
(6.127)
with the same mean remaining service times wa as for the priority systems with fixed priorities. If these are exact, then Eq. (6.127) is also exact; otherwise it is an approximation formula. Another important priority function is:
q&) = r’, + t - to,
(6.128)
PRIORITY QUEUEING
(4
245
Priority A
Fig. 6.16 (a) Priority function with slope b,. (b) Mean queuelength 0, for an M/M/l priority system with time-dependent priorities having slope b,.
with 0 5 ~1 5 ~2 5 . . . 5 T,. Each priority class has a different starting priority rr and a constant slope (see Fig. 6.17a). In [BoBr84] a heavy traffic approximation (as p approaches 1) is given: (6.129) as well as a more accurate recursive formula: (6.130) Note that for m = 1, we have P, = p. From Figs. 6.16b and 6.17b it can be seen that systems with time-dependent priorities are more flexible than systems with static priorities. If the values of
246
SINGLE STATION QUEUEING SYSTEMS
b, or rT, respectively, are close together, then the systems behave like FCFS systems, and if they are very different, then they behave more like systems with static priorities. In many real time systems, a job has to be serviced within an upper time limit. To achieve such a behavior, it is advantageous to use a priority function that increases from 0 to 00 between the arrival time and the upper time limit ul-. A convenient priority function is:
G-(t) =
(4
(t-to)&-t+to) { co
to cKu,+to, UT+ to 5 t.
(6.131)
Priority c
rr i to
4
t”0
-t =l = 5
= 10
Fig. 6.17 (a) Priority function with starting priority Tr. (b) Mean queue lengths &, for an M/M/l priority system with time-dependent priorities having starting priorities rr.
PRIORITY QUEUEING
to 4
tb t:, + G, GP Gi
247
to+&b
(b)
WV
Fig. 6.18 (a) P riority function with upper time limit ur. (b) Mean waiting time w, for an M/M/l priority system with static priorities and with upper time limits ur = 2,4,6.
Again we get a heavy traffic
approximation
(as p approaches
1):
(6.132)
248
SINGLE STATION QUEUEING SYSTEMS
and a recursive formula for more accurate results: TV, x
6 --fJ-FVi(l-~) 1-P (
(I-Pmexp(--C2)))
-(l-iglpi(l-z)
(l-P.,,~exp(-!Zk)))~l.
(6’133)
Also see [BoBr84] and [Jaekgl] for other priority functions. In Fig. 6.18b, the mean waiting times for a system with static priorities are compared to those of a priority system with upper time limits. The figure also contains the upper time limits. For the two higher priorities, the static priority system is better, but the upper time limit system is better over all priorities.
6.16
THE
ASYMMETRIC
SYSTEM
The calculation of the performance measures is fairly simple if there is only one server or several servers with identical service times arranged in parallel, and the interarrival time and service time are exponentially distributed (M/M/m queueing systems). However, heterogeneous multiple servers often occur in practice and here the servers have different service times. This situation occurs, for example, when machines of different age or manufacturer are running in parallel. It is therefore useful to be able to calculate the performance measures of such systems. One method for solving this problem is given in [BoScSl] in which the formulae derived provide results whose relative deviation from values obtained by simulation are quite acceptable. The heterogeneous multiple server is treated in more detail by [BFH88] and [Baer85] who achieve exact results. Also see [Triv82] and [GeTr83] for exact results. We will assume throughout this section that the arrival process is Poisson with rate X and that service times are exponentially distributed with rate ,Q; on the ith server. 6.16.1
Approximate
Analysis
In homogeneous (symmetric) queueing systems, the response time depends only on whether an arriving job finds some free server. A heterogeneous (asymmetric) multiple server system differs since which of the m servers processes the job is now important. The analysis in [BoScSl] is based on the following consideration: Let p,+ (k = 1, . . . , m) be the probability that a job is assigned to the &h server. Because it can be assumed that a faster server processes more jobs than a slower server, it follows that the ratios of the probabilities p!, to each other are the same as the ratios of the service rates to each other. Here no distinction is made between different strategies for selecting a
THE ASYMMETRIC SYSTEM
249
free server. This approximation is justified by the fact that in a system with heavy traffic there is usually one free server and therefore a server selection strategy is not needed. Then pk (1 5 /C5 m) is approximately given by: Pk
&!L-
(6.134) Et%’ i=l
In addition, pk can be used to determine the utilization
pk of the lath server: (6.135)
i=l
i=l
Thus the utilization is given by:
of all the servers is the same and the overall utilization (6.136)
p = pk.
In [BoScSl] it is shown that the known formulae for symmetric M/M/m and GI/G/m systems can be applied to asymmetric M/M/m and GI/G/m systems if Eqs. (6.135) and (6.136) are used to calculate the utilization p. Good approximations to the performance measures are obtained provided the values of the service rates do not differ too much, i.e, pmin/pmax 5 10 (see Table 6.3).
Table 6.3 Approximative queueing systems
x (4
(b)
and simulative
results
for (a) M/M/5
and (b) M/E2/5
P@ >
WV
a%)
7x92)
0.799 0.807 0.808 0.826 0.807 0.801 0.799
0.819 0.812 0.844 0.858 0.839 0.830 0.824
2.472 2.472 2.472 2.472 2.472 2.472 2.472
2.367 2.476 2.443 2.549 2.456 2.442 2.425
2.495 2.500 2.626 2.713 2.606 2.569 2.523
0.800
0.825
1.854
1.854
1.962
I-lk
P(A)
PW
73
73 73 81.11 81.11 81.11 81.11
10,15,20,20,25 16,17,18,19,20 5,10,18,22,35 4,8,16,32,40 8,9,20,31,32 8,14,20,26,32 10,15,20,25,30
0.811 0.811 0.811 0.811 0.811 0.811 0.811
81.11
10,15,20,25,30
0.811
>
Notes: A = approximation, S1 = discrete-event simulation with random selection of a free server, and 5’2 = discrete-event simulation with selection of the fastest free server.
6.16.2
Exact
Analysis
This section describes the methods used for calculating the characteristics of asymmetric queueing systems described in [BFH88] and [Baer85]. These systems are treated as a CTMC, and the performance measures are calculated
250
SINGLE STATION QUEUEING SYSTEMS
by solving the underlying CTMCs. In this way exact results are obtained for elementary asymmetric M/M/m queueing systems. This method is used as follows: First, the steady-state probabilities are calculated for systems without queues, known as loss systems. Then the formulae given in [BFH88] and [Baer85] can be used to extend the results to asymmetric, non-lossy systems. Two strategies for selecting a free server are investigated: 1. The job is assigned randomly to a free server as in [BFH88].
2. The fastest free server is selected as in [Baer85]. 6.16.2.1 Analysis of M/M/m 1 oss Systems A loss system contains no queues. The name indicates that jobs that cannot find a free server are rejected and disappear from the system. A CTMC is constructed and solved to obtain a closed-form formula for its steady-state probabilities. It should be noted that asymmetric queueing systems differ from symmetric queueing systems in that it is not sufficient when producing the CTMC to describe the states by the number of servers occupied. Due to the different service rates, the state classification depends on which of the m servers are currently occupied. Thus each state is characterized by a set g = (ICI, Ic2, . . . , k,), where k, belongs to g if the Zth server is currently occupied. Here i = (g( is the number of servers currently occupied (1 5 i < m) and I is an index (1 5 I 5 m).
Fig. 6.19 CTMC for an asymmetric M/M/3 free server.
An M/M/3 system is used as an example. diagram of the loss system is shown in Fig. 6.19. The
Random Selection of a free Server
The state transition
loss system with random selection of a
THE ASYMMETRIC SYSTEM
251
steady-state probabilities are given by [BFHM]:
9-7r - (m -‘g’Y rIx m!
*kEgG
(6.137)
*=”
for all g C G with g # 8 and G = {1,2,. . . , m}. In order to determine ;TT~,we use the normalization condition:
c
7rg = 1
with
G = {1,2,. . . ,m).
(6.138)
gEG The probability
of loss r CL)is the probability dL) m = r{l
I2., ._,m}
that all servers are occupied:
(6.139)
= TG.
The utilization pk of the kth server is equal to the sum of the state probabilities of all states in which the kth server is occupied:
d = c =g*
(6.140)
g: kEg
Choice of the Fastest Free Server. Here it is always assumed that the individual servers are sorted in descending order of the service rates, i.e., the fastest server has the lowest index. Let pm = (~1, ~2, . . . , bum) and let (pk-‘, pm) = of loss n!$) = ~~l,..,,ml(~m) is given (P1,P2,*.* ~~k-l,~m)* The probability by [Baer85]:
7T(L) m = ~{l,.,.,m}(Pm) = Bm(Pm) = Bm-l(pm-l)
&(CL”-‘,/km)
*
Bk(pkwl,pk
+ b7L>
(6.141)
-’
1
’
with: w4
The utilization pk =
= “iT{l}(Pl)
=
&.
pk of the individual servers is given by:
-$Bk-l(pk-‘)
-
Bk(pk)]
with
Bo(pO) = 1.
(6.142)
6.16.2.2 Extending to Non-Lossy System According to [BFH88], the next step is to use the results obtained so far to draw conclusions about the behavior of the corresponding non-lossy system. Here the met hod used to select a free server of the corresponding loss system does not matter. It is only necessary
252
SINGLE STATION QUEUEING SYSTEMS
that the values appropriate to the selected strategy are used in the formulae finally obtained. In what follows, a superscript W means that the variable is a characteristic of a non-lossy system; a superscript L means that the variable is a characteristic of the corresponding loss system. The steady-state probabilities are given by [BFH88]: m-i . ,w m
for
i > m,
(6.143)
with:
,wm = -7&y
(6.144)
N’
where: 7&y
N=l+-
l-c
*c
x
c=-
with
f? k=l
(6.145) pk’
It is now possible to calculate all the state probabilities of the asymmetric non-lossy system and all interesting performance measures. It should be noted that c is not the same as the utilization p. The method for calculating p is described in the following. The probability Pm that an arriving job is queued is equal to the sum of the probabilities of all states in which there are more than m jobs in the system: (6.146) i=m
Analogous to the symmetric case, the mean queue length 8 is given by:
Q= 2
(i-m).rjW)=E.
(6.147)
-
i=m+l
The mean waiting time can be calculated using Little’s theorem:
Q TV=x.
(6.148)
To calculate the utilization p it is first necessary to calculate the specific utilization pk for each server: + 7&y * c/( 1 - c) N
.
(6.149)
i=m+l
Then p is given by: (6.150)
THE ASYMMETRIC SYSTEM
6.16.3
Exact
Analysis
of an Asymmetric
M/M/2
253
System
In [Triv82] an exact solution is given for an asymmetric M/M/2 system with the strategy of selecting the fastest free server. Figure 6.20 shows the CTMC state transition diagram for this system. The state of the system is defined
Fig. 6.20
CTMC for the M/M/2
asymmetric system.
to be the tuple (ICI, k 2) w h ere ICI 2 0 denotes the number of jobs in the queue including any at the faster server, and k2 E (0, 1) denotes the number of jobs at the slower server. If we use Eq. (2.58) for the underlying CTMC, we can calculate the steadystate probabilities easily with: c=
x
-
(6.151)
tQ1-k P2
r(k) 1) = cr(k - 1,1) = c’“%(l,
l),
for
k > 1.
(6.152)
Furthermore: n(O,l) = 5 %o, 1+ 2c p2
o>,
l+c x 7T(l,O) = --7q4 1+2cp1
o),
n(l,l)
= -
c
1+2c
Finally, using the normalization n-(0,0) = [l +
x(x+Il”)T(o Plj.42
(6.153)
70)*
condition, we get: X(X + Y2>
p&(1
+ 2c)(l - J1*
(6.154)
And with these, the utilizations of the two servers: p1 = 1 - T(O,O) - 7T(O,l), p2 = 1 - 7T(O,O)- 7+,0),
(6.155)
254
SINGLE STATION QUEUEING SYSTEMS
and the mean number of jobs in system: 1 A(1 - c)~’
j&
(6.156)
where: +-
’
l-c
1 ’
Example 6.1 To show how to use the introduced formulae for asymmetric M/M/m systems, we apply them to a simple M/M/2 example with X = 0.2, ~1 = 0.5, and ~2 = 0.25. Approximation
With Eq. (6.136) for the utilization:
[BoSc91]
0.2 = 0.267, 0.5 + 0.25
p’p1’p2’x= CL1+ P2
with Eq. (6.29) for symmetric M/M/2 -
K=2p+-r
systems for mean number in system: 2P3
1 - p2
0.575,
and with Little’s theorem (Eq. (6.9)):
T = 2.875. Exact [Triv82]
With Eq. (6.151): c = 0.267,
with Eqs. (6.153), (6.154), and (6.155) for the utilizations
O.a(O.2+ 0.25) 0.25 a0.5(1 + 2c)(l - c)
of the servers:
-I
1
= 0.6096,
7r(O,1) =
0.267
- 0.2 +0.6056 = 0.085, 1 + 0.534 0.25
7r(l,O) = -1.267 . -0.2 . 0.6056 = 0.2014, 1.534 0.5 l-0.6096 $0.085 = 0.305, p2 = 1 - 0.6096 + 0.2014 = 0.189,
p1 =
THE ASYMMETRIC SYSTEM
255
and with Eq. (6.156): A = 0.5 - 0.25(1 + 0.534) + 1 = 3.495, 0.2(0.2 + 0.25) 0.733 1 = 0.533, 3.495 + 0.7332
K=
K T = x = 2.663. Exact [BFH88] First we consider the asymmetric M/M/2 loss system with random selection of a free server: With Eq. (6.156) for the state probabilities: (2-l)! 5TT1=-. 2! (2-l)! 7r2 = - 2!
x 1 0.2 7ro= 0.2X(), j-jy’*o=y~’ x 1 ‘/12’ro=y~’
(2 -‘I! -.-.-.
7r1,2 =
2!
A
A
Pl
P2
ro
0.2
=
7r()= 0.47r(),
lfjT
0
*
O*
With the normalizing condition: (1 + 0.2 + 0.4 + 0.16)no = 1, Now the probability
x0 = 0.568.
of loss can be determined (Eq. (6.139)): ?p
and also the utilizations
= 7rl,2 = 0.16.0.568 = 0.091, of the servers (Eq. (6.140)):
pf = n1 + 7r1,2= (0.2 + 1.6) - 0.568 = 0.204, pi = r2 + r1,2 = (0.4 + 1.6) e0.568 = 0.319. This result means that the slower server has a higher utilization as expected. Now we can use these results of the loss system to obtain performance measures for the non-lossy system. For c and N we get with Eq. (6.145): ’ =
0.2 = 0.267, 0.5 + 0.25
and for the probability
N=l+
0.091 . 0.267 = 1 033 0.733 LY
of waiting (Eq. (6.146)): 1 0.091 p2=---.= 0.120. 0.733 1.033
The mean queue length and the mean waiting time (Eqs. (6.147), (6.148)): Q=
0.120.0.267 = 0.0437, 0.733
w = 0.219.
256
SINGLE STATION QUEUEING SYSTEMS
Utilization
of the servers (Eq. (6.149)): (w) _ 0.204 + 0.091 .0.267/0.733 = o 230 b? 1.033
Pl
(w) _ 0.319 + 0.091 s0.267/0.733 = o 341 L. 1.033
P2
Utilization
of the system (Eq. (6.150)): P=
2
= 0.275.
I-+&i 0.230
.
Mean number of jobs in system: E = PI + ~2
+ 62 = 0.230 + 0.341 + 0.0437 = 0.6147.
Mean response time: K T = x = 3.074. Exact [Baer85] The last case we consider is the asymmetric M/M/2 system with selection of the fastest free server using the formulae in [Baer85]. First we again consider the loss system and determine the probability of loss (Eq. (6.141)): r2
CL) = B2(p2) = B&2)
=-. x x+/-Q
1+ E.
Bl(P2)
x
R(Pl-tP2)
[
1+!% x---1
1 -l
-1
x x:‘“2 I
x+cL1+I.L2
0.2 + 0.25 + 0.5 -’ 0.2 + 0.25
1
= 0.0785, and the utilizations of the servers (Eq. (6.142)): pf = +30(po) Pi = g
[&W)
- B&d)]
= g
-B2(P2)]
= g
[l-
1
0.2Oe2 + 0.5 = 0.286,
.
[020;20
1 A*
5 - 0.0785 = 0 166
Again we use the results of the loss system to obtain performance measures for the queueing system with selection of the fastest server. From Eq. (6.145)): c = 0.267,
N=l+
0.0785 mO.267= l 0286 +* 0.733
THE ASYMMETRIC SYSTEM
Probability
257
of waiting (Eq. (6.146)): 1 0.0785 p,=-.-= 0.733 1.0280
0.104.
Mean queue length (Eq. (6.147)): 0.104 ’ 0.267 = 0.0379. 0.733 Mean waiting time: WC
Utilizations
g = L0 190 x of the servers (Eq. (6.149)): py
=
0.286 + 0.0785 e0.267/0.733 = o 306 L, 1.0286
(w) _ 0.166 + 0.0785 .0.267/0.733 1.0286
P2
Utilization
=
0.189.
of the system (Eq. (6.150)): 2 P=
1
= 0.234.
0.306 +1 0.189
Mean number of jobs in the system: x
= ~1 + p2
+ s = 0.306 + 0.189 + 0.0379 = 0.533.
Mean system time: K T = x = 2.665. In Table 6.4, results of this example are summarized.
In Table 6.5, the
Table 6.4 Results for several performance measures with different solution techniques and strategies for a asymmetric M/M/Z system (A = 0.2, ~1 = 0.5, ~2 = 0.25) Strategy Reference
Awr . [BoScS l]
FFS [Triv82]
FFS [Baer85]
Random [BFH88]
P Pl P2 E T
0.267 0.267 0.267 0.575 2.875
(0.267) 0.305 0.189 0.533 2.663
0.234 0.306 0.189 0.533 2.665
0.275 0.230 0.341 0.615 3.074
Notes: FFS = selection of the fastest free server and Random = random selection of the free server.
mean response time T for an asymmetric M/M/2
system is compared with
258
SINGLE STATION QUEUEING SYSTEMS
-Table 6.5 Mean response times T, TI, and Ta for an asymmetric M/M/2 system with strategy FFS and two M/M/l systems (A = 0.2, ~2 = 0.25, ~1 = cv e~2) a
Tl 572
T
1
2
3
4
5
20 20 4.762
3.33 20 2.662
1.818 20 1.875
1.25 20 1.459
0.55 20 1.20
the mean response times Ti (for a M/M/l system with service rate pi) and ??z (for a M/M/l system with service rate ,CQ)[Triv82]. From Table 6.5 we see that sometimes it is better to disconnect the slower server if we wish to minimize the mean response time. This is the case only for low utilization of the servers and if the service rates differ considerably. This issue is further explored in [GeTr83].
6.17
SYSTEMS WITH
BATCH SERVICE
An interesting and important way to service customers in queueing systems is the batch service. In such batch systems the service of all the customers of a batch is started at the same time. There are many applications of this kind of service, especially in manufacturing systems. Two policies are in use. In the case of a full batch policy (FB), the service of the batch is started when all b customers of the batch have arrived. In the case of the minimum batch policy (MB), a minimum number of a customers is sufficient to start the service of the batch. If there are more then b waiting customers, only b customers are collected to a batch and serviced together. The extended Kendall notation for batch systems is: GI/G[“~b]/m, M u It’iserver system with MB policy GI/G[blb]/m, M u It’iserver system with FB policy A special case of the MB policy is the GREEDY policy where the service begins when a = 1 customer is in the queue. To obtain a formula for the mean waiting time and the mean queue length for the GI/G[b~bl/m queueing system, we use the already introduced formulae for GI/G/m queueing systems and calculate the mean queue length $ for the batches. From the result we can calculate the mean queue length abatch for the individual customers. If the arrival rate of the individual customers is X, then the arrival rate of the batches with batch size b is:
x A= b,
(6.157)
SYSTEMS WITH BATCH SERVICE
and the coefficient of variation for the batch interarrival
259
time is: (6.158)
using Eq. (1.52) with the coefficient of variation of the individual customers CA. Using the queue length 0 of the batches, the queue length of the individual customers is approximately given by [HaSp95, KirsSl]:
The first part of the formula is due to the customers in the dj batches of the queue, and the second part is the mean number of customers who are still not combined to a batch. The accuracy of Eq. (6.159) depends mainly on the accuracy of the applied approximation formula for calculating 0. A solution for GI/GI”>bl/ m queueing systems is given by a heuristic extension of Eq. (6.159) [KirsSI]: -b,bl
Q
batch
wbdj+P,
(6.160)
Eq. (6.160) includes Eq. (6.159) if a = b. For the probability of waiting Pm, we use the corresponding value for the M/M/m queueing system. In the special case of an M/M[b~61/l system, we obtain an Eb/M/l queueing system for the calculation of 0 since in this case the interarrival time of the batch is the sum of b exponentially distributed interarrival times. In the case of an M/MIb7’]/m system, we obtain an Eb/M/m queueing system for which exact formulae (see Eqs. (6.59) and (6.64) respectively) are known. From Fig. 6.21~ it can be seen how the mean queue length Qbatch for individual jobs depends on the batch size b in a full batch system. Figure 6.21b shows that for increasing values of p, the mean queue length of a minimum batch system approaches that of a full batch system with the same batch size b. Figure 6.22 shows that the queue length Qbatch of an individual job in a minimal batch system depends more on the batch size b than on the minimum batch size a, as also can be seen from Eq. (6.160). Moreover we realize that a has a noticeable influence only on Gk$Lh for small values of p. Problem 6.1 Consider two M/M/l systems with arrival rate Xi = 0.5/ set and service rate ~1 = l/set each and an M/M/2 system with arrival rate x2 = 2x1 = l/ set and service rate ,LQ= ,~i = l/set for each server. Compare utilization pi, mean number of jobs in the system ?7i, mean waiting time wi, and mean response time Fi for both cases (i = 1,2). Discuss the results! Problem 6.2 Consider an M/M/l/K system and determine the throughput and the mean number of jobs K as a function of the maximum number
260
SINGLE STATION QUEUEING SYSTEMS
c_____- -----
0-0 .1
012
0:3
Ol4
0:5 P
016
017
o:a
019
Fig. 6.21 (u) M ean queue length Gbatch for a full batch system M/M[‘?b]/l-FCFS and (b) ?#$~~ for a minimum batch system.
SYSTEMS WITH BATCH SERVICE
261
p = 0.8
p = 0.6 p = 0.3
Fig. 6.22
Mean queue length QL&
of a minimum batch system.
of jobs in the system K for the cases X = ,x/2, X = p, and X = 2,~ with ,Q= l/ sec. Discuss the results. Problem 6.3 Compare the mean number of jobs K of an M/D/l, an M/M/l, and an M/G/l system (cg2 - 2) with X = 0.8/set and p = l/set. Give a comprehensible explanation why the values are different although the arrival rate X and the service rate h are always the same. Problem 6.4 Compare the mean number of jobs E( of an M/G/l and a GI/M/l system with X = O.g/sec, p = l/set, ci = 0.5, and cg = 0.5, respectively. Discuss the results. Problem 6.5 Consider a GI/G/l system with p = 0.8, cl = 0.5, and c; = 2. Compute the mean queue length s using the Allen-Cunneen, the Also compute KrRmer/Langenbach-Belz, and the Kimura approximation. upper and lower bounds for the mean queue length (see Section 6.12). Problem 6.6 Consider an M/M/l priority system with 2 priority classes, and ~1 = ~2 = l/set and Xr = X2 = 0.4/ sec. Compute the mean waiting time w for the two classes for a system with and without preemption. Discuss the results and compare them to the corresponding M/M/l-FCFS system. Problem
6.7
Consider an asymmetric M/M/2
system with X = l/set,
~1 - l/set, and ~2 - l.S/sec. Cjompute the utilizations p1 and p2 of the two servers and the mean number of jobs r using the approximation
method and
262
SINGLE STATION QUEUEING SYSTEMS
the exact methods for the strategy random selection of the free server and the strategy selection of the fastest free server (see Section 6.16). Discuss the results. Problem 6.8 Consider an IvI/M[~J~]/~ batch system and an M/M[275]/1 batch system with p = 0.5. Compute the mean queue length 8 for both cases and discuss the results.
7 Queueing Networks
Queueing networks consisting of several service stations are more suitable for representing the structure of many systems with a large number of resources than models consisting of a single service station. In a queueing network at least two service stations are connected to each other. A station, i.e., a node, in the network represents a resource in the real system. Jobs in principle can be transferred between any two nodes of the network; in particular, a job can be directly returned to the node it has just left. A queueing network is called open when jobs can enter the network from outside and jobs can also leave the network. Jobs can arrive from outside the network at every node and depart from the network from any node. A queueing network is said to be closed when jobs can neither enter nor leave the network. The number of jobs in a closed network is constant. A network in which a new job enters whenever a job leaves the system can be considered as a closed one. In Fig. 7.1 an open queueing network model of a simple computer system is shown. An example of a closed queueing network model is shown in Fig. 7.2. This is the central-server model, a particular closed network that has been proposed by [Buze71] for the investigation of the behavior of multiprogramming system with a fixed degree of multiprogramming. The node with service rate ,ur is the central-server representing the central processing unit (CPU). The other nodes model the peripheral devices: disk drives, printers, magnetic tape units, etc. The number of jobs in this closed model is equal to the degree of multiprogramming. A closed tandem queueing network with two nodes is shown in Fig. 7.3. A very frequently occurring queueing network is the machine repairman model, shown in Fig. 7.4. 263
264
QUEUEING NETWORKS
s&yvb;f
Disk
L,,o
Ii: CPU
Printer
cl -‘T-p Stream of completed Magnetic Tape jobs
fig. 7.1 Computer system shown as an open queueing network. I
Fig, 7.2
Central-server model.
There are many systems that can be represented by the machine repairman model, for example, a simple terminal system where the M machines represent the M terminals and the repairman represents the computer. Another example is a system in which A4 machines operate independently of each other and are repaired by a single repairman if they fail. The M machines are modeled by a delay server or an infinite server node. A job does not have to wait; it is immediately accepted by one of the servers. When a machine fails, it sends a repair request to the repair facility. Depending on the service discipline, this request may have to wait until other requests have been dealt with. Since there are M machines, there can be at most .M repair requests.
fig. 7.3 Closed tandem network.
DEFINITIONS AND NOTATION
265
Fig. 7.4 Machine repairman model.
This is a closed two-node queueing network with h/i7jobs that are either being processed, waiting, or are in the working machines.
7.1
DEFINITIONS
AND NOTATION
We will consider both single class and multiclass networks. 7.1.1
Single Class Networks
The following symbols are used in the description of queueing networks: N
Number of nodes
K
The constant number of jobs in a closed network
(h,
k2,
. . .,lc~) The state of the network
k
The number of jobs at the ith node; for closed networks
mi
zki=K i=l The number of parallel servers at the ith node (mi > 1)
Pi
Service rate of the jobs at the ith node
lIPi
The mean service time of the jobs at the ith node
Pij
Routing probability, the probability that a job is transferred to the jth node after service completion at the ith node. (In open networks, the node with index 0 represents the external world to the network.)
POj
The probability that a job entering the network from outside first enters the jth node
Pi0
The probability
that a job leaves the network just after com-
pleting service at node i (pi0 = 1 - 5 pij). j=l XOi
The arrival rate of jobs from outside to the ith node
266
x
QUEUEING NETWORKS
The overall arrival rate from outside to an open network N
(A= c Xoi) i=l
the overall arrival rate of jobs at the ith node The arrival rate Xi for node i = 1,. . . , N of by adding the arrival rate from outside and the nodes. Note that in statistical equilibrium the is equal to the rate of arrival, and the overall written as:
an open network is calculated arrival rates from all the other rate of departure from a node arrival rate at node i can be
N
Xi = Xoi + C Xjpji 7 for i = 1, . . . , N
(74
j=l
for an open network. These are known as traffic equations. For closed networks these equations reduce to: Xi = c
Xjpji,
for i = 1,. . . , N,
(74
j=l
since no jobs enter the network from outside. Another important network parameter is the mean number of visits (ei) of a job to the ith node, also known as the visit ratio or relative arrival rate:
xi ei = -, x
for i=
l,...,N,
(7.3)
where X is the overall throughput of the network(see Eqs. (7.24) and (7.25)). The visit ratios can also be calculated directly from the routing probabilities using Eqs. (7.3) and (7.1), or (7.3) and (7.2). For open networks, since Xoi = X*pOi: N
ei =po; + Cejpji, j=l
for i = l,...,
N,
(7.4)
and for closed networks:
(7.5) Since there are only (N - 1) independent equations for the visit ratios in closed networks, the ei can only be determined up to a multiplicative constant. Usually we assume that er = 1, although other possibilities are used as well. Using ei, we can also compute the relative utilixation xi, which is given by:
(74
DEFINITIONS AND NOTATION
267
It is easy to show that [ChLa74] the ratio of server utilizations is given by:
7.1.2
Multiclass
Networks
The model type discussed in the previous section can be extended by including multiple job classes in the network. The job classes can differ in their service times and in their routing probabilities. It is also possible that a job changes its class when it moves from one node to another. If no jobs of a particular class enter or leave the network, i.e., the number of jobs of this class is constant, then the job class is said to be closed. A job class that is not closed is said to be open. If a queueing network contains both open and closed classes, then it is said to be a mixed network. Figure 7.5 shows a mixed network. The following additional symbols are needed to describe queueing networks that contain multiple job classes, namely: R
The number of job classes in a network
k.ar
The number of jobs of the rth class at the ith node; for a closed network: N
R
cc Kr
kir = K
(74
The number of jobs of the rth class in the network; not necessarily constant, even in a closed network: N
kir = Kr
c
i=l
(7.9)
K
The number of jobs in the various classes, known as the population vector (K = (KI, . . . , KR))
Si
The state of the ith node (Si = (ki,, . . . , kiR)): N c
Si = K
(7.10)
i=l
S
The overall state of the network with multiple classes (S = (S
Pir
The service rate of the ith node for jobs of the rth class
Pir,js
that a job of the rth class at the ith node is transferred to the sth class and the jth node (routing probability)
po,js The probability in an open network that a job from outside the network enters the jth node as a job of the sth class
268
QlJEUElNG NETWORKS
The probability in an open network that a job of the rth the network after having been serviced at the ith node, so:
Pir,o
Pir,O = 1 - 5
2
j=l
x
(7.11)
Pir,js
s=l
The overall arrival rate from outside to an open network
Xa,ir The arrival rate from outside to node i for class r jobs (Xe,ir = X. PO,+) A.al-
The arrival rate of jobs of the rth class at the ith node: (7.12) j=l
s=l
for closed networks, pg,i, = 0 (1 < i < N, 1 < r < R) and we obtain:
&r = 5 j=l
5
(7.13)
xjs * Pjs,ir.
s=l
The mean number of visits e;, of a job of the rth class at the ith node of an open network can be determined from the routing probabilities similarly to Eq. (7.4):
eir = PO,ir + 2 j=l
2
ejsPjs,i7-,
for i=l,,..,
N, r=l,...,
R.
(7.14)
R.
(7.15)
s=l
For closed networks, the corresponding equation is: for i=l,...,
N, r=l,...,
Usually we assume that er, = 1, for r = 1, . . . , R, although other settings are also possible.
7.2 7.2.1
PERFORMANCE
MEASURES
Single Class Networks
Analytic methods to calculate state probabilities and other performance measures of queueing networks are described in the following sections. The determination of the steady-stateprobabilities r(kl,. . . , k~) of all possible states of the network can be regarded as the central problem of queueing theory. The
PERFORMANCE MEASURES
269
mean values of all other important performance measures of the network can be calculated from these. There are, however, simpler methods for calculating these characteristics directly without using these probabilities. Note that we use a slightly different notation compared with that used in Chapters 2-5. In Chapters 2-5, xi(t) denoted the transient probability of the CTMC being in state i at time t and 7ri as the steady-state probability in state i. Since we now deal with multidimension state spaces, ~(ki, Icz, . . . , IAN) will denote the steady-state probability of state (ki, lc2, . . . , k~). The most important performance measures for queueing networks are: Marginal Probabilities ri(lc): For closed queueing networks, the marginal probabilities xi(k) that the ith node contains exactly ki = k jobs are calculated as follows: 7ri(k) =
c 5
~(kl,...,kN).
(7.16)
k,=K
j=1
& ki=k
Thus xi(k) is the sum of the probabilities of all possible states (kl, . . . , 5 kj = K where a fixed j=l number of jobs, k, is specified for the ith node. The normalization condition that the sum of the probabilities of all possible states (ICI, . . . , kN> kN), 0 5 ki 5 K that satisfy the condition
that satisfy the condition
5 kj = K with
(0 5 kj 5 K)
must be 1,
j=l
that is: c
7r(kl,...,kN)
= 1.
(7.17)
Correspondingly, for open networks we have:
ri(k)= c r(kl,...Jw), k,=k
with the normalization
condition:
C(OTTICI,...
,kN)=L
Now we can use the marginal probabilities to obtain other interesting performance measures for open and closed networks. For closed networks we have to take into consideration that xi(k) = 0 Vk > K. Utilization by:
pi: The utilization pi of a single server node with index i is given
pi
=
ITi
k=l
(7.18)
270
QUEUNNG NETWORKS
where pi is the probability
that the ith node is busy, that is: pi = 1 - 7ri(O).
(7.19)
For nodes with multiple servers we have: pi = k
Fmin(mi, 2 k=O
k);lri(lc) = 1 - mgl G.,,(i,), k=O
(7.20)
and if the service rate is independent of the load we get (see Eq. (6.4)): pi zz - Ai mini ’
(7.21)
Throughput Xi: The throughput Xi of an individual node with index i represents in general the rate at which jobs leave the node: (7.22) where the service rate pi(Jc) is, in general, dependent on the load, i.e., on the number of jobs at the node. For example, a node with multiple servers (m; > 1) can be regarded as a single server whose service rate depends on the load pi (Ic) = min(k,mi). pi, where ,!~i is the service rate of an individual server. It is also true for load-independent service rates that (see Eqs. (6.4), and (6.5)):
We note that for a node in equilibrium, arrival rate and throughput are equal. Also note that when we consider nodes with finite buffers, arriving customers can be lost when the buffer is full. In this case, node throughput will be less than the arrival rate to the node. Overall Throughput A: The overall throughput X of an open network is defined as the rate at which jobs leave the network. For a network in equilibrium this departure rate is equal to the rate at which jobs enter the network, that is:
x = 2 xoi.
(7.24)
i=l
The overall throughput of a closed network is defined as the throughput of a particular node with index i for which ei = 1. Then the overall throughput of jobs in closed networks is: (7.25)
PERFORMANCE MEASURES
Mean Number given by:
of Jobs xi:
271
The mean number of jobs at the ith node is
Ki = &i(k).
(7.26)
k=l
From Little’s theorem (see Eq. (6.9)) it follows:
Ki = A&
(7.27)
where Fi denotes the mean response time. Mean Queue Length mined by:
&i: The mean queue length at the ith node is deter-
(7.28) or, using Little’s theorem:
6.i = XiWi,
(7.29)
where Wi is the mean waiting time. Mean Response Time Ti: The mean response time of jobs at the ith node can be calculated using Little’s theorem (see Eq. (6.9)) for a given mean number of jobs f;;i :
Ti=r.
Ki i
(7.30)
Mean Waiting Time wi: If the service rates are independent of the load, then the mean waiting time at node i is: wigi-l. Pi
7.2.2
Multiclass
(7.31)
Networks
The extension of queueing networks to include multiple job classes leads to a corresponding extension of the performance measures. The state probability of a network with multiple job classes is represented by r(Sr, . . . , S,). The normalization condition that the sum of the probabilities of all possible states (Sl,.. . , S,) is 1 must also be satisfied here.
272
QUEUEING NETWORKS
----
Source
-
Open Class Closed Class
7
Fig. 7.5
A mixed network.
Marginal Probability n;(k): For closed networks the marginal probability, i.e., the probability that the ith node is in the state Si = k, is given by:
n(k)= c +h,..vSv), -$
(7.32)
S,=K
3=1
&
S,=k
and for open networks: 7ri(k) = c
(7.33)
x(S1,. . . , S,).
s,=k
The following formulae for computing the performance measures can be applied to open and closed networks.
Utilization pir: The utilization of the ith node with respect to jobs of the rth class is: Pir --
?ri(k)?
c
-f, ’
all states
min(rni, Ici),
Ici = f
a
k
&,
(7.34)
r=l
with k, > 0
and if the service rates are independent on the load: Air Pir
(7.35)
= m&7-
Throughput Air: The throughput Air is the rate at which jobs of the rth class are serviced and leave the ith node [BrBa80] : (7.36) all states k with k, > 0
2
PRODUCT-FORM QUEUEING NETWORKS
273
or if the service rates are independent on the load: Ai, = mi.pirspir.
(7.37)
Overall Throughput A,: The overall throughput of jobs of the rth class in closed networks with multiclasses is:
+xiT,
(7.38)
ei,
and for open networks:
k-= 5 xo,ir.
(7.39)
i=l
Mean Number of Jobs %?i,: The mean number of jobs of the rth class at the ith node is: Ic,. q(k).
(7.40)
all states k with Ic,. > 0
Little’s theorem can also be used here: Ki, = Ai,* Fir.
(7.41)
Mean Queue Length gir: The mean queue length of class r jobs at the ith node can be calculated using Little’s theorem as:
Qir = Xi,wip*
(7.42)
Mean Response Time Ti,: The mean response time of jobs of the rth class at the ith node can also be determined using Little’s theorem (see Eq. (7.41)): (7.43) Mean Waiting Time Wi,: If the service rates are load-independent, the mean waiting time is given by:
wi, ZTi, - ‘. t%r 7.3
PRODUCT-FORM
QUEUEING
then
(7.44)
NETWORKS
In the rest of this chapter, we will discuss queueing networks that have a special structure such that their solutions can be obtained without generating their underlying state space. Such networks are known as product-form or separable networks.
274
QUEUEING NETWORKS
7.3.1
Global
Balance
The behavior of many queueing system models can be described using CTMCs. A CTMC is characterized by the transition rates between the states of the corresponding model (for more details on CTMC see Chapters 2-5). If the CTMC is ergodic, then a unique steady-state probability vector independent of the initial probability vector exists. The system of equations to determine the steady-state probability vector 7r is given by 7rQ = 0 (see Eq. (2.58)), where Q is the infinitesimal generator matrix of the CTMC. This equation says that for each state of a queueing network in equilibrium, the flux out of a state is equal to the flux into that state. This conservation of flow in the steady state can be written as: c j&s
rjqji
= 7ri
c
qij,
‘d&S,
(7.45)
jES
where qij is the transition rate from state i to state j. After subtracting ni eqii from both sides of Eq. (7.45) and noting that qii = - Cjfi qij, we obtain the global balance equation (see Eq. (2.61)): Vi E S :
c i#i
r’jqji
- 7ri
c
4ij = 0,
(7.46)
i#i
which corresponds to the matrix equation rrQ = 0 (Eq. (2.58)). In the following we use two simple examples to show how to write the global balance equations and use them to obtain performance measures. Example 7.1 Consider the closed queueing network given in Fig. 7.6. The network consists of two nodes (N = 2) and three jobs (K = 3). The service times are exponentially distributed with mean values l/pi = 5 set and l/p2 = 2.5 set, respectively. The service discipline at each node is FCFS. The state space of the CTMC consists of the following four states: ((3, O), c&l>, (1, a>, (0,3)}*
Fig. 7.6
A closed network.
The notation (ki, /&) says that there are /cl jobs at node 1 and k2 jobs at node 2, and r(lcl, Ic2) denotes the probability for that state in equilibrium. Consider state (1,2). A transition from state (1,2) to state (0,3) takes place if the job at node 1 completes service (with corresponding rate ~1). Therefore, ,LL~is the transition rate from state (1,2) to state (0,3) and, similarly, ~2 is
PRODUCT-FORM QUEUEING NETWORKS
Fig. 7.7
State transition
diagram
for Example
275
7.1.
the transition rate from state (1,2) to state (2,l). The flux into a state of the model is just given by all arcs into the corresponding state, and the flux out of that state is determined from the set of all outgoing arcs from the state. The corresponding state transition diagram is shown in Fig. 7.7. The global balance equations for this example are: 439
oh1
= 7-42, qp2,
427
w/Q1
+ P2> =
+
2)t/% + P2) = 7@, ~)CLZ=
7r(3,0)/% + 7r(l, 2)/L& 7+, r(
l)/%
+ T(o,
3),Q,,
1,Qh.
Rewriting this system of equations in the form rrQ = 0 we have:
and the steady-state probability vector n = (7r(3,0), ~(2, l), n(l,2), ~(0,3)). Once the steady-state probabilities are known, all other performance measures and marginal probabilities xi(k) can be computed. If we use ~1 = 0.2 and ~2 = 0.4, then th e generator matrix Q has the following values: -0.2
‘;0
0.2
iof0
0
0
$26 0.4
-0.4 oo2 *
Using one of the steady-state solution methods introduced in Chapter 3, the steady-state probabilities are computed to be: 7r(3,0) = 0.5333,
7r(2,1) = 0.2667,
n&2)
= 0.1333,
~(0,3) = 0.0667
and are used to determine all other performance measures of the network, as follows: l
Marginal probabilities
(see Eq. (7.16)):
q(O) = x2(3) = 7r(0,3) = 0.0667,
q(l)
= n,(2) = +,2)
= 0.133,
276
QUEUEING NETWORKS
~~(2) = ~(1) l
Utilizations
= 7r(2,1) = 0.2667,
(see Eq. (7.20)):
p1 =
a Throughput
7rr(3) = 7r2(0) = n(3,O) = 0.5333.
1 -
7rl(O)
=
0.9333,
p2 = 1 - 7r2(0) = 0.4667.
(see Eq. (7.23)): x = h = x2 = plpl = pap2 = 0.1867.
l
Mean number of jobs (see Eq. (7.26)): El = 2
knl (k) = 2.2667,
X2 = 2
k=l
l
h2(k)
= 0.7333.
k=l
Mean response time of the jobs (see Eq. (7.30)): %
Xl
= ~1
= 12.1429,
T2
x2
= xz
= 3.9286.
Example 7.2 Now we modify the closed network of Example 7.1 so that the service time at node 1 is Erlang-2 distributed, while the service time at node 2 remains exponentially distributed. The modified network is shown in Fig. 7.8. Service rates of the different phases at node 1 are given by ~11 = ~12 = 0.4sec-l and the service rate of node 2 is ~2 = 0.4. There are K = 2 jobs in the sysiem. A state (ICI, I; Icz), I = 0, 1,2, . . ., of the network is now
Fig. 7.8 Network with Erlang-2 distributed server.
not only given by the number Ici of jobs at the nodes, but also by the phase 1 in which a job is being served at node 1. This state definition leads to exactly five states in the CTMC underlying the network. Pll
CL12
Pll
Pl2
Fig. 7.9 State transition diagram for Example 7.2.
PRODUCT-FORM QUEUEING NETWORKS
277
The state transition diagram of the CTMC is shown in Fig. 7.9. The following global balance equations can be derived: 451;
Oh1
= n(L
4272;
oh12
= 74271;
qp11
+ n(l,
al;
Wll
+ pa)
= 742,2;
O)p12
+ n(O,O; 2)/&,
492;
l)(Pl2
+ P2)
= +,1;
q/w,
2)/92
= +2;
l)p12.
7440;
1; qp2, 2; l)p2,
The generator matrix is:
-(I-h1
0 0
0 0
0 Pl2
Pll
+ cL2)
0
-(I412
0
IQ2
pY2
+p2)
*
-P2
After inserting the values ~11 = p12 = p2 = 0.4, Q becomes: -0.4
0.4 -0.4
0 0.4 -0.8
0
0
0.4
0 0 0.4 -0.8 0
0 0 0 0.4 -0.4
Solving the system of equations 7rQ = 0, we get: x(2,1; 0) = 0.2219,
n(2,2; 0) = 0.3336,
7r(l, 1; 1) = 0.2219,
7r(l,2; 1) = 0.1102,
r(O,O; 2) = 0.1125. These state probabilities are used to determine the marginal probabilities: q(O) = 7r2(2) = ?r(O,O;2) = 0.1125, ~~(1) = ~~(1) = x(1,1; 1) + ~(1,2; 1) = 0.3321, 7rr(2) = n2(0) = n(2,l; 0) + 7r(2,2; 0) = 0.5555. The computation of other performance measures is done in the same way as in Example 7.1. 7.3.2
Local
Balance
Numerical techniques based on the solution of the global balance equations (seeEq. (2.61)) can in principle alwaysbe used,but for large networksthis technique is very expensive because the number of equations can be extremely
278
QlJEUElNG NETWORKS
large. For such large networks, we therefore look for alternative solution techniques. In this chapter we show that efficient and exact solution algorithms exist for a large class of queueing networks. These algorithms avoid the generation and solution of global balance equations. If all nodes of the network fulfill certain assumptions concerning the distributions of the interarrival and service times and the queueing discipline, then it is possible to derive local balance equations, which describe the system behavior in an unambiguous way. These local balance equations allow an essential simplification with respect to the global balance equations because each equation can be split into a number of single equations, each one related to each individual node. Queueing networks that have an unambiguous solution of the local balance equations are called product-form networks. The steady-state solution to such networks’ state probabilities consist of multiplicative factors, each factor relating to a single node. Before introducing the different solution methods for product-form networks, we explain the local balance concept in more detail. This concept is the theoretical basis for the applicability of analysis methods. Consider global balance equations (Eq. (2.61)) for a CTMC: vi
E s :
c jES
Tiqii
= 7Ti
c j&3
4ij
or: rQ=O, with the normalization
condition: c
iTTi = 1.
iES
Chandy [Chan72] noticed that under certain conditions the global balance equations can be split into simpler equations, known as local balance equations. Local balance property for a node means: The departure rate from a state of the queueing network due to the departure of a job from node i equals the arrival rate to this state due to an arrival of a job to this node. This can also be extended to queueing networks with several job classes in the following way: The departure rate from a state of the queueing network due to the departure of a class r-job from node i equals the arrival rate to this state due to an arrival of a class r-job to this node. In the case of non-exponentially distributed service times, arrivals and departures to phases,instead of nodes, have to be considered.
PRODUCT-FORM QUEUEING NETWORKS
279
Fig. 7.10 A closed queueing network.
Example 7.3 Consider a closed queueing network (see Fig. 7.10) consisting of N = 3 nodes with exponentially distributed service times and the following service rates: ,LJ~= 4sec-‘, ~2 = lsec-l, and ,93 = 2sec-l. There are K = 2 jobs in the network and the routing probabilities are given as: p12 = 0.4, p13 = 0.6, and p21 = p31 = 1. The following states are possible in the network: (2,0, O),
((4% O),
(O,O, 3,
(171, o>, (LO,117(07 L0
The state diagram of the underlying CTMC is shown in Fig. 7.11. We set the overall flux into a state equal to the overall flux out of the state for each state to get global balance equations:
(1)
42,0,
WcLlP12
+ I-LlPl3)
= n(l,
0, qp3p31
(a>
407
27 Q2P21
= r(l,
1, QlP12,
(3)
40?
07 %-43P31
= +,
0, QJlP13,
+ YlP12)
= @,
2, qp2p21
(4)
417
17 Ob2P21
+ /JlP13
+ 40,1, c5)
+,
0,1)(~3P31
+ plPl2
+ plP13)
= +, + 64
(6)
~(0~
19 WcL3P31
+ P2P21)
= +,
+ r(l,
l,O)p2p21,
+ 42,0,0)/4p12
1b3P317
0,2)jJ3p31
+ T(O, 1, +2p21
O,O)/aP13, 1, qp1p13
+ n(l,
0, qp1p12.
To determine the local balance equations we start, for example, with the state (1, 1,O). The departure rate out of this state, because of the departure of a job from node 2, is given by ~(1,1,0)+,~2~p2r. This rate is equated to the arrival rate into this state (I, 1,O) due to the arrival of a job at node 2; 7r(2,0,0). ,~l.p12. Therefore, we get the local balance equation:
(4’)
+,1,0>*
ruZ’P21
= 7@,
60).
/Ql'Pl2.
Correspondingly, the departure rate of a serviced job at node 1 from state (1, 1,O) equals the arrival rate of a job, arriving at node 1 into state (l,l,O): (4”)
al,
0). Pl.
(P13 + P12) = n(0,
1, 1). /J3’P31 + n(O,2,0).
/Q.p21.
280
QUEUEING NETWORKS
Fig. 7.11 The CTMC for Example 7.3.
By adding these two local balance equations, (4’) and (4”)) we get the global balance Eq. (4). Furthermore, we see that the global balance Eqs. (l), (2), and (3) are also local balance equations at the same time. The rest of the local balance equations are given by: c5’)
T(l,
o,lh
(Pl2 417
(5”)
+ P13)
07 G3P31
= ‘@, = 7+,0,
1, +2P21
+ r(o,
0,2)jJ31331,
Qw13,
(6’)
407
17 l)PwzPZl
= q,
0, Qw12,
(f-0
Q,
1, Q3P31
= q,
LO)pm3,
with (5’) + (5”) = (5) and (6’) + (6”) = (6), respectively. Noting that ~12 + ~13 = 1 and ~21 = ~31 = 1, the following relations can be derived from these local balance equations: 7r(l 70 7
1)
7r(l
P3
= 7r(270 7O)c"'p 13’
7r(O70 7a>= n(2 70 70) ( 7r(O, 1,1) = 7r(2,0,0)
fip13 P3
71 70) = 7r(2 70 ?0)&p P2 12’
> 2 7 n-(0 72 O)= 42 7
7
0 70) ( fip12 > 2 7
Pu:!
P2 -p12P13. p2p3
Imposing the normalization condition, that the sum of probabilities of all states in the network is 1, for 7r(2,0,0) we get the following expression:
After inserting the values, we get the following results:
7r(2,0,0) = 0.103, 7r(O,O,2) = 0.148, r(l, (41) = 0.123, ~(0,2,0)
= 0.263,
~(1, I$)
= 0.165,
~(0, I, I) = 0.198.
PRODUCT-FORM QUEUEING NETWORKS
281
As can be seen in this example, the structure of the local balance equations is much simpler than the global balance equations. However, not every network has a solution of the local balance equations but there always exists a solution of the global balance equations. Therefore, local balance can be considered as a sufficient (but not necessary) condition for the global balance. Furthermore, if there exists a solution for the local balance equations, the model is then said to have the local balance property. Then this solution is also the unique solution of the system of global balance equations. The computational effort for the numerical solution of the local balance equations of a queueing network is still very high but can be reduced considerably with the help of a characteristic property of local-balanced queueing networks: For the determination of the state probabilities it is not necessary to solve the local balance equations for the whole network. Instead, the state probabilities of the queueing network in these cases can be determined very easily from the state probabilities of individual single nodes of the network. If each node in the network has the local balance property, then the following two very important implications are true: l
The overall network also has the local balance property as proven in [CHT77].
l
There exists a product-form +%,
s2, * * * ) S,)
=
solution for the network, that is: ;
[7r(S1)* n(S2)*
. . . * 7T(S,)]
)
(7.47)
in the sense that the expression for the state probability of the network is given by the product of marginal state probabilities of each individual node. The proof of this fact can be found in [Munt73]. The normalization constant G is chosen in such a way that the sum of probabilities over all states in the network equals 1. Equation (7.47) says that in networks having the local-balance property, the nodes behave as if they were single queueing systems. This characteristic means that the nodes of the network can be examined in isolation from the rest of the network. Networks of the described type belong to the class of socalled separable network or product-form networks. Now we need to examine for which types of elementary queueing systems a solution of the local balance equation exists. If the network consists of only these types of nodes then we know, because of the preceding conclusion, that the whole network has the local-balance property and the network has a product-form solution. The local-balance equations for a single node can be presented in a simplified form as follows [SaCh81]:
~(S)%-(S) = T(S -1,)*X,.
(7.48)
282
QUEUEING NETWORKS
In this equation, Pi is the rate with which class-r jobs in state S are serviced at the node, A, is the rate at which class-r jobs arrive at the node, and (S - 1,) describes the state of the node after a single class-r job leaves it. It can be shown that for the following types of queueing systems the local balance property holds [Chan72]: Type-l : M/M/ m-FCFS. The service rates for different job classes must be equal. Examples of Type-l nodes are input/output (I/O) devices or disks. Type-Z?: M/G/l-PS. The CPU of a computer system can very often be modeled as a Type-2 node. Type-3: M/G/w nodes.
(infinite
server).
Terminals can be modeled as Type-3
Type-d: M/G/l-LCFS PR. There is no practical example for the application of Type-4 nodes in computer systems. For Type-2, Type-3, and Type-4 nodes, different job classes can have different general service time distributions, provided that these have rational Laplace transform. In practice, this requirement is not an essential limitation as any distribution can be approximated as accurately as necessary using a Cox distribution. In the next section we consider product-form solutions of separable networks in more detail. Problem 7.1 Consider the closed queueing network given in Fig. 7.12. The network consists of two nodes (N = 2) and three jobs (K = 3). The
Fig. 7.12
A simple queueing network example for Problem 7.1.
service times are exponentially distributed with mean values l/pi = 5 set and l/p2 = 2.5 sec. The service discipline at each node is FCFS, and the routing probability 4 = 0.1. (a) Determine the local balance equations. (b) From the local balance equations, derive the global balance equations.
(4
Determine the steady-state probabilities tions.
using the local balance equa-
PRODUCT-FORM QUEUEING NETWORKS
7.3.3
283
Product-Form
The term product-form was introduced by [Jack631 and [GoNe67a], who considered open and closed queueing networks with exponentially distributed interarrival and service times. The queueing discipline at all stations was assumed to be FCFS. As the most important result for the queueing theory, it is shown that for these networks the solution for the steady-state probabilities can be expressed as a product of factors describing the state of each node. This solution is called product-form solution. In [BCMP75] these results were extended to open, closed, and mixed networks with several job classes, nonexponentially distributed service times and different queueing disciplines. In this section we consider these results in more detail and give algorithms to compute performance measures of product-form queueing networks. A necessary and sufficient condition for the existence of product-form solutions is given in the previous section but repeated here in a slightly different way: Local Balance Property: Steady-state probabilities can be obtained by solving steady-state (global) balance equations. These equations balance the rate at which the CTMC leaves that state with the rate at which the CTMC enters it. The problem is that the number of equations increases exponentially in the number of states. Therefore, a new set of balance equations, the so-called local balance equations, is defined. With these, the rate at which jobs enter a single node of the network is equated to the rate at which they leave it. Thus local balance is concerned with a local situation and reduces the computational effort.
Moreover, there exist two other characteristics that apply to a queueing network with product-form solution: M + M-Property (Markov Implies Markov): A service station has the M + M-property if and only if the station transforms a Poisson arrival process into a Poisson departure process. In [Munt73] it is shown that a queueing network has a product-form solution if all nodes of the network have the M + M-property. Station-Balance Property: A service discipline is said to have stationbalance property if the service rates at which the jobs in a position of the queue are served are proportional to the probability that a job enters this position. In other words, the queue of a node is partitioned into positions and the rate at which a job enters this position is equal to the rate with which the job leaves this position. In [CHT77] it is shown that networks that have the station-balance property have a productform solution. The opposite does not hold.
284
QUEUEING NETWORKS
Fig. 7.13 Relation between SB, LB, PF, and M + M.
The relation between station balance (SB), local balance (LB), productform property (PF) , and Markov implies Markov property (n/r + A4) is shown in Fig. 7.13. 7.3.4
Jackson
Networks
The breakthrough in the analysis of queueing networks was achieved by the works of Jackson [Jack57, Jack63]. He examined open queueing networks and found product-form solutions. The networks examined fulfill the following assumptions: There is only one job class in the network. The overall number of jobs in the network is unlimited. Each of the N nodes in the network can have Poisson arrivals from outside. A job can leave the network from any node. All service times are exponentially
distributed.
The service discipline at all nodes is FCFS. The ith node consists of mi 2 1 identical service stations with the service rates pi, i = 1,. . . , N. The arrival rates Xoi, as well as the service rates, can depend on the number Ici of jobs at the node. In this case we have load-dependent service rates and load-dependent arrival rates. Note: A service station with more than one server and a constant service rate pi is equivalent to a service station with exactly one server and loaddependent service rates:
(7.49)
PRODUCT-FORM QUEUEING NETWORKS
Jackson’s Theorem: for all nodes i = Eq. (7.1)), th en expressed as the nodes, that is: ;?-(h,
285
If in an open network ergodicity (A; < pi. mi) holds 1,. . . , N (the arrival rates Xi can be computed using the steady-state probability of the network can be product of the state probabilities of the individual
k2, * * * , kN)
= 7b(h)‘~2(Jc2)’
(7.50)
* * * ‘TN(kN).
The nodes of the network can be considered as independent M/M/m queues with arrival rate Xi and service rate pi. To prove this theorem, [Jack631 has shown that Eq. (7.50) fulfills the global balance equations. Thus, the marginal probabilities ri(lci) can be computed with the well-known formulae for M/M/m systems (see Eqs. (6.26), (6.27)):
(7.51)
where 7ri(O) is given by the condition
E ri(ki)
= 1:
k,=O
-’ 7 pi= Proof: t ions:
= 5
-
Ai
< 1.
(7.52)
m&i
We verify that Eq. (7.50) fulfills the following global balance equa-
Xoiy(ki)n(kl,
. . . , ki - 1,. . . , kN)
i=l
(7.53) r(k1,. . . , ki + 1,. . . , kN) N
N
+ c c q ( kj + l)/.qpgr(kl, i=l j=l
. . . , kj + 1,. . . , ki - 1, . . . , kN).
The indicator function y(ki) is given by: y(h) =
0,
ki = 0,
1, ki > 0.
(7.54)
286
QUEUEING NETWORKS
The function: (7.55) gives the load-dependent service rate multiplier. For the proof we use the following relations:
(7.56)
Aj/LiCXi(lc;) T(kl, 66s7kj + 1, *. -7 Ici - 1,. ~67ICN) _ Xi/.hjCXj(kj + 1)’ ~(JGl,...,Icj,...,~i,...,~N) If we divide Eq. (7.53) by ~(lcr,. . . , /CN) and insert Eq. (7.56), then, by using the relation y(/~i) . CX~(/C~) E oi (Ici) 7we get:
flhoi i=l
+ eN(k)pi i=l
=e (
I-
gpij j=l
N
XOif%ai
i=l
+x
x
i=l
Ai )
(h)
i
(7.57) N N Pi44 + CC TPjiXji=l
j=l
2
The first term can be rewritten as:
and the last one as: 2
2
i=l j=l
PiaiFk)
2
pjixj
= f)iCri(ki) i=l
_ 2 i=l
AOipTi(ki). a
(7.58)
Substituting these results on the right side of Eq. (7.57), we get the same result as on the left side. q.e.d. The algorithm based on Jackson’s theorem for computing the steady-state probabilities can now be described in the following three steps:
For all nodes, i = 1, . . . , IV, compute the arrival rates Xi of the open network by solving the traffic equations, Eq. (7.1).
287
PRODUCT-FORM QUEUEING NETWORKS
Consider each node i as an M/M/m queueing system. Check the ergodicity, Eq. (6.5), and compute the state probabilities and performance measures of each node using the formulae given in Section 6.5. Using Eq. (7.50), compute the steady-state probabilities overall network.
of the
We note that for a Jackson network with feedback input, processes to the nodes will, in general, not be Poisson and yet the nodes behave like independent M/M/m nodes. Herein lies the importance of Jackson’s theorem [BeMe78]. We illustrate the procedure with the following example. Example 7.4 Consider the queueing network given in Fig. 7.14, which consists of N = 4 single server FCFS nodes. The service times of the jobs at
Fig. 7.14 Open queueing network model of a computer system.
each node are exponentially distributed with respective means: -
1
= o.odsec,
tQ1
-
1
= 0.03seq
t92
The interarrival
-
1
= O.O6sec, -
1
= 0.05sec.
P4
IQ3
time is also exponentially distributed
with the parameter:
X = X04 = 4 jobs/set. Furthermore, the routing probabilities are given as follows: P12 = ~13 = 0.5,
~41 = p21 =
1,
p31 = 0.6,
p30 = 0.4.
Assume that we wish to compute the steady-state probability of state (ICI, k2, k3, k4) = (3,2,4,1) with the help of the Jackson’s method. For this we follow the three steps given previously: Compute the arrival rates from the traffic equations, Eq. (7.1): h
= X2P21
x3 = hP13
+ x3p31+ = 10,
X4P41
= 20,
x2 = hp12 x4 =
= 10,
x()4 = 4.
288
QUEUEING NETWORKS
Compute the state probabilities and important performance measures for each node. For the utilization of a single server we use Eq. (6.3): Xl p1=--0.8, I-L1
x2 p2=-=0.3, P2
p3=x3&6, CL3
p&L~. I-L4
Thus, ergodicity (pi < 1) is fulfilled for all nodes. The mean number of jobs at the nodes is given by Eq. (6.13):
j+P1=
4,
K2 = 0.429,
K3=l.5,
17*=O.a5.
1 -P1
Mean response times, from Eq. (6.15):
gil -
llm 1 -
= 0.2,
T2 = 0.043,
T3 = 0.15,
T4 = 0.0625.
Pl
The mean overall response time of a job is given by using Little’s theorem in the following way: T=
K x = $Fi
= 1.545.
1=1
Mean waiting times, from Eq. (6.16): w1 = fd$
= 0.16,
w2 = 0.013,
w3 = 0.09,
w4 = 0.0125.
Mean queue lengths, from Eq. (6.17):
~~ = - Pi 1 -
= 3.2, Q2 = 0.129,
Q3
=0.9,
Q4
=0.05.
Pl
The necessary marginal probabilities
can be computed using Eq. (6.12):
nr(3) = (1 - pl)p; = 0.1024,
7r2(2) = (1 - p2)p; = 0.063,
~~(4) = (1 - p3)p; = 0.0518,
7r4(1) = (1 - p4)p4 = 0.16.
Computation
of the state probability
7r(3,2,4,1)
using Eq. (7.50):
7r(3,2,4,1) = 7rr(3)* 7r2(2)* 7r3(4)* n-4(1) = 0.0000534. 7.3.5
Gordon/Newell
Networks
Gordon and Newell [GoNe67a] considered closed queueing networks for which they made the same assumptions as in open queueing networks, except that
PRODUCT-FORM QUEUEING NETWORKS
289
no job can enter or leave the system (X0; = X;e = 0). This restriction means that the number K of jobs in the system is always constant:
Thus, the number of possible states is finite, and it is given by the binomial coefficient:
which describes the number of ways of distributing K jobs on N nodes. The theorem of Gordon/Newell says that the probability for each network state in equilibrium is given by the following product-form expression: (7.59) Here G(K) is the so-called normalixation constant. It is given by the condition that the sum of all network state probabilities equals 1:
G(K) =
C fiF,(ki)5 ki=K '=' a=1
The Fi(ki) are functions that correspond to the state probabilities the ith node and are given by:
(7.60)
ri(lci) of
(7.61) where the visit ratios ei are computed using Eq. (7.5). The function pi(Ici) is given by:
pi(ki) =
ki!, mi! * rnfamrni,
1,
ki I mi, Ici > mi, mi = 1.
(7.62)
For various applications a more general form of the function Fi (Ici) is advantageous. In this generalized function, the service rates depend on the number of jobs at the node. For this function we have:
ki e; Fi(ki) = Ai(Q’
(7.63)
290
QUEUEING NETWORKS
with
(7.64)
With relation (7.49) it can easily be seen that the case of constant service rates, Eq. (7.61), is a special case of Eq. (7.63). Proof: Gordon/Newell have shown in [GoNe67a] that Eq. (7.59) fulfills the global balance equations (see Eq. (7.53)) that have now the following form:
(7.65)
where the left side describes the departure rate out of state (ki, . . . , kN> and the right side describes the arrival rate from successor states into this state. The function y(ki) and cxi(ki) are given by Eqs. (7.54) and (7.55). We define now a variable transformation as follows: @l,.
. . , kN) =
Q(h,--.,k~) fi Pi(k)
i=l
(7.66) ’
and: +l,
. . . , ki - 1,. . . , kj + 1,. . . , kN) =
-$&Q(k, 3
,...,
ki-l,...,Ic,+l,...$N) . fi
Pi(k)
i=l
If we substitute these equations into Eq. (7.65) we get:
(7.67)
291
PRODUCT-FORM QUEUEiNG NETWORKS
This equation can be simplified using the relation r(lci)cri(ki)
E ai(lci):
N
c a@i)piQ(h, -a.7
kN)
=
i=l N
(7.68)
N
cc ai(k)pjpjiQ(h,. ..,h- 1,...,kj+1,...,h), j=l
i=l
and Q(rCr,. . . , kN) can be written in the form: (7.69) with the relative utilization xi = ei/pi (Eq. (7.6)). Here c is a constant. Substituting this into (7.68) we have:
i=l
= 5
2
cui(~i)pjpji(x~l
. . . xy
. . . x;j+l
. . . x$cJ) . c
This expression can be rewritten as:
gm (pi-gpjpji$ =0. Since at least one ai (ki) will be non-zero, it follows that the factor in the square brackets must be zero. Thus, we are led to consider the system of linear algebraic equations for xi: N /LiZi
=
pjxjPji,
c j=l
where xi = e;/pi (Eq. (7.6)) and: N ei =
c j=l
ejPji*
292
QUEUEING NETWORKS
This is the traffic equation for closed networks (Eq. (7.5)). That means that Eq. (7.69) is correct and with Eqs. (7.66) and (7.61) we obtain Eq. (7.59).q.e.d. Thus the Gordon/Newell theorem yields a product-form solution. In the general form it says that the state probabilities r(lcl, kz, - - . , kN) are given as the product of the functions Fi (ki), i = 1. . . , N, defined for single nodes. It is interesting to note that if we substitute in Eq. (7.59) Fi(lci) by Li. Fi(ki), 1 < i 5 N, then this has no influence on the solution of the state probabilities , kN> as long as Li is a positive real number. Furthermore, the +h,... use of X.ei, i = l,... , N, with an arbitrary constant X > 0, has no influence on the results because the visit ratios ei are relative values [ShBu77]. (Also see Problems 2 and 3 on page 444 of [Riv82].) The Gordon/Newell method for computing the state probabilities can be summarized in the following four steps: Compute the visit ratios ei for all nodes i = 1,. . . , N of the closed network using Eq. (7.5). For alli = l,... , N, compute the functions Fi(ki) using Eq. (7.61) or Eq. (7.63) (in th e case of load-dependent service rates). Compute the normalization
constant G(K) using Eq. (7.60).
Compute the state probabilities of the network using Eq. (7.59). From the marginal probabilities, which can be determined from the state probabilities using Eq. (7.16), all other required performance measures can be determined. An example of the application of the Gordon/Newell
theorem follows:
Example 7.5 Consider the closed queueing network shown in Fig. (7.15) with N = 3 nodes and K = 3 jobs. The queueing discipline at all nodes is FCFS. The routing probabilities are given as follows:
P12 = 0.3,
p21 = 0.2, P22 = 0.3,
P13 = 0.1,
P23 =
~11 = 0.6,
0.5,
The service time at each node is exponentially ~1 =
0.8 se?,
This network consists of
p2 = 0.6 set-l,
p32
0.4, = 0.1,
p33
= 0.5.
P31 =
distributed with the rates: p3 = 0.4 set-l .
(7.70)
PRODUCT-FORM QUEUEING NETWORKS
Fig. 7.15
293
Closed queueing network.
states, namely: (3,0, O),
(2,L o),
(270, I>,
(k2, O),
(Lb I>>
(l,O, 2),
(073, O),
(0,2,1>,
(0,1,2),
(O,O?3).
We wish to compute the state probabilities using the Gordon/Newell orem. For this we proceed in the following four steps: r-1
-1
Determine the visit ratios at each node using Eq. (7.5): el = elpll
+
e2p21
+
e3p31
=
I,
e2
=
elpl2
+
e2p22
+
e31332
=
0.533,
e3
=
elpl3
+e2p23
=
0.733.
+e3p33
Determine the functions Fi(ki) for i = 1,2,3 using Eq. (7.61):
WO = (el/pl)o = 1,
W) = (e1/d = 1.25,
Fr(2) = (el/p1)2 = 1.5625,
Fr(3) = (el/p1)3 = 1.953,
and correspondingly: F2(0) = 1, F?(O) = 1, m]
Fz(l) = 0.889, F3(1) = 1.833,
F2(2) = 0.790, Fs(2) = 3.361,
Determine the normalization G(3) =
Fr(3)F2(0)&(0) + ~~(~)~2(2)~3(0)
&(3) = 0.702, Fs(3) = 6.162.
constant using Eq. (7.60):
+ Fr(2)&(1)Fs(O) + h(l)F&)F&)
+ Fr(2)FZ?(O)F$) + Fl(1)h(o)F3(2)
the-
294
QUEUEING NETWORKS
+ pl(“)F2(3)~3(o)
+ Fl(o)F2(2)F3(1)
+ Fl(O)&(O)F3(3)
= 24.733.
Determine the state probabilities rem, Eq. (7.59):
+
using the Gordon/Newell
1
= -q3)*qo)* G(3)
n(3,0,0)
F3(0)
0) = L~r(2).~2(l)+3(0) G(3)
G,L
F1(0)&(1)F3(2)
=
theo-
0.079,
= 0.056.
In the same way we compute: 7r(2,0,1)
= 0.116,
~(0,3,0)
= 0.028,
n(1,2,0) x(0,2,1)
= 0.040, = 0.058,
?r(l, 1,l) = 0.082, ~(0, 1,2) = 0.121,
T&0,2) = 0.170, r(O,O, 3) = 0.249.
Using Eq. (7.16), all marginal probabilities can now be determined: q(o) = 7r(o, 3,0) + n(O,2,1) + 7r(O,1,2) + 7r(O,0,3) = 0.457, 7rr(l) = 7r(l, 2,O) + 7r(l, 1,l) + T(l,O, 2) = 0.292, ~(2)
= n(2,1,0)
+ n(2,0,1)
= 0.172,
7r1(3) = 7r(3,0,0)
= 0.079,
~~(0) = 7r(2,0,1)
+ r(l,O, 2) + r(O,O, 3) + n(3,0,0)
= 0.614,
T2(1) = n(2,1,0) + 7r(l, 1,l) + T(O,l, 2) = 0.259, 7r2(2) = x(1,2,0)
+ 7r(O,2,1) = 0.098,
T2(3) = x(0,3,0)
= 0.028,
~~(0) = 7r(3,0,0)
+ n(2,1,0)
+ r&2,0)
+ n(O,3,0) = 0.203,
7r3(1) = 7r(2,0,1) + ~(1, 1,l) + ~(0,2,1) = 0.257, ~~(2) = r(l,O, 2) + ~(0,1,2) = 0.291, 7r3(3) = 7r(O,
0,3) = 0.249.
For the other performance measures, we get: l
Utilization
at each node, Eq. (7.19): p1 = 1 - 7rr(O) = 0.543,
l
p2 = 0.386,
p3 = 0.797.
Mean number of jobs at each node, Eq. (7.26):
K1= &q(k) k=l
= 0.873 &, = 0.541, K3= 1.585.
295
PRODUCT-FORM QUEUEING NETWORKS
Throughputs at each node, Eq. (7.23):
l
Xl =mlplpl
= 0.435, X2 = 0.232, X3 = 0.319.
Mean response time at each node, Eq. (7.43):
l
T, = -Kl = 2.009, T2 = 2.337, T3 = 4.976. h
7.3.6
BCMP
Networks
The results of Jackson and Gordon/Newell were extended by Baskett , Chandy, Muntz, and Palacios in their classic article [BCMP75], to queueing networks with several job classes, different queueing strategies, and generally distributed service times. The considered networks can be open, closed, or mixed. Following [BrBa80], an allowed state in a queueing model without switching is characterized by four conditions:
class
1. The number of jobs in each class at each node is always nonnegative, i.e.:
2. For all jobs the following condition must hold: iii, > 0
if there exists a way for class-r jobs to node i with a non-zero probability.
3. For a closed network, the number of jobs in the network is
by:
4. The sum of class-r jobs in the network is constant at any time, i.e.:
K, = gki, given
= const.
l
i=l
If class switching is allowed, then Conditions 1-3 are fulfilled, but Condition 4 can not be satisfied because the number of jobs within a class is no longer constant but depends on the time when the system is looked at and can have the values k E {O, . . . , K}. In order to avoid this situation, the concept of chains is introduced.
296
QUEUEING NETWORKS
7.3.6.1 The Concept of Chains Consider the routing matrix P = bir,jS], i,j = l,..., N, and T, s = 1,. . . , R of a closed queueing network. The routing matrix P defines a finite-state DTMC whose states are pairs of form (i, r) (node number i, class index r). We call (i,r) a node-class pair. This state space can be partitioned into disjoint sets Ii, i = 1,. . . , U:
r = rl + r2 + . . . + ru, where Ii is a closed communicating
class of recurrent states of the DTMC.
Pl
M-l
fig. 7.16
p2
0
Modified routing matrix.
With a possible relabeling of nodes, we get the routing matrix of Fig. 7.16. Where submatrices Pi contains the transition probabilities in set I’i and each Pi is disjoint and assumed to be ergodic. Let chain Ci denote the set of job classes in ri. Because of the disjointness of the chains Ci, it is impossible for a job to switch from one chain to another. If a job starts in a chain, it will never leave this chain. With the help of this partitioning technique, the number of jobs in each chain is always constant in closed networks. Therefore, Condition 4 for queueing networks is fulfilled for the chains. Muntz [Munt73] proved that a closed queueing network with U chains is equivalent to a closed queueing network with U job classes. If there is no class switching allowed, then the number of chains equals the number of classes. But if class switching is allowed, the number of chains is smaller than the number of classes. This means that the dimension of the population vector is reduced. An extension of the chains concept to open networks is possible. The procedure to find the chains from a given solution matrix is the following:
PRODUCT-FORM QUEUEING NETWORKS
297
Construct R sets that satisfy the following condition:
node-class pair (j, s) can be reached from pair (i, r) in a finite number of steps
7
Eliminate all subsets and identical sets so that we have: U 5 R
sets to obtain the chains
Cl,. . . , Cu.
Compute the number of jobs in each chain
The set of all possible states in a closed multiclass network is given by the following binomial coefficient:
N+$[+K,*-I N. IC,l -
’
I
where IC, ( is the number of elements in C,, llq
r E Cq, 2%
= PO,ir +
C
ejsPjs,ir
for
i = l,...N,
(7.71)
lLqlU,
SEC,
j=l,...,N
and for closed networks:
-Sir =c
SEC, j=l,...,N
r E c,, ejsPjs,ir
for
i= l,...N, lSq
(7.72)
298
QUEUEING NETWORKS
Example
7.6
Let the following routing matrix P be given: (1.1)
=
(292)
(293)
(3.1)
(3,2)
(393)
0
0
0.3
0
0
0.3
0
0.3
0
0
0
0
0
0
0.7
0
(133)
0
0
0
0
0
0.3
0
0
0.7
(2,1)
0
0
0
0
1
0
0
0
0
(292)
0.3
0
0
0
0
0
0.7
0
0
(2>3)
0
0
0.3
0
0
0
0
0
0.7
(331)
1.0
0
0
0
0
0
0
0
0
(3.2)
0
0
0
1.0
0
0
0
0
0
(3,3)
, 0
0
0.4
0
0
0.6
0
0
0
(1?2)
p
(221)
0.4
(1.1)
’ 0
(1,2) (1>3)
By using the chainin!
;echnique we get:
EI =
{1,2},
E2
=
{Q},
E3
= {3),
meaning that R = 3 classes are reduced to U = 2 chains. By eliminating the subsets and identical sets we get two chains: Cl = {1,2},
c2 = (3).
Then the reorganized routing matrix P’ has the form: (19 1) (13 1)
p’ =
’ 0
(1?2)
(2,1)
(2.2)
(3,l)
(3,2)
(1.3)
(2.3)
(393)
0.4
0
0.3
0
0.3
0
0
0
(13 2)
0.3
0
0
0
0
0.7
0
0
0
(2>1)
0
0
0
1.0
0
0
0
0
0
(272)
0.3
0
0
0
0.7
0
0
0
0
(3,1)
1.0
0
0
0
0
0
0
0
0
(312)
0
0
1.0
0
0
0
0
0
0
(193)
0
0
0
0
0
0
0
(223)
0
0
0
0
0
0
0.3
0
0
0
0
0
0.4 0.6
(3.3)
,o
0.3 0.7 0
0.7 0
If a job starts in one of the chains Cl or C2, then it can not leave them. Because of the switch from classes to chains, it is necessary to transfer class measures into chain measures [BrBa80] (ch ain measures are marked with a *):
PRODUCT-FORM QUEUEING NETWORKS l
299
Number of visits: In a chain a job can reach different node-class pairs (i, j) (i: node index, j: class index) where: C eir r-4, * eiq = Gel,’
(7.73)
rd2,
l
The number of jobs per class is constant in a chain, but within the chain jobs can change their class. Therefore, if a job of chain q visits node i, it is impossible to know to which class it belongs because we consider a chain to be a single entity. If we make the transformation class
chain,
+
the information about the class of a job is lost. Because different job classes have different service times, this missing information is exchanged by the scale factor cv. For the service rate in a chain we get: 1 siq = * = c I-L. vl r&, ai,
=
-
gireis
Sir
(7.74)
’ air 9
(7.75)
’
SEC,
In Chapter 8 we introduce several algorithms to calculate the performance measures of single and multiclass queueing networks without class switching (such as mean value analysis or convolution). Using the concept of chains, it is possible to also use these algorithms for queueing networks with class switching. To do so, we proceed in the following steps: Calculate the number of visits ei, in the original network. Determine the chains Ci . . . Cu, and calculate the number of jobs Kg* in each chain. Compute the number of visits ezq for each chain, Eq. (7.73). Determine the scale factors air, Eq. (7.75). Calculate the service times szq for each chain, Eq. (7.74). Derive the performance measures per chain [BrBa80] with one of the algorithms introduced later (mean value analysis, convolution, etc.). Calculate the performance measures per class from the performance measures per chain: Tir(K*)
= sir * (1 + Ki(K*
- 1,)) ,
T E Cq
300
QUEUEING NETWORKS
Ai,
= aiT . A$,
pi&*)
= sir - kr(K*),
where: Population vector containing the number of jobs in each chain, K* - 1, : K” with one job less in chain q, TjT,(K* - lq) : Mean number of jobs at node i if the number of jobs in the chains is given by (K* - 14).
K* = (K;,...,K;):
All other performance measures can easily be obtained using the formulae of Section 7.2. In the following section we introduce the BCMP theorem, which is the basis of all analysis techniques to come. If class switching is not allowed in the network, then the BCMP theorem can be applied directly. In the case of class switching, it needs to be applied in combination with the concept of chains. 7.3.6.2 BClVlP Theorem The theorems of Jackson and Gordon/Newell have been extended by [BCMP75] to networks with several job classes and different service strategies and interarrival/service time distributions, and also to mixed networks that contain open and closed classes. The networks considered by BCMP must fulfill the following assumptions: l
Queueing discipline: The following disciplines are allowed at network nodes: FCFS, PS, LCFS-PR, IS (infinite server).
l
Distribution of the service times: The service times of an FCFS node must be exponentially distributed and class-independent (i.e. pi1 = pi2 = . . . = ,&R = pi), while PS, LCFS-PR and IS nodes can have any kind of service time distribution with a rational Laplace transform. For the latter three queueing disciplines, the mean service time for different job classes can be different.
l
Load-dependent service rates: The service rate of an FCFS node is only allowed to depend on the number of jobs at this node, whereas in a PS, LCFS-PR and IS node the service rate for a particular job class can also depend on the number of jobs of that class at the node but not on the number of jobs in another class.
l
Arrival processes: In open networks two kinds of arrival processes can be distinguished from each other. Case 1: The arrival process is Poisson where all jobs arrive at the network from one source with an overall arrival rate X, where X can depend
PRODUCT-FORM QUEUEING NETWORKS
301
on the number of jobs in the network. The arriving jobs are distributed over the nodes in the network in accordance to the probability PO,+. where:
E i=l
PO,ir = 1.
r=l
Case 2: The arrival process consists of U independent Poisson arrival streams where the U job sources are assigned to the U chains. The arrival rate X, from the uth source can be load dependent. A job arrives at the ith node with probability PO,+.so that:
c
PO,ir = 17
for all u= l,...,U.
rEC, i=l
I..., N
These assumptions lead to the four product-form node types and the local balance conditions for BCMP networks (see Section 7.3.2), that is: Type-l:
-/M/m
Type-3:
-/G/CO (IS)
- FCFS
Type-2: Type-4:
-/G/l -/G/l
- PS - LCFS PR
Note that we use -/M/ m notation since we know that, in general, the arrival process to a node in a BCMP network will not be Poisson. The BCMP Theorem says that networks with the characteristics just described have product-form solution: For open, closed, and mixed queueing networks whose nodes consist only of the four described node types, the steady-state probabilities have the following product-form: (7.76)
where, G(K) = the normalization constant, d(S) = a function of the number of jobs in the network, S = (Si, . . . , S,) and: K(S)-1 w
open network with arrival process 1,
i=o u K,(S)--1 n J-J WL
open network with arrival process 2,
r-J
d(S) =
I
u=l
1,
i=o
closed networks.
QUEUEING NETWORKS
302
fi(Si)
= a f uric t ion which depends on the type and state of each nodei
Type-l, Type-% (7.77)
Type-3, Type-4. Variables in for Eq. (7.77) have the following meanings: szj : Pir1: uir :
Class of the job that is at the jth position in the FCFS queue. Mean service rate in the Zth phase (I = 1, . . . , uir) in a Cox distribution (see Chapter 1). Maximum number of exponential phases. l-l
n airj, probability that a class-r job at the ith node reaches the Zth j=o service phase (A,,1 = 1 because of aiT0 = 1). CLirjZ Probability that a class-r job at the ith node moves to the (j + 1)th phase. b-1 1 Number of class-r jobs in the Zth phase of node i. Ail-1
:
For the load-dependent case, fi (Si) is of the form: fi(Si)
=
-A- Ici iifZiszj. ( Pi > j=l
Proof: The proof of this theorem is very complex and therefore only the basic idea is given here (for the complete proof see [Munt72]). In order to find a solution for the steady-state probabilities r(S), the following global balance equations have to be solved: state transition rate ] n(s) 1from state S
with the normalization
98
state transition rate from state S to state S
17 (7.78)
condition:
CT(S) = 1. S
(7.79)
PRODUCT-FORM QUEUEING NETWORKS
303
Now we insert Eq. (7.76) into Eq. (7.78) to verify that Eq. (7.78) can be written as a system of balance equations that [Chan72] calls local balance equations. All local balance equations can be transformed into a system of N - 1 independent equations. To get unambiguity for the solution the normalization condition, Eq. (7.79), has to be used. Now we wish to give two simplified versions of the BCMP theorem for open and closed networks. BCMP Version 1: For a closed queueing network fulfilling the assumptions of the BCMP theorem, the steady-state state probabilities have the form:
n(S1,. ..)S,)= & fi fiw
(7.80)
2=1
where the normalization
constant is defined as: (7.81)
and the function Fi(Si) is given by:
ki
1 . nR -efJT, Ici,! r=l
ZqSi)
=
Type-l,
Type-W,
(7.82)
Type-3.
The quantity Ici = 5 Icir gives the overall number of jobs of all classes at r=l
node i. The visit ratios ei, can be determined with Eq. (7.72), while the function ,&(lci) is given in Eq. (7.62). For FCFS-nodes (mi = 1) with load-dependent service rates pi(j) we get: (7.83)
BCMP Version 2: For an open queueing network fulfilling the assumptions of the BCMP theorem and load-independent arrival and service rates, we have:
304
QUEUEING NETWORKS
where:
(1-Pi)P;“,
Type-1,2,4 (mi = l),
7ri(ki) =
(7.85) e---pz& lci! ’
Type-3,
with: &-F,
i
Type-l
(mi = l), (7.86)
pir = A, E,
Type-2,3,4.
Furthermore we have:
ri, = pir
1 - pi*
(7.87)
For Type-l nodes with more than one service unit (mi > 1)) Eq. (6.26) can be used to compute the probabilities pi. For the existence of the steady-state probabilities ri(~?i), ergodicity condition (pi < 1) has to be fulfilled for all i, i = l,...,N. The algorithm to determine the performance measures using the BCMP theorem can now be given in the following five steps: -1 Compute the visit ratios ei, for all i = 1,. . . , N and r = 1,. . . , R using Eq. (7.71). m/
Compute the utilization
of each node using Eq. (7.86)
-1 Compute the other performance measures with the equations given in Section 7.2. w
Compute th e marginal probabilities of the network using Eq. (7.85).
-1
Compute th e st at e probabilities
using Eq. (7.84)
Example 7.7 Consider the open network given in Fig. 7.17 with N = 3 nodes and R = 2 job classes. The first node is of Type-2 and the second and
PRODUCT-FORM QUEUEING NETWORKS
third nodes are of Type-4. with the rates: ~11 =
The service times are exponentially
8sec-‘,
The interarrival
~22 =
distributed
~31 = 16 see-‘,
~21 = 12 set-‘,
~12 = 24sec-',
305
32secA1,
~32
= 36sec-‘.
times are also exponentially distributed with the rates: X1 = X2 = 1 job/set.
q---q----
CL3 cl Sink
Fig. 7.17 Open queueing network.
The routing probabilities are given by: PO,11 = 1, P11,21
=
P21,11
0.4, p 21,31
P31,11 = 0.5,
P11,31 = 0.3, Pll,O
=
= 0.6, = 0.4,
0.3,
p31,21 =
0.5,
PO,12 =
1,
P12,22 = 0.3, Pl2,32 =
0.6,
p12,O = 0.1,
P22,12
= 0.7,
p22,32 = 0.3, P32,12
= 0.4,
1)32,22 =
0.6,
which means that class switching is not allowed in the network. We wish to compute the probability for the state (Ici, Icz,Ics) = (3,2,1) by using the BCMP theorem, Eq. (7.84). ~~~~ Compute the visit ratios ei, for all i = 1,. . . , N and r = 1,. . . , R using Eq. (7.71). ell
= POJI
+ elml,ll
+ e2lp21,ll
+ e3lp31,ll
= 3.333,
e21 = P0,21 + fmm,21
+ e211321,21 + e31p31,21
= 2.292,
e31 = P0,31 + elml,31
+ e2lp21,31
= 1.917.
+ e311331,31
In the same way we get: el2 = 10,
e22 =
8.049,
es2 = 8.415.
306
QUEUEING NETWORKS
Compute the utilization p1 = J@L
+ he12
= ~11 + p12 = 0.833,
CL12
I-L11 P2 =
of each node using Eq. (7.86):
X1 e21 + X2% CL21 P22
= p21 + p22 = 0.442,
P3=xle31+x2~=pJl+p32=o.354. P31
I-L32
Compute the other performance measures of the network. In our case we use Eq. (7.87) to determine the mean number of jobs at each node:
El1 = *1 El2
= 2.5,
Pl
= 2.5, 7721 = *1 K22
= 03,
K32
P2
p3
= 0.362.
Determine the marginal probabilities
~~~
n1(3) = (I-
pl)p;
7r3(1) = (l-
p3)p3 = 0.2287.
= 0.0965,
using Eq. (7.85):
7r2(2) = (I-
Compute th e st,at,e probabilities 7r(3,2,1)
= 0.186, 7731 = -1 p31
= 0.342,
p2)p5 = 0.1093,
for the network using Eq. (7.84):
= 7r1(3). 7r2(2). n3(l) = 0.00241.
An example of the BCMP theorem for closed networks, Eq. (7.80), is not given in this chapter. As shown in Example 7.7, the direct use of the BCMP theorem will require that all states of the network have to be considered in order to compute the normalization constant. This is a very complex procedure and only suitable for small networks because for bigger networks the number of possible states in the network becomes prohibitively large. In Chapter 8 we provide efficient algorithms to analyze closed product-form queueing networks. In these algorithms the states of the queueing network are not explicitly involved in the computation and therefore these algorithms provide much shorter computation time. Several researchers have extended the class of product-form networks to nodes of the following types: l
l
In [Spir79] the SIR0 ( service in random order) is examined and it is shown that -/M/l-SIR0 nodes fulfill the local balance property and therefore have a product-form solution. In [Noet79] the LBPS (last batch processor sharing) strategy is introduced and it is show that -/M/l-LBPS node types have product-form solutions. In this strategy the processor is assigned between the two last
PRODUCT-FORM QUEUEING NETWORKS
307
batch jobs. If the last batch consists only of one job, then we get the LCFS-PR strategy and if the batch consists of all jobs, we get PS. l
In [ChMa83] the WEIRDP-strategy (weird, parameterized strategy) is considered, where the first job in the queue is assigned 100 . p % of the processor and the rest of the jobs are assigned 100 . (1 - p) % of the processor. It is shown that -/M/l-WEIRDP nodes have product-form solution.
l
In [CHT77] it is shown that -/G/l-PS, -/G/co-IS, and -/G/l-LCFSPR nodes with arbitrary differentiable service time distribution also have product-form solution.
Furthermore, in [Tows80] and [Krze87], the class of product-form networks is extended to networks where the probability to enter a particular node depends on the number of jobs at that node. In this case, we have so-called load-dependent routing probabilities. Problem 7.2 Consider an open queueing network with N = 3 nodes, FCFS queueing discipline, and exponentially distributed service times with the mean values: -
1
= 0.08 set,
-
CL1
1
= 0.06 set,
l-42
-
1
= 0.04sec.
CL3
Only at the first node do jobs arrive from outside with exponentially distributed interarrival times and the rate Xar = 4 jobs/set. Node 1 is a multiple server node with ml = 2 server, nodes 2 and 3 are single server nodes. The routing probabilities are given as follows: Pll
= 0.2,
p21
p12 = 0.4, p13 =
=
p30 =
1,
p31 =
0.5,
0.5,
0.4.
(a) Draw the queueing network. (b) Determine the steady-state probability
for the state k = (4,3,2).
(c) Determine all performance measures. Note: Use the Jackson’s theorem. Problem 7.3 Determine the CPU utilization and other performance measures of a central server model with N = 3 nodes and K = 4 jobs. Each node has only one server and the queueing discipline at each node is FCFS. The exponentially distributed service times have the respective means: -
1
Pl
= 2 msec,
-
1
P2
= 5 msec,
-
1
I_L3
= 5 msec,
308
QUEUEING NETWORKS
and the routing probabilities are: Pll
=
0.3,
0.5,
p12 =
p21 = p31 = 1.
0.2,
p13 =
Consider a closed queueing network with K = 3 nodes, N = Problem 7.4 3 jobs, and the service discipline FCFS. The first node has ml = 2 servers and the other two nodes have one server each. The routing probabilities are given by: PII = 0.6,
1321
= 0.5,
p31 =
P12 = O-3,
p22
=
0.0, = 0.5,
p32
p23
p12 = o-1,
The service times are exponentially
0.4,
= 0.6,
p33 = 0.0.
distributed with the rates
pl = 0.4sec-l 7 /A:! = 0.6sec-1 7
0.3sec-l.
/L3 =
(a) Draw the queueing network. (b) What is the steady-state probability
that there are two jobs at node 2?
(c) Determine all performance measures. (d) Solve this p ro bl em also with the local and global balance equations respectively and compare the solution complexity for the two different methods. Note: Use the Gordon/Newell
Theorem for parts (a), (b) and (c).
Problem 7.5 Consider a closed queueing network with N = 3 nodes, K = 3 jobs, and exponentially distributed service times with the rates: pl =
0.72 set-l,
,92 = 0.64sec-’ 7
j.~3
= lsec-l,
and the routing probabilities: 1331= 0.4,
p32
= 0.6,
~13 = p23 =
1.
Determine all performance measures. Problem
7.6
Consider the following routing matrix P:
P
(Ll)
(172)
(1,3)
(174)
(271)
(2c4
(2>3)
(24)
(1,l)
0.5
0
0
0
0.25
0
0.25
0
y; (114) (2,l) (2,2) (2,3) (2,4)
0.5 0
0.5 0 0 0::
0 0 0 0 1 0
0.5 0 0.5 0 0 0 1
0 0 0.5 0 0 0
0 0.25 005
0.5 0 0 0 0 0 0
0 0.25 0 0 0 0
OY5 0 0 0
0 0
0 0
PRODUCT-FORM QlJEUElNG NETWORKS
309
At the beginning, the jobs are distributed as follows over the four different classes:
(a) Determine the disjoint chains. (b) Determine the number of states in the CTMC underlying the network. (c) Determine the visit ratios. Consider an open network with N = 2 nodes and R = 2 job Problem 7.7 classes. Node 1 is of Type-2 and node 2 of Type-4. The service rates are pll
=
4sec-‘,
~12 = 5sec-‘,
~21 = Gsec-‘,
~22 = 2sec-‘,
and the arrival rates per class are Xr=&=lsec-l. The routing probabilities are given as: PO,11 = PO,12 =
p11,21 = 0.6,
1,
P21,ll
~12,22
= P22,12
=
With the BCMP theorem, Version 2, determine (a) All visit ratios. (b) The utilizations
of node 1 and node 2.
(c) The steady-state probability
1,
= 0.4 PII,O = 0.4,
for state (2,1).
p12,o = 0.6.
8 Algorithms for Product-Form Networks Although product-form solutions can be expressed very easily as formulae, the computation of state probabilities in a closed queueing network is very time consuming if a straightforward computation of the normalization constant using Eq. (7.3.5) is carried out. As seen in Example 7.7, considerable computation is needed to analyze even a single class network with a small number of jobs, primarily because the formula makes a pass through all the states of the underlying CTMC. Therefore we need to develop efficient algorithms to reduce the computation time [Buze71]. Many efficient algorithms for calculating performance measures of closed product-form queueing networks have been developed. The most important ones are the convolution algorithm and the mean value analysis (MVA) [ReLa80]. The convolution algorithm is an efficient iterative technique for calculating the normalization constant, which is needed when all performance measures are computed using a set of simple equations. In contrast, the mean value analysis is an iterative technique where the mean values of the performance measures can be computed directly without computing the normalization constant. We also introduce the RECAL algorithm [CoGe86] (recursion by chain algorithm), which is well suited for networks with many job classes. The fourth and last method for analyzing product-form networks presented in detail is the so-called flow-equivalent server method [CHW75b, ABP85]. This method is well suited when we are especially interested in computing the performance of a single station or a part of the network. There are several other algorithms that we do not cover in detail due to space limitations. Based on the mean value analysis, [ChSa80] and [SaCh81] developed an algorithm called LBANC (local balance algorithm for normal311
312
ALGORITHMS FOR PRODUCT-FORM NETWORKS
Fig, 8.1
Flowchart describing sections of Chapters 8 and 9.
izing constants). This algorithm iteratively computes the normalization constant and the performance measures. It is very well suited for networks with a small number of nodes but a large number of jobs. The CCNC algorithm (coalesce computation of normalizing constants) [ChSa80] is especially used when storage space is limited. That the convolution algorithm, the mean value analysis, and LBANC can be derived from each other has been shown by [Lam81]. The DAC algorithm (distribution analysis by chain) [SoLa89] is used to directly compute the state probabilities of a product-form queueing network. A generalization of the MVA to higher moments is introduced in [Stre86]. Other algorithms for determining the normalization constant are the algorithms of [Koba78] and [Koba79] (w h ic h are based on the enumeration of polynomials of Polya), and the partial fraction method of [Moor72]. For very large networks, execution time of the preceding algorithms is not acceptable. Many approximation algorithms have been developed for this purpose. Exact and approximate algorithms are discussed in this chapter and in Chapter 9 and are summarized in Fig. 8.1.
THE CONVOLUTION ALGORITHM
8.1
THE
CONVOLUTION
313
ALGORITHM
The convolution algorithm was one of the first efficient algorithms for analyzing closed product-form queueing networks and is still in use today. The name of this technique reflects the method of determining the normalization constant G(K) from the functions Fi(Ici), which is similar to the convolution of two probability mass functions. Once the normalization constant is computed, the system performance measures of interest can be easily derived [Buze71]. We recall that the convolution operator @ is defined in the following way: Let A, B, C be vectors of length K + 1. Then the convolution C = A @ B is defined as:
C(k) = &A(j) j=o
. B(k - j),
k = 0, . . . , K.
(84
First we consider the convolution algorithm for single class closed queueing networks. Then this technique is extended to multiclass closed queueing networks. 8.1.1
Single
Class Closed
Networks
According to the BCMP theorem, Eq. (7.80), the state probabilities of a closed single class product-form queueing network with N nodes can be expressed in the following way:
where the functions I?i (Ici) are defined in Eq. (7.61), and:
is the normalization constant of the network. The computation of G(K) is carried out by iterating over the number of nodes in the network and over the number of possible jobs at each node. For this purpose the following auxiliary functions G, (Ic) , n = 1, . . . , N and /k = 0, . . . ) K are defined as follows:
The desired normalization
constant is then:
G(K) = GN(K).
W)
314
ALGORITHMS FOR PRODUCT-FORM NETWORKS
From Eq. (8.2) for n > 1, it follows that:
For n = 1 we have: k = l,...,
Gl(k) = Fl(k),
K.
(8.5)
The initial condition is of the form: Gn(0) = 1,
n= l,...,
N.
(8.6)
The convolution method for computing the normalization constant G(K) is fully determined by Eqs. (8.4) to (8.6). The (K + 1)-dimensional vectors:
,
Gn =
n = 1,. . . , N,
are therefore determined by the convolution:
F,=
The computation of G,(k) is easily visualized as in Fig. 8.2. In this figure we show the values of the functions needed in the calculation of GN (K). The real objective of the algorithm is to determine the last value in the last column because this value is the normalization constant G(K) that we are looking for. The values GN (Ic) for k = 1, . . . , K - 1 in the last column are also useful in determining the system performance measures. Buzen [Buze71, Buze73] has developed algorithms that are based on the normalization constant G(K) and the functions Fi(ki) to compute all important performance measures of the queueing network without needing the underlying CTMC state probabilities. Computation of the performance measures using the normalization constant is now illustrated.
THE CONVOLUTION ALGORITHM 11
... **a
n-l
0
1
Gn-1(0)(%(k)) Gn-l(O)(~Fn(k))
1
Fl(1)
... ...
I
I
k-l
Fr.(k-1) Fr.(k-1)
k
Fl(k)
K
FI (K)
... ...
Gn-l(l)(+Fn(k Gn-l(l)(*Fn(k
Gn-l(k
-
...
n
I
1
-- 1)) 1))
N 1 = G(1)
Gn(l)
l)(*Fn(l))
c
Gn-l(k)(.Fn(O))
+
Gn(k)
*.*
Gn (W
Fig. 8.2
315
Computation
Gv(k) Gv(k)
&v(K)
= = G(k) G(k)
= G(K)
of G,(k).
(a) The marginal probability that there are exactly ki = !Cjobs at the ith node is given by Eq. (7.16): n$c) =
c E
7r(/kl,...,k~).
kj=K
3=1
&k,=k
If we substitute Eq. (7.59) we get:
(8.7) Then G$‘(Ic) can be interpreted as the normalization constant of the network with Ic jobs and node i removed from the network. G;‘(k)
=
c
fi
Fj(kj).
(8.8)
2 k.=K 27; i 3=1 3 &k,=K-k
By definition we have:
GN-~(k)=GE~(k)=GryN)(k)
for k=O,...,K.
Because of Eq. (8.7), for node N we have [Buze71, Buze73]: 7~(k) = #4iN-l(~
- IC).
(8.9)
316
ALGORITHMS FOR PRODUCT-FORM NETWORKS
In [BBS77], a simple algorithm for computing the G:‘(k) for k = , K is presented. Since the sum of all marginal probabilities is 1, 0 it’ follows from Eq. (8.7): &(j) j=o
= g
#Gf(K
- j) = 1,
and therefore we have: K G(K)
=
xF,(j)+G;)(K-j), j=o
j =
(8.10)
1,~..,N.
With the help of these equations we can derive an iterative formula for computing the G”)(k), for k = 0 7”‘? K** N
G;‘(k) = G(k) - &(j).G$(k j=l
- j),
(8.11)
with the initial condition: G$O)=G(O)=l
7 i=l
Y”‘? N *
(8.12)
In the case of rni = 1 the preceding formulae can be considerably simplified . Because of Eq. (8.11) we have: k-l
G:)(k)
= G(k) - c
Fi(j + l)G;‘(k
- 1 - j)
j=O = G(k)
k-l c
- ;
F,(j)G;‘(k
- 1 - j)
' j-0
= G(k) - ;G(k
- 1).
After inserting this result in Eq. (8.7), we get for the marginal probabilities:
7r#c) = (~)k.~.(G(K-~)-~,G(K-ii-l)),
(8.13)
where G(k) = 0 for k < 0. (b) The throughput of node i in the load-dependent or load-independent case is given by the formula:
X(K) = G(K - I) G(K)
and
Xi(K) = e;-
G(KG(K)
1) *
(8.14)
THE CONVOLUTION ALGORITHM
Proof:
By definition, the throughput
is given by Eq. (7.22):
= &&k
- /!Y)
- l)G$)(K
317
k=l K-l
= &
c
Fi(k)Gg’(K
- 1 - Ic) = ei ‘gi)l).
q.e.d,
k=O
(c) Th e u tl’z zxat ion of a node in the load-independent case can be determined by inserting Eq. (8.14) in the well known relation pi = Xi/(mipi): ei G(K-1) pi=------. mipui G(K)
*
(8.15)
(d) The mean number of jobs for a single server node can be computed from following simplified equation [Buze71]: (8.16)
(e) The mean response time of jobs at node i can be determined with the help of Little’s theorem. For a single server node we have: (8.17) The use of the convolution method for determining the normalization constant is now demonstrated by a simple example. We also show how to compute other performance measures from these results. Example 8.1 Consider the following closed queueing network (Fig. 8.3) with N = 3 nodes and K = 3 jobs. The first node has ml = 2 and the second node has m2 = 3 identical service stations. For the third node we have m3 = 1. The service time at each node is exponentially distributed with respective rates: ~1 = 0.8 set-‘,
p:! = 0.6 set-‘,
p3 = 0.4 set-1 ,
318
ALGORITHMS FOR PRODUCT-FORM NETWORKS
fig. 8.3
A closed queueing network.
The visit ratios are given as follows: el = 1,
e2 = 0.667,
e3 = 0.2.
From Eq. (7.61) it follows for the functions Fi(ki),
i = 1,2,3:
Fl(O)
= I,
F2(0)
F3(0)
= 1,
Fl(1)
= 1.25,
Fz(1) = 1.111,
F3(1)
= 0.5,
Fz(2) = 0.617, &(3) = 0.229,
F3(3)
Fi(2) = 0.781, Fi(3) = 0.488,
= 1,
Fa(2) = 0.25, = 0.125.
In Table 8.1 the intermediate results as well as the final result of computing G(K) are shown. Table 8.1 Computation ization constant
0 1 2 K=3
of the normal-
1
2
N=3
1 1.25 0.781 0.488
1 2.361 2.787 2.356
1 2.861 4.218 4.465
Thus the normalization constant G(K) = 4.465. The marginal probabilities for the single server node 3 can be computed using Eq. (8.13): =
= 0.528,
.iT3(1) =
= 0.312,
r3(0)
THE CONVOLUTION ALGORITHM
r,(2)
=
x3(3)
= (z)3.&.
(;)2&.
(G(1) - zG(O))
(G(O)- z.0)
319
=0.132, =0.028.
For the computation of the marginal probabilities of nodes 1 and 2, we need the values G$ (k) for k = 0, 1,2,3 and i = 1,2. With Eq. (8.11) we get: G(l)(O) = 1 G&l)
= G(1) - R’i(l)G$$)(O) = 1.611,
G:)(2)
= G(2)
- (Fi(l)Gg)(l)
+ &(2)G$(O))
G$)(3) = G(3) - (Fi(l)Gg)(2)
+ &‘i(2)Gg)(l)
= 1.423, + Fi(3)Gc)(O))
= 0.940.
In the same way we compute: GC2)(3) N = ** 1 316 GC2)(2) = -2.-.-T 175 N
G$)(O) = 1 7 Gg)( 1) = 1.656,
With Eq. (8.7) the marginal probabilities
Tl(“) = F(0)G(1)(3) G(3) N
are:
= L, 0 398
7rl(l) = -F1(1) @(2) G(3) N
0 211 -- i?
0 109 7ri(3) = -F1 (3) G(l) (0) = A, G(3) N
7rl(2) = -F1(2) G(l) (1) - 0 282 G(3) N - d’ and 740)
= 0.295,
The throughputs X1 = el g
n2(l) = 0.412,
x2.3) = 0.051.
can be computed using Eq. (8.14): = 0.945,
The utilizations
7r2(2) = 0.242,
X2 = e2g
= 0.630,
X3 = e,g
= 0.189.
are given by Eq. (7.21):
p1 = - x1 = 0.590,
x2
p2 = ___
ml/-J1
= 0.350,
p3
x3
= K = 0.473.
m2l-L2
The mean number of jobs at the multiserver nodes is given by Eq. (7.26): 771 = 7rl(l) + 27rr(2) + 37rr(3) = 1.290, 1T2 = 7r2(1) + 27r2(2) + 3n2(3) = 1.050, where we use Eq. (8.16) for the single server node 3: E3=(~)gg+(~)2~+(~)3~=o.660.
320
ALGORITHMS FOR PRODUCT-FORM NETWORKS
For the mean response time we use Eq. (7.43):
-
Kl = 1.366, T1 = x1 8.1.2
Multiclass
Closed
K2 = 1.667, T2 = x2
T3 = % x3 = 3.496.
Networks
The convolution method introduced in Section 81.1 can be extended to the case of multiclass closed product-form queueing networks. According to the BCMP theorem, Eq. (7.80), the state probabilities for closed product-form queueing networks are given by:
$31,. ..)S,) =& fpio, 2=1
where the functions Fi(Si) are defined in Eq. (7.82). The determination the normalization constant of the network:
of
G(K) = c fiE(si), is analogous to the single class case. For now, we assume that class switching is not allowed and thus the number of jobs in each class is constant. Corresponding to Eq. (8.2), we define for k = 0, . . . , K the auxiliary functions:
G(k)= c fi fi(si)
(8.18)
2 S,-ki’l i=l
and determine for n = 1, . . . , N the matrix G, by convolution: G, = F, @ Gn-1, with the initial condition Gr(.) = Fl(.).
For n > 1 we also have:
C&(k) = 5 . . . 2 F,(j).G+l(k j,=o j,=o
- j).
(8.19)
The computation of the normalization constant was simplified by [MuWo74b] and [Wong75] by introducing a much simpler iterative formula for G(K) = GN(K). For this purpose, we define the vector: kc”) =pSi i=l
= (Ici”‘,...,$)),
321
THE CONVOLUTION ALGORITHM
where kp) ,1 5 T < R, is th e overall number of jobs of class r at the nodes 1,2,..., n - 1, n. We then have: ky’ = 2
kir
and
kcN) = K.
i=l
With the help of these expressions, it is possible to rewrite Eq. (8.18) in the following manner:
G,(k(“))= C
fJFi(Si).
(8.20)
5 &=k(n) ‘=l i=l
Thus: n-l
G,,(k(“))
=
C
c k(‘+l)+S,=k(n)
n-1 C
n
Fi(Si)eFn(sn)
i=l
&=k(“-l)
i=l
= c
(8.21)
Gn-l(k(“-‘))*F,(S,).
k(n-1)+$&(n)
Equation (8.21) together with the initial conditions Gr(.) = J’r(.) completes the description of the algorithm of [MuWo74b] and [Wong75]. For the normalization constant we have: (8.22)
G(K) = G,v(kcN))e The computation of the performance measures is as follows:
(a) According to Eq. (7.32) th e marginal probability that there are exactly Si = k jobs at node i is given by:
r;(k)= c ~(%...&v) =& 2
c fiF,(S,) 5
Sj=K
Sj=K3=l
3=1
3=1
&S,=k
&Si=k
E(k) = Go
c ~ j=l
Sj=K~~~i
&Si=k
fi
Fj(Sj)
= $$Gg(K-k),
(8.23)
322
ALGORITHMS FOR PRODUCT-FORM NETWORKS
Then G:)(K) can again be interpreted as the normalization constant of the network without node i: (8.24)
In the same way as in the single class case (Eq. (8.11)), the iterative formula for computing the normalization constant G:‘(k) is given as follows: S(j). Fi(j). G$‘(k - j),
GE)(k) = G(k) - 5
(8.25)
j=O
with S(j) defined by: 0, ifj=O, 1, otherwise
Ki> =
(8.26)
and affecting the computation only for j = 0. The initial condition is: GE)(O) = G(0) = 17 i = l,...,N.
(8.27)
(b) The throughput Ai, of class-r jobs at the ith node can be expressed as follows [Wong75] :
where ei, is the solution of Eq. (7.72). With (K - lT) = (Kl, . . . , K, 17* * * , KR), we have the following simpler form: x
=e
GW-
ir
ir
14
G(K)
(8.28)
*
This formula holds for load-dependent and load-independent all four types. The proof can be found in [BrBa80].
nodes of
(c) The utilization pir is given by Eq. (7.34). If the service rates are constant, then we have the following simplification because of Xir/(mipir): pir
=
- Air miPui7-
=
-. eir miPir
G(K
G(K)
lr)
(8.29) *
THE CONVOLUTION ALGORiTHM
Fig. 8.4
323
An example of a two-class closed network.
Table 8.2 Computation F,(Si), i = 1,2,3,4,
W)
1
(LO) (W (171) (0,2>
1 4 4
(x4
12
of
1 1.6 2 6.4 4 19.2
‘2
the
functions
1 2.4 4.8
1 3.2 3 19.2 9 86.4
11.52 11.52 27.648
Example 8.2 Consider the network given in Fig. 8.4 consisting of N = 4 nodes and R = 2 job classes. In class 1 the number of jobs is Ki = 1 and in class 2, K2 = 2. It is assumed that class switching is not allowed. The first node is of Type-2, the second and third nodes are of Type-$, and the fourth node is of Type-3 with mean service times: 1
1 -=lsec,
1 -=4sec,
1 -=8sec,
-
Pll
CL21
I-L31
CL41
-
1
CL12
=2Sec,
-
1
= 5sec,
CL22
-
1
= lOsec,
CL32
-
1
= 12sec, = 16sec.
p42
The visit ratios are given by: ell = 1, e21 = 0.4, el2 = 1, e22 = 0.4,
e31 = 0.4, e32 = 0.3,
e41 = 0.2, e42 = 0.3.
First, with the help of Eq. (7.82), the functions Z?i(Si),i = 1,2,3,4, are computed [Table 8.2). Then for determining the normalization constant we compute the G,(kcn)) from Eq. (8.21) where Gr(.) = Fi(.). For n = 2 we
324
ALGORITHMS FOR PRODUCT-FORM NETWORKS
have:
Gz(O,0) = Gl(O,O)F2(07
=1 = iii,
0)
G2(l, 0) = G1(0,O)F2(1,0) + G~(l,o)Fz(o, 0) G2(0, 1) = Gi(0, O)F2(0, 1) + Gl(O, l)&(O, 0)
= 4,
Gz(1, 1) = G1(O,O)F2(Ll> + Gl(L l)F2(0,0> + G1(1,0)F2(0, 1) + Gi(O,l)F&O) G2(0,2) =
= 15.6,
+ Gl(O,2)J’2(07
G1(0,0)~2(0,2)
0) =l2,
+ Gl(O, l)J’2(0,1> G2(1,2) =
G1(0,0)5’2(17
+ Gl(l,
2)
2)F2(07
0)
+ G1(l, l)Fz(O, 1) + G1(0,1)F2(1,1) + Gl(l,O)F$I, 2) + Gl(0,2)F2(1,0)
= 62.4.
In the same way we can compute the values for Gs ( kc3)) and G4 (kc4)) as summarized in Table 8.3. So the normalization constant is G(K) = 854.424. Computation of G,(kci)) z
Table 8.3 ktn), 1 < n 5 N
G:!(kc2))
GS(kc3))
G4(kc4))
Kw)
1 2.6 4 15.6 12 62.4
1 5.8 7 55.4 33 334.2
1 8.2 11.8 111.56 78.12 854.424
(170) (071) (14
vv) W)
Now, with the help of Eq. (7.47), we can compute the marginal probability of 0 jobs of class 1 and 2 jobs of class 2 at node 4: 7r4(0,2)
F4(07‘) Gc4’(l 0) = L7 0 0782
= -
G(K)
N
’
where G$)(l , 0) can be obtained from Eq. (8.26): G;)(l,O)
= G(l,O) - F4(1,0).G$$(0,0)
Similarly other marginal probabilities r4(172)
n4(17
I>
=
F4(17 -G,
=
G(K)
2)
(4)
= 5.
at the nodes can be calculated:
(0,O) = 0.0324,
with Gc4)(0 0) = 1 N
F4(I’ ‘) Gc4)(0 1) = 0 0944 N
with Gc4)(0 7 N ’ 1) = -7
G(K)
7
L7
7
7
THE CONVOLUTION ALGORITHM
1) = 0.3112,
7r4(0,
‘)GC4)(0 2) = L, 0 0927 N ’
with GC4) 55 4 N (1 ’ 1) = L, with G(“)(O N ’ 2) = 33 ’
7r4(1)0)
= -F4(l,
n4(0,0)
F4(o,o)G(4)(l = m N
7r3(1,2)
=
-GN G(K)
~3(1,2)
(3)
(0,O) = 0.1011,
with GC3) N (0 ’ 0) = 1 ’
7r3(1,0)
=
-GN G(K)
F3(1,0)
(3)
(072) = 0.1600,
with GC3) 42 72 N (0 ’ 2) = L’
7r3(1,1)
=
-GN G(K)
F3(1,1>
(3)
(071)
with GC3)(0 88 N ’ 1) = A*
G(K)
’
2) = -L-----Y 0 3911
=
0.1977,
325
with GC4)(1 334 2 N ’ 2) = L,
The rest of the marginal probabilities are computed in the same way. The mean number of jobs can be determined with the help of Eq. (7.40). For example, for node 3 the mean number of class 1 and class 2 jobs is respectively given by: = 0.4588, K31 = ~3(1,0)+~3(1,1)+~3(1,2) 1?32 = r3(0,1) + x3(1,1) + 2r3(0, 2) + 2~3(l, 2) = 0.678. For the IS-node 4 the mean number of jobs by class type is: x41
K41 = -
= 0.219,
x42
z42 = -
= 0.627.
I-L42
P41
The throughputs at each node by class type can be computed with Eq. (8.28):
For the computation of the utilizations of each node by class type, Eq. (8.29) is used to get: p11 = 0.0914, p12 = 0.2611,
p21 = 0.1463, pz2 = 0.2611,
p31 = 0.2926, /I32 = 0.3917.
326
ALGORITHMS FOR PRODUCT-FORM NETWORKS
The algorithm we presented in the preceding text is applicable only to networks without class switching. For networks with class switching, [Munt72] proved that a closed queueing network with U ergodic chains is equivalent to a closed network with U job classes without class switching. Therefore, the computation of the normalization constant for networks with class switching is an extension of the technique for networks without class switching. More details are given in [BBS771 and [.BrBa80] and Section 7.3.6. In [ReKo75] and [Saue83] the convolution method has been extended for analyzing open queueing networks with load-dependent service rates and for analyzing closed queueing networks with load-dependent routing probabilities. The methods can also be extended for networks with class specific service rates. These are networks in which the service rates depend not only on the overall number of jobs of each class at the node, but also on the number of jobs at the node. For this case, [LaLi83] modified the convolution algorithm so as to make the best use of storage space. Their algorithm is known as tree-convolution. Because the computation of the normalization constant can cause numerical problems, other techniques were developed that allow the calculation of the performance measures without using the normalization constant. One of the key development in this regard is the mean value analysis (MVA), which we discuss next.
8.2
THE MEAN VALUE ANALYSIS
The MVA was developed by Reiser and Lavenberg [ReLa80] for the analysis of closed queueing networks with product-form solution. The advantage of this method is that the performance measures can be computed without explicitly computing the normalization constant. The method is based on two fundamental equations and it allows us to compute the mean values of measures of interest such as the mean waiting time, throughput, and the mean number of jobs at each node. In these computations only mean values are computed (hence the name). For the case of multiserver nodes (mi > 1)) it is necessary, however, to compute the marginal probabilities. The MVA method is based on two simple laws: 1. Little’s theorem, which is introduced in Eq. (6.9) to express a relation between the mean number of jobs, the throughput, and the mean response time of a node or the overall system: K=X.T. (8.30) 2. Theorem of the distribution at arrival time (in short, arrival theorem), proven by [LaRe80] and [SeMi81], for all networks that have a productform solution. The arrival theorem says that in a closed product-form queueing network, the pmf of the number of jobs seen at the time of arrival to a node i when there are Ic jobs in the network is equal to the
THE MEAN VALUE ANALYSIS
327
pmf of the number of jobs at this node with one less job in the network (= (k - 1)). Th is property has an intuitive justification [LZGS84]. At the moment a job arrives at a node, it is certain that this job itself is not already in the queue of this node. Thus, there are only Ic - 1 other jobs that could possibly interfere with the new arrival. The number of these at the node is simply the number there when only those (K - 1) jobs are in the network. At first we introduce the MVA for single class closed queueing networks and explain it in more detail. This algorithm is then extended to multiclass networks, mixed networks, and networks with load-dependent service rates. 8.2.1
Single
Class Closed
Networks
The fundamental equation of the mean value analysis is based on the arrival theorem for closed product-form networks [ReLa80, SeMi81] and it relates the mean response time of a job at the ith node and the mean number of jobs at that node with one job less in the network, that is: Ti(K)
= 2-m [I +QK - I)] , i = l,...,N. (8.31) Pi For single server stations (mi = 1) with an FCFS strategy, it is easy to give an intuitive explanation for Eq. (8.31) because for each FCFS node i the mean response time Ti(.K) of a job in a network with K jobs is given by the mean service time (l/pi) of that job plus the sum of the mean service times of all jobs that are ahead of this job in the queue. Equation (8.31) can also be derived without using the arrival theorem. For this purposes, we use the formulae for computing the utilization (Eq. (8.15)) and the mean number of jobs (Eq. (8.16)): dK)
ei G(K - 1) = E’ G(K) 7
and:
17i(K) =2 (~)k*G&? k=l
From Eq. (8.16) it follows:
If we transform Eq. (8.15) for G(K-1) and insert the result of the preceding equation, then, after rearranging the equation, we get: pi(K)*?7i(K
- 1) = 5 k=2
(E)”
* G$i)li)
7
ALGORITHMS FOR PRODUCT-FORM NETWORKS
328
and substituting Ki(K)
this equation to Eq. (8.16), we have:
G(K - 1) + Pi(h’)Z(K - 1) (-qK) = 1;' = Pik) + Pi(K)-&(K - 1) = pi(K)ei
[l + K;(K
- l)] .
If we assume constant service rates and rni = 1, then we get the desired result by using pi(K) = Xi(K)/pi and Little’s theorem: -
T;(K)
Ki(K) = Xi(K)
pi(K) = -Xi(K)
- 1)] = 1. [l + Ki(K
[l +Ki(K
- l)] .
Pi
For computing the mean response time at the ith node we need two othLer equations in addition to Eq. (8.31) to describe the MVA completely. Both equations can be derived from Little’s theorem. The first one is for determining the overall throughput of the network: (8.32)
whereas the other one determines the mean number of jobs at the ith node: Ki(k) = X(k).Ti(k). ei, (8.33) where ei is the visit ratio at the ith node. The three equations, (8.31), (8.32), and (8.33), allow an iterative computation of the mean response time, mean number of jobs, and throughput of the closed product-form queueing network. The iteration is done over the number of jobs Ic in the network. Equation (8.31) is valid for FCFS single server nodes, PS-nodes, and LCFS PR-nodes. The description of the MVA is complete if we extend Eq. (8.31) to the case of IS-nodes and FCFS-nodes with multiple servers. In the case of IS-nodes we have: Ti(K)
= ;.
i
(8.34)
For the latter case, consider a job that arrives at -/M/m-FCFS node containing j - 1 jobs, given that the network population is Ic - 1. This event occurs with probability 7ri(j - 1 1Ic - 1). Then we have:
T&k)=k
3
jE1 Pi * %(d
&i(j) =
- 7ri(j - 1 1k - l),
if j < mi, 3, 1.m,i , otherwise.
329
THE MEAN VALUE ANALYSIS
To obtain an expression for the probability ni(j ( F;), we use the formulae presented in the convolution algorithm for computing the performance measures with help of the normalization constant, namely:
ni(jp)=Fi(j)*
G’~~k~ ‘)*
(see Eq. (8.7)) G;‘((k
ei = pi . &)
- 1) - (j - 1))
* Fi(j - 1) -
1
W)
= pi * w(j)
ei - G@ - 1) . G:)(@ - 1) - (j - 1)) . qj _ l) ‘L G(k - 1) , *, wv / L v ri(j--ilk-1) see Eq. (8.7) h(k)
W) = pi *&i(j)
* %(j - 1 ( k - l),
By induction and rearrangement of the equations, the following can be obtained:
=-’
1 mi f Pi
rni -2
1+K&-1)+ (
C(mi j=o
-
-
1)-%(jIk- 1) , (8.35)
=I--.
%(j I q =
1 mi
b@) Pi ’ w(j)
77lj-j)
*ri(j 1k)
queueing networks can
For i = 1,. . . , iV and j = 1,. . . , (mi - 1):
Xi(O) = 0,
Ti(O 10) = 1,
(8.36) (8.37)
- 7ri(j - 1 1 Ic - 1).
Now the MVA for closed single class product-form be described as follows: Initialization.
7
Xi(j I 0) = 0.
330
ALGORITHMS FOR PRODUCT-FORM NETWORKS
Iteration over the number of jobs k = 1,. . . , K. L.STEP 2 I Fori=l,... the ith node:
, N, compute the mean response time of a job at
; a[I l tKi(k- l)] , T&q = A-
l +l?$c
L 1
mi-2
- 1) +c
[
Pi**i
Type-1,2,4 (*i = I>,
(rni - j - l)*%(j
I Ic - 1)
1
Type- 1
’ (*i
j=o
>
I),
Type-37
ii’
(8.38)
where the conditional probabilities are computed using Eqs. (8.37) and (8.36). [STEP]
Compute the overall throughput:
X(k)= N Ic c ei.T&
(8.39)
i=l
The throughput
of each node can be computed using the following equation: Xi(k) = ei* X(k),
(8.40)
where the ei can be determined with Eq. (7.5). I STEP 2 3 node:
For i = 1, . . . , N, compute the mean number of jobs at the ith xi(k)
= ei. X(k)‘Ti(k)*
(8.41)
The other performance measures, e.g., utilization, mean waiting time, mean queue length, etc., can be derived from the calculated measures using the wellknown equations. The disadvantage of the MVA is its extremely high memory requirement. The memory requirement can be considerably reduced by using approximation techniques (e.g., SCAT, or self-correcting approximation technique), discussed in Chapter 9. Another disadvantage of the MVA is the fact that it is not possible to compute state probabilities. In [AkBo83], the MVA has been extended for computing the normalization constant and for computing Only Step 2.2 needs to be extended to include the the state probabilities. equation: G(k) =
G(k - 1) qjq 7
(8.42)
with the initial condition G(0) = 1. This follows immediately from Eq. (8.14). When the iteration stops, we have the normalization constant G(K) that
THE MEAN VALUE ANALYSIS
331
can be used to compute the state probabilities with the help of the BCMP theorem, Eq. (7.80). T wo applications of the mean value analysis are now given. Example 8.3 The central-server model shown in Fig. 8.5 has N = 4 nodes and K = 6 jobs. The service time of a job at the ith node, i = 1, 2, 3, 4, is exponentially distributed with the following mean values: 1 1 - = O.O2sec, - = 0.2sec, P2 I-L1
-
1
-
= 0.4sec,
P3
1
= 0.6sec.
CL4
-cYutGL,H3PI )lIIn-@-clYIE+ Fig. 8.5 The central-server model. The visit ratios are given as follows: el = 1,
e2 = 0.4,
e3 = 0.2,
e4 = 0.1.
With the help of the MVA, the performance measures and normalization constant of the network are computed in three steps: Initialization: x1(0) = K2(0) = K3(0) = z4(0) = 0, Iteration m
over the number of jobs in the network starting with
Mean response times, Eq. (8.38): E(1) T3(1)=
1-1
G(0) = 1.
= ; ;
[l -t%(o)]
= 0.02,
T2(1) = ;
[l +Kz(O)]
= 0.2,
[l+K3(0)]
=0.4,
T4(1)
[l+K4(0)]
=oJ%
Throughput, X(l)=
*
= ;
Eq. (8.39)) and normalization constant, Eq. (8.42): l
C &(l) i=l
= 4.167,
G(1) = #
= 0.24.
332
ALGORITHMS FOR PRODUCT-FORM NETWORKS
Tab/e 8.4 steps.
Performance measures after completing six iteration
Node
1
2
3
4
Mean response time !Fi
0.025
0.570
1.140
1.244
Throughput Xi Mean number of jobs K, Utilization pz
9.920 0.244
3.968 2.261
1.984 2.261
0.992 1.234
0.198
0.794
0.794
0.595
-1
Mean number of jobs, Eq. (8.41): El(l)
= X(l)Ti(l)ei
= 0.083,
Kz(l)
= X(l)Tz(l)ez
E,(l)
= X(l)Ts(l)es
= 0.333,
F4(1) = X(l)Td(l)e4
= 0.333, = 0.25.
Iteration for k = 2: -1
Mean response times: Tl(a> = ;
[l +%(I)]
= 0.022,
Tz(2) = ;
[l +K2(l)]
= 0.267,
573(2>
[l + K3(1>]
= 0.533,
T4(2) = i
[l +K4(1)]
= 0.75.
=
-1
;
Throughput X(2) =
4
and normalization
2
= 6.452,
constant:
G(2) = z
= 3.72. 10-2.
C eiTi(2) i=l
STEP 2 3 Mean number of jobs: l
Ki(2)
= X(2)?;1(2)el = 0.140,
1?2(2) = A(2)T2(2)e2 = 0.688,
z,(2)
= X(2)F3(2)e3 = 0.688,
K4(2) = A(2)T*(2)e4 = 0.484.
After six steps, the iteration stops and we get the final results as summarized in the Table 8.4. The normalization constant of the network is G(K) = 5.756 +lo-“. With Eq. (7.80) we can compute the steady-sate probability that the network is, for example, in the state (3,1,1,1): h
=
5.337.10-4.
THE MEAN VALUE ANALYSIS
Fig. 8.6
333
A closed queueing network.
Example 8.4 As anot her example, consider the closed queueing network given in Fig. 8.6 with K = 3 jobs. At the first node we have ml = 2 identical processors having exponentially distributed service times with mean l/,~r = 0.5 sec. Node 2 and node 3 have exponentially distributed service times with means l/p2 = 0.6 set and l/bs = 0.8 set, respectively. At the fourth node (terminals), the mean service time is l/p* = 1 sec. The routing probabilities are as follows: P12
=
~13
= 0.5
and
~24 = p34 = p41 = 1.
From Eq. (7.5) we compute the visit ratios: el = 1,
e2 = 0.5,
e3 = 0.5,
e4 = 1.
The analysis of the network is carried outin the following steps: ~~~
Initialization: mo)
= K2(0)
Iteration
I-2.1
IsTEP
= 0,
x1(0 IO) = 1,
7rl(l
IO) = 0.
over the number of jobs in the network starting with
Mean response times:
Ti(1) = T3(1)
= K3(0)
&
= i
[1+ [1+
K(0)
K3(0)]
Throughput:
+ v(o = 0.8,
1 O)] = o.s,
Tz(1)
= ;
[l + F2(0)]
T4(1)
= ;
=A.
= 0.6,
334
ALGORITHMS FOR PRODUCT-FORM NETWORKS
-1
Mean number of jobs: El(l)
= ~(l)~l(l)el
= 0.227,
z,(l)
= X(l)Tz(l)ez
= 0.136,
Es(l)
= X(l)Fs(l)es
= 0.182,
??~(l) = X(l)FA(l)ed
= 0.454.
Iteration for k = 2: -1
Mean response times: 532)
=&
[1+~1(1)+m(o1l)] =0.5,
with: 7rl(O 11) = 1- $
[
1
:x(l)
+7r1(1 1 1) = 0.773,
and:
d1 1l> = ~~(l)n,(o 10) = 0.227,
T2(2)
%(2>= ; [I t-%(l)] = 0.946, -1
= ; [I+??$)] = 0.682,
T4(2) = 1.
Throughput: x(2) =
= 0.864. 4 2 T f?;?;;(2) u -6
i=l
-1
Mean number of jobs: K1(2)
=
X(2)T1(2)el
0.432,
Kz(2)
= A(2)Tz(2)ez = 0.295,
=
X(2)T3(2)e3 = 0.409,
F*(2)
= X(2)?;4(2)ed = 0.864.
=
-
K3(2)
Iteration for k = 3: -1
Mean response times: C(3)
= f-& [I +x1(2)+
nl(0 I2)] = 0.512,
with:
1
= 0.617,
THE MEAN VALUE ANALYSIS
335
and:
w(l
l’sTEf2.23
12)
=
32)711(0
1 1) = 0.334,
%(3) = ;
[l -t-&(2)]
= 0.776,
T3(3) = i
[l +??,(a)]
= 1.127,
Ej(3) = 1.
Throughput: X(3) =
/?!?%~~
= 1.217.
3 5 &(3) i=l
Mean number of jobs:
E,(3)
= X(3)T1(3)el = 0.624,
17,(3) = X(3)&(3)e2
KS(~) = X(3)T3(3)e3 = O.686! z*(3)
= 0.473,
= X(3)F4(3)e4 = 1.217.
The throughput at each node can be computed with Eq. (8.40): X1 = X(3). el = 1.218, X3 = X(3). e3 = 0.609, For determining the utilization
p1=
8.2.2
h
~ m1p1
Multiclass
= 0.304,
Closed
X2 = X(3). e2 = 0.609, X4 = X(3). e4 = 1.218.
of each node, we use Eq. (7.21): P2
x2
= z
= 0.365,
x3
p3 = -
P3
= ().487. -
Networks
The algorithm for computing the performance measures of single class closed queueing networks can easily be extended to the multiclass case in the following way [ReLa80] : Initialization.
For i = 1,. . . , N, T = 1,. . . , R, j = 1,. . . , (mi - 1):
J&(O,O...,O) =o, 7ri(O IO)= 1, %(j IO>=a IteraCon: k = 0, . . . , K:
336
ALGORITHMS FOR PRODUCT-FORM NETWORKS
[ STEP 2.1 1 For i = 1, . . . , N and r = 1, . . . , R, compute the mean response time of class-r jobs at the ith node:
’
&
. [
1+ e Eis(k - 1,) s=l
1 Fir(k)
1
Type-1,2,4 (mi = I>,
1+ F Fis(k - 1,)
= ( /-hr. mi [
s=l
(8.43)
m,-2
-t- c (m; - j - l)r&’ j=o
] k - 1,)
1 -3
1
, Type-1 (mi > l), Type-3.
x Pir
Here (k - 1,) = (ICI, . . . , Ic, - 1, . . . , /CR) is the population vector with one class-r job less in the system. The probability that there are j jobs at the ith node (j = 1,. . . , (mi - 1)) given that the network is in state k is given by: ri(j
1k)=
h 2
"i'Xr(k)ri(j-
3
Pir
r=l
11 k-
1,)
1 ,
(8.44)
and for j = 0 by: ?ri(O]k)=l-;
z
5
%X,(k)
+ mgl(mi
r=l
- j)r&’
j=l
] k)
1
(8.45)
where ei, can be computed by Eq. (7.72). 1 STEP 2.2 ]Forr=l,... , R, compute the throughput: X,(k) = N C
(8.46)
Icr *
eirTir(k)
i=l
1_______1
STEP 2 3 For i = 1, . . . , N and r = 1, . . . , R, compute the mean number of class-r jobs at the ith node: Fir(k)
= Xr(k)*Tir(k)*
ei,.
(8.47)
With these extensions the MVA algorithm for multiclass closed product-form queueing networks is completely specified. Akyildiz and Belch [AkBo83] extended the MVA for computing the normalization constant and the state probabilities. For this purpose, Step 2.2 in the preceding algorithm is expanded to include the formula:
G(k) =
G(k - 1,) L-(k)
’
(8.48)
THE MEAN VALUE ANALYSIS
337
with the initial condition G(0) = 1. After the iteration stops we get the normalization constant G(K) that can be used to compute the steady-state probabilities with the help of the BCMP theorem, Eq. (7.80).
Fig. 8.7
Sequenceof intermediate values.
To explain iteration Step 2, we use Fig. 8.7 where the sequence of intermediate values is shown for a network with R = 2 job classes. The first job class has K1 = 2 jobs and the second class has K2 = 3 jobs. The mean value algorithm starts with the trivial case where no job is in the system, that is, the population vector (0,O). From this, solutions for all population vectors that consist of exactly one job are computed; in our example these are the population vectors (1,0) and (0,l). Then the solution for all population vectors with exactly two jobs are computed, and so on, until the final result for K = (2,3) is reached. In general, to compute the solution for a population vector k we need R intermediate solutions as input, namely the solutions for all population vectors k - l,, r = 1,. . . , R. Now, the algorithm is illustrated by means of an example: Consider the queueing network shown in Fig. 8.8 with N = Example 8.5 3 nodes and R = 2 job classes. Class 1 contains K1 = 2 jobs and class 2 contains Kz = 1 jobs. Class switching of the jobs is not allowed. The service time at the ith, i = 1, 2, 3, node is exponentially distributed with mean values: 1 = 0.2sec, Pll 1 -=0.2sec, Pl2
-
1
= 0.4sec,
P21
-
1
i-422
-
1
= 1 set,
I-431
=0.6sec,
-
1
p32
=2sec.
338
ALGORITHMS FOR PRODUCT-FORM NETWORKS
Another closed queueing network.
Fig. 8.8
The visit ratios are given as follows: ell = 1,
e31 = 0.4,
e21 = 0.6,
= 1,
el2
e32 = 0.7.
e22 = 0.3,
The queueing discipline at node 1 is FCFS and at node 2, processor sharing. The terminal station is modeled as an IS-node. We analyze the network using the MVA in the following three steps: Initialization: Xi,(O)
= 0 f or i = 1,2,3 and T = 1,2,
7r1(0 10) = 1,
x1(1 10) = 0.
Iterate over the number of jobs in the network beginning with the population vector k = (1,O): -1
Mean response times, Eq. (8.43):
%(l,O) = & T&O)
= ;
T31(1,0) = ; -1
[I
+
%l(O,
0)
[1+~21(0,0)
+
%z(O,O>
+~22(0,0)]
+
m(0
I 0, o,]
= 0.4,
=I.
Throughput
h(U)
for class 1 using Eq. (8.46):
= 3 C eiA
= 1.190.
l (LO>
i=l
Note that the class 2 throughput for this population vector is 0.
=
0.2,
THE MEAN VALUE ANALYSIS
-1
Mean number of jobs, Eq. (8.47): E,,(i,O)
= X1(l,O)T&O)ell
= 0.238,
~,,(i,O)
=
= 0.286,
X1(l,0)~21(l,0)e2~
E31(1, 0) = X1(1, O)T31(1,O)e31 = 0.476. Iteration for k = (0,l): [I
Mean response time: ~12(0,1>
=
&
T22(0,1)
=
&
[I + %(o,
.
[l + 17&O, 0) +
?;32(0,1) = ; /?%$-%%I
0) + 1712(0,0)
E22(0,
o)]
+ m(0
I 0, O)]
= 0.2,
= 0.6,
= 2.
Throughput
for class 2:
X2(0,1)
=
3
C
= 0.562.
-l
1)
ei2Ti2(O:
i=l
Note that the class 1 throughput (-1
for this population vector is 0.
Mean number of jobs:
E,,(O,l)
= X2(0, 1)552(0,l)e12
= 0.112,
F22(0,i) = X~(O,l)T22(0,Q322 = 0.101, ES2(0, 1) = X2(0,1)5732(0,1)e32= 0.787. Iteration for k = (1,l): I-2.1
Mean response times:
Class 1:
T11(1,1) =&
[I + %2(0,1)
+ 7r1(0 10, l)] = 0.2,
where: n1(0 ) 0,l) = 1 - - l
%,(o,
ml
[ p12
1) + Tl(1 10,l)
1
= 0.888,
339
340
ALGORITHMS FOR PRODUCT-FORM NETWORKS
m(l I071) =
%,(O, CL12
1)7Tl(O ( 0,O) = 0.112,
T&,
1) = k
[I +2721(0,1)+~22@,1)]
%(l,
1) = &
= 1.
= 0.44,
Class 2: F&,1)
= -A2. P.12
[1+1711(1,0)
~~(1 1 I, 0) = $X1(1,O)n1(0
-1
( l,O)] = 0.2,
+ ~~(1 1 1,0)
E&(1,0)
~(0 ( l,o) = 1 - $
T22(1, 1) = k
+w(o
1
= 0.762,
( 0,O) = 0.238,
[l + K&O)]
= 0.772,
5%2(1,1)
Throughputs: =
J&,1>
3
C eil%(L
i=l X2(1,1>
= 1.157,
l
1) = 0.546.
1
=
5 ei2R2 (1, 1)
i=l
-1
Mean number of jobs: K&,
1) = X1(1, l)Tll(l,
l)ell
= 0.231,
K12(1, 1) = X2(1,1)T12(1,l)el2
= 0.109,
F&(1,1)
= 0.305,
= Xl(l,l)T21(1,l)e21
?f22(1, 1) = X2(1,1)T22(1,+22
= 0.126,
F&(1,1)
= !LB&
K32(I,
= X1(1,l)T31(l,l)e3l I) = X2(1, I)T32(1,l)e32
= O-764.
= h
= 2.
THE MEAN VALUE ANALYSIS
Iteration for k = (2,0): -1
Mean response times:
=&
%(2,0>
[1+~11(1,0)+n(O 1LO)] =0.2,
Tz1(2,0) = k
[I + &(l,
T31(2,0) = A
= 1.
1-1
O)] = 0.514,
Throughput: 2 WV)
= 2.201.
=
5 dil(2,O) i=l
p?%?$3
Mean number of jobs: S&(2,0)
= X1(2,O)T&
O)ell = 0.440,
xz1(2, 0) = &(2,O)Tz1(2,O)ezl
= 0.679,
KS1(2, 0) = X1(2, 0)‘?‘,1(2,O)e31 = 0.881. Iteration for k = (2,l): -1
Mean response times:
Class 1: [1+ K&,
1) + -T(12(1!1) +
n(O
I 1, I)]
=
0.203,
where: 7Tl(O ) 1,1) = 1 - -L %l(l,l) ml [ PII
+ %X&,1)
+ 7rl(l 11,l)
1
= 0.685,
and: v(l
I 171)
5%(2,1>
=
y$‘(l,
=
;
1)7rl(O 10,l) + EX?(l, [l +E&
1) +Ezz(l,
l)*m(O
I LO)
= 0.289,
I)] =
Class 2: T12(2,1>
=
&
[l +E,1(2,0)
+ 7rl(O 12,0)] = 0.205,
341
342
ALGORITHMS FOR PRODUCT-FORM NETWORKS
where: 7r1(0 ) 2,O) = 1 - -
%X1(2,0)
+ ~~(1 12,0)
1
= 0.612,
and: m(1
I w>
%(2,1) -1
=
~&(2,0)7rl(O
= &
) 1,o) = 0.335,
[1+ Kar(2,0)]
= 1.008,
Ts2(2,1) = &
= 2.
Throughputs: = 2.113, C d%
(2,1>
i=l
-1
i=l
Mean number of jobs: &(2,1)
= X1(2, 1)5711(2,l)ell
= 0.428,
E12(2, 1) = X2(2, l)T12(2,l)el2
= 0.108,
X2r(2, 1) = X1(2, l)T21(2,+21
= 0.726,
K22(2, 1) = X2(2, 1)5722(2,l)e22
= 0.158,
X731(2,1) =
0.8457
A1(2,l)T3i(:qu&eing
??,,(a, 1) = X2(2, l)T32(2,+32
= 0.734.
We can see that the MVA is easy to implement and is very fast. For networks with multiple server nodes (mi > 1) and several job classes, this technique has some disadvantages such as stability problems, accumulation of roundoff errors, and large storage requirements. The storage requirement is 0 (N. n,“=, (K, + 1)). For comparison, the storage requirement of the convolution algorithm is 0 (n,“=,(Kr cases is approximately
+ 1)). The time requirement in both
the same: 2 . R(N - 1). fi (K, + 1) [CoGe86]. Recall r=l
that N is the number of nodes in the network and K, is the number of jobs of class T. 8.2.3
Mixed
Networks
In [ZaWo81] the MVA is extended to the analysis of mixed product-form queueing networks. But the networks are restricted to consist only of single server nodes. Before we introduce the MVA for mixed queueing networks,
THE MEAN VALUE ANALYSIS
343
we consider the arrival theorem for open queueing networks. This theorem says that the probability that a job entering node i will find the network in state (ICI.. . . , Ici, . . . , ICN) is equal to the steady-state probability for this state. This theorem is also called PASTA theorem. r Therefore we have for the mean response time of a job in an open network:
(8.49)
As from [ZaWo81], we have, for any class r and s, Fi,
= PisKi,
and,
Pir
therefore, with the help of the relation Fir(k) Eq. (8.49):
= Xir*Tir(k),
Pir
pir
?Tir (k) =
1 -
5
it follows from
Type-1,2,4 (mi = l), Pis’
s=l
Pir 7
Type-3,
(8.50) with pir = Xir/pir, where Xi, = &.a ei, is given as an input parameter. These results for open queueing networks are now used for analyzing mixed queueing networks. The arrival theorem for mixed product-form queueing networks says that jobs of the open job classes when arriving at a node see the number of jobs in equilibrium, while jobs of the closed job classes when arriving at that node will see the number of jobs at that node in equilibrium with one job less in its own job class in the network [ReLa80, SeMi81]. If we index the open job classes with op = 1,. . . , OP and the closed job classes with cl = 1, . . . , CL, then we have for the mean number of jobs in an open class T, taking into account Eq. (8.50): Fir(k)
= Air* -L
1+ 5
Pir
Ki,cl(k)
+ E op=l
cl=1
Xi,,,(k)
1 (8.51)
I
= 1 -
E
>
Pi,op
op=l
‘PASTA:
Poisson
arrivals
see time
averages
(see [Wolf82]).
344
ALGORITHMS FOR PRODUCT-FORM NETWORKS
where k is the population vector of the closed job classes. Equation (8.51) is valid for Type-1,2,4 single server nodes, while for Type-3 nodes we have:
&r(k)
= pir.
(8.52)
For the mean response time of a job in a closed class r at node i we have: OP
R, .(k)
= L
1+
Pir
E
Ki,cl(k
-
1,)
c
+
Pir
-
1 CL
1 +
1+
Ki,,,(k
op=l
cl=1
r
=I
1 I,)
C
1 Ki,cl(k
-
cl=1 5
Ki,cl(k
-
lr)
+
1,)
OP C
1 op=
pi,op 1
OP
1- c Pi,op
cl=1
op=l
=-
11 1L1+
5
Ki,cl
(k
-
L-)
cl=1
1
A
(8.53)
j%r
1- E
Pi,op
op=l
Equation (8.53) is valid for Type-1,2,4 single server nodes. For Type-3 nodes we have: Fir(k)
=
(8.54)
$
With these formulae the MVA for mixed product-form queueing networks can be described completely. We assume for now that all arrival and service rates are load independent. The algorithm is as follows: Initialization. For all nodes i = 1,. . . , N, compute the utilization of the open class jobs op = 1,. . . , OP in the mixed network: 1 Pi,op
=
-X0,*
ei,0p,
(8.55)
Pi,op
and check for the ergodicity condition (pi,OP 5 1). Set Ki,cl(O) = 0 for all i= I,..., N and all closed classes CL= 1, . . . , CL. Construct a closed queueing network that contains only the jobs of the closed job classes and solve the model with the extended version of the MVA, which considers also the influence of jobs of the open job classes. The results are the performance measures for the closed job classes of the mixed queueing network.
THE MEAN VALUE ANALYSIS
Iteration:
345
k = 0,. . . , K:
I
STEP 2 I For i = 1,. . . , N and r = 1,. . . , CL, compute the mean response times with Eqs. (8.53) and (8.54). CL, compute the throughputs with Eq. (8.46). I STEP 2 2 Forr=l,..., STEP 2 3 For i = 1,. . . , N and T = 1,. . . , CL, evaluate the mean number of jobs with Eq. (8.47).
I
With the help of the given in Section 7.2, compute classes of the network starting mean number of jobs Ei,, i =
solutions of the closed model and the equations the performance measures for the open job with Eqs. (8.51) and (8.52) for computing the 1,. . . , N and r = 1,. . . , OP.
If we use the iterative formula (8.48) f or computing the normalization constant in Step 2, then the steady-state probabilities can be derived from the BCMP theorem, Eq. (7.76) [AkBo83]: (8.56)
n(S = Sl . . . ) S,) = where X(j) is the (possibly state dependent) arrival rate.
As an example of a mixed queueing network, consider the Example 8.6 model given in Fig. 8.9 with N = 2 nodes and R = 4 job classes. Class 1 and 2 are open, the classes 3 and 4 are closed. Node 1 is of Type-2 and node 2 is of Type-4. _---
Source
-
Open Class Closed Class
LJ
Fig. 8.9
A mixed queueing network.
The mean service times are given as follows: -
1
= 0.4sec,
Pll
-
1
P21
= 0.6sec,
1 = 0.8sec, CL12 1 = 1.6sec, CL22
-
1
-
1
p23
1
= 0.3seq
-
= 0.5sec,
1 = 0.8sec. k24
I-L13
P14
= 0.5sec,
346
ALGORITHMS FOR PRODUCT-FORM NETWORKS
The arrival rates by class for the open class jobs are Xi = 0.5 jobs/ set and X2 = 0.25 jobs/ sec. In each of the closed classes there is K3 = K4 = 1 job. The routing probabilities are: Pll,ll
= 0,
1,
p12,12
= 0,
PO,13 = 0,
P13,13
=
PO,14 = 0,
p14,14
= 0.6,
1,
PO,11 = PO,12 =
0.5,
0.5,
Pll,Zl
=
~12,22
= 0.6,
p13,23
=
pl4,24
= 0.4,
0.5,
P21,ll
=
1,
~22~2
=
1,
p23,13=
1,
=
1.
p24,l4
Thus, by Eq. (6.27) the visit ratios are computed to be: 2, e12 = 2.5, = 1, e22 = 1.5,
ell
=
e21
e13 = 1, e23 = 0.5,
el4 e24
= 1, = 0.4.
The determination of the performance measures of this mixed queueing network is carried out with the MVA in the following three steps: Initialization. job classes:
Compute the utilization
of both nodes by the open
PII = hell+ -!- = 0.4, l-L11
p21 = Xle21. L
~12 = X2el2.
~22 = X2e22-
2-
Pi2
= 0.5,
=0.3,
IQ21 k
=0.6.
Set Kii (0) = Xi2 (0) = X21(Q) = K22(Q) = 0. Using MVA, analyze the closed queueing network model, obtained by leaving out the jobs of the open job classes. Mean value iteration for k = (1,O): Mean response times, Eq. (8.53): 1+ Pl3
Throughput,
1 -
&3(Q) (Pll
= 3,
T23(1,0) = J-
+ P12)
CL23
1 + z,,(a) 1 -
Eq. (8.46) : X3(1,0) =
2 C
K3
= 0.182.
ei3%3(1,0>
i=l
Mean number of jobs, Eq. (8.47): K13(1,0)
= ~3(1,0)*
ei3.T13(1,0)
z23(1,0)
= ~3(1~o)’
e23- T23
= 0.545,
(1, 0) = 0.454.
(p21
+ ,022)
= 5’
THE MEAN VALUE ANALYSIS
347
Mean value iteration for k = (0,l):
%4(0,1> = 5,
T24(0,1> = s,
X4(0,1) = 0.122,
K14(0, 1) = 0.610,
&(O,
1) = 0.390.
Mean value iteration for k = (1,1):
T&(1,1)
= 4.829, T2&,1)
X3(1, 1) = 0.120, K13(l,1)
= 6.951, ?;I&,
1) = 7.727, %(l,
1) = 11.636.
1) = 0.624, %,(l,
1) = 0.376.
X4(1, 1) = 0.081.
= 0.582, &(l,
1) = 0.418, &(l,
With these results and using Eq. (8.51), the performance measures of the open classes can be computed: F1&
1> =
Pll
(1+
%3(L
1) + %4(L
1))
,
0.1 K21(1 1> = Pzl (1 + E23(1~~> + %dl, , 0.1 K12(1
1) =
P12 (1 + %3(1,1>
+ %4(L
1> =
p22 (1 + rc,,&
+ K24(1,1>>
7 K
(1 22
7
0.1
l>>
1))
= 8.822, = 5.383, = 11.028, = 10.766.
For the computation of the mean response times we use, Eq. (7.43): Tll (1,1) = 8.822,
T12(l, 1) = 17.645,
T21(1, 1) = 10.766,
T22(1, 1) = 28.710.
Now all the other performance measures can be computed. There are some suggestions to extend this algorithm to mixed queueing networks that are allowed to contain additional multiple server nodes. The most important approaches can be found in [KTK81] and [BBA84]. 8.2.4
Networks
with
Load-Dependent
Service
The MVA can be extended to examine product-form queueing networks where the service rates at node depends on the number of jobs at that node, [Reis81]. Let pi(j) denote the load-dependent service rate of the ith node when there are j jobs in it.
348
ALGORITHMS FOR PRODUCT-FORM NETWORKS
8.2.4.1 Closed Networks The algorithm for single class queueing networks with nodes having load-dependent service rates can be described in the following steps: Initialization. Iteration:
For i = 1, . . . , N:
Ic = 1, . . . , K.
1 STEP 2.1 ] For i=
l,...
, N, compute the mean response time:
Ti(lc)= & -7ri(j 3 j=l l-G>
-1
7ri(O 10) = 1.
(8.57)
- 1 1Ic - 1).
Compute the throughput:
X(k)= N Ic . c ei.T,(k)
(8.58)
i=l
I STEP 2
3 Fori=l,..., N, compute the conditional probabilities of j jobs at the ith node given that there are Ic jobs in the network:
W)
-ri(j cLi(j) ri(j
I k) =
- 1 ) k - l)ei,
for j = 1,. . . , Ic, (8.59)
I
for j = 0.
The extension of this algorithm to multiclass queueing networks where service rate of a node depends on the overall number of jobs at the node be done very easily. Let hir(j) be the the service rate of class-r jobs at ith node when there are j jobs in it. Then the MVA can be presented in following steps: Initialization. Iteration:
For i = 1, . . . , N:
the can the the
7ri(O I 0) = 1
k = 0,. . . , K.
[ STEP 2.1 ] For i = 1, . . . , N and r = 1, . . . , R, compute the mean response times: Tir(k)
= 2 j j=l f4.d
T;(j--Ilk-l,),
where k=e&. r=l
(8.60)
THE MEAN VALUE ANALYSIS
I STEP 2
2 Forr-
349
, R, compute the throughputs:
l,...
k-(k) = N
.
”
(8.61)
C eir*Tir(k) i=l
I STEP 2
3
Fori=l,...
, N, determine the marginal probabilities:
cR
A-(k) ari(j zr
r=l ri(j
1 k)
-
1 I k-
L)eir,
f0r.i
=
l,---,k,
(8.62)
=
for j = 0.
Example 8.7 Consider again the example given in Fig. 8.6 where the multiple-server node 1 is replaced by a general load-dependent server. Loaddependent service rates are: ~~(1) = 1.25, ,u4(1) = 1, ~~(2) = 1.25, ~~(2) = 2, ~~(3) = 1.25, ~~(3) = 3.
,ul(l) = 2, ~~(1) = 1.667, ~$2) = 4, /~(3) = 4,
~~(2) = 1.667, ~~(3) = 1.667,
Initialization:
r;(O ) 0) = 1 for i = 1,2,3,4. Iteration
[I
over the number of jobs in the network starting with
Mean response times, Eq. (8.57): ??i(l) = 1n1(0 Pi(l) Ts(l)
= ~ l
10) = 0.5, n3(0 ( 0) = 0.8,
T2(1) = --&(O TJ(1) =
P3U)
[I
Throughput, X(1) =
4 C i=l
l -7r4(0 Y4P)
I 0) = 0.6, 1 0) = I.
Eq. (8.61) and normalizing constant, Eq. (8.42) : l
eiTi(l)
= 0.454,
G(1) = z
= 2.203.
350
ALGORITHMS FOR PRODUCT-FORM NETWORKS
-1
Marginal probabilities, m(l
Eq. (8.62):
w
1 1) =
7rl(O ( 1) = ~~(0 ( 1) = 7r3(0 ( 1) = n-4(0 1 1) =
------n-1(0 10)el = 0.227, CL1 (1) 1 - 7ri(l ( 1) = 0.773, 0.864, ~~(1 1 1) = 0.136, 0.818, ~~(1 ( 1) = 0.182, 0.545, Q(l ) 1) = 0.454.
Iteration for k = 2: -1
Mean response times: Tr(2) = --7rl(O 1
2 ni(1 I 1) = 0.5, 11) + ___ I-L1 (2)
CL1(1)
2 1)+ -7r2( Tz(2) = 1 P2(1) sTr2(o I-Q(2) 2 T3(2) = J1) + -7r3(1 P3(1) T3(o CL3(2)
1 1 1) = 0.682, ) 1) = 0.946,
2 n4(1 I 1) = 1. T4(2) = J1) + ___ P4(1) T4(o P4 (2)
-1
Throughput X(2) =
4
and normalizing constant: 2
= 0.864,
G(2) = g
= 2.549.
C @i(2) i=l
-1
Marginal probabilities: 742
I 2) =
w
---7ri(l
I 1)er = 0.049,
Plc4
n(l
I 2)
=
m) -7rl(O
1 1)ei = 0.334,
I-Q)
q(O 12) = 1 - 2
7rl(l ( 2) = 0.617.
1=1
n2(2 ) 2) = 0.035, ~~(2 12) = 0.063, n4(2 ( 2) = 0.196, Iteration for k = 3:
7r2(1 12) = 0.224,
x2(0
r3( 1 12) = 0.283, 7r4(l 12) = 0.472,
x3(0 n4(0
12) = 0.741, 12) = 0.654, ( 2) = 0.332.
THE MEAN VALUE ANALYSIS
-1
351
Mean response times: F(3)
= --&do
12) + 2
4
I 2) +
Pl (2) &(3)
= 0.776,
-1
F3(3)
Throughput
-1
TA(3) = 1.
and normalizing
Marginal
1 2) = 0.512,
I-L1 (3)
= 1.127,
X(3) = 1.218,
3 -7r1(2
constant:
G(2) G(3) = -q3)
= 2.093.
probabilities:
v9 = -P1(3)r1(2 w 7r1(2 1 3) = /11(2P Jw I 3) = -rl(O
2)el = 0.015,
7r1(3 13)
v(1
2)el = 0.102, I2)el
= 0.375,
I-L1(1)
3
7rl(O 1 3) = 1 - Cxl(Z
) 3) = 0.507.
1=1
3) = 0.082,
7r2(3
1
0.013,
7rz(2
~(1
] 3) = 0.271,
~~(0
3) = 0.634,
~~(3
)
7r3(2
3) = 0.138,
7r3(1 I 3) = 0.319,
7r3(0
3) = 0.512,
7rITq(313) = 0.080,
7r4(2
3) = 0.287,
7rA(l ] 3) = 0.404,
7rJ(O ] 3) = 0.229.
3) 3)
=
=
0.031,
Now the iteration stops and all other performance measures can be computed. For the mean number of jobs at the nodes we have, for example: x1(3)
13)
0.624,
k=l
-
&,(3) The MVA dence:
=-&q(k = = = 0.473,
is extended
X3(3)
0.686,
- = K4(3)
by [Saue83] to deal with
1.217.
other kinds of load depen-
l
Networks in which the service rate of a node depends on the number of jobs of the different classes at that node (not just on the total number of jobs at the node).
l
Networks
with
load-dependent
routing
probabilities.
352
ALGORITHMS FOR PRODUCT-FORM NETWORKS
In [TuSa85] and [HBAK86], the MVA is extended to the tree mean value anulysis, which is well suited for very large networks with only a few jobs. Such networks arise especially while modeling distributed systems.
8.2.4.2 Mixed Networks The MVALDMX algorithm (mean value analysis load dependent mixed) was first presented in [BBA84] and can be used to analyze mixed BCMP networks with load-dependent service stations. Apart from the algorithm presented in [BBA84], another suggestion is made in [KTK81] to extend the MVA to analyze multiclass load-dependent queueing networks. In order to extend the MVA, the equations derived in the previous sections, and especially the equation for the mean response time, need to be extended. Since a mixed network contains open as well as closed job classes, we need to consider the open classes in the network when computing the performance measures for the closed classes. Furthermore the load-dependency must be explicitly accounted for. To do so we need to consider marginal probabilities in the corresponding equations. Bruell, Balbo, and Afshari [BBA84] introduce the capacity function to describe the behavior of load-dependent service stations. The capacity function ci(j) of node i is the number of jobs completed by that station if there are j jobs in the queue. The inverse function of the capacity function, C(j), dependence factor. The loud-dependent mean service time si,(j) tion i is given by the relation
is called the Zoud-
for class-r jobs at service sta-
(8.63) The function Sir(j) can be interpreted as the nominal service time. Using the capacity function c,(j), we can model a general load-dependency, which means, ci(j) can be any function with: ci : W -+ lR+ and ci(l) = 1. In the following, load-dependency shall be restricted to multiple server nodes. The load-dependency factor for an -/M/m node is then:
1 Ic’ I-
if k < m;,
Ci(k) = 1 = c,(k)
1 , ifkzm;. mi
(8.64)
THE MEAN VALUE ANALYSIS
353
In the MVA for mixed queueing networks, and especially in the MVALDMX algorithm, the procedure is to consider the performance measures for open and closed classes separately from each other. At first the performance measures of the closed classes are computed and then the performance measures of the open classes are computed. This separation can be achieved in the MVALDMX algorithm by checking the marginal probabilities of different job combinations. In [BBA84] a recursive equation is given for the computation of the conditional probabilities ni(lc, K) such that there are exactly Ic jobs of the closed job classes at node i for a given population K. This equation is only valid for the closed job classes. This recursive equation is the only one necessary for the analysis of mixed, load-dependent queueing networks:
T~(JC,K) = 5
s,,cl +ECi(~)
. &cl(K)
+ri(k
- 1, K - lc&
(8.65)
Here, K is the population vector of jobs in the closed classes, and 1,~ is the vector whose components are all zero except the clth component, which is one. The variable Ic is the overall number of jobs in closed classes in a certain population vector during the iteration, and ECi is the inverse of the effective capacity function. The reduction of the mixed network to a closed network is given by modifying the load-dependence factor Ci (Ic). In a mixed queueing network, the service stations are used by jobs belonging to open as well as closed classes. The availability of open job classes in the network reduces the capacities of the service stations, which means that the closed job classes do not have the full capacity available any more as previously defined in the capacity function. It is for this reason that we need to define the eflective capacity function: eciw
1 = Eci(q
= c&c) *
Ei(k - 1) E;(k)
’
(8.66)
This function describes the capacity that is effectively available for the closed classes if we consider the jobs of open classes. This step makes a separate analysis of the open and closed classes possible. As we can see, Ei(k) (which is a function that helps in computing the effective capacity of station i) is of basic importance for this reduction. The computation of the auxiliary parameters can be done for each service station i independently from the other service stations. Vk>mi-
1
:
Ei(k)= Li = E op=l
1
1 _ Li . Ci(mi) ' Ei(k - l)' Xi,op ' Si,op,
354
ALGORITHMS FOR PRODUCT-FORM NETWORKS
Ei(k) = E@) I
I
&l(k)
+ E&k)
1 1 - LiCi(mi)
-
@3(k)
. ci C,(k)
. Eil(k - l),
k > 0,
= ( k = 0,
mi-2
EiZ(k) = c &,(k,j), j=o mi-2 &3(k)
=
c
‘dk 5 rni - 1
F3(k&
j=o
‘k+j j
. Li +Ci(mi) . Fis(k, j - l),
$$
- Fi3(k - l,O), i
mi-lC,(Z) II1=1ciCrni> ’
IC> 0, j > 0, k > 0, j > 0,
k = 0, j = 0.
Here Li denotes the load factor of open classes with regard to station i. Now we have all necessary equations to describe the MVA algorithm for mixed product-form queueing networks with multiple server nodes. In the algorithm, K denotes the overall number of jobs in the closed job classes of the network and k denotes the state vector for an iteration step for the closed job classes. Initialization. the following values: [I
For all nodes i = 1,. . . , N, of the network compute
Load factor Li of the open job classes: Li = E
Xi,*p ’ Si,op*
op=l
Let K = E Kcl and compute for all k = 1, . . . ,K+l cl=1 auxiliary functions Ei(k) and ECi (k).
-1
1 STEP 1.3 1 ni(O, 0) = 1.0.
the
THE MEAN VALUE ANALYSIS
355
Construct a closed model that contains only jobs of closed job classes of the original network and solve this closed model. The results that we get from this model are the performance measures for the closed classes of the mixed network. For all vectors k = 0, . . . , K, do:
I
STEP 2 i For all cl = 1, . . . , CL and i = 1, . . . , N, compute the mean response time:
k
F+(k)
=
““’
cj. ’ j=l
ECi(j)
. ni(j - 1, k - lcl),
Type-3.
Si,cl,
I
STEP 2 2 put:
Type-1,2,4,
For all cl = 1,. . . , CL and i = 1,. . . , N, compute the through-
J&&)
z
N
ICC1’ ei7cz
.
C ej,cz . Tj,dk) j=l
STEP 2 3 jobs:
I
For all cl = 1, . . . , CL and i = 1, . . . , N, compute the number of
(-?EEZGL]
Adapt the marginal probabilities
for Type-1,2,4 nodes:
For i = 1, . . . , N do: sum = 0.0 for k = 1, . . . , k do: n&k)
= E
sZ,CZ. ECi(k) * h,cl(k)
* ri(k - 1, k - lcl),
cl=1
sum = sum + ni(lc, k),
ri(O, k) = 1.0 - sum. 1-1 Compute all other performance measures using the well-known formulae (see Section 7.1) Compute from the solution of the closed model, the performance measures for the open classes of the network. For op = 1, . . . , OP and i =
356
ALGORITHMS FOR PRODUCT-FORM NETWORKS
1,“‘7 N, compute: + 1) . EC@ + 1) . ri(lc, K),
Type-Q,&
Type-3.
k,op * %,op, %,0,(K) Ti,,,(K)
Xi,op
=
’
Type-1’2’47
In the following example, we show how to apply the method to a simple queueing network: Consider the network given in Fig. 8.9. The mean service Example 8.8 times are given as follows: 1 = msec, Pll 1 = @set, P21
-
1
= usec,
-
1
= Qsec,
1
P22
=
usec,
1 -
1
= osec,
Pl4
Pl3
Pl2
-
usec,
=
CL23
1 -
P24
=
@Jsec.
The arrival rates for the open class jobs are Xi = 0.5 jobs/set and X2 = 0.25 jobs/ sec. In each of the closed classes there is K3 = K4 = 1 job. The routing probabilities are: 0.5,
P21,ll
=
1,
= 0.6,
p22,12
=
1,
pl3,23
= 0.5,
pl4,24
= 0.4,
p23,13= 1, p24,14= 1.
PO,11 =
1,
Pll,ll
= 0,
P11,21
=
PO,12 =
1,
P12,12
= 0,
m2,22
PO,13 = 0,
P13,13
= 0.5,
PO,14 = 0,
p14,14
= 0.6,
For both nodes of the mixed network compute:
~1,~~ = X11 . sll
+
X12 - ~12 = 0.9 jobs,
op=1 L2
=
5 op=l
x2,op
-
~2,~~= Xzl + ~21 + X22 - ~22 = Q jobs.
CL
-1
Set: K = C Kc1 = K3 + K4 = 2. cl=1
THE MEAN VALUE ANALYSIS
Fork=
l,...
i=l:
, KcL + 1 compute Ei(k) and ECi(k):
&l(l)
=
'%1(z)
= h(2)+
k(3)
=
ECl(k)
&l(l)
+
&2(l)
+
h(3)
-
&3(l)
=
100,
&z(2) - &s(2)
= 1000,
-
= 10000,
~%2(3)
E13(3)
1 =-=lo, 1 - Ly
= 1 _ L$(ml) 1
i=2:
357
* G(m)
computed in a similar manner as for i = 1.
~~(O,(O,O))
= 7r2(0,(0,0))
= 1.0.
Analysis of the closed model, which we get after removing all open job classes in the network. Iteration for k = (1,O): [STEP]
Compute the mean response times:
‘T13(1,0)
=
. &
s13
.ECl(T)
* @,
(O,O))
r=l
T&1,0) -1
= . . . = 5.
Compute the throughputs: X13(1,0)
=
2
C
k3 *e13
-
ej3
= 0.182,
- Tj3(W
j=l
X23(1,0) = . . . = 0.091. 1-1
-1
Compute the mean queue lengths: rir,,(l,O)
= X13(1,0)'T13(l,0)
K23(l,O)
= ... =
0.455.
Adapt the marginal probabilities: k = “c” kcl = 1. cl=1
= 0.545,
=
3,
358
ALGORlTHMS FOR PRODUCT-FORM NETWORKS
i=l:
sum = 0.0, for Ic = 1: m(L(LO)> = ...
= 5 sl,cl - EG(l) cl=1
- ~l,cz(LO)
.m(O,k
= 0.546,
sum = sum+ rr(l,
(1,O)) = 0.546,
7rl(O, (1,O)) = 1.0 - sum = 0.454,
i=2:
sum = 0.0, for k = 1: 7rz(l, (1,O)) = *. . = 0.454, ~(0, (1,0)) = . . . = 0.546.
Iteration for k = (0,l): -1
Mean response times: T14(0,1) = . . . = 5,
-2.2)
T24(0,1) = . . . = 8.
Throughputs: X14(0,1) = 0.122,
X24(0,1) = 0.049.
-1
Mean queue lengths: K14(0,1) = 0.610, X24(0,1) = 0.392.
11
Adapt the marginal probabilities:
~~(1, (0,l)) 7ry,(1, (0,l))
= 0.610, = 0.390,
7r1(0, (0,l)) x2(0, (0,l))
= 0.390, = 0.610.
Iteration for k = (1,l): L-2.1
Mean response times: T13(1, 1) = 4.829,
T&l,
1) = 6.951,
T14(1, 1) = 7.727, Tz4(1, 1) = 11.636.
- lcl))
THE MEAN VALUE ANALYSIS
-1
359
Throughputs:
-1
&(I,
1) = 0.120,
X&,
1) = 0.081,
X&,1) = 0.060, X2& 1) = 0.032.
Mean queue lengths:
m
&(l,
1) = 0.582,
&(l,
&(l,
1) = 0.624,
K2*(1, 1) = 0.376.
1) = 0.418,
Adapt the marginal probabilities: k = “c” ICC1 = 2. cl=1
i=l:
sum = 0.0, ~~(1, (1,l)) = 0.324, sum = sum + 0.324 = 0.324. 7rr(2, (1,l)) = 0.441, sum = sum + 0.441 = 0.765, rl(O, (1,1)) = 1 - sum = 0.235,
i=2:
sum = 0.0, ~~(1, (1,1)) = 0.332, 7r2(2, (1,1)) = 0.234, 7r2(0, (1,1)) = 0.434.
Now we can compute the performance measures for the open job classes. First the mean number of jobs by class and node:
Kll(1,
K&l)
1)
=
x11
* 311
.~(k+l).~c~(lc+l).rjl(ii,(l,l)) lc=o
= 8.822, = 5.383,
7C12(l,l) = 11.028,
&(l,
1) = 10.766,
Then the mean response time by class and node: T&l)
= “‘;!;)
Tr2(1, 1) = 17.648,
l) = 8.822, T,, (1,1) = 16.766,
T&l,
1) = 28.710.
360
ALGORITHMS FOR PRODUCT-FORM NETWORKS
8.3
THE RECAL METHOD
Another technique for the exact solution of closed product-form queueing networks is the RECAL (recursion by chain algorithm) algorithm. RECAL is very well suited for networks with a large number of job classes but a small number of nodes. For the explanation of this method we assume that we have a product-form network N consisting only of single server or infinite server nodes. We consider only the case of a multiple class network (R > 1) and introduce an extended normalizing constant GR(v) for the network with the Here, GR (0) is the standard normalizing constant vector v = (wi,...,~~). G(K), called G for short in the following. In [CoGe86] a recursive expression for computing G = GR (0) is ’ given, which relates the normalizing constant of a network with r classes and the normalizing constants of a set of networks with each of the (r - 1) classes. The RECAL algorithm uses a simplified version of this general relation. This simplification can be given when each job class contains exactly one job, because if K, = 1 for r = 1,. . . , R, then the normalizing constant GR(O) is recursively given by the expression [CoGe86]: Gr(vr)
= c(1
EG,-l
fvd)
(vr + li)
for vT E F,,
(8.67)
i=l
with:
vr = (‘ulr,...,~Nr), Vir
>Ofori=
, OLr
l,...
r = R,
6i =
1, Type-LW, 0, Type-3,
li = (0,. ss,O,l,O,* v~70)
is an N-dimensional vector with 1 at the ith position.
With the help of the normalizing constant GR (0)) the throughput and the mean number of class-R jobs can be computed because for KR = 1 the correctness of the following equations are proven in [CoGe86]: N
GR-I($) eiR* c j=l
GR(O)*
XiR = e R GR-l(lz) i * GR(“)
(N
+ K
-
1) ’
if there are no Type-3 nodes in the network, for a Type-3 node with index 5,
(8.68)
THE RECAL METHOD
361
and: K,
=
eiR
ih--l(b)
ar
(8.69)
-*%(o)
/-hR’
The basic idea of the RECAL algorithm is to generate a fictitious network containing one job per job class in order to be able to compute the performance measures using Eqs. (8.67), (8.68), and (8.69). Therefore, it must be possible to partition each class T into K, identical subclasses with exactly one job per job class. This partitioning changes the state space and the normalizing constant of the network, but the performance measures remain the same. Let N* denote the fictitious network that we get after partitioning the job classes into subclasses. The overall number of jobs in this fictitious network:
K*=eK,=K, r=l
and the number of job classes is, by definition of N*, exactly R* = K*. Each class r = l,... , R* contains exactly K: = 1 job. Let the jobs be numbered 1,2,. . . , K* and let the function c(k) denote the class index of the lath node in the original network. Using Eq. (8.67), the normalizing constant Gk, (0) of the fictitious network JV is recursively given by:
G;(vk) = $(l
+ vikSi)*~, (k)GLlbk
for Ic= l,...
(8.70)
+ li>T
ZC
i=l
, E(* and vk E .?$ with: N vi,&
2
0
for
i =
1, . . . , N; c
Vik = K* - k
,
i=l
and the initial conditions: G;T(vo) = 1 for all va E .7--i. With Eqs. (8.68) and (8.69) the performance measures A& and ??& relating to a class-c(R*) job in the original network N can be computed: N Gk*-l(1.d eic(R*)’
c j=l G;, (0). (N + K* - 1) ’ f;;~;;;lte,,n~e~w~;
(8.71) \
G&,(L) e4R*Y G;,(o)
’
for a Type-3 node with index 2,
362
ALGORiTHMS FOR PRODUCT-FORM NETWORKS
E&* = G*K*&i) .-eicp) Gil* (0) Pic(R*)*
(8.72)
From the fact that all the jobs of a given class in the network N are identical, the throughputs and the mean number of jobs in class c(R*) of the network N are given by: &c(R’) = Kc(R*)’ &* KiccR*) = Kc(R*&Re.
(8.73)
7
(8.74)
Performance measures for classes other than c(R*) can be obtained by relabeling the jobs in the fictitious network p with the goal that the assignment c(R*) belongs to another class in the original network N. At the beginning, the numbering of jobs in W is chosen with the following assignment to the jobs in N:
1, I
k= l,...,Kr - 1, 2 5 i 5 R and
i,
i-l
k = 1 + c(K7-
- l), . . . , -&G-
I
R
R
R-k+l+C(Kr-l),
- 9,
r=l
r=l
k=l+)(Kr-I),**-yK** r=l
r=l
(8.75)
After computing G&, (0) using Eq. (8.70), performance measures for class s = 1 in the original network N can be determined using Eqs. (8.71)-(8.74), due to Eq. (8.75). For the computation of performance measures of class(s + 1) jobs in the network N, the jobs in N* have to be relabeled afterwards in the following way: R .,S-1+x
(Kr
-
I>,
r=l e(Kr r=l
-
l+e(Kr-
(8.76)
I>,
l),...,K*.
r=l
Now, the performance measures for class-(s + 1) jobs can be computed after computing G&, (0) anew. The RECAL algorithm can be described in the following four steps:
THE RECAL METHOD
363
Initialization: Gc(ve) = 1 for all vo E .7$. Number the jobs in the fictitious assignment c(l)(k) given in Eq. (8.75).
network n/* according to the
Compute and store the values of Gz(v,) - R by using Eq. (8.70).
for all w, E .7$ with
Iteration over all classes s = 1,. . . , R in the network M -1 Compute Gk, (0) with Eq. (8.70) by using the stored values for the Gi (v~). -1 Compute the performance measures of class s using Eqs. (8.71)(8.74). If s = R then the algorithm stops. 1-1 For computing the performance measures of class-s + 1 jobs, relabel the jobs in P corresponding to the assignment (8.76). 1-1 Increase x by 1 and compute and store the values of the Gc(v,) anew for all v, E -Tjc*by using Eq. (8.70). By storing the values Gz(v5) computed in Step 3 and Step 4.4, it is not necessary to compute Eq. (8.70) in Step 4.1 anew for all k = 1,. . . , K* when evaluating the performance measures. Instead, Eq. (8.70) needs to be calculated only for k = x, . . . , K* with x = s + C,“=, (K, - 1). The algorithm is illustrated by the following example: Consider a closed queueing network with N = 2 nodes and Example 8.9 R = 3 job classes and with K = 5 jobs distributed as follows: K1 = 2, KS = 1, KS = 2. We have K = R* = K* = 5. Class switching is not allowed. Node 1 is a Type-2 node and node 2 is a Type-l node. The service rates are given as follows: Pll
=
20%
p12
=
50,
p13
= 2,
p21 = p22 = p23 1 4,
and the visit ratios are: ell = e12 = e13 = 1,
e21 = 0.72,
e22 = 0.63,
e23 = 0.4.
The computation of the performance measures is carried using the RECAL algorithm as follows: Initialization: Gg(ve) = 1 for all ve E Fi, with F; =
364
ALGORITHMS FOR PRODUCT-FORM NETWORKS
Labeling the jobs in the network w k k k k k
= = = = =
with Eq. (8.75) yields: 1, 2, 3, 4, 5.
Compute Gz(w,) for all v, E ,FG and x = K* - R = 2. We get:
zli2>0,
= uo, % (19% (2, I>, (3,0>> *
&=K*-2=3 i=l
With Eq. (8.70) we have: $
G;(l, 3) + 4.2.
G;(O, 4) = 0.727,
G;(1,2)
= 2.2.
G;(2,2) + 3.2.
G;(l, 3) = 0.774,
G;(2,1)
= 3.2.
G;(3,1) + 2.:.
G;(2,2)
G;(3,0)
= 4$.G;(4,0)
G;(O, 3) =
+
E.G;(3,1)
= 0.681, = 0.448,
0.905,
z.
G;r(l, 4) + 5.2.
G;(O, 5) =
G;(l, 3) = 2.2.
G;(2,3) + 4.2.
G;(l, 4) = 0.73,
G;(2,2)
= 3.2.
G;;(3,2) + 3.2.
G;;(2,3) = 0.555,
G;(3,1)
= 4.E.
G;(4,1) + 2.2.
G;(3,2)
= 0.38,
G;(4,0)
= 5.2.
G;(5,0) +
2.
G;(5,0)
= 0.205.
GT(O, 4) =
Iteration with s = 1:
over all job classes s of the given network JV, starting
THE RECAL METHOD
of G:(O) using Eq. (8.70). For this computation
I] Computation we additionally need: G;(O, 2) =
E-G;(l,2)
G;(l, 1) = 2$G;(2,1) = 3-E.
G;(2,0)
G;(O, 1) = G:(l,O)
365
E.
+3 ~2.
G;(O, 3) = 0.605,
+ 2.2. G;(l, 2) = 0.836, CL23 e23. G;(2,1) = 0.740, /-k23
G;(3,0) +
G;(l, 1) + 2-E.
= 2.$.Gj(2,0)
G;(O, 2) = 0.207,
“22-G;& Pm
+
1) = 0,161.
From this i t follows: G;(O) = E,
G;(l, 0) + 2.
G;(O, 1) = 0.038.
1-1 Computation of the class 1 performance measures. Because of Eqs. (8.71) and (8.72) we have: 2
XT5= ellsC j=l
GXlj) + 5 - 1) = 1.6117
G;(o)-(2
2
AZ5= e2,+C j=l
Wlj)
G;(o). (2 + 5 - 1) = m7 2
= 0.021,
G;(O) 1). 2 -* K25 = -- 1 G;(O)
= 0.979.
G;(l,O)1 -* I-Cl5 = -.G;(O)
These results are needed to compute the throughputs jobs in class 1 using Eqs. (8.73) and (8.74). = 3.222,
xl1 = &.l;i;,
= 0.042,
(8.77)
X21 = K1. A;, = 2.320,
1721 = Q??;,
= 1.958.
(8.78)
Xl1 = K14;5
-1
and mean number of
Relabel the jobs in JV* in accordance to Eq. (8.76): 17
37
d2)(k) =
17
I
3, 27
k = 1) k = 2) k = 3) k =4, k = 5.
366
ALGORITHMS FOR PRODUCT-FORM NETWORKS
[ml Increase x by 1, i.e., x = 3 and compute and store Gz (vz) for all v, E Fz = {(0,2), (1, l), (2,O)):
G;(0,2) =
$G;(l,2)
+3=G;(O,3)
= 0.396,
k421
G;(l, 1) = 2.2.
G;(2,1) + 2.2.
G;(l, 2) = 0.285,
Pll
G$(2,0) = 3$.G;(3,0)+
G;(2,1) = 0.129.
$
Iteration for s = 2: I-$,11 additionally
Computation of G;(O) with Eq. (8.70). For this computation we need: E.Gj(l,
1) + 2 .z.G;(0,2)
= 2$.G;(2,0)+
2.
G;(O, 1) = G;(l,o)
= 0.222,
G;(l, 1) = 0.158.
Hence: G;(O) =
z.G;(l,O)
im Computation Eq. (8.71) and Eq. (8.72):
+
E.G;(O,
1) = 0.038.
of the performance measures for class 2 using
A& = e12-C2 ~Gm jzl Gw
6
= 1.661,
Furthermore: A;, = 1.046,
K;,
= 0.083,
E& = 0.917.
Because of K2 = 1 with Eqs. (8.73), (8.74) we get: x12 = A;,,
mq
x22 = A;,,
E,,
= IT&,
Relabel the jobs: 1, fk = 1,
3, cqk)
k =
2,
= I 1, k = 3,
2, 3,
k = k =
4, 5.
7722 =
I?;,.
THE RECAL METHOD
11 Increase z by 1, i.e., x = 4, and compute and store Gz(v,) all v, E -T4*= ((0, l), (1,O)): z+
G:(O, 1) =
G;(l, 1) + 2.2.
‘%(I, 0) = 2$.G;(2,0)
+
367
for
G;(O, 2) = 0.131,
z.G;(l,
1) = 0.050.
Iteration for s = 3: -1
Computation
of GE(O):
G;(O) = E.G;(l,O) 1-1
Computation
+ $G;(O,
1) = 0.038.
of the class-3 performance measures:
XT:, = 0.790,
A;, = 0.316,
rr5
= 0.658,
?7& = 0.343.
Xl3 = 1.580,
XZ3 = 0.632,
El3 = 1.315,
K15 = 0.685.
Hence:
Now the iteration ends and all other performance measures can be obtained using formulae from Section 7.1. If the number of job classes in the network is very large, then the RECAL algorithm is much more efficient than the convolution algorithm or the MVA. But if there are only a few job classes and many nodes in the network, then RECAL is not very well suited (see Figs. 8.10 and 8.11). If we assume that K, = K for all T and N is fixed, then the time complexity and memory requirement grow polynomially with the number of job classes in the network. In this case, the memory requirement is:
and the time requirement in number of operations is:
0( (y-$RN+’ >. The algorithm can be extended further to analyze Type-l nodes with multiple servers and to analyze nodes with load-dependent service rates. Based on the RECAL algorithm, [CSL89] developed the MVAC algorithm (mean value analysis by chain), which directly computes the mean values of performance measures. The RECAL method was extended to the tree RECAL technique by [McKe88].
368
ALGORITHMS FOR PRODUCT-FORM NETWORKS classes (R)
18 16 14 12 10
8
(NJ fig. 8.10 Regions in the (N x R) space in which the storage requirement of RECAL is less than that of convolution, for K = 1 and 12 (on or above the curves the storage requirement of RECAL is less than that of convolution).
8.4
FLOW EQUIVALENT
SERVER METHOD
The last method for analyzing product-form queueing networks that we consider is called flow equivalent server (FES) method. The method is based on Norton’s theorem from electric circuit theory [CHW75b]. We encounter this method again in Chapter 9 when we deal with approximate solution of non-product-form networks. Here we consider this technique in the context of exact analysis of product-form networks. If we select one or more nodes in the given network and combine all the other nodes into an FES, then Norton’s theorem says that the reduced system has the same behavior as the original network. 8.4.1
FES Method for a Single Node
To describe the FES method, we consider the central-server model given in Fig. 8.12 where we suppose that a product-form solution exists. For this network we can construct an equivalent network by choosing one node (e.g., node 1) and combining all other nodes to one FES node c. As a result we get a reduced net consisting of two nodes only. This network (shown in Fig. 8.13) is much easier to analyze then the original one, and all computed performance measures are the same as in the original network. To determine the service rates ~Jlc), k = 1,. . . , K of the FES node, the chosen node 1 in the original network is short circuited, i.e., the mean service time of this node is set to zero, as shown in Fig. 8.14. The throughput in the short circuit path with k = 1,. . . , K jobs in the network is then equated to
FLOW EQUIVALENT SERVER METHOD
369
classes (R) v/c=1
22
RECAL
20 f
I’ ,,
/’
18 16 14 12 10 8 6 4 2 2
4
6
8
10
12
Fig. 8.11 Regions in the (N x R) space in which the time requirement of RECAL is less than that of convolution, for K. = 1 and 12 (on or above the curves the time requirement of RECAL is less than that of convolution).
Fig. 8.12
The central-server network.
the load-dependent service rate ,u~(Ic) , Ic = 1, . . . , K of the FES node in the reduced queueing network. The FES algorithm can now be summarized in the following three steps: In the given network, choose a node i and short-circuit it by setting the mean service time in that node to zero. Compute the throughputs XT(k) along the short circuit, as a function of the number of jobs k = 1, . . . , K in the network. For this computation, any of the earlier solution algorithms for product-form queueing networks can be used. From the given network, construct an equivalent reduced network only of the chosen node i and the FES node c. The visit ratios in both nodes are ei. The load-dependent service rate of the FES node is the throughput along the short-circuit path when there are Ic jobs in the network, that is: p,(k) = X?(k) for F; = 1,. . . , K. Compute the performance measures in the reduced network with networks (e.g., convolution or MVA).
any suitable algorithm for product-form
370
ALGORITHMS FOR PRODUCT-FORM NETWORKS
chosen
node FES node
fig. 8.13
Reduced queueing network.
Fig. 8.14
The short-circuited
model.
The technique just described is very well suited for examining the influence of changes in the parameters of a single node while the rest of the system parameters are kept fixed. To clarify the usage of the FES method, we give the following simple example:
Fig. 8.15
Yet another closed queueing network.
Example 8.10 Consider the closed single class product-form network shown in Fig. 8.15. It consists of N = 4 nodes and K = 2 jobs. The service times are exponentially distributed with parameters: fh=l,
The routing probabilities are
p2 =
2,
given
as:
pug=
3, puqz4.
0.5, $I42 = 0.4, = 0.5, p43 = 0.4,
P12 = 0.5,
P2l = O-5,
P31 =
P13 = 0.5,
p24 = 0.5,
p34
p44 = 0.2.
FLOW EQUIVALENT SERVER METHOD
371
With Eq. (7.5) we compute the visit ratios: el = 1,
e2 = 1,
e3 = 1,
e4 = 1.25.
Now, the analysis of the network is carried out in the following three steps: Choose a node, e.g., node 4 of the network shown in Fig. 8.16, and short circuit it. The throughput of the short circuit is computed for Ic = 1,2
Fig. 8.16
Network after node 4 is short circuit.
by using the MVA, for example. Then we get the following results: X(1) = 0.545,
X(2) = 0.776.
XT(l) = 0.682,
x7(2) = 0.971.
and therefore:
Construct the reduced network that consists of the selected node 4 and the FES node c as shown in Fig. 8.17. For the service rates of the FES
Fig. 8.17
Reduced network.
node we have: ,uc(l) = XT(l) = 0.682,
~~(2) = X7(2) = 0.971.
The visit ratios are: e4 = e, = 1.25.
ALGORITHMS FOR PRODUCT-FORM NETWORKS
372
Analyze the reduced network shown in Fig. 8.17 using, for example, the MVA for load-dependent nodes. The load-dependent service rates of node 4 can be computed by Eq. (7.49) as pq (1) = ,Q (2) = 4. MVA-Iteration
for Ic = 1:
T&) = -7rTTq(O 1
( 0) = 0.25,
T,(l)
=
P40)
10) = 1.467,
l-4)
X(1) = 0.466, ~~(1 1 1) = 0.146, MVA-Iteration
1 -n,(O
G( 1) = - ’ = 2.146, w n,(lll) = 0.854, 7r4(0 11) = 0.854,
7r,(OIl) = 0.146.
for Ic = 2: T&)
= -
1 P4W
x4(01
1>+
2 -7r4(l
( 1) = 0.286,
P4 (2)
2 ;?;c(a>= - 1 -ire(W) = 1.974, + -7r,(l(l) PC(1) PC(2) G(l) = 3.032, X(2) = 0.708, G(2) = x(a> ~~(112) = 0.189,
n4(212) = 0.032,
7@12) = 0.189,
7rc(2[2) = 0.779,
~(012) = 0.779, 51r,(O12)= 0.032.
Now all other performance measures of the network can be computed: l
Mean number of jobs: For node 4 we get from Eq. (7.26): IT;4 = 2
Jc’7r4(Ic12) = 0.253.
k=l
For the other nodes we have by Eq. (8.16):
l
772 = 0.436,
x3 = 0.273.
X3 = 0.708,
X4 = 0.885.
Throughputs, Eq. (8.14): = 0.708,
X2 = 0.708,
a Mean response times, Eq. (7.43):
T1= 5 = A? 1466 Xl
T2 =
0.617,
T, = 0.385.
FLOW EQUIVALENT SERVER METHOD
8.4.2
FES Method
for Multiple
373
Nodes
The usage of the FES method, when only one node is short-circuited, has nearly no advantage in reducing the computation time. Therefore [ABP85] suggest an extension of the concept of [CHW75b]. For this extension, the closed product-form network is partitioned in a number of subnetworks. Each of these subnetworks is analyzed independently from the others, i.e., the whole network is analyzed by short-circuiting the nodes that do not belong to the subnetwork that is to be examined. The computed throughputs of the shortcircuited network form the load-dependent service rates of FES node j, which represents the jth subnetwork in the reduced network. When analyzing this reduced network, we get the normalizing constant of the whole network. The normalizing constant can then be used to compute the performance measures of interest. The FES method for this case can be described in the following five steps. -1 Partition the original network into &! disjoint subnetworks SN-j (1 5 j 5 &!). It is even allowed to combine nodes in the subnetwork that do not have a direct connection to this subnetwork. -4 Analyze each of these subnetworks j = 1, . . . , M by short-circuiting all nodes that do not belong to the considered subnetwork. The visit ratios of the nodes in the subnetworks are taken from the original network. For analyzing the subnetworks, any product-form algorithm can be used. Determine the throughputs &iv-j (k) and the normalizing constants Gj (Ic), for Ic = 1, . . . , K. -C om b ine the nodes of each subnetwork into one FES node and construct an equivalent reduced network out of the original network by putting the FES nodes (with visit ratio 1) together in a tandem connection. The loaddependent service rates of the FES node j are identical with the throughputs in the corresponding jth subnetwork: I.Lcj(k) =
for j = 1,. . . ,&! and Ic = I,. . . ,E(.
b~-j(k),
r@Jm] The normalizing constant of the reduced network can be computed by convoluting the 111normalizing vectors
of the subnetworks [ABP85]: G=G1@G2cc..@GM,
(8.79)
where the convolution operator @ is defined as in Eq. (8.1). To determine the normalizing constant, the MVA for networks with load-dependent nodes (see Section 8.2.4) can be used instead.
374
ALGORITHMS FOR PRODUCT-FORM NETWORKS
[m] Compute the performance measures of the original network and the performance measures of the subnetworks with the help of the normalizing constants and the formulae given in Section 8.1. The FES method is very well suited if the system behavior for different input parameters needs to be examined, because in this case only one new normalizing constant for the corresponding subnetwork needs to be recomputed. The normalizing constants of the other subnetworks remain the same. Example again.
The queueing network from Example 8.10 is now analyzed
8.11
ar tti ion the whole network into M = 2 subnetworks where SubIW-JP network 1 contains nodes 1 and 2 and Subnetwork 2, nodes 3 and 4 of the original network.
Fig. 8.18
Subnetwork 1 with subnetwork 2 short-circuited.
Imj Analyze the two subnetworks: To analyze Subnetwork 1, nodes 3 and 4 are short-circuited (see Fig. 8.18). We use the MVA to compute for Ic = 1,2 the load-dependent throughputs and normalizing constants of this subnetwork: XsN-1(1) = 0.667, G1(l) = 1.5,
XSN-r(2) = 0.857, Gr(2) = L.75.
To analyze Subnetwork 2, nodes 1 and 2 are short-circuited
(see Fig. 8.19).
Analogously, we get the following results for Subnetwork 2: XsN-&)
MI
= 1.548,
XSNM2(2) = 2.064,
G2(1) = 0.646,
G2(2) = 0.313.
Construct the reduced network shown in Fig. 8.20.
FLOW EQUIVALENT SERVER METHOD
fig. 8.19
Short-circuit
Fig. 8.20
375
of Subnetwork 1.
Reduced network.
The first FES node describes the nodes of Subnetwork 1 with service rates /car = XSN-r(Ic) for Ic = 1,2, and the second FES node describes the nodes of Subnetwork 2 with service rates p,I;z(k) = XSN-z(JG). The normalization using Eq. (8.79):
constants of the reduced network are computed
By using Eq. (8.1) we get: G(l) = G(O)G(l)
+ Gr(l).Gz(0)
= 2.146,
G(2) =
+ G(l)G(l)
+ Gr(2).G2(0)
WWa(2)
= 3.032.
Compute the performance measures: Mean number of jobs at each node is computed using, Eq. (8.16): = 1.038,
E2 = 0.436,
r3 = 0.273,
hr4 = 0.253. As we can see, the same results are obtained as in Example 8.10. The FES method can also be applied to the case of multiple class queueing networks. The method is very flexible in the need for resources and lies between the two extremes: MVA and convolution. This flexibility is based on how the subnetworks are chosen.
376
ALGORITHMS FOR PRODUCT-FORM NETWORKS
Table 8.5 Storage requirement in number of storage elements and time requirement in number of operations [CoGe86] Method
Storage Requirement
Convolution
Time Requirement O(2. R(N - 1). fi (K, + 1))
O( fi WY + 1)) r=l
r=l
MVA
= nN+l + 1 o$-=-ijT-
RECAL* FES
> O(3 * &K,
O(2.
R(N - 1). &G
-=-L(R(N O( (N + l)!
+ 1)) + 1))
i=2O(2 * R(N - 1). fi (K, + 1)) ?-=l
+ 1))
* Kr=KforT=l,...,R
8.5
SUMMARY
As shown in Table 8.5, the time requirement for convolution, MVA and FES is approximately the same and differs for RECAL in dependency of the number of nodes and number of classes (see also Fig. 8.11). Also, the storage requirement depends greatly on the number of nodes and classes for RECAL compared to the other methods (see Fig. 8.10). For larger numbers of nodes, MVA is much worse than FES and convolution. For FES, the storage requirement depends also on the number of subnetworks. The larger the number of subnetworks, the smaller is the storage requirement. In Table 8.6 the main advantages and disadvantages of the convolution algorithm, the MVA, the RECAL met hod, and the FES method are summarized. For the closed queueing network of Problems 7.3 and 7.4, Problem 8.1 compute the performance measures using the convolution method. Problem 8.2 For a closed queueing network with N = 2 nodes and R = 2 job classes, compute the performance measures for each class and node using the convolution method. Node 1 is of Type-3 and node 2 is of Type-2. Each class contains two jobs Ki = K2 = 2 and class switching is not allowed. The service rates are given as follows:
PII = 0.4sec-l,
p21 = 0.3seCl,
~12 = 0.2sec-‘,
p22 = 0.4sec-1,
The routing probabilities are: Pll,ll
= p11,21 =
0.5,
P21,ll
= p22,12 = p12,z
=
1.
Problem 8.3 Compute performance measures of the network in Problems 7.3 and 7.4 again, with the MVA.
SUMMARY
Tab/e 8.6
Comparison
of the four solution
Method
methods
for product-form
377
networks
Advantages
Disadvantages
MVA
Mean values can be computed directly without computing the normalizing constant. But if required, the normalization constant and hence state probabilities can also be computed.
High storage requirement (for multiple classes and multiple servers) Overflow and underflow problems when computing the marginal probabilities for -/M/m nodes
Convolution
Less storage requirement MVA
than
Normalizing constant (NC) has to be computed Overflow or underflow problems when computing the NC
RECAL
Less storage and time requirement for a large number of job classes and nodes than MVA and convolution
More storage and time requirement for small number of job classes and nodes than MVA and convolution
FES
The time and storage requirement is reduced considerably when examining the influence of changing the parameters of an individual or a few nodes while the rest of the system parameters remain unchanged Basis for solution techniques for non-product-form networks
Multiple application of convolution or MVA Network needs to be transformed
Problem 8.4 For a mixed queueing network with N = 2 Type-2 nodes and R = 4 classes (see Fig. 8.21), compute all performance measures per node and class. The system parameters are given as follows: X1 = 0.2 see-‘,
X2 = 0.1 set-‘,
pll = 1.5sec-l,
~12 = lsec-l,
~21 = 2 set-‘,
p22
P21,ll
= P22,12
PO,11=
PO,12
= P21,o
=
Pll,Zl
=
~13 =
= 0.5 set-l,
= P22,o = Pl2,22
=
K2 = KS = 2,
/L23
3sec-l,
= 2 set-l,
~14 = /L24
2sec-l,
= 1 set-l,
0.5. P13,23
=
P14,24
=
P23,13
=
P24,14
= 1.
Problem 8.5 Consider a closed queueing network, consisting of a Type-3 node and a Type-l node (m2 = 2). This model represents a simple twoprocessor system with terminals. The system parameters are given as: 1 1 K = 3, ~12 =pal = 1, - = 2sec, - = lsec. p2 Pl. (a) Apply the mean value analysis to determine the performance measures of each node, especially throughput and mean response time of the terminals.
378
ALGORITHMS FOR PRODUCT-FORM NETWORKS
Source cl ‘- t --w I I L
-
q&-@-~--, I
i cl Sink
Fig. 8.21
Mixed queueing network.
Q4 Determine
for both nodes the load-dependent service rates and analyze the network again, using the load-dependent MVA.
Problem 8.6 Compute the performance measures for the network given in Problem 8.2 using the RECAL algorithm. Problem 8.7 Consider a simple central-server model with N = 3 Type-l nodes (mi = 1) and K = 3 jobs. The service rates are given as follows: pl = lsec-l,
p2 = 0.65 set-‘, p3 = 0.75 set-‘,
and the routing probabilities are: ~12 = 0.6,
~21 = ~31 =
1,
p13 =
0.4.
Combine nodes 2 and 3 into a FES node. Use the FES method and compare the results with the results obtained without the use of FES method. Consider a closed queueing network with N = 4 Type-l Problem 8.8 nodes and K = 3 jobs. The service rates are: p1 =
4sec-l ? p2 = 3sec-l,
p3 =
2sec-l
7
CL4
= lsec-l,
and the routing probabilities are: p23 =p24
= 0.5,
P12 = P31 = p41 =
1.
For computing the performance measures, use the extended FES method by combining nodes 1 and 2 in subsystem 1, and nodes 3 and 4 into subsystem 2.
9 Approximation Algorithms for Product-Form Networks In Chapter 8, several efficient algorithms for the exact solution of queueing networks are introduced. However, the memory requirements and computation time of these algorithms grows exponentially with the number of job classes in the system. For computationally difficult problems of networks with a large number of job classes, we resort to approximation methods. In Sections 9.1, 9.2, and 9.3 we introduce methods for obtaining such approximate results. The first group of methods is based on the MVA. The approximate methods that we present need much less memory and computation time than the exact MVA and yet give very accurate results. The second approach is based on the idea that the mean number of jobs at a network node can be computed approximately if the utilization pi of this node is known. The sum of all the equations for the single nodes leads to the so-called system equation. By solving this equation we get approximate performance measures for the whole queueing network. In some cases it is enough to have upper and lower bounds to make some statements about the network. Therefore, the third approach (Section 9.4) deals with methods on how to get upper and lower bounds on performance measures of a queueing network. In Section 9.5 we introduce a method that allows intervals as input parameters. For other techniques that have been developed to analyze very large networks see [ChYu83, SLM86, HsLa89].
379
380
9.1
APPROXIMATION ALGORITHMS FOR PRODUCT-FORM NETWORKS
APPROXIMATIONS
BASED
ON THE
MVA
The fundamental equation of the MVA (8.31) describes the relation between the mean response time of a node when there are k jobs at the node and the mean number of jobs in that node with one job less in the network. Therefore, to solve a queueing network it is necessary to iterate over all jobs in the network, starting from 0 to the maximum number K. For this reason, an implementation of the MVA requires a lot of computation time and main memory. For multiclass queueing networks with a large number of jobs in the system, we will very quickly reach the point at which the system runs out of memory, especially if we have multiserver nodes. An alternative approach is to approximate the fundamental equation of the MVA in a way that the mean response time depends only on K and not on K - 1. Then it is not necessary to iterate over the whole population of jobs in the system but we have to improve only the performance measures iteratively, starting with an initial vector. This approach leads to a substantial reduction in computation time and memory requirement. In the following, two techniques based on this idea, are introduced. 9.1.1
Bard-Schweitzer
Approximation
Bard and Schweitzer [Bard79, Schw79] suggested an approximation of the MVA for single server queueing networks that is based on the following idea: Starting with an initial value Ei,(K) for the mean number of class-r jobs at node i for a given population vector K, make an estimate of the mean number of jobs for a population vector (K - 1,). This estimating needs to be done for all classes s. These Ki,(K - ls) estimates are then used in the MVA to compute the mean response times, and only one iteration step is now needed to evaluate the performance measures for a given population vector K. In particular, new values for the K;,(K) are computed. The values we get are used to estimate the K+(K - 1,) again. This iteration stops if the values of the Ki,.(K) in two iteration steps differ by less than the chosen error criterion E. Obviously, the problem with this MVA approximation is how to estimate the mean value xi, (K - I,), given xi, (K) . An approximate formula for the case of a very large network population was suggested by [Schw79] since in this case a reasonable explanation can be given as to why the mean number of jobs of class T at node i remains nearly the same if the number of jobs in the network is reduced by one. For the mean number of class-r jobs at node i given a population vector (K - ls), they suggest the approximation:
x,r(K
-
1s)
=
cK
- 'd,( K r
zr
(K)
y
(94
APPROXIMATIONS
BASED ON THE MVA
381
where:
P-2)
:fE' ,
'
gives the number of class-r jobs when the network population K is reduced by one class-s job. If we assume that the jobs are initially equally distributed over the whole network, then the Bard-Schweitzer approximation can be described in the following steps: 1-1
For i = 1,. . . , N and s = 1,. . . , R:
Initialization.
r?,,(K)
= $.
[wa For all i = 1,. . . , N and all r, s = 1,. . . , R, compute the estimated values for the mean number of jobs for the population vector (K - 1,): Kis(K
- 1,) = (K G1””
*Eis(K).
S
-\A na 1yze the queueing network for a population vector K by using one iteration step of the MVA.
I STEP 3
1 For i = 1, . . . , N and T = 1, . . . , R, compute the mean response
times:
1 +&i,(K
-
- 1,)
s=l
I1
j&r
I STEP 3
1 ,
Type-l&t
(mi = I),
Type-3. ’
2 Forr=l,...
, R, compute the throughputs: X,(K)
=
N
Kr
.
C ?Tir(K)eir i=l
I STEP 3
3 For i = 1, . . . , N and T = 1, . . . , R, compute the mean number
of jobs:
K,,.(K)
= Ti,.(K)X,.(K)ei,.
I/ Check the stopping condition: If there are no significant changes in the xi, (K) values between the nth and (n - 1)th iteration step, that is, when: max K!:)(K) i,r
- K~~-l)(K)~
< E,
382
APPROXIMATION ALGORITHMS FOR PRODUCT-FORM NETWORKS
for a suitable E (here the superscript cn) denotes the values for the nth iteration step), then the iteration stops and all other performance values are computed. If the stopping criterion is not fulfilled, then return to Step 2. This approximation is very easy to program and faster than the exact MVA. The memory requirements are proportional to the product of N and R. Therefore, this approximation needs considerably less memory than the exact MVA or the convolution method, especially for networks with a large number of job classes. A disadvantage of this approximation is that networks with multiple server nodes cannot be solved using it. We give a simple example of the use of the algorithm: We revisit the single class network of Example 8.2 (Fig. 8.4) Example 9.1 with network parameters as shown in Table 9.1. For & we choose the value 0.06. The analysis of the network (K = 6) is carried out in the following Table 9.1 Input parameters for Example 9.1. i
ei
l/Pi
1 2 3 4
1 0.4 0.2 0.1
0.02 0.2 0.4 0.6
7%
1 1 1 1
steps: Initialization: 17,(K) = K&x-)
= 11;;#iy = T-c@) = g =l&
1. Iteration: Estimated values, Eq. (9.1): Ir,(K
- 1) = ~.7qK)
rjz(K
- 1) = K&c
= 1.25, - 1) = Kq(K
- 1) = 1.25.
One MVA iteration: 1-1 ?;@c) =;
Mean response times: [1+ Ei(K
- 1)] = 0.045,
T2(K) T/&+-$1
= ;
[I + 1T,(K - l)] = 0.45, +i7,(K
- l)]
= 1.35.
APPROXIMATIONS
-4
BASED ON THE MVA
Throughput:
X(K)= 4 K
= 11.111.
C eiTi(K) i=l
I
Mean number of jobs: xl(K)
= Fl(K)X(K)el
= 0.5,
E,(K)
= Tz(K)X(K)ez
= 2,
x,(K)
= Ts(K)X(K)es
= 2,
zd(K)
= FA(K)X(K)eA
= 1.5.
mj
Check the stopping condition: max IKjl)(K) i
- $“(K)l
> 0.06.
= 1
2. Iteration: Estimated values: xl(K
- 1) = y.
0.5 = 0.417,
K2(K - 1) = 1.667, mi?ma
K3(K - 1) = 1.667,
K4(K
- 1) = 1.25.
One MVA Iteration: ?&(K) = ;
[l+
T2(K) = 0.533, X(K) =
4
K1(K - l)] = 0.028 Fs(K)
= 1.067,
T4(K) = 1.35,
= 10.169,
K
C eiTZ(K) i=l
Lmj
XI(K)
= Tl(K)X(K)el
x2(K)
= 2.169,
= 0.288,
Es(K)
= 2.169,
r4(K)
= 1.373.
Stopping condition: max E{“’ (K) - $l’(K) i
1 = 0.212
> 0.06.
3. Iteration: [w]
Estimated values: j&(K E,(K
- 1) III y.
0.288 = 0.240, - 1) = 1.808, K3(K - 1) = 1.808,
c4(K
- 1) = 1.144.
383
APPROXIMATION
384
ALGORITHMS FOR PRODUCT-FORM NETWORKS
MVA Iteration: T&T)
= 0.025,
T2(K)
= 0.562,
T3(K)
= 1.123,
??#i-) = 1.286,
X(K) = 9.955, ??,(A-) = 0.247, wm@
K2(lX-) = 2.236, I,
= 2.236, x4(K)
= 1.281.
Stopping condition: max zd3’(K) i
- $“‘(K)I
= 0.092
> 0.06.
4. Iteration:
li;,(K
- 1) = 0.206,
K2(K
- 1) = 1.864,
K3(K
- 1) = 1.864,
??.*(K - 1) = 1.067.
[y?#p@gj T&T) = 0.024, X(K) = 9.896,
T2(K)
= 0.573,
T3(K)
= 1.145,
T4(K)
= 1.240,
77,(K)
K2(K)
= 2.267,
x3(K)
= 2.267,
E4(K)
= 1.227.
= 0.239,
m max $‘)(K) i
- f?i3’(K)I
= 0.053
< 0.06.
Now, the iteration stops and the other performance measures are computed. For the throughputs of the nodes we use Eq. (8.40): X1 = X(K). el = 9.896, For the utilization
X2 = 3.958,
X3 = 1.979,
X4 = 0.986.
of the nodes we get:
pl = K = 0.198,
p2 = 0.729,
p3 = 0.729,
p4 = 0.594.
The final results are summarized in Table 9.2. A comparison with the values from Example 8.2 shows that the BardSchweitzer approximation yields results that in this case are very close to the exact ones. The storage requirement of the Bard-Schweitzer approximation depends only on the number of nodes N and the number of classes R (= O(N x R)) and is independent of the number of jobs. This is much smaller than those for convolution and MVA.l The accuracy of the Bard-
lRecal1 0
N.
that fi r-=1
the storage
p&+1)
>
requirements
) respectively.
for convolution
and MVA
are 0
fi (K, r=l
and
+ 1) >
APPROXIMATIONS
Tab/e 9.2
BASED ON THE MVA
385
Bard-Schweitzer approximation for Example 9.1 1
2
3
4
0.024 9.896 0.239 0.198
0.573 3.958 2.267 0.729
1.145 1.979 2.267 0.729
1.240 0.986 1.240 0.594
Node: Mean response time T, Throughput A, Mean number of jobs E; Utilization pZ
Schweitzer approximation is not very good (the mean deviation from exact results is approximately 6%) but is sufficient for most applications. An essential assumption for the Bard-Schweitzer method is that the removal of a job from a class influences only the mean number of jobs in that class and not the mean number of jobs in other classes (see Eq. (9.1)). This assumption is not very good and we can construct examples for which the computed results deviate considerably from the exact ones. In Section 9.1.2, the self-correcting approximation technique (SCAT) is introduced, which does not have this disadvantage. Of course, the algorithm is more complex to understand and implement than the Bard-Schweitzer approximation. 9.1.2
Self-Correcting
Approximation
Technique
Based on the basic idea of Bard and Schweitzer, Neuse and Chandy [NeCh81, ChNe82] developed techniques that are an improvement over the Bard-Schweitzer approximation [ZESSS]. Th e main idea behind their approach is demonstrated on queueing networks consisting only of single server nodes. 9.1.2.1 Single Server Nodes The SCAT algorithm for single server nodes [NeCh81] gives better results than the Bard-Schweitzer approximation, especially for networks with small populations, because in the SCAT approximation we not only estimate the mean number of jobs but also the change in the mean number of jobs from one iteration step to the next. This estimate is used in the approximate formula for the Ei, (K - 1,). We define F&. (K) as the contribution of class-r jobs at the ith node for a given population vector K as: &r(K)
=
E,,(K) K. r
If a class-s job is removed from the network, Dir,(K) contribution: Dirs(K)
= Fir(K =
Fir(K
gives the change in this
- 1,) - Fir(K) -
1s)
(K - l,),
Fir(K)
- K,’
(94
386
APPROXIMATION
ALGORITHMS FOR PRODUCT-FORM NETWORKS
where (K - l,), is defined as in Eq. (9.2). Because the values Eir(K - 1,) are unknown, we are not able to compute the values Dirs(K) as well. Therefore we - estimate values for the difference D+,(K) and then we approximate the K,,(K - 1,) values by the following formula based on Eq. (9.4):
%r(K - 1s)= (K - L)r~ [Fir(K) + Dim(K)].
(9.5)
Note that if Dirs(K) = 0 we get the Bard Schweitzer approximation. The core algorithm of SCAT approximation needs the estimated values for Di,,(K) as input and, in addition, estimates for the mean values C,,.(K). The core algorithm can be described in the following three steps:
1STEP Cl 1 For i = 1,. . . , N and T, s = 1,. . . , R, compute with Eq. (9.5) the estimated values for the mean number xi, of jobs for the population (K - 1,). 1 STEP C2 1 Analyze the queueing network by using one iteration step of the MVA as in Step 3 of the Bard- Schweitzer algorithm. 1 STEP C3 ] Check the stopping condition. If: z;:)(K) max = i,r
- $-l’(K)1 KT
< E,
then stop the iteration, otherwise return to Step Cl (here the superscript cn) denotes the results of the nth iteration step), The use of & = (4000+ 16(KI)-l as a suitable value is suggested by [NeCh81]. The SCAT algorithm produces estimates of the differences DirS needed as input parameters of the core algorithm. So the core algorithm is run several times, and with the results of each run the differences Dir,(K) are estimated. The estimated differences Dirs(K) are computed by the SCAT algorithm, hence the name self-correcting approximation technique. To initialize the method, the jobs are distributed equally over all nodes of the network and the differences DirS (K) are set to zero. The SCAT algorithm can be described by the following four steps: m Use the core algorithm for the population vector K, and input values K+(K) = Kr/N and Dirs (K) = 0 for all i, r, s. The auxiliary values of the core algorithm are not changed by the SCAT algorithm. Use the core algorithm for each population vector (K - lj) for j = 1 * * 7R, with the input values Kir(K - lj) = (K - lj)r/N and Dirs(K 1;; = 0 for all i, r, s. ~ For all i = 1,. . . , N and r, s = 1,. . . , R, compute the estimated values for the Pi,(K) and &(K - l,), respectively, from Eq. (9.3) and the estimated values for the Dirs(K) from Eq. (9.4).
APPROXIMATIONS
BASED ON THE MVA
387
/%$$$$&I Use the core algorithm for the population vector K with values l?i7.(K) computed in Step 1, and values Dirs(K), computed in Step 3. Finally, compute all other performance measures using Eqs. (7.21)-(7.31). The memory requirement and computation time of the SCAT algorithm grows as O(N. R2) and produces, in general, very good results. Now, the algorithm is used on an example: Consider the central-server model of Example 8.3 again, but Example 9.2 now we use the SCAT algorithm to solve the network. L-1 Use the core algorithm with K = 6 jobs in the network and the input data ??i (K) = K/N = 1.5 and Di (K) = 0, for i = 1, . . . ,4. With these assumptions and E = 0.01, the usage of the core algorithm will produce the same results as the Bard-Schweitzer algorithm as in Example 9.1. Therefore, we use those results directly: T2(6)
= 0.573,
T3(6)
= 1.145,
T4(6)
= 1.240,
X2(6)
= 2.267,
3?3(6) = 2.267,
K4(6)
= 1.227.
?;1(6) = 0.024, X(6) = 9.896, K1(6)
[m]
= 0.239,
Use the core algorithm for (K - 1) = 5 jobs and the inputs 1?,(5) = = 0, for i = 1,. . . ,4.
5/N = 1.25 and Di(5) -1
- 1) from Eq. (9.5):
Compute the estimated values for the K,(K Ki(4)
=
4.
(i(5) --g-
=
1
for i = 1,...,4.
1 STEP C2 1 One step of MVA: T1(5)
= i
[l + K1(4)]
= 0.04,
= 10.417,
K2(5)
(1
izzl = 1.667,
K3(5)
T2(5) Z,(5)
= 1.667,
= 0.4,
T3(5)
= T1(5)X(5)el
E4(5)
= 0.8, = 0.417,
= 1.25.
Check the stopping condition: pTp(5)-Kp(5))
max i
5
=0166 L
The iteration must be started again with Step Cl.
>&.
T4(5)
= 1.2,
APPROXIMATION ALGORITHMS FOR PRODUCT-FORM NETWORKS
388
For the sake of brevity, we do not present all the other iterations but give the results after the last (fourth) iteration: T3(5) = 0.989,
T1(5) = 0.024, X(5) = 9.405,
T2(5) = 0.495,
K1(5) = 0.222,
x2(5) = 1.861, &(5)
T4(5) = 1.123,
= 1.861, x4(5) = 1.056.
Compute with Eq. (9.4) th e estimated values for the differences Di(K): El (5) E,(S) 0.222 0.239 4 7. 1o-3 = 5 - = - = , 5 6 ’ 6 DQ(6) = m5) --j- K2(6) = -5 7 10-3 6 ** ’ X3(5) E3(6) -5 7 10-3 D3(6) = 5 - = 6 ” ’
&(6)
D4(6) = x4(5) --g-
- li;4@) ~ 6
=
6 72
lo-3
”
*
Use the core algorithm for K = 6 jobs in the network with the Ki (K) values from Step 1 and the Di(K) values from Step 3 as inputs. STEP cl
Estimated values: %(6) K1(5) = 5 * 6 X3(5)
[I
+ Dl(6)
= 1.861,
1
= 0.222,
E,(5)
= 1.861,
E4(5) = 1.056.
One step of MVA: Tl(6) = i
[l + fT1(5)] = 0.024,
T2(6) = 0.572,
T3(6) = 1.144,
T4(6) = 1.234,
X(6) = 9.908, x1(6) = Fl(6)X(6)el
= 0.242,
E,(6)
= 2.267, F3(6) = 2.267,
K4(6) = 1.223. -1
Stopping condition:
IK36) - $“(6) 1 = 8.128. 1O-3 < &, max i 6 is fulfilled and, therefore the iteration stops and the preceding results are the final results from the SCAT algorithm. The exact results (MVA) are (see Example 8.3) : K1(6) = 0.244,
G(6)
= 2.261,
K3(6) = 2.261,
z(6)
= 1.234.
APPROXIMATIONS
BASED ON THE MVA
389
9.1.2.2 Multiple Server A/odes If we wish to analyze networks with multiple server nodes, then from the MVA analysis, Eq. (8.31), we need not only the l?;i, values but also the conditional marginal probabilities ni(j 1 K - 1,). Similarly, for the SCAT algorithm, we need estimates not only for the values Ki,(K - ls) but also for the probabilities ni(j 1 (K - 1,). In [NeCh81] a technique is suggested to estimate the marginal probabilities by putting the probability mass as close as possible to the mean number of jobs in the network. Thus, for example, if 7?i(K - 1) = 2.6, then the probability that there are two jobs at node i is estimated to be 7ri(2 1 K - 1) = 0.4, and the probability that there are three jobs at node i is estimated to be ri(3 1 K - 1) = 0.6. This condition can be described by the following formulae. We define: Ceilingi, = 1Ei(K
- 1,)1 ,
flOOrir = 1ITi(K - lr)]
,
(9.6) (9.7)
with:
and estimate: ri(floori, ~i(ceiling;, ri(j
1K - 1,) = ceilingi, - E(i (K - 1,), ) K - lT) = 1 - ri(floori, ) K - 1,), I K - 1,) = 0 for j < floori, and j > ceiling;,.
(9.8) W) (9.10)
Now we need to modify the core algorithm of SCAT to analyze networks with multiple server nodes. This modified core algorithm, which takes suitable estimates of Dirs (K) and xiv as inputs, can be described in the following steps: Cl 1 For - i = 1,. . . , N and r, s = 1,. . . , R, compute estimates for the mean values Ki, (K - ls) using Eq. (9.5). Compute using Eqs. (9.7)-(9.10), for each multiple server node i, the estimated values of the marginal probabilities r;(j 1K - l,),j = 0,. . . , (mi - 2) and r = 1,. . . , R. (-1 Analyze the queueing network using one step of the MVA via Eqs. (8.38), (8.39), and (8.41). [STEP]
Check the stopping condition: E;;)(K) max i,r
- ??,‘,“-l’(K) K-
/ < e.
If this condition is not fulfilled, then return to Step Cl.
390
APPROXIMATION ALGORITHMS FOR PRODUCT-FORM NETWORKS
The SCAT algorithm, which computes the necessary estimated input values for the differences Dirs(K) and mean values Kir(K) for the core algorithm, differs from the algorithm for networks with single server nodes only in the usage of the modified core algorithm, as follows:
all Dir,(K)
Use the modified core algorithm for the population vector K, where values are set to 0. Use the modified
core algorithm
for each population
vector
(K - lj). Compute the Fi,, and Dirs values from the previous results (see Eqs. (9.3) and (9.4)). Use the modified core algorithm with the computed F and D values for the population vector K. The closed network from Example 8.4 (Fig. 8.6) is now anaExample 9.3 lyzed with the SCAT algorithm. The network contains N = 4 nodes and K = 3 jobs. The network parameters are given in Table 9.3. The analysis of the network is carried in the following four steps: Table 9.3
Input parameters for the net-
work of Example 9.3 i
ea
1 2 3 4
1 0.5 0.5 1
l/Pi
mi
0.5 0.6 0.8 1
2 1 1 00
The modified core algorithm for K = 3 jobs in the network with the input parameters ??i (K) = K/N = 0.75 and Di (K) = 0 for i = 1, . . . ,4 is executed as follows: (ml
Estimate values for the xi(K Ki
E.;(2) = 2 * 3
(3)
=O.J
- 1) from Eq. (9.5): for i=
1,...,4.
Estimate values for x1(0 1K - 1). B ecause of floor1 = 1x1 (K - l)] = Q and ceiling, = floor1 + 1 = 1, we get: nr(O 12) = ceiling, - Kr(2)
= 0.5.
APPROXIMATIONS
I]
BASED ON THE MVA
391
One step of MVA: ml-2
%(3) = +--&
+ c (ml - j - 1). m(j 12) j=o
1 -t%(2) [
1
= pm1
ml - l)q(O
[1+X1(2)+( -
%(3) = ;
[l +
T4(3) = $ = 1.
X(3) =
4
) 2)] = 0.5,
Ts(3) = ;
= 0.9,
K2(2)]
1
[l +Ks(2)]
= l.2,
= 1.176,
3
C G(3) i=l
F,(3)
= T1(3)X(3)el
Ks(3) = 0.706, [STEP]
= 0.588,
&(3)
x2(3) = 0.529,
= 1.176.
Check the stopping condition: /fT,(l)(3) - FiO’(3)) max i
-1
= 0.142 > E = 0.01.
3
Estimate values for the Ki(K
- 1):
= 0.392,
K2(2) = 0.353, r,(2)
Ks(2) = 0.471,
= 0.784.
Estimate values for nr(0 1 K - 1). With floor-r = L??r(K - l)] ceiling, = floor1 + 1 = It, we get: 7rr(O ( 2) = ceiling, - Kr(2) -1
= Q and
= 0.608.
One step of MVA:
After three iteration steps, we get the following results for the mean number of jobs: x,(3)
= 0.603,
K2(3) = 0.480,
Ka(3) = 0.710,
17,(3) = 1.207.
The modified core algorithm for (K - 1) = 2 jobs in the network with the input parameters K,(2) = 2/N = 0.5 and Di(2) = 0 for i = 1,. . . ,4 is executed. After three iteration steps, for the mean number of jobs we have: 37,(2) = 0.430,
p2(2) = 0.296,
Ks(2) = 0.415,
x4(2) = 0.859.
392
APPROXIMATION ALGORITHMS FOR PRODUCT-FORM NETWORKS
Determine the estimates for the I$ and Di values using Eqs. (9.3) and (9.4)) respectively: K(3)
Ji(3)
= y-
= 0.201,
m) Pg2) = y--
h(3)
K(3) = -j-
= 0.160,
F2(2) = 172(2) 2 = 0 . 148,
F3(3)
E3(3) = 0.237, = y-
F4(3)
= 3
K4
(3)
= 0.402,
= 0 .215 )
F3(2) = K3P) 2
= 0 .208 ,
F4(2) = X4(2) 2
= 0 .430 ,
Therefore we get: D1(3) = Fl(2) - Fl(3) = 0.014, &(3) = &(2) - F2(3) = -0.012, D3(3) = F3(2) - 5’s(3) = -0.029, D4(3) = F4(2) - F4(3) = 0.027. Execute the modified core algorithm for K = 3 jobs in the network with values Ki(K) from Step 1 and values Di(K) from Step 3 as inputs. I]
Estimate values for the IIi(K Z(3) yj[ J&(2) = 0.415, r;;,(2) =2
+ D1(3)
- 1):
1
= 0.430,
z,(2)
= 0.296,
?;;4(2) = 0.859.
Estimate value for: floor1 = [Kr (2)J = 0,
ceiling1 = floor1 + 1 = I.,
slrr(O 12) = ceiling1 - ‘I?, (2) = 0.570.
After two iterations, results:
the modified core algorithm yields the following final
T1(3) = 0.5, X(3) = 1.224,
Tz(3) = 0.776,
x,(3)
z,(3)
= 0.612,
T3(3) = 1.122, T4(3) = 1,
= 0.475, x3(3) = 0.687, x4(3) = 1.224,
The exact results are (see Example 8.4): T1(3) = 0.512, X(3) = 1.217, KI(3)
T2(3) = 0.776,
T3(3) = 1.127, T4(3) = IL,
= 0.624, Kz(3) = 0.473, K3(3) = 0.686, x4(3) = 1.217.
APPROXIMATIONS
BASED ON THE MVA
393
9.1.2.3 Extended SCAT Algorithm The accuracy of the SCAT algorithm for networks with multiple server nodes depends not only on the estimates of the ?i;i, (K - 1 s) values, but also on the approximation of the condition al marginal probabilities 7ri(j ] K - 1,). Th e suggestion of [NeCh81], to distribute the whole probability mass only between the two values adjacent to the mean is not very accurate. If, for example, ??i(K- IT) = number of jobs K,(K-l,), 2.6, then the probability of having 0, 1, 4, and 5 jobs at the node cannot be neglected. Especially in the case when values ??i(K - 1,) are close to rni (number of identical service units at the ith node), the discrepancy can be large. In this case, all relevant probabilities are zero because the values ~(j ( K - 1,) f or computing the mean response time are determined for j = 0 * * 7(mi - 2) only. In [AkBo88 a] an improvement over this approximation of-the conditional probabilities is recommended. In Akyildiz and Belch’s SCAT scheme, the marginal probabilities are spread over the whole range . . . ,2 \Ki(K)] + 1. They define the following two functions, 0 * * 7K,(K-l,), the first function, PR, being a scaling function with:
PR(1) = Q PR(n) = pmPR(n - 1)
for n = 2,. . . , rn~x{~~},
where the authors give 45 and 0.7, respectively, as the optimal values for ctr and p. The second function W is a weighting function that is defined for all i = l,... , max(K,) as follows: W(O,O) = 1, W(i, j) = W(i - 1, j) - wti
- Lww~) 100
7
for j = 1,. . . ) (i _ 1)
7
i-l W(i,
i) = 1 - c W(6.i). j=O
The values of the weighting function W(i, j), j = 0,. . . ,5 are given in Table 9.4. Table 9.4
Values of the weighting function W
i
W(i, 0)
W(i, 1)
W(i, 2)
W(i, 3)
W(i, 4)
W(i, 5)
0 1 2 3 4 5
1.0 0.55 0.377 0.294 0.248 0.222
0.45 0.308 0.240 0.203 0.181
0.315 0.245 0.208 0.185
0.221 0.187 0.166
0.154 0.138
0.108
In addition to these two functions, we need the values of floor, ceiling, and maxval, where floor and ceiling are defined as in Eqs. (9.7) and (9.6) and
394
APPROXIMATION
maxval
is defined
ALGORITHMS FOR PRODUCT-FORM NETWORKS
as follows: maxvali,
= min(2 flooriT + 1, mi).
(9.11)
The term maxval describes the maximum variance of the probability mass. To compute the conditional probabilities, we divide the probability mass into (floori, + 1) pairs, where the sum of each of these pairs has the combined probability mass W(i, j). Formally this computation can be described as follows: l
For j III 0 are j jobs’at 7ri (floori,
.
. , floor+, compute the ith node:
- j ] K - 1,) = W(floorir,
l-dist
l
For j = ceiling;, ni(j
with l
the cond it ion al probabilities
upperval upperval
upperval
= ceiling;,
lowerval
= flooriT - 1-dist = j.
, . . . , maxvali,,
there
- Zi(K - lT) - lowerval
’ (9.12)
= flooriT - j, + 1-dist,
we have:
1K - lT) = W(fl oar;,, u-dist) u-dist
l-dist)
that
- ni(floori,
- u-dist
] K - l,), (9.13)
= j - ceiling+.
For j > maxvali,
we get: ri(j
1K - 1,) = 0.
This approximation assumes that the Ki(K - lr) values are smaller than KT/2. If any of these values is greater than (K, - 1)/2, then we must also make sure that the value of upperval is not outside the range 0, . . . , (Kr - 1). In this case upperval is set to (K, - 1). More details of this SCAT technique can be found in [AkBo88a].
Example 9.4 Consider the network shown in Fig. 9.1. To compute the marginal probabilities we apply the improved technique of [AkBo88a]. The service time of a job at the ith node, i = 1,2, is exponentially distributed with rates ~1 = 0.5 and ,Q = 1. The queueing disciplines are FCFS and IS, respectively. There are K = 3 jobs in the network and ei = e2 = 1. m Execute the modified and the input parameters Ei(K)
core algorithm for K = 3 jobs in the network = K/N = 1.5 and Di(K) = 0 for i = 1,2.
APPROXIMATIONS
Fig. 9.1
(sTEpc1]
BASED OAJ THE MVA
395
A model of a multiprocessor system.
Estimate values for the Ki(2), Eq. (9.5): Jw) =l i=12 Ki(2) = 2 * yj9 * -7 ( )
Estimate value for nr(0 1K - 1). Since: floor1 = LKr (2)J = IL, ceiling1 = floor1 + 1 = 2, maxvalr = min(2. floor1 + 1,mr) = 2, ~r(0 12) is determined with: l-dist = floor1 - 0 = IL, upperval = ceiling, + 1 = 3, lowerval = floor1 - 1 = 0, from Eq. (9.12): 7rl(O 12) = W(l,l) I/
* 3-W2)3
=rJ
* .
One iteration step of the MVA:
After four iterations, we get the following results: Kr(3)
= 2.187,
x2(3) = 0.813.
Apply the modified core algorithm for K = 2 jobs with the input parameters Ki(2) = 1 and Di(2) = 0, for i = 1,2. msT”Ep
Estimate values for the xi(l):
396
APPROXlMATiON
ALGORITHMS FOR PRODUCT-FORM NETWORKS
Estimate values for nr(0 1 1). With: floor1 = LX,(l)]
=Q,
ceiling1 = floor1 + 1 = 1,
maxvalr = min(2. floor1 + 1, ml) = 1, we determine: 1-dist = Q,
upperval = Il,
lowerval = 0,
and use these in Eq. (9.12) to compute: 7rl(O 1 1) = W(O,O) * l-ml) -1
=u
1
. .
One iteration step of the MVA:
After two iterations, we get the following results: 1?;1(2) = 1.333, -1
E,(3)
= 0.667.
Estimate the difference with Eq. (9.4): Dr(3) = Fr(2) - Fr(3) = -0.062,
02(3) = Fz(2) - Fz(3) = 0.062.
Apply the modified core algorithm
for K = 3 jobs in the net-
work: -1
Estimate values for the zi(2): ??r(2) = 1.333,
z2(2) = 0.667.
Estimate value for ~r(0 ) 2). With: floor1 = [;ilr (Z)] = 1,
ceiling, = 2,
maxvalr = 2,
we determine: 1-dist = 1,
upperval = 3,
lowerval = 0,
and, therefore: n1(0 12) = W(l,l)* 1-1
3 - F,(2)
One iteration step of the MVA:
= 0.25.
SUMMATION METHOD
397
Table 9.5 Storage requirement for convolution, MVA, Bard-Schweitzer, and SCAT Algorithms Storage Requirement
Methods Convolution
Oc&~
MVA
OW.
+ 1))
r&G-
Bard-Schweitzer
O(N . R)
SCAT
O(N . R2)
+ 1))
After two iterations, we get the final results: Tr(3) = 2.57,
TZ(3) = 1,
X(3) = 0.84,
x1(3)
= 2.16,
&(3)
= 0.84.
A comparison with the exact values shows the accuracy of this technique for this example: Ti(3)
= 2.45,
Tz(3) = 1,
X(3) = 0.87,
Er(3)
= 2.13,
K2(3) = 0.87.
If we use the SCAT algorithm from Section 9.1.2.2 we get: E(3)
= 3,
Tz(3) = 1,
X(3) = 0.75,
Er(3)
= 2.25,
111;2(3)= 0.75.
The storage requirement of SCAT is independent of the number of jobs (= O(N . R2)) but higher th an for the Bard-Schweitzer approximation, especially for systems with many job classes and many jobs. The storage requirement for SCAT is much less than those of MVA or convolution (see Table 9.5). The accuracy is much better for the SCAT algorithm (the average deviation from exact results is approximately 3%) than for the Bard-Schweitzer approximation and, therefore, in practice SCAT is used more often than MVA, BardSchweitzer, or convolution to solve product-form queueing networks. Problem 9.1 algorithms.
Solve Problem 7.3 with the Bard-Schweitzer and the SCAT
Problem 9.2 algorithms.
Solve Problem 7.4 with the SCAT and the extended SCAT
9.2
SUMMATION
METHOD
The summation method (SUM) is based on the simple idea that for each node of a network, the mean number of jobs at the node is a function of the
398
APPROXIMATION
throughput
ALGORITHMS FOR PRODUCT-FORM NETWORKS
of this node: (9.14)
For nodes with load-independent service rates, function fi has the following properties: l
Xi must be in the range 0 5 Xi 5 mipi. For IS-nodes we have 0 < Xi 5 K* pi 3 because Xi = Xi/pi and Ki 5 K.
l
fi(Xi) 5 f(xi + A&) for AX, > 0, that is, fi is a monotone nondecreasing function of Xi.
l
fi(0)
=O.
Monotonicity does not hold for nodes with general load-dependent service rates and must therefore be examined for each individual case. For multiple server nodes the monotonicity is maintained. To analyze product-form queueing networks, [BFS87] suggest the following formulae: t
Pi 1-K-l
Type-1,2,4 (mi = l),
’ KPi
fi
(Xi)
=
Ki
=
{
T?2ifli
Pi
+
AL,,
l-K-mi-l
K -mi
Type-l
(mi > l),
pi
-Ai \ pi ’
Type-3. (9.15)
The utilization
pi is computed using Eq. (7.21) and the waiting probability with Eq. (6.80). It should be noted that Eq. (9.15) gives exact values only for Type-3 nodes while the equations for the other node types are approximations. The approximate equations for Type-l (-/M/l), Type-2, and Type-4 nodes can be derived from Eq. (6.13), Ki = pi/(1 - pi), and the introduction of a correction factor (K - 1)/K, respectively. The correction factor is motivated by the fact -that in the case of JO s are at node i, hence, for pi = 1 we have Ki = K. The idea pi=l,allK’b of the correction factor can also be found in the equation for Type-l (-/M/m) nodes by considering Eq. (6.29). If we assume that the functions fi are given for all nodes i of a closed network, then the following equation for the whole network can be obtained by summing over all functions fi of the nodes in the network: Pm2 with Eq. (6.28) or approximately
CRi i=l
= C i=l
fi(Xi)
= K.
(9.16)
399
SUMMATION METHOD
With the relation A; = A. ei, an equation to determine the overall throughput X of the system can be given as: 2
(9.17)
fi(X+ ei) = g(X) = K.
i=l
In the multiple class case, the Function (9.14) can be extended:
Kir = fir(b).
(9.18)
The function fir ( Air) has the following characteristics: l
l
fir is defined in the range 0 2 Ai, 2 pi, . rni and, in case of an infinite server node, 0 5 Ai, 5 pir . K,. fir is a monotone non decreasing function. fir (Air) 5 fir (Ai, + e), e > 0.
. fir(O) = 0. For multiclass product-form queue&g networks, the following formulae, as extensions of Eq. (9.15), are given: Pir
l--
K-l K
Pi
Type-1,2,4 (mi = 1)
’ Pir
. P,,,
l-K-mi-l K -mi
Type-l
(mi > 1),
(9.19)
Pi Type-3,
with: Pi
=
(9.20)
5Pir7
r=l R
(9.21)
K=xK,.
For the probability of waiting P,,, we can use the same formula as for the single class case (Eq. (6.28)) and Eq. (9.20) for the utilization pi. Now we obtain a system of equations for the throughputs A, by summing over all functions fir:
5K;r
i=l
=
5
fir(Air)
i=l
=
Kr
(T
=
I,,
.
l
,
R).
(9.22)
400
APPROXIMATION
ALGORITHMS FOR PRODUCT-FORM NETWORKS
With Xi, = Xreir we obtain the system of equations:
fJi.(&-ei,)=g&b)=K,
r = l,...,R.
(9.23)
i=l
In the next section, we introduce solution algorithms to determine the throughput X in the single class case or the throughputs X, in the multiple class case using the preceding fixed-point equations. 9.2.1
Single
Class Networks
An algorithm for single class queueing networks to determine X with the help of Eq. (9.17) and the monotonicity of fi (Xi) has the following form: m-!
Initialization.
Choose Xl = 0 as a lower bound for the throughput as an upper bound. For -/G/co-nodes,
1-1
rni must be
Use the bisection technique to determine X:
STEP 2 1 CZYI
Xl + L Set x = 2 *
-1
Determine g(X) = 2 fi (X . e,,) , w h ere the functions f; (X. ei) are i=l determined by using Eqs. (9.15) and (9.14). -1 If g(X) = K f e, then stop the iteration and compute the performance measures of the network with the equations from Section 7.1. If g(X) > K, set X, = X and go back to Step 2.1. If g(X) < K, set Xl = X and go also back to Step 2.1. The advantage of this method is that the performance measures can be computed very easily. Furthermore, if we use suitable functions fi, then the number K of jobs in the network and the number of servers rni does not affect the computation time. In the case of monotonicity, the convergence is always guaranteed. By using suitable functions fi (Xi), this method can also be used for an approximate analysis of closed non-product-form networks (see Section 10.1.4). In [BFS87] and [AkBoBBb], it is shown that the summation method is also well suited for solving optimization problems in queueing networks (see Chapter 11). In the following example we show the use of the summation method: Consider Example 8.4 from Section 8.2.1. This network Example 9.5 is examined again to have a comparison between the exact values and the approximate ones as computed with the summation method. The network
SUMMATION METHOD
Table 9.6 Input parameters work of Example 9.5 i
e,
1 2 3 4
1 0.5 0.5 1
401
for the net-
mi
lIPi
2 1 1 cm
0.5 0.6 0.8 1
parameters are given in Table 9.6. The analysis of the network is carried out in the following steps:
Initialization: Xl = Q and
A, = min i
Bisection: Xl + A, X = 2=
STEP 2 1 r---Y-y
1.25.
[STEP/ Computation of the functions fi(Xi), For the -/M/2 node 1 we have: p1 =
~J-1 = 0.3125 kw ml
i = 1,2,3,4 with Eq. (9.15).
and Pm1 = 0.149
Eq. (6.28),
and, therefore: f&Q
= 2p1 + PIP,,
= 0.672.
Furthermore: fzP2)
= 0.5
= fi
with p2 = 0.375,
3 f3P3)
=
= 0.75
&
with p3 = 0.5,
3 f4(X4)
=
;
= 1.25.
Thus, we get: g(X) = 5
fi(Xi)
= 3.172.
i=l
l-2.3] Check the stopping condition. A, = X = 1.25 and go on with the iteration.
Because of g(X) > K, we set
402
APPROXIMATION
ALGORITHMS FOR PRODUCT-FORM NETWORKS
[STEP x
fr(xr> = 2Pr + Pi&, f$i2) = 0.214 f3(X3)
= 0.3
f&b)
= 0.625.
=
AZ
-
+
AA
2
= 0.319
= 0.625.
with pi = 0.156 and Pm1 = 0.042, with p2 = 0.1875, with pa = 0.25,
It follows that: g(X) = -&(Xi)
= 1.458.
i=l
-1
Because of g(X) < K, we set Xl = X = 0.625.
x
=
-= AZ
+
0.625 + 1.25 = o g375 -2
LL
2
This iteration is continued until the value of g(X) equals (within E = 0.001) the number of jobs K in the system. To make the bisection clearer, we show the series of Xl and X, values in Table 9.7. Table 9.7
Intervals for Example 9.5
Step:
0
1
2
3
4
5
6
7
8
9
10
h Lt
0 2.5
0 1.25
0.625 1.25
0.9375 1.25
1.094 1.25
1.172 1.25
1.172 1.211
1.191 1.211
1.191 1.201
1.191 1.196
1.191 1.194
The overall throughput of the network is X = 1.193 as computed by this approximation method. Compare this with the exact value of the overall throughput is X = 1.217 The approximate values of the throughputs and the mean number of jobs calculated using the SUM (Eq. (9.14)) and the exact values for this network calculated using MVA (see Example 8.4) are summarized in Table 9.8.
SUMMATION ME T/-l00
403
Table 9.8 Exact and approximate values for the throughputs Xi and the number of jobs K, for Example 9.5
i
xZSUM &MVA
K K ZMVA
-ZSUM
9.2.2
Multiclass
1
2
3
4
1.193 1.218 0.637 0.624
0.596 0.609 0.470 0.473
0.596 0.609 0.700 0.686
1.193 1.218 1.193 1.217
Networks
Now we assume R job classes and K = (Ki, K2, . . . , KR) jobs in the network. Therefore, we get a system of R equations. In general, g, are non-linear functions of A,: gT(Xi, X2,. . . , AR). This is a coupled system of non-linear equations in the throughput of the R job classes and, hence, it is not possible to solve each equation separately from the others. We first transform the system of equations into the fixed-point form:
x = f(x). Under certain conditions f has only one fixed-point and can then be used to calculate x iteratively using successive substitution as follows:
This method of solution is also called fixed-point iteration. In the case of the SUM, for the calculation of the throughputs A,, we get the following system of equations: ~~=f$-(A,.ei,)
= K,,
T= l,...,
R.
(9.24)
i=l
i=l
To apply the fixed-point iteration, this equation can easily be transformed to:
or: Jb=
N
K?-
C fixi&b
see [BaTh94].
= fr(h, . ei,)
* * JR)
= h-(X),
(9.25)
404
APPROXlMATlON
ALGORITHMS FOR PRODUCT-FORM NETWORKS
For the functions fixiT, we get from Eq. (9.25):
,
ei, Pir
I-K-1K
eir mi *j&r
fiXir = { eir
Type-1,2,4 m; = 1,
. Pi ’
I
(9.26)
l-K-mi-l K - mi
/Jir
*
% (Pi>, Type-l
‘pi
ei,
\b t%r
Type-37 ’
and the throughputs ~~
rni > 1,
A, can be obtained by the following iteration:
Initialization: Xl = x2 = * * * = AR = 0.00001; Compute the throughput
E = 0.00001.
A, for all r = 1, . . . , R:
Determine error norm:
e=
C( k2 - h-n+l)2’ 11r=l
If e > E, go to Step 2. ~
Compute other performance measures.
The Newton-Raphson method can also be used to solve the non-linear system of equations for the SUM [HahnSO]. There is no significant difference between fixed-point iteration and the Newton-Raphson method in our experience. The computing time is tolerable for small networks for both methods (below 1 second), but it increases dramatically for the Newton-Raphson method (up to several hours) while it is still tolerable for the fixed-point iteration (several seconds) [BaTh94], Therefore, the fixed-point iteration should preferably be used. The relative error can be computed with the formula: difference =
(exact value - computed value]. 1oo 7 exact value
BOTTAPPRO>( METHOD
405
and gives, for the previous example, an error of 2.6% for the overall throughput. Experiments have shown that the relative error for the SUM method is between 5% and 15%. It is clear that the results can be further improved if better functions fi could be found. Problem
9.3
9.3
Solve Problem 7.3 and Problem 7.4 using the SUM.
BOTTAPPROX
METHOD
We have seen that, compared to the MVA, the SUM approximation needs much less memory and computation time. But if we take a closer look at it, we will see that the number of iterations to obtain reasonably accurate results is still very large for many networks. Now the question arises whether it is possible to choose the initial guess of throughput X(O) so that the number of iterations can be reduced while still maintaining the accuracy of the results. A good initial value of throughput based on the indices of the bottleneck node is suggested by [BoFi93]. Recall that the node with the highest utilization pi is called the bottleneck node. The performance measures of the bottleneck node are characterized by the index bott. The resulting approximation method is called bottapprox (BOTT). 9.3.1
Initial Value of X
For the throughput
X we have the relation: Pi * mi A= -*Pi,
i = l,...,N,
ei
with: 0 5 pi < 1,
i = l,...,
N.
In contrast to the SUM in the BOTT we assume that the utilization &,tt of the bottleneck node is very close to 1, and therefore we can choose for the initial value of the throughput X(O):
where argmin is the function that returns the index of the node with the minimum value. 9.3.2
Single Class Networks
The BOTT is an iterative method where for each node i in each iteration step, the mean number of jobs Ki is determined as a function of the chosen
APPROXIMATION
406
throughput
ALGORITHMS FOR PRODUCT-FORM NETWORKS
A!“): (9.27)
?;;i = f@i).
For the function fi, the same approximations are used as in the SUM. By summing over all nodes, we get for the function g(X) : (9.28) of jobs K in the
This sum g(X) is compared to the correction factor: K
and a
(9.29)
corr = SC4 ’
is determined. We assume th .at in each node i the error in computing the mean number of jobs K; is the same, and therefore for each node the following relation holds: (9.30) For the bottleneck node the new mean number of jobs in the node is computed as follows: jp+1) bott
and the new utilization
= jp) .- K bott g(X) ’
(9.31)
(k+l) can then be determined as a function of the Pbott
new mean number of jobs -iTfoTtl’ at the bottleneck node using Eq. (9.15), that is: (k+l) Pbott
=
qptl’)*
(9.32)
Now the new value for the throughput X at the bottleneck node can be computed as a function of the new utilization (Eqs. (7.23) and (7.25)): A(‘“+l)
_ -
(k+l)
Pbott
* mbott
’ pbott
.
(9.33)
ebott
If none of the following stopping conditions is fulfilled then the iteration repeated: l
is
If the correction factor corr is in the interval [(l - c), . . . , (1 + E)], then stop the iteration. Tests have shown that for E = 0.025 we get good results. For smaller E, only the number of iterations increased but the accuracy of the results did not improve significantly.
BOTTAPPROX METHOD
407
l
If in the computation the value of Pb&t = h(??b,tt) is greater than 1, then set P&t = 1 and stop the iteration.
l
To prevent endless loops, the procedure is stopped when the number of iterations reaches a predefined upper limit.
The function h previously mentioned for different node types is defined as follows (see Eq. (9.15)): 1. For node types -/M/l-FCFS,
-/G/l-PS,
h(Kbott) =
and -/G/l-LCFS-PR: xbott
2. For node type -/G/oo-IS: xbott
h(%ott ) = 7.
3. For node type -/M/m-FCFS it is not possible to give an exact expression for fi and therefore an appropriate approximation is chosen (see [BoFi93]):
2-f
- mbott
’
with: f=
K
-
-
1 ?
K
b = Kbott C
mbott -
mbott . f + mbott
-I- Pmbott,
= b2 - 4 * mbott * Kbott . f.
Now we give an algorithm for the BOTT: Initialize:
Iteration: [d
Compute the following performance measures for i = 1, . . . , N:
408
APPROXIMATION ALGORITHMS FOR PRODUCT-FORM NETWORKS
-1
Compute the new throughput
g(X)= &,
A:
Kbott=Ebott*-$jY
i=l
x = Pbott . mbott
Pbott = @bott),
[I
. pbott
ebott
Check the stopping condition:
I I K g(X)
< E*
If it is fulfilled, then stop the iteration, otherwise return to Step 2.1. For i = 1, . . . , N compute:
To estimate the accuracy of the BOTT and to compare it with the SUM, we investigated 64 different product-form networks. The mean deviation from the exact solution for the overall throughput X and for the mean response time T was 2%. The corresponding value for the summation method was about 3%. The number of iterations of the SUM were about three times the number of iterations of the BOTT. For details see Table 9.9. Table 9.9
Deviation in percent (%) f rom the exact values (obtained by MVA) Error in X
min mean max
9.3.3
Error in T
Number of Iterations
SUM
BOTT
SUM
BOTT
SUM
BOTT
0 2.60 7.40
0 2.10 13.90
0 2.70 7.80
0 2.10 12.20
1 6.60 13
1 2.30 5
Multiclass
Networks
One of the advantages of the BOTT method is that it can easily be extended to multiple class networks. The necessary function h,(zi,) is given by (Eq. (9.19)): 1. For node types -/M/l-FCFS, hr(%ott,r)
-/G/l-PS
= Kbott,r
’
and -/G/l-PR: 1 - 7
’ pbott)
.
(9.34)
409
BOTTAPPROX METHOD
2. For node type -/G/oo-IS,
/L,(K~,~): WKbott,r
) =
Fbott K.
r
(9.35)
3. For the node type -/M/m-FCFS: h-(Kbott,r)
Ebott,r
=
1
I mbott
’ Pmbott
+
1 _
K
K
mbott -
-
1
(Pbott
)
’ Pbott
mbott
(9.36) Remark: Eqs. (9.34) and (9.36) are approximations because we used &ott,r part of &Ott = c,r”=, Pbott,i when we created the h,(Ki,,) function. Now we give the algorithm for the multiple class BOTT: Initialize:
,s.,,Nj{e}, A-= iEIyin
forr=L..,R,
bott, r = argmin {y},r. Iteration: IsTEh,
Compute the performance measures: Air
= b%-,
pir
=
Air
zr
mi
* j&r
= f&h,
’ * * a, AR),
r = l,...,R, i = l,...,N.
Pi=~Pir7 r=l
-1
Compute the new throughputs: N g(Xr)
= CKir, i=l
Ebott,r
= Ebott,r
Pbott,r xr
KYS(h)
=
h- (Ebott,r),
=
Pbott,r
* mbott
b
’
* pbott,r ,
eb0tt,7-
r = l,...,R.
410
APPROXIMATION ALGORITHMS FOR PRODUCT-FORM NETWORKS
-1
Stop the iteration for class T. If:
go to Step 2.1, else stop iteration for class r. Final value of Ki,:
Ki, = Fir . - KT dM
’
r=l,...,
R, i=l,...,
N.
The deviation from the exact results is similar to the one in the single class case for the SUM and is slightly worse but tolerable for the BOTT (see Table 9.10). The average number of iterations for the examples of Table 9.10 Tab/e 9.10 Deviation in percent (%) from the exact results for the throughput X (average of 56 examples) BOTT min mean max
SUM
0
0
6.7 38.62
4.68 40.02
used by the bottapprox algorithm is 4.5, and the maximal value is 20, which is much smaller in comparison with the SUM. Problem
9.4
9.4
Solve Problem 7.3 and Problem 7.4 using the BOTT.
BOUNDS ANALYSIS
Another possibility for reducing the computation time for the analysis of product-form queueing networks is to derive the upper and lower bounds for the performance measures instead of the exact values. In this section we introduce two methods for computing upper and lower bounds, asymptotic bounds analysis (ABA) and balanced-job- bounds analysis (B JB). These two methods use simple equations to determine the bounds. Bounds for the system throughput and for the mean number of jobs are calculated. The largest possible throughput and the lowest possible response time of the network are called optimistic bounds, and the lowest possible throughput and greatest possible mean response time are called pessimistic bounds. So we have:
BOUNDS ANALYSIS
411
With these relations other bounds for the performance measures such as throughputs and utilizations of the nodes can be determined. In the following we make a distinction between three different network types: l
Type A describes a closed network containing no infinite server nodes.
l
Type B describes a closed network with IS nodes.
l
Type C describes any kind of open network.
We restrict our consideration to networks with only one job class. Generalizations to networks with several job classes can be found in [EaSe86]. 9.4.1
Asymptotic
Bounds Analysis
The ABA [MuWo74a, Klei76, DeBu78] gives upper bounds for the system throughput and lower bounds for the mean response time of a queueing model (optimistic bounds). The only condition is that the service rates have to be independent of the number of jobs at the node or in the network. As inputs for the ABA method, the maximum relative utilization of all non-IS-nodes, 2 miLX,and the sum xsum of the relative utilizations of these nodes are needed:
X max
--
max(xi) i
and
xs,,
= c
xi,
where xi = ei/pi is the relative utilization of node i (Eq. (7.6)). The relative utilization of an IS node (thinking time at the terminals) is called 2. At first we consider the case of an open network (Type C): Since the utilization pi = Xi/pi with Xi = X. ei, the following relation holds: p; = A* x;.
(9.37)
Due to the fact that no node can have a utilization greater than 1, the upper bound for the throughput of an open network cannot exceed Xsat = l/xmax if the network is to remain stable. Therefore, Xsat is the lowest arrival rate where one of the nodes in the system is fully utilized. The node that is utilized most is the node with the highest relative utilization xrnax. For this bottleneck node we have pmax = X. x max 5 1, and the optimistic bound for the throughput is therefore: (9.38) To determine the optimistic bounds for the mean response time, we assume that in the best case no job in the network is hindered by another job,. With this assumption, the waiting time of a job is zero. Because the relative utilization xi = ei/pi of a node is the mean time a job spends being served at
412
APPROXIMATION
ALGORITHMS FOR PRODUCT-FORM NETWORKS
this node, the mean system response time in the non-IS nodes is simply given as the sum of the relative utilizations. Therefore, for the optimistic bounds on mean response time we get:
For closed networks we consider networks with IS nodes (Type B). From the results obtained for such networks we can easily derive those for networks of Type A by setting the mean think time 2 at the terminal nodes to zero. The bounds are determined by examining the network behavior under both light and heavy loads. We start with the assumption of a heavily loaded system (many jobs are in the system). The greater the number of jobs K in the network, the higher the utilization of the individual nodes. Since:
Pi(K)= qqxi I 1, the highest possible overall network throughput is restricted by each node in the network, especially by the bottleneck node. As in open models we have:
X(K) 5 1. Xmax
Now assume that there are only a few jobs in the network (lightly loaded case). In the extreme case K = 1, the network throughput is given by +Z). For K = 2,3,... jobs in the network, the throughput reaches l/(Xsum its maximum when each job in the system is not hindered by other jobs. In this case we have X(K) = K/(xsum + 2). These observations can be summarized in the following upper bound for the network throughput : (9.40) To determine the bound for the mean response time F(K), Eq. (9.40) with the help of Little’s theorem: K T(K)
we transform
1
< min + 2 C Xmax
‘2
and get:
(9.41) or
max {xsum, K*xmax - 2) I T(K)*
The asymptotic bounds for all three network types are listed in Table 9.11.
BOUNDS ANALYSIS
Table 9.11 Network
413
Summary of the ABA bounds
Type
ABA
Bounds
A
x
B
Xl1
C
xmax
T
T(K)
2 max { xsurn
) K*
xmax
-
2)
As an example for the use of ABA, consider the closed Example 9.6 product-form queueing network given in Fig. 9.2. There are K = 20 jobs in the network and the mean service times and the visit ratios are given as follows: = 4.6, el = 2,
~/PI
Fig. 9.2
~/,cL~ = 8, e2 = e3 = 1.
l/p3
Single class queueing
= 120 = 2,
network.
At first we determine: 2 max
=max
el
--,--{ Pl
e2
= 9.2,
x,,,
= 3Al + ;
= 17.2,
Pa >
With these values and the formulae asymptotic bounds immediately.
given in Table 9.11, we can determine
the
414
APPROXIMATION
ALGORITHMS FOR PRODUCT-FORM NETWORKS
Throughput: K
X(K) 5 min {
xsurn
‘> + 2’
=min{-$&;$-}
=0.109.
xmax
Mean response time: T(K)
2 max {CC,,,, PC.xmax - 2) = max(17.2; 64) = 64.
Now the bounds for the other performance measures can be computed. With the formula pi = X. xi, we can, for example, compute the upper bounds for the utilizations pr 5 1 and p2 < 0.87. For this example, the exact values for throughput and response time are X(K) = 0.100 and T(K) = 80.28, respectively. 9.4.2
Balanced
Job Bounds
Analysis
With the BJB method [ZSEG82], upper as well as lower bounds can be obtained for the system throughput and mean response time of product-form networks. In general, using this analysis we get better results than with the ABA method. The derivation of bounds by the BJB method is demonstrated only for closed networks of Type A. The derivation of the bounds for the other two network types is similar and is therefore left to the reader as an exercise. In the derivation of BJB, it is assumed that the network is balanced: that is, the relative utilizations of all nodes are the same: xi = . . . = xN = x. The throughput of a balanced network can be computed by applying Little’s theorem:
X(K)= N K C Ti(K)’ i=l where Fi (K) is the mean response time in the ith node with K jobs in the network. For product-form networks, Reiser and Lavenberg [ReLa80] have shown that: Ti(K)
= [I + K,(K
- I)] .x.
If we combine the two preceding formulae, we get: K X(K)
=
N
C [l +Ki(K
- I)] .x
= (N+ff-
l).x’
(9.42)
i=l
Equation (9.42) can be used to determine the throughput bounds of any type of product-form queueing network. For this computation, let xmin and xrnax,
BOUNDS ANALYSIS
415
respectively, denote the minimum and maximum relative utilization of an individual node in the network. The basic idea of the BJB analysis is to enclose the considered network by two adjoining balanced networks: 1. An “optimistic” network, where the same relative utilization assumed for all nodes,
x,i,
is
2. A “pessimistic” network, where the same relative utilization assumed for all nodes.
x,,,
is
With Eq. (9.42), the BJB bounds for the throughputs can be determined as:
(N +“K -1).-xi,,LX(K) L(N +“K -1).-Xii,’
(9.43)
Similarly, the bounds for the mean response time can be determined by using Little’s theorem: (N+K-1)*x
min 5 T(K)
5 (N + K - l)‘Xmax-
(9.44)
The optimistic bounds can be improved by observing that among all the networks with relative overall utilization xsum, the one with the relative utilization, xi = xsUrn/N for all i, has the highest throughput. If xave = xsum/N denotes the average relative utilization of the individual nodes in the network, then the improved optimistic bounds are given by:
X(K) L
K .-= 1 K + N - 1 x:,,,
K (9.45) Xsum
+ (K
-
l)xa,,
and: Xsum
+ (K
-
1)xaw
(9.46)
T(K),
5
respectively. The optimistic network, the network with the highest throughput, has, therefore, N = xsum/xave nodes with relative utilization x,,. Analogously, among all networks with relative overall utilization x,,, and maximum relative utilization xmax, the network with the lowest throughput has altogether xsum/xmax nodes with relative overall utilization x,,,. Therefore the improved pessimistic bounds are given by: K K+z-l
.-= 1 xmax
K xsum
+ (K
I X(K) -
(9.47)
1)xmax
and:
T(K) L Gum respectively.
+ (K
-
l)Xmax,
(9.48)
416
APPROXIMATION ALGORITHMS FOR PRODUCT-FORM NETWORKS
Table 9.12
Summary of the BJB bounds [LZGS84]
Network Type A
T
B
BJB Bounds K
K Xsum + (K - l)xmax
’ X(K) ’
+ CK - ljxave
< gicK) I
xsum
l+&
1 -
Zsum
+ (K
-
1)~s~
+ (K - l)xmax 1+*
Gum
C
Zsum
IT<
XXave
sum
zsum l-
AXmax
The inequalities for open networks can be obtained in a similar way. In Table 9.12, the BJB bounds for all three network types are summarized. The equations for the two closed network types are identical, if the value of 2 is set to zero. Consider the network of Example 9.6 again, but now we Example 9.7 determine the bounds with the BJB method. With x,,~ = x,,,/N = 8.6, we can obtain the bounds directly from Table 9.12. Throughput: 20 20 < X(K) I 17.2 + 120 + 129.5 17.2 + 120 + 20.5’ 0.075 < X(K) < 0.127. Response time: 174.8 17.2 + = < ‘?;(K) 5 17.2 + 1349’ 7.977 37.70 < T(K) < 146.8. A comparison of the results of both methods shows that for this example we get better results with the ABA method than with the BJB method. Therefore the conclusion is irresistible that using both methods in combination will reduce the area where the performance measures can lie. Figures 9.3 and 9.4 show the results of the ABA and BJB values for the throughput and mean response time as functions of the number of jobs in the network, respectively. The point of intersection for the two ABA bounds is given by: Jq* = 2sum +Z . Xmax
BOUNDS ANALYSIS
417
From the point: (Xsum
K+ =
+ 2)2
(X sum + 2).
-
Xmax
Xsumxave -
Xsumxave
upwards, the optimistic curve of the ABA method gives better results than the B JB one. These bounds say that the exact values for the throughput and the mean response time reside in the shaded area.
BJ&,t ABA
B JB,,,
I I
Y
1
Fig. 9.3
K+
ABA and BJB throughput
A
I
I I
1
bounds as a function of K.
ABA
B J&w,,
Fig. 9.4
-,K
,
K*
I I
K*
1
B JB,,t
;-K
K+
ABA and BJB response time bounds as a function of K.
The performance bound hierarchy method for determing the bounds of the performance measures was developed by [EaSe83]. In this method we have a hierarchy of upper and lower bounds for the throughput. These bounds converge to the exact solution of the network. The method is based on the
418
APPROXIMATION ALGORITHMS FOR PRODUCT-FORM NETWORKS
MVA. Thus, there is a direct relation between the accuracy of the bounds and the cost of computation. If the number of iterations is high, then the cost is higher but the bounds are closer together. On the other hand, when the number of iterations is lower, the cost is lower but the bounds are farther apart. The extension of this method to multiclass networks can be found in [EaSe86]. 0th er versions can be found in [McMi84], [Kero86], [HsLa87], and [ShYaSS]. Problem 9.5 Solve Problem 7.3 with the ABA and BJB methods and compare the results.
9.5
NETWORKS
WITH VARIABILITIES
IN WORKLOAD
Performance models need values of input parameters in order to predict values of performance measures. In practice, input parameters are estimated from measurements on real systems. Thus each input parameter will have a confidence interval in addition to a point estimate. Traditional algorithms such as MVA or SCAT accept only single values as input parameters and compute single values for the performance measures. In this section we describe a method that allows intervals as input parameters. As a result, the predicted performance measures are also given as intervals. The treatment here is adapted from [HLM96] and [MaRa95]. We can describe the workload variabilities, e.g., service rate variabilities, by intervals: I = [i-,if].
(9.49)
If the performance measures are monotone with respect to the input parameters, given by intervals, then a simple interval based algorithm can be derived from a given single value algorithm y = y (~1, ~2, . . . , pi), where y is the considered performance measure and the service rates are given by intervals: Ipn = [p,,pL]
n=
l,,..,
N.
Now the interval based algorithm can be described as follows: Calculate the performance measures y for the lower and upper bounds of the service rates:
!I- =Y(/q?.**&A
(9.51)
!I+ = Y(/& - * - 7P$).
(9.52)
Calculate the intervals for the performance measure y . If y is monotonically increasing in the parameter, then:
Iy = [Y-4+1.
(9.53)
NETWORKS WITH VARIABILITIES IN WORKLOAD
419
If y is monotonically decreasing, then: (9.54)
Iy = [Y+,Y-l.
Example 9.8 Consider a closed queueing network with two nodes and er = e2 = 1, K = 5, and the two intervals: Ipl = [l/
set, 2/ set]
Ip2 = [3/ set, 4/ set].
(9.55)
The interesting performance measures are the throughput Xr = X2 = X, which is monotonically increasing with service rates and the mean response time T, which is monotonically decreasing with service rates.
A- = X(1/ set, 3/ set) = 0.99/ set,
(9.56)
set) = 1.97/ set,
(9.57)
A+ = x(2/
set, 4/
T- = T( l/ set, 3/ set) = 5.01 set,
(9.58)
T+
(9.59)
= T(2/sec,4/sec)
Interval for the throughput
= 2.54sec.
X:
1~ = [A-, A+] = [0.99/ set, 1.91/ set].
(9.60)
Interval for the mean response time T: IT = [T+,T-]
1 Workstations
Ethernet
Fig. 9.5
= [2.54sec, 5.01 set].
(9.61)
1
Model of the client-server system.
As another example, we consider a client-server system where Example 9.9 the client workstations (labeled as node 1) submit SQL requests to a server, consisting of a CPU and n disk devices (labeled as nodes 3 and 4) [HLM96]. The workstations are modeled as -/G/l-IS station, while the CPU and disk device are modeled as -/M/l-FCFS stations. The ethernet (labeled as node 2), which connects the workstations with the server, can approximately be modeled by a -/G/l-IS server. The resulting network is shown in Fig. 9.5. The parameters of the model are listed in Table 9.13. Note that workstation service rate is specified as three intervals each having a given probability. For further details see [HLM96].
APPROXIMATION ALGORITHMS FOR PRODUCT-FORM NETWORKS
420
Parameters
Table 9.13 Node
I1 = [2.3,2.7] : p(Il) = 0.2 I2 = [7.4,8.4] : ~(12) = 0.5 13 = [16,21] : p(Iz) = 0.3 0.00185 set 0.12 set 0.054 set
Ethernet CPU Disk
system model Visit Ratios
Service Time
Workstation
9.6
of the client-server
1.0 2.0 1.0 1.0
SUMMARY
The main advantages and disadvantages of the approximation algorithms for product-form queueing networks described in this chapter are summarized in Table 9.14. Tab/e 9.14 networks.
Comparison
of the approximation
algorithms
queueing
Disadvantages
Advantages
Algorithms
for product-form
Very low storage requirement
SCAT
Good accuracy Very low storage requirement compared with MVA or convolution
Needs more iterations
SUM
Easy ment Low ment Easy form
Accuracy is not very high (but sufficient for most applications)
to understand
and
time
No multiple server nodes Low accuracy
Bard-Schweitzer CBS)
and imple-
than BS
storage and time requireto extend to non-productnetworks
BOTT
The same advantages as SUM Fewer iterations than SUM For multiple class networks, easier to implement than SUM Accuracy slightly better than SUM
Accuracy is not very high (but sufficient for most applications)
ABA, BJB
Well suited for a bottleneck analysis In the design phase, to obtain a rough prediction (insight, understanding) of the performance of a system Extremely low storage and time requirement
Only for single class networks Only upper and lower bounds
IO Algorithms for Non-Product-Form Networks
Although many algorithms are available for solving product-form queueing networks (see Chapters 8 and 9), most practical queueing problems lead to non-product-form networks. If the network is Markovian (or can be Markovized), automated generation and solution of the underlying CTMC via stochastic Petri nets (SPNs) is an option provided the number of states is fewer than a million. Instead of the costly alternative of a discrete-event simulation, approximate solution may be considered. Many approximation methods for non-product-form networks are discussed in this chapter. These algorithms and corresponding sections of this chapter are laid out as shown in the flowchart of Fig. 10.1. Networks with non-exponentially distributed service times are treated in the next section, while networks with FCFS nodes having different service times for different classes are treated in Section 10.2. Priority queueing networks and networks with simultaneous resource possession are treated in Sections 10.3 and 10.4, respectively. Network models of programs with internal concurrency are treated in Section 10.5. Fork-join systems and parallel processing are treated in Section 10.6. Networks with asymmetric server nodes and blocking networks are discussed in Section 10.7 and Section 10.8, respectively. 421
422
ALGORITHMS FOR NON-PRODUCT-FORM
NETWORKS
Fig. 10.1 Different algorithms for non-product-form tions.
networks and corresponding sec-
NON-EXPONENTIAL
10.1 10.1.1
NON-EXPONENTIAL Diffusion
DISTRIBUTIONS
423
DISTRIBUTIONS
Approximation
Although we present diffusion approximation as a method to deal with networks with non-exponential service time and interarrival time distributions, it is possible to apply this technique to Markovian networks as well. The diffusion approximation is a technique based on an approximate productform solution. In this approximation, the discrete state stochastic process {K(t) I t 2 O> d escribing the number of jobs at the ith node at time t is approximated by a continuous state stochastic process (diffusion process) {Xi(t) ] t 2 O}. For the fluctuation of jobs in a time interval we assume a normal distribution. In the steady-state, the density function fi(x) of the continuous state process can be shown to satisfy the Fokker-Planck equation [Koba74] assuming certain boundary conditions. A discretization of the density function gives the approximated product-form-like state probabilities ?i (Ici) for node i (see Fig. 10.2). K;(t)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+f(j&) A
Substitute the discrete process by a continuous diffusion process
Discretize the density function
t Xi@> +i (4 Determine the density function of the diffusion process for the steady state case Fig. 10.2
The principle
of the diffusion
approximation.
Although the derivation of this method is very complex, the method itself is very simple to apply to a given problem. Steady-state behavior of networks with generally distributed service and interarrival times at nodes can be approximately solved with the restriction of a single single server at each node. At present no solutions are available for multiple class networks [Mitz97]. Consider a GI/G/l queueing system with arrival rate X, service rate p, and the coefficients of variation cA and cg of the interarrival and service times, respectively. Using the diffusion approximation, the following approximated state probabilities can be obtained [Koba74, ReKo74]:
qk) =
l-P, { p(l - &P’,
k = 0, Ic > 0,
with:
(10.2)
424
ALGORITHMS FOR NON-PRODUCT-FORM
NETWORKS
and p = X/p < 1. Note that y is defined in Eq. (10.2) for convenience so that fi = exp (y ) . For the mean number of jobs we then have: (10.3) Differential equations underlying the diffusion approximation need boundary conditions for their solution. Different assumptions regarding these boundary conditions lead to different expressions. The preceding expressions are based on the work of [Koba74] and [ReKo74]. Using different boundary conditions, [Gele75] and [Mitz97] get different results that are more accurate than those of [Koba74] and [ReKo74] for larger values of utilization p. They [Gele75], [Mitz97] d erived the following expression for the approximated state probabilities: k = 0,
1 - P,
(10.4)
with ,6 and y as in Eq. (10.2). The mean number of jobs is then: K=p
pc; + c; 1+ 2u -4 1
1
Next we show how the diffusion approximation networks.
(10.5)
*
can be applied to queueing
10.1.1.1 Open Networks We can make use of the results for the GI/G/l system for each node in the network, provided that we can approximate the coefficient of variation of interarrival times at each node. We assume the following: l
The external arrival process can be any renewal process with arrival time 1/X and coefficient of variation CA.
inter-
l
The service times at node i can have any distribution time l/pi and coefficient of variation cgi.
l
All nodes in the network are single server with FCFS service strategy.
with mean service
According to [ReKo74] the diffusion approximation for the approximated state probabilities of the network have a product-form solution
it(kl, I I.
(10.6) i=l
NON-EXPONENTIAL
with the approximate marginal probabilities
DISTRIBUTIONS
425
as in Eq. (10.1):
jqk;) = l - Pi) 1pi(l- lj@-l,
I$= 0, k 2 1,
with X . ei
pi = -
Pi pi = exp
(10.8)
7
( -
w - Pi) c& * pi + c& > *
We approximate the squared coefficient of variation of interarrival node i using the following expression:
(10.9) times at
N
cf$ = 1+
cc j=o
C&
- 1) .p$. ej . e,l,
(10.10)
where we set: 2 -2
cBO -
(10.11)
cA*
For the mean number of jobs at node i we get: (10.12) k,=l
Similar results are presented in [Gele75], [GeMi80], and [Koba74]. Source
Sink
Fig. 10.3
A simple open queueing network.
Example 10.1 In this example (see Fig. 10.3), we show how to use the diffusion approximation for a simple open network with N = 2 stations. The
426
ALGORITHMS FOR NON-PRODUCT-FORM NETWORKS
external arrival process has mean interarrival time l/X = 2.0 and the squared 2 - 0 .94. The service times at the two stations have coefficient of variation cA the following parameters: pl = 1.1,
p2 = 1.2,
c;r = 0.5,
cg2 = 0.8.
With routing probabilities ~110= ~112= p21 = p22 = 0.5 and pal = 1, we compute the visit ratios, using Eq. (7.4): el
e2 = el . ~12 + e2 . p22 = 2.
= ~01 + e2 . ~21 = 2,
Node utilizations
are determined using Eq. (10.8): = 0.909,
X . e2 = 0.833. p2 = P2
We approximate the coefficients of variation of the interarrival the nodes using Eq. (10.10): 2 Gil = 1 + C(c& j=o
time at both
- l)p& * 3 el
= 1 + (ci - I)&
* z +
QPL
*z
+
(c”,, - l)p& * 2
= 0.920, 2 42 = 1 + c(c” Bj - 1)~;~ - 3 = 0.825. e2 j=o With Eq. (10.9) we get: & = exp
-
$2 = exp
-
at1- Pd c;1 * p1+ C&l > &
For the mean number of jobs zi,
K1=P1= 1 - bl
= 0.873,
21 - P2) = 0.799. - P2 + ($2 > we use Eq. (10.12) and obtain:
7.147,
K2 = A-S- = 4.151, 1 - p2
and the marginal probabilities are given by Eq. (10.7): %1(O)= 1 - pr = 0.091,
?r(k) = pr(1 - /?1)&-’ = 0.116.0.873k-1
%2(O)= 1 - p2 = 0.167,
for k > 0,
;r2(k) = p2(1 - @2)&1 = 0.167.0.799”-1
for k > 0.
NON-EXPONENTIAL
DISTRIBUTIONS
427
10.1.1.2 Closed Networks To apply the diffusion approximation to closed queueing networks with arbitrary service time distributions, we can use Eq. (10.7) to approximate marginal probabilities %i(ki), provided that the throughputs Xi and utilizations pi are known. In [ReKo74] the following two suggestions are made to estimate Xi:
1. For large values of K we use the bottleneck analysis: Search for the bottleneck node bott (node with the highest relative utilization), set its utilization to 1, and determine the overall throughput X of the network using Eq. (10.8). With this throughput and the visit ratios ei, compute the utilization of each node. The mean number of jobs is determined using Eq. (10.12). 2. If there is no bottleneck in the network and/or the number K of jobs in the network is small, replace the given network by a product-form one with the same service rates pi, routing probabilities pij, and the same K, and compute the utilizations pi. The approximated marginal probabilities given by Eq. (10.7) are then used to determine the approximated state probabilities: (10.13) where G is the normalizing constant of the network. Then for the marginal probabilities and the other performance measures improved values can be determined.
Fig. 10.4
A simple closed queueing network.
Example 10.2 Consider the closed network shown in Fig. 10.4 with the following parameters: p1 = 1.1,
p2 = 1.2,
c& = 0.5,
ci2 = 0.8,
and the routing probabilities: ~12
= 1 and
~21 = ~22 =
0.5.
For the visit ratios we use Eq. (7.5) and get: el = 1,
e2 = 2.
At first we choose K = 6 and use the bottleneck analysis to study the network.
428
ALGORITHMS FOR NON-PRODUCT-FORM
10.1.1.2.1
Solution with Bottleneck
NETWORKS
Let node bott be the bottleneck
Analysis
with the highest relative utilization: pb&t = m:x {pi} = A. m”;” ,
= x . max{0.909,1.667},
which means that node 2 is the bottleneck and therefore its utilization to 1:
is set
p2 = 1. Using Eq. (10.8) we determine the overall throughput
x = p2 *E For the utilization
1 Z-----E 1.667
X of the network:
0.6.
of node 1 we get:
pl=+
0.545.
Now we use Eq. (10.10) to determine the coefficients of variation of the interarrival times: 2
cam= 1 -t- (&I - l)pfl * el . ell + (c”,, - 1)&r . e2 . ell = 0.9, &2
= 1 + (431 - l)Pf2
* el * e;l
+ (~$3,
- l)pz, se2 . e;l = 0.7.
The pi are given by Eq. (10.9): fi1 = exp
( -
w &'Pl+c;,
- Pl)
= 0.3995,
&j = exp
>
(
W-P4 42.P2
Now, by using Eq. (10.12) we can compute the approximate mean number of jobs: = 0.908,
x2 = K - El
+&2
>
=
1.
for the
= 5.092.
Table 10.1 compares the values obtained by the preceding diffusion approximation (DA) with those obtained via DES. We can see that in this case the results match very well. Such close matching does not always occur, especially when one node in the network is highly utilized and the coefficients of variation are very small. To show the second method, we assume that there are K = 3 jobs in the network.
NON-EXPONENTIAL
Table 10.1
Node 1 Node 2
DISTRIBUTIONS
429
Comparison of results for the example (K = 6) Pz (DES)
z,
0.544
0.965 5.036
0.996
(DES)
Pi (DA)
Et
0.545
0.908
(DA)
1
5.092
10.1.1.2.2 Solution Using Product-Form Approximation Now we substitute the given network by a product-form one. To determine the utilizations of this network we can use the MVA and get:
pr = 0.501
and
p2 = 0.919.
The coefficients of variation for the interarrival the previous met hod: &
= 0.9
and
times remain the same as in
ci2 = 0.7.
For the ,!I&we use Eq. (10.9) and obtain: j& = exp
21
cil
-
Pl)
* p1+
= 0.35,
>
c&
p2
=
0.894.
The approximated marginal probabilities can be computed using Eq. (10.7): %1(O)= 1 - pr = 0.499, *r(l) = pr(l - ,&) = 0.326, fh(2)
= pr(1 - /%)br = 0.114,
?I (3) = p1(1 - /qg
= 0.039,
h(O)
= 1 - p2 = 0.081,
%(l)
= pz(1 - @a)= 0.0974,
+2(2) = ,oa(l - fi2)&
= 0.0871,
h(3)
= 0.0779.
= ,%(1 - &)p;
Now the following network states are possible: (3,0),
whose probability
(24,
(1,2),
(0,3),
is computed using Eq. (10.13): k(3,O)
= fq3)
Gf2(0).
?(2,1)
= 0.01111~ $
;r(l,2)
= 0.02839.
$,
+(0,3)
= 0.03887.
$
;
= 0.00316 - $
With the normalizing condition: fi(3,O) + ?(2,1)
+ q1,q
+ q&3)
= 1,
ALGORITHMS FOR NON-PRODUCT-FORM
430
NETWORKS
we determine the normalizing constant G: G = 0.00316 + 0.01111 + 0.02839 + 0.03887 = 0.08153, which is used to compute the final values of the state probabilities: ii(3,O) = 0.039,
+(2,1) = 0.136,
ii(l,2)
= 0.348,
qo, 3) = 0.477,
and with these state probabilities the improved values for the marginal probabilities are immediately derived: ~~(3) = ~~(0) = 0.039,
ni(2) = ~~(1) = 0.136,
q(1) = 7r2(2) = 0.348,
7rl(O) = 7r2(3) = 0.477.
For the mean number of jobs we get -
3
Ki = c
Ice TV
= 0.737
and
?7, = 2.263.
k=l
In Table 10.2 DES values for this example are compared to values from the preceding approximation. Table 10.2
Node 1 Node 2
Comparison of results for the example (K = 3). ~z (DES)
K, (DES)
pi (DA)
‘is7i (DA)
0.519 0.944
0.773 2.229
0.501 0.919
0.737 2.263
A detailed investigation of the accuracy of the diffusion method can be found in [ReKo74]. The higher the utilization of the nodes and the closer the coefficient of variation is to 1, the more accurate are the results. 10.1.2
Maximum
Entropy
Method
An iterative method for the approximate analysis of open and closed queueing networks is based on the principle of the maximum entropy. The term entropy comes from information theory and is a measure of the uncertainty in the predictability of an event. To explain the principle of the maximum entropy, we consider a simple system that can take on a set of discrete states S. The probabilities 7r(S) for different states are unknown; the only information about the probability vector is the number of side conditions given as mean values of suitable functions. Because in general the number of side conditions is smaller than the number of possible states, in most cases there is an infinite number of probability vectors that satisfy the given side conditions. The question now is
NON-EXPONENTIAL
DISTRIBUTIONS
431
which of these probability vectors shall be chosen as the one best suited for the information given by the side conditions and which is least prejudiced against the missing information. In this case the principle of maximum entropy says that the best-suited probability vector is the one with the largest entropy. To solve a queueing network using the maximum entropy method (MEM) the steady-state probabilities r(S) are determined so that the entropy function
H(T) = -C7r(S)lnn(S)
(10.14)
S
is maximized subject to given side conditions. The normalizing condition -the sum of all steady-state probabilities is l-provides another constraint. In [KGTA88], op en and closed networks with one or several job classes and -/G/l, -/G/m nodes are analyzed using this approach. Queueing disciplines such as FCFS, LCFS, or PS as well as priorities are allowed. An extension to the case of multiple server nodes is given in [KoA188]. For the sake of simplicity we only consider single class networks with -/G/l-FCFS nodes based on the treatment in [Kouv85] and [Wals85]. 10.1.2.1 Open Networks If we maximize the entropy by considering the side conditions for the mean number of jobs I?,, Eq. (7.18), and the utilization pi, Eq. (7.26)) then the following product-form approximation for the steady-state probabilities of open networks can be derived [Kouv85]: (10.15) where the marginal probabilities
are given by:
(10.16)
and where: (10.17) b, = c-i - pi z Ei ’ Gi zz e-i--
1 - pi *
(10.18) (10.19)
To utilize the MEM, we thus need the utilizations pi and the mean number of jobs Ki at each node. The equation pi = Xi/pi can be used to easily determine pi from the given network parameters. The mean number of jobs
432
ALGORITHMS FOR NON-PRODUCT-FORM
NETWORKS
-
Ki can be approximated as a function of the utilization pi and the squared coefficient of variation of the service and interarrival times [Kouv85]: K,
2
= pi
I+
c$i + Pic%i
l-pi
2
(10.20)
> ’
if the- condition (1 - c2Ai)/(l + c’,i) 5 pi < 1 is fulfilled. The computation of the Ki is made possible by approximating the squared coefficient of variation c~i of interarrival times. The following iterative expressions for computing the squared coefficients of variation is provided by [Kouv85] : (-ii = -1+
C3i
(10.21)
($.gql~
= 1 + pji (Cgj - 1) )
c$i = pi(l
(10.22)
- pi) + (1 - pi)Cii
There are other approximations of jobs ITi and for the estimation variation. For more information on With this information, the MEM can be summarized in the following
(10.23)
+ pfC:i*
for the computation of the mean number of the necessary squared coefficients of these equations, see Section 10.1.3. for the analysis of open queueing networks three steps:
Determine the arrival rates Xi using the traffic equations and the utilizations pi = Xi/pi for all nodes i = 1, . . . , N. Determine -1 arrival
the squared coefficients
(Eq. (7.1))
of variation.
Initialize. The squared coefficients of variations times are initially set to one for i = 1,. . . , N.
c2Ai of the inter-
1 STEP 2.2 ] Compute the squared coefficients of variation c2oi of the interdeparture times of node i for i = 1, . . . , N using Eq. (10.23). Compute the squared coefficients of variation c& of node i for i = 1, . . . , N and j = 0, . . . , N using Eq. (10.22). From these the new values of the squared coefficients of variation cii of the interarrival times are computed using Eq. (10.21). 1 STEP 2.3 ] Check the halting condition. If the old and new values for the c2Ai differ by more than E, then go back to Step 2.2 with the new & values. Determine the performance measures of the network beginning with the mean number of jobs ??i for i = 1,. . . , N using Eq. (10.20) and then compute the maximum entropy solutions of the steady-state probabilities using Eq. (10.15). The MEM normally approximates the coefficients of variation less accurately than the method of Kiihn (see Section 10.1.3) and therefore the method of Kuhn is generally preferred for open networks with a single job class. The MEM is mainly used for closed networks.
NON-EXPONENTIAL
DISTRIBUTIONS
433
IO. 1.2.2 Closed Networks For closed networks that consist only of -/G/lFCFS nodes, the following product-form formulation for the approximated steady-state probabilities can be given if we maximize the entropy under the side conditions on the mean number of jobs ??i and the utilization pi:
7r(kl. . .$N) = A- Aqkl). . . . - cv@N), G(K)
(10.24)
with: (10.25) and:
G(K) =
F&)
c
. . . . . FN(kN).
(10.26)
As in the case of open networks, coefficients ai and bi are respectively given by Eqs. (10.17) and (10.18). Th e computation of the steady-state probabilities of closed networks is more difficult than that for open networks as the pi and 17, values of the individual nodes cannot be determined directly. Therefore we use the following trick: We construct a pseudo open network with the same number of nodes and servers as well as identical service time distribution and routing probabilities as the original closed network. The external arrival rates of this open network are determined so that resulting average number of jobs in the pseudo open network equals the given number of jobs K in the closed network: N c
11; = K,
(10.27)
i=l
where x: denotes the mean number of jobs at the ith node in the pseudo open network. The pseudo open network arrival rate X is determined iteratively by using Eqs. (10.20) and (10.27), assuming that the stability condition Xei max
434
ALGORITHMS FOR NON-PRODUCT-FORM
NETWORKS
Eq. (10.25), th e normalizing constants as well as the utilizations and mean number of jobs at each node of the closed network can be computed. This approximation uses values for the coefficients ai and bi, which are computed for the pseudo open network. To compensate the error in the approximation, we apply the work rate theorem (Eq. (7.7)), which gives us an iterative method for the computation of the coefficients ai : .(n+l) z
=
a,!“’
. K
(10.28)
. (pi.p$
. p;
with the initial values: (10.29)
For the coefficients bi we have: (10.30) The asterisk denotes the performance measures of the pseudo open network. The algorithm can now be given in the following six steps. Compute the visit ratios ei for i = 1,. . . , N, using Eq. (7.2) and the relative utilizations xi = ei/pi. Construct and solve the pseudo open network. r-1 Initialize. Initial value for the squared coefficients of the interarrival times at node i = 1, . . . , N:
Initial value for the arrival rate of the pseudo open network: x =
Oagg lCbott ’
where x&t
= max {xi} is the relative utilization i
of the bottleneck node.
[I Compute X from Condition (10.27) by using Eq. (10.20). Thus, X can be determined as solution of the equation: 1 +
Csi
+
X * Xi
* C~i
-K=O,
(10.31)
I.-X-Xi
if (1 - C~i)/(l + c”,i) 5 X . x; < 1 for all i is fulfilled. To solve the non-linear equation, we use the Newton-Raphson method.
NON-EXPONENTIAL
DISTRIBUTIONS
435
-1 Compute for i, j = 1,. . . , N the squared coefficients of variation 2 using the Eqs. (10.21)-(10.23). For this computation we need c&, C$ and & the values pi = X-Q and Xi = X.ei. p-1 Check the halting condition. If the new values for the & differ less than E from the old values, substitute them by the new values and return to Step 2.2. m% Determine for i = 1, . . . , N the utilizations pi = (A . ei)/y;, and with Eq. (10.20) the mean number of jobs l?r in the pseudo open network. Determine the coefficients ai and b; of the pseudo open network using Eqs. (10.29) and (10.30). w
Solve the closed network.
-1 Use the convolution method to determine the normalizing constant G(K) of the network. The functions Fi(ki) are given by Eq. (10.25). -1 Determine the performance parameters pi and ??i of the closed network. For this we use the normalizing constant Gc’ (Ic) as defined in Eq. (8.8). From Eq. (8.7) we have ni(k) = Fi(lc) +G$$(K - Ic)/G(k) and therefore: (10.32) (10.33)
(7 Use Eq. (10.28) to determine a new value for the coefficients ai. If there is a big difference between the old values and the new values, then return to Step 5.1. -1 Use Eq. (10.24) to determine the maximum entropy solution for the steady-state probabilities and other performance parameters of the network ils, e.g., Xi = /hipi or Ti = Xi/X,.
Fig. 10.5
Closed network.
The maximum entropy method is now illustrated on a simExample 10.3 ple network with N = 2 nodes and K = 3 jobs (see Fig. 10.5). The queueing
436
ALGORITHMS FOR NON-PRODUCT-FORM NETWORKS
discipline at all nodes is FCFS and the service rates and squared coefficients of variation are given as follows: 2 cB2 = 0.5.
2
and
p2 = 2,
Pl = 1,
cl?1 = 5,
The analysis of the network is carried out in the following steps. As condition for the termination of iterations we choose E = 0.001. Determine the visit ratios and relative utilizations: and
el = e2 = 1,
X2=05Aa
Xl =I,
Construct and solve the pseudo open network. I STEP 2 I
Initialize: x = 0.99/x1 = 0.99.
c& = ci2 = 1,
-1 Determine X. We use the Newton-Raphson iteration method to solve the following equation:
2 -x-xi 1+c&+x*xi*c& -3=o 1-X.X; > z 2 ( and get X = 0.5558. -1
Determine the squared coefficients of variation: &
= pl(l - ~1) + (1 - pl)c;,
3j2 = p2(1 - ~2) + (I2
+ p~c& = 2.236,
,a)~;,
+ p;~;~
= 0.961,
cl2 = 1 + ~12 s (c”,, - 1) = 2.236, 2 c21 = 1 + pzl . (4,
&
=
-1 +
ci2 = -1+
- 1) = 0.961,
x2 ’ P21
(
k
Xl * P12 x2 * (42
(-2.4
-l
= o g61
--I
= 2 236
(%, + 1) >
+ 1) >
LT
A.
Check the halting condition: 1Cy;f-) - c?;ld) 1= (0.961 - l( = 0.039 > E, Icy - c~;~~)I = 12.236- 11= 1.236 > E.
As the halting condition is not fulfilled, we return to Step 2.2 and determine a new value for the constant X.
NON-EXPONENTIAL
DISTRIBUTIONS
437
After 14 iterations, the halting condition is fulfilled and we get the following results: 2 CA1 = 2.130,
x = 0.490,
ci2 = 2.538.
Determine pi* and xz:
= 2.446, Determine the coefficients: ~ Pi . ~ PT = 0.241, a1 = 1 - pT KT - pi
- Pa . ~ PZ = 0.257, a2 = 1 - p; 17; - p;
b =K;-d -
bz =
1
-*
=‘)
o8
Xi-p; - E*
= La 0 558 2
Kl
Solve the closed network. -1 Determine the normalizing constant using the convolution method (see Section 8.1). For the functions Fi(Ici), i = 1,2, we use Eq. (10.25) and get: wo
=
F2(0)
I,
= L,
I-$(l) = 0.193, Fr(2) = 0.154,
Fz(1) = 0.144,
&(3)
F2(3) = 0.045.
= 0.123,
F2(2) = 0.080,
The procedure to determine the G(K) is similar to the one in Table 8.2 and summarized in the following tabular form: k
2
\n 0
1 0.193 0.154 0.123
1 2 3
1 0.3362 0.2618 0.2054
For the normalizing constant we get G(K) = 0.2054. -1 Determine pi and xi. For this we need the normalizing constant G!‘(5) of the network with Node i short-circuited. We have only two nodes and therefore it follows immediately that: G(l% nr
= Fz(k),
G% iv
= Fl (k) 3
438
ALGORITHMS FOR NON-PRODUCT-FORM
NETWORKS
and, therefore: p1 = 1 - @ Fl@)
-k=l
p2 z 1 - @#
= 0.782,
= L, 0 400
(1)
. G, (K - k) = 2.090,
G(K)
- k) = :* 0 910 [I
Determine the new coefficients ai. With Eq. (10.28) we get:
%
(new) _-. al +KS p; . (&$
=0.242,
(new) _-a2+K.p;+
=o.253.
a2
(p2.$$’
Because the old and new values of the ai differ more than E = 0.001, the analysis of the closed network has to be started again, beginning with Step 5.1.
After altogether three iteration steps the halting condition is fulfilled and we get the following final results: p1 = 0.787,
p2 = 0.394,
and
x1 = 2.105,
K2 = 0.895.
Compute all other performance measures: X1 = /xlpl = 0.787, T1 = K1/X1 = 2.675,
X2 = p2p2 = 0.787, T2 = J&/x:!
= 1.137.
To compare the results, we want to give the exact values for the mean number of jobs: x1 = 2.206,
K2 = 0.794.
The method of Marie (see Section 10.1.4.2) gives the following results: x1 = 2.200,
K2 = 0.800.
The closer the service time distribution is to an exponential distribution (c& = 1) , the more accurate are the results of the MEM. For exponentially distributed service and arrival times the results are always exact. Another important characteristic of the MEM is the fact that the computation time is relatively independent of the number of jobs in the network.
NON-EXPONENTIAL
10.1.3
Decomposition
for Open
439
DISTRIBUTIONS
Networks
This section deals with approximate performance analysis of open non-product-form queueing networks, based on the method of decomposition. The first step is the calculation of the arrival rates and the coefficients of variation of the interarrival times for each node. In the second step, performance measures such as the mean queue length and the mean waiting time are calculated using the GI/G/l and GI/G/ m f ormulae from Sections 6.12 and 6.14. The methods introduced here are those due to Kuhn [Kiihn’lS, Bolc89], Chylla [Chy186], Pujolle [PuAi86], Whitt [Whit83a, Whit83b] and Gelenbe [GePu87]. Open networks to be analyzed by these techniques must have the following properties: distributed
and
l
The interarrival times and service times are arbitrarily given by the first and second moments.
l
The queueing discipline is FCFS and there is no restriction on the length of the queue.
l
The network can have several classes of customers (exception is the method of Kuhn).
l
The nodes of the network can be of single or multiple server type.
l
Class switching is not allowed.
With these prerequisites the method works as follows: Calculate the arrival rates and the utilizations of the individual nodes using Eqs. (10.35) and (10.36), and (10.37) and (10.38), respectively. Compute the coefficient of variation of the interarrival each node, using Eqs. (10.41)-( 10.44).
times at
Compute the mean queue length and other performance measures. We need the following fundamental formulae (Step 1); most of them are from Section 7.1.2: Arrival rate Xij,T from node i to node j of class r:
Arrival rate Xi,, to node i of class r: N
k,r
= XOi,r + C j=l
Xj,r * Pji,r-
(10.35)
440
ALGORITHMS FOR NON-PRODUCT-FORM NETWORKS
Arrival rate Xi to node i: (10.36) Utilization
pi,r of node i due to customers of class T: pi,r = ~. mi f Pi,r
Utilization
(10.37)
pi of node i: R
Pi = C r=l
(10.38)
Pi,r*
The mean service rate pi of node i: 1
(10.39)
Coefficient of variation cgi of the service time of node i: 2
&
= -1+
R
Air A. r=l Ai c
(
&
>
(10.40)
* @fJ- + l),
with coefficient of variation ci,r for the service time of customers of class r at node i. The calculation of the coeficient of variation or interarrival times (Step 2) is done iteratively using the following three phases (see Fig. 10.6): 0 1291) C2z,l> \
. . .
VNi,R>
CK,,,)
(hl,l, (Lc;z)
)
(X2)z,CD,
. . .
-:
/
Phase 1: Merging
&,
(~N,R,
Phase 2: Flow
$~v,R)
Phase 3: Splitting
Fig. 10.6 Phases of the calculation of the coefficient of variation.
l
Phase 1: Merging Several arrival processes to each node are merged into a single arrival process. The arrival rate X is the sum of the arrival rates of the individual arrival processes. For the coefficient of variation of the interarrival time, the several authors suggest different approximate formulae.
NON-EXPONENTIAL
441
DISTRIBUTIONS
Phase 2: Flow
l
The coefficient of variation cg of the interdeparture times depends on the coefficients of variation of the interarrival times CA and the service times cg . Here again the different authors provide different approximate formulae. Phase 3: Splitting
l
For the splitting of the departure process, all authors apply the formula: = 1 +pij,r
cFj,r
* (c& - 1).
(10.41)
The iteration starts with phase 1 and the initial values are: cij,r = 1. The corresponding formulae for phases 1 and 2 are introduced in the following: l
Decomposition of Pujolle
[PuAi86]:
Merging:
*A j,r ’ Pji,r + &i,r ’
2 _ 1 R cAi - x i - cr=l
‘ii,r
XO,r
* POi,r
,
(10.42)
(10.43)
* k,r-
Flow: ‘Y&i
l
= Pf ’ <&i + l) + (l - Pi) ’
Decomposition of Whitt
C~i
+ Pi * (1 - 2pi).
(10.44)
[Whit83b, Whit83a]:
Merging: see Pujolle. Flow: 2 = 1 + ” ’ (~Cgi
l
‘) + (1 - pi) ’ (C~i - 1).
(10.45)
Decomposition of Gelenbe [GePu87]: Merging: see Pujolle. Flow:
c& = -1 + Ai 5 !Q * (c&
+ 1) + (1 - pi) * (Cii + 1+ 2 * Pi>.
r=l CLG~
(10.46)
442
ALGORITHMS FOR NON-PRODUCT-FORM NETWORKS l
Decomposition of Chylla
[Chyl86]:
N xj,r * Pji,r 2 x, CA+ = 1 + c 2,r j=o
* gi,,
- l>,
(10.47)
(10.48)
Flow: 2
cgi = 1 + P& (pi) +(& - 1) + (1 - P:,(pi))
fig. 10.7 l
Decomposition of Kuhn
Substitution
*
(10.49)
of a feedback.
[Kiihn79]:
Before starting with the iteration in the method of Kuhn, nodes with direct feedback are replaced by nodes without feedback (see Fig. 10.7 and Eq. (10.50)). x; = Xi(l - Pii), CL: = pip - Pii), Pij & = ~1 _ pii for j f i, p,’ = 0.
cg = pii + (1 - p&‘,,,
(10.50)
c& =
2&i
* x2i * (Ali +
A%)
* (I1 + I2 + 13 + I*) - 1.
(10.51)
Only two processes can be merged by this formula. If there are more processes to merge, then the formula must be used several times. The terms Zl,l = 1,2,3,4 are functions of the Xji and the cj;. For details see [Kuhn791 or [Bolc89].
NON-EXPONENTIAL
443
DISTRIBUTIONS
Flow: c& = Cii + a&&
- pf (&
+ &)
(10.52)
- GKLB.
For GKLB see Eqs. (6.72) and (6.92). To calculate the mean queue length (Step 3) and the other performance measures, GI/G/l and GI/G/ m f ormulae from Sections 6.12 and 6.14 can be used in the case of a single class of customers. Here we give the extension of the most important formulae to the case of multiple classes of customers: M/M/m-FCFS:
l
(10.53) with the probability
of waiting, P,;, given by Eq. (6.28)
Allen-Cunneen [AllegO] :
l
(10.54) l
Kramer/Langenbach-Belz
[KrLa76]:
Q~,TKLB
= Qi,rAC
(10.55)
’ GKLB,
with the correction factor: --.-.
G KLB
ei
2 3
(1
-
pi)
C1
Pi
Gi,rj2
(CL,, + C”,i,,>
c5i,r 5 ‘7
= (4,r -
i e l
-
(
(’
-
pi)
’ (C”,i,,
-
(10.56)
1)
+ C~i,~)
c;i,r > 1.
7
Kimura [Kimu85] :
<‘Zli,r-
. l-4,
T
l-4C(?&,)
*e
2 (l--Pi) Pi ( -3
)
+
4i,r) l-4,
+
T
1fChh
+ cf&
+ c&.
-
1’
(10.57) with the coefficient of cooperation C(mi,
pi) = (1 - pi)(mi
-l> J275&2 16mip;
’
(10.58)
444
ALGORITHMS FOR NON-PRODUCT-FORM
NETWORKS
Sink
Fig. 10.8 Open non-product-form queueing network. l
Marchal [Marc78]:
(1+ C”,i,,>*(c&l. + P%i,?-1+GKLB. Qi,rMAR z5Qi,rM,M/rn 20 + P5c;i.J
(10.59)
Example 10.4 Consider the open non-product-form queueing network of Fig. 10.8. The routing probabilities of the network are given by: ~13 = 0.5,
~12 = 0.5,
p31
=
0.6,
p21
=
p41 = 1.
Jobs are served at different nodes with rates: ~1 =
25,
p2 = 33.333
/!.L3 =
16.666,
20,
/A4 =
and arrive at node 4 with rate X04 = 4. The coefficients of variation of the service times at different nodes as well as of the interarrival times are given by:
c& =
2.0,
c& =
6.0,
c& =
0.5,
c&
= 0.2,
Co4 2 -
4.0.
With Eq. (10.35) we can determine the arrival rate at each node: Xl =2J,
x2 =a,
x3 =lo,
and using Eq. (10.37) we determine the utilizations p1=0.8,
p2=0.3,
p3 = 0.6,
x4 =4,
of different nodes: 0.2 p4 = -*
The initial values for the coefficients of variation cij are: Cf2 = CT3= c;1 = c;1 = c& = 1. Now we start the iteration using the decomposition of Pujolle to obtain the different coefficients of variation of the interarrival times c;~, i = 1,2,3,4.
NON-EXPONENTIAL
DISTRIBUTIONS
Iteration 1: Merging (Eq. (10.42)):
+ c41 X4P41) d, = &h?1)21 + & X3P31 = &lOl+llO~O.6+1~4l) YZ1. Similarly, we obtain:
Flow (Eq. (10.44)):
611= Pa3, + 1)+ (1- Pl)zl, + /a(1-
2Pl) = 0.82(2 + 1) + (1 - 0.8) +1 + 0.8 . (1 - 2 -0.8) =1.64.
Similarly, we get:
c& = 1.45, c& = 0.82, Splitting
c& = 3.368.
(Eq. (10.41)): CT2= 1+m2+201 -1) = 1 + 0.5 . (1.64 - 1) = 1.32, 2
cl3 = 1.32,
c& = 1.45,
c31 = 0.892,
c& = 3.368.
Iteration 2: Merging (Eq. (10.42)): cil = 1.666,
c;2 = 1.32,
ci3 = 1.32,
ci4 = 4.0.
Flow (Eq. (10.44)): cg, = 1.773, Splitting
c& = 1.674,
c& = 0.948,
c& = 3.368.
(Eq. (10.41)):
2 cl2 = 1.387,
cf3 = 1.387,
c& = 1.674,
c& = 0.969,
c& = 3.368.
After six iteration steps we get the values for the coefficients of variation of the interarrival times ci, at the nodes, shown in Table 10.3. In Table 10.4 the mean number of jobs at different nodes are given. We use the input parameters
446
ALGORITHMS FOR NON-PRODUCT-FORM
NETWORKS
Table 10.3 Coefficients of variation cii, i = 1,2,3,4 of the interarrival times at the nodes Iteration 1 2 3 4 5 6
1.0 1.666 1.801 1.829 1.835 1.836
4,
4,
4,
1.0 1.320 1.387 1.400 1.403 1.404
1.0 1.320 1.387 1.400 1.403 1.404
4.0 4.0 4.0 4.0 4.0 4.0
Table 10.4 Mean number of jobs ??, at node i using the formulae of Allen-Cunneen (AC), KrgmerLangenbach/Belz (KLB) , and discrete-event simulation (DES) Methods
EI
772
773
774
AC KLB DES
6.94 5.97 4.36
0.78 0.77 0.58
1.46 1.42 1.42
0.31 0.27 0.23
and values for the coefficients of variation for the interarrival times at the nodes after six iterations (given in Table 10.3). As we can see in Table 10.4, we get better results if we use the Kramer-Langenbach/Belz formula instead of the Allen-Cunneen formula. In this second example we have two servers at node 1. The Example 10.5 queueing network model is shown in Fig. 10.9. Source z* << < ”*:” c7
Sink Fig. 10.9
Open non-product-form
queueing network with two servers at node 1.
The service rate of each server at node 1 is ,~i = 12.5. All other input parameters are the same as in Example 10.4. We also have the same values for the arrival rates Xi and the utilizations pi. The initial values for the
NON-EXPONENTIAL
DISTRIBUTIONS
447
coefficients of variation c;j are again: 2 Cl2
=
2 Cl3
=
2 c21
= c& = CT;1= 1.0.
This time we use the formula of Whitt to determine the coefficients of variation cAi iteratively. Iteration 1: Merging: Whitt uses the same merging formula as Pujolle (Eq. (10.42)) and we get:
Flow (Eq. (10.45)): c& = l+
P?Gl - 1)+
(1 - Pxi, - 1) fi = 1 + O-64(2 - 1) + (1 - 0.64)(1 - 1) l.h = 1.453
c& = 1,45, Splitting
cg3 = 0.82,
c& = 3.848.
(Eq. (10.41)):
cT2 = 1.226,
cT3 = 1.226,
c;i = 1.450,
c& = 0.892,
& = 3.848.
Iteration 2: Merging (Eq. ( 10.42)) : cil = 1.762,
cl2 = 1.226,
c;, = 1.226,
c;, = 4.0.
Flow (Eq. (10.45)):
CL1= 1.727, Splitting
c& = 1.656,
c& = 0.965,
c& = 3.848.
(Eq. (10.41)):
2 cl2 = 1.363,
cT3 = 1.363,
c;i = 1.656,
c& = 0.979,
c& = 3.848.
After seven iterations, we get the values for the coefficients of variation of the interarrival times ci, at the nodes shown in Table 10.5. In Table 10.6, the mean number of jobs at the different nodes are given. We use the input parameters and values for the coefficients of variation for the interarrival times at the nodes after seven iterations (given in Table 10.5). These results are similar to the results of Example 10.4, and the differences with discrete-event simulation results are slightly smaller than in Example 10.4.
448
ALGORITHMS FOR NON-PRODUCT-FORM
NETWORKS
Table 10.5 Coefficients of variation cii of the interarrival at the nodes
4,
Iteration 1 2 3 4 5 6 7
times
1.0 1.762 1.891 1.969 1.983 1.991 1.992
1.0 1.226 1.363 1.387 1.401 1.403 1.405
4.0 4.0 4.0 4.0 4.0 4.0 4.0
1.0 1.226 1.363 1.387 1.401 1.403 1.405
Tab/e 10.6 Mean number of jobs ??, at the nodes using the formula of Allen-Cunneen (AC), Krtimer-Langenbach/Belz (KLB), and discrete-event simulation (DES)
AC KLB DES
10.1.4
Methods
6.48 6.21 4.62
for Closed
0.78 0.77 0.57
1.46 1.42 1.38
0.31 0.27 0.23
Networks
10.1.4.1 Robustness for Closed Networks In case of closed non-product-form queueing networks with -/G/l and -/G/m FCFS nodes, the easiest way to analyze them is to replace the -/G/l and -/G/m FCFS nodes by -/M/l and -/M/m FCFS nodes and to use a product-form method such as MVA, convolution, or the SCAT algorithm. This substitution can be done due to the property of robustness of such closed non-product-form queueing networks. Robustness, in general, means that a major change in system parameters generates only minor changes in the calculated performance measures. In our case it means that if we replace the values of the coefficients of variation of the service times c& (i = 1,. . . , N) by 1, i.e., we assume exponentially distributed service times instead of arbitrarily distributed service times, we obtain a tolerable deviation in the calculated performance measures. We have investigated more than 100 very different closed non-productform queueing networks with -/G/l and -/G/m FCFS nodes. The maximum number of nodes was 10 and the coefficient of variation varied from 0.1 to 15. For a network with only -/G/l FCFS nodes, the mean deviation is about 6% for both single and multiple class networks. In the case of only -/G/m FCFS nodes, the deviation decreases to about 2%. If we have only -/G/co nodes, the deviation is zero. The difference between single and multiple class networks is that the maximum deviation is greater for multiple class networks (40% instead of 20% for single class networks).
NON-EXPONENTIAL
DISTRIBUTIONS
449
For open non-product-form queueing networks, the influence of the coefficients of variation is much greater, especially if the network is a tandem one without any feedback. So we cannot suggest to use the property of robustness to obtain approximate results for open non-product-form queueing networks. It is better to use one of the approximation methods introduced in Section 10.1.2. Example 10.6 In this example we study the robustness property of several closed networks with different coefficients of variation. Consider the three network topologies given in Fig. 10.10 together with four combinations of the coefficients of variation (see Table 10.7). Note that the first combination (a) corresponds the case of all service times being exponentially distributed and the corresponding network being a product-form one. The analysis of the resulting models is done with K = 5 and K = 10 jobs in the system. Table 10.7
Squared coefficients of variation for the net-
works in Fig. 10.10 Combinations
Node 1
Node 2
Node 3
41
42
43,
z
0.2 1.0
0.4 1.0
0.8 1.0
1
4.0 2.0
4.0 1.0
8.0 0.2
We use two combinations of service rates. The first combination leads to a more balanced network, while the second combination results in a network with a bottleneck at node 1 (see Table 10.8). Tab/e 10.8
Two combinations of service rates
Balanced network Network with bottleneck
0.5 0.5
0.333 1.0
0.666 2.0
In Tables 10.9 and 10.10, the throughputs X for the three network topologies together with the different combinations of the coefficient of variation are given. If we compare the product-form approximation results (case a, Table 10.7) with the results for non-product-form networks (cases b, c, and &, Table 10.7), given in Tables 10.9 and 10.10, we see that the corresponding product-form network is a good approximation to the non-product-form one provided: l
The coefficients of variation are < 1 4.
(case
b) and/or not too big (case
450
ALGORITHMS FOR NON-PRODUCT-FORM
Network
NETWORKS
1
--l-/-F
)
p12 =0.6;
p13 =0.4
r Network 2
~12
=
P21 =
0.6; p13 = 0.4 0.3;
p23
= 0.7
Network 3
Pll
Fig. 10.10
= 0.2; p12 = 0.4;
p13
= 0.4;
P21 = 0.3;
P22 = 0.2; p23 = 0.5;
p31 = 0.6;
p32 = 0.2;
Different closed networks.
~3~
= 0.2;
NON-EXPONENTIAL
Table 10.9
Throughputs
of Variations
451
X for the networks in Fig. 10.10, for the balanced case Network
Squared Coefficient
DISTRIBUTIONS
K=5
1
Network 3
Network 2
K = 10
K=5
K = 10
K=5
K = 10
E
0.47 0.43
0.50 0.47
0.45 0.41
0.49 0.47
0.40 0.37
0.44 0.42
:
0.37 0.39
0.44 0.42
0.33 0.48
0.39 0.47
0.29 0.36
0.34 0.41
Tab/e 10.10
Throughputs
X for the networks in Fig. 10.10, for the case with a bottle-
neck. Squared Coefficient of Variations
Network K=5
1
Network 3
Network 2
K = 10
K=5
K = 10
K=5
K = 10
;
0.50
0.50
0.52 0.51
0.50
0.50
0.52
zi
0.49 0.47
0.50
0.47 0.52
0.50
0.48 0.49
0.50 0.52
l
The network has a bottleneck.
l
The number of jobs in the network is high.
The larger the coefficients of variation (c& > I), the larger are the deviations. Example 10.7 In this example we consider in more detail the dependency of the approximation accuracy on the number of jobs K in the network. The examination is based on the first network given in Fig. 10.10 together with the same input parameters except that ,~i = 1, for i = 1,2,3. The results for the throughputs X for different numbers of jobs K are shown in Table 10.11. As we can see in Table 10.11, the larger the number of jobs K, the better is the approximation. Table 10.11 Throughput X for the Network 1 shown in Fig. 10.10, for different values of K and combinations of the cii K
3
4
5
10
20
50
E
0.842 -
0.907 0.970
0.940 0.991
0.996 1.00
1.00
1.00
zi
0.716 -
0.856 0.766
0.894 0.805
0.972 0.917
0.998 0.984
1.00
452
ALGORITHMS FOR NON-PRODUCT-FORM
NETWORKS
10.1.4.2 Marie’s Method If the network contains single or multiple server FCFS nodes with generally distributed service times, then an approximate iterative method due to Marie [Mari79, Mari80] can be used. In this method each node of the network is considered in isolation with a Poisson arrival stream having load-dependent arrival rates Xi (Ic). To determine the unknown arrival rates Xi (Ic), we consider a product-form network corresponding to the given non-product-form network. The product-form network is derived from the original network by simply substituting the FCFS nodes with generally distributed service times by FCFS nodes with exponentially distributed service times (and load-dependent service rates). Because the service rates pi(lc) for the nodes of the substitute network are determined in the isolated analysis of each node, the method is iterative where the substitute network is initialized with the service rates of the original network. The load-dependent arrival rates Xi(k) of the nodes 1, , . . , N can be computed by short-circuiting node i in the substitute network (see Fig. 10.11). All other nodes of the network are combined into the composite node c (see Section 8.4).
fig. 10.11 Short-circuiting node i in the substitute network. If there are k jobs at node i, then there are exactly K - k jobs at the composite node c. The load-dependent service rate through the substitute network with K - k jobs and short-circuited node i is, therefore, the arrival rate at node i with k jobs at node i, which is: Xi(k)
= Ap)(K
- k),
for k = 0,. . . , (K - 1).
(10.60)
Here Xp)(K - k) denotes the throughput of the network with node i shortcircuited. This throughput can be computed using any algorithm for a product-form network. The arrival rates X;(k) can now be used for the isolated analysis of the single node of the network. The state probabilities ni(k) of the nodes in the substitute network and the values of the load-dependent service rates p;(k) for the next iteration step have to be computed. It is shown in [Mast771 that in the considered network for each isolated node, a partition of the state space can be found so that the following condition holds: vi(k)
. xi(k)
= Xi(k - 1) . ni(k - 1).
(10.61)
According to this equation the probability that a job leaves a node in state k is equal to the probability that a job arrives at the same node when this node is in state k - 1. The stochastic process, which can be assigned to the node
NON-EXPONENTIAL
DISTRIBUTIONS
453
under consideration, behaves exactly like a birth-death process with birth rates Ai and death rate vi(lc) (see Section 3.1). The state probabilities, which we are looking for, can therefore be computed as follows: k= l,...,K,
(10.62) (10.63)
The new load-dependent service rates for the nodes in the substitute network for the next iteration step are then chosen as follows: pi(k) = vi(k)
for i = l,...,N.
(10.64)
Now the main problem is to determine the rates vi(lc) for the individual nodes in the network. The fact that in closed queueing networks the number of jobs is constant allows us to consider the individual nodes in the network as elementary X(lc)/G/m/K st at ions in the isolated analysis. The notation describes a more specific case of a GI/G/ m node where the arrival process is Poisson with state-dependent arrival rates and for which an additional number E( exists, so that X(lc) > 0 for all Ic < K and X(lc) = 0 for all Ic 2 K. If we do the isolated analysis, we have to differentiate between the following l
Product-form
node
l
Single server node with FCFS service strategy and generally distributed service time
In the first case, a comparison of the Eqs. (10.62) and (10.63) with the corresponding equations to compute the state probabilities of product-form nodes with load-dependent arrival rates [Lave83] shows that the rates vi(lc) correspond to the service rates pi(k). For nodes with exponentially distributed service times and service discipline PS, IS, LCFS PR, respectively, it follows immediately from Eq. (10.64), that the service rates of the substitute network remain unchanged in each iteration step. Next, consider FCFS nodes with generally distributed service times, which can be analyzed as elementary X(k)/G/l/K-FCFS systems. Under the assumption that the distribution of the service time has a rational Laplace transform, we use a Cox distribution to approximate the general service times. Depending on the value of the squared coefficient of variation c‘& of the service time, different Cox models are used: if c& = l/m&& for m = 3,4, . . ., where E is a suitable tolerance area, then we use an Erlang model with m exponential phases, If ~5; >_ 0,5, then a Cox-2 model is chosen. For all other cases a special Cox model with m phases is chosen.
ALGORITHMS FOR NON-PRODUCT-FORM
454
NETWORKS
According to [Mari80], the parameters for the models are determined as follows: l
For a Coz-2 model (Fig. 10.12):
Fig. IO.12
l
Cox-2 model.
The special Cox-m model is shown in Fig. 10.13. A job leaves this model with probability bi after completing service at the first phase or runs through the rest of the (m - 1) exponential phases before it is completed. The number m of exponential phases is given by:
11 1 7j-.
m=
CBi
”
,
Fig. IO.13
”
**
Special Cox-m model.
From Section 1.2.1, we have: b, = 2mc& a /%
=
+ (m - 2) - j/m2
2(m -
For the Erlang-m
+ 4 - 4mcgi
+ 1)
9
[m - bi(m - l)] /%i*
Here bi denotes the probability phase. l
1)(c2Bi
that a job leaves node i after the first
model we have:
NON-EXPONENTIAL
DISTRIBUTIONS
455
General Cox-m model.
Fig. 10.14
Next consider the general Cox-m model, shown in Fig. 10.14. Each of the elementary X(k)/C,/l/K no d es can be assigned a CTMC whose states (k, j) indicate that the service process is in the jth phase and k jobs are at node i with corresponding steady-state probability ~(k, j). The underlying CTMC is shown in Fig. 10.15. The corresponding balance equations for each node i are given by: (10.65) j=l
x&k - 1)7ri(lc - 1,l) + &,&,j?ri(k
+ l,j)
j=l =
Xi(lc)%(b
1)
+
i&,1+,
x&k - 1)7r& - 1,j) + a;$-i&j-i7ri(k,j = h(+G(k?>
11,
(10.66)
- 1)
+ fii,jT(kj).
(10.67)
Let xi(k) denote the probability that there are k jobs at node i and vi(k) the conditional throughput of node i with k jobs: (10.68) j=l F Vi(k)
=
j=l
h,jl%j%
(k
j>
m
Jg 746 8
(10.69) ’
Here b,,j denotes the probability that a job at node i leaves the node after the jth phase. If we add Eq. (10.66) to Eq. (10.67) (for j = 2,. . . , m), we obtain for node i:
j=l
j=l
m
m
456
ALGORITHMS FOR NON-PRODUCT-FORM
Fig. 10.15
NETWORKS
CTMC for the general Cox-m model.
NON-EXPONENTIAL
457
DISTRIBUTIONS
Now we can apply Eq. (10.69) to Eq. (10.70) and obtain: &(k-l).~s(k-l,j)+u,(k+l).~?r~(k+l,j) j=l
j=l
= b(k)
* &i(k’j) j=l
+ Q‘q
&i(k, j=l
j),
Xi(k - l);lri(k - 1) + Vi(k + 1)7ri(k + 1) = Xi(lc)7ri(k) + Vi(k)7l’i(k). If we now consider Eq. (10.65), we finally get for the solution of an elementary X(k)/C,/l/K system in steady-state [Mari78]: Ui(k)7ri(k) = Xi(k - 1)7ri(k - 1). But this is exactly Eq. (10.61). For this reason the state probabilities of an FCFS node with generally distributed service time can be determined using Eqs. (10.62) and (10.63) w h ere Eq. (10.69) is used to compute the conditional throughputs vi (k) of the single nodes. To determine the conditional throughputs, efficient algorithms are given in [Mari80]. In the case of an exponential distribution, vk(k) is just given by: Vi(k)
(10.71)
= /Lis
In the case of an Erlang-m model, vi(k) is given by: n
Vi(k)
=
Y-1 ‘%d”)
Pi
.
(,i,
+ Jgl(-l)‘Ei;k(J)
n
v@sl)) fii
(10’72)
’
with: Y-1 -&k(j)
=
c
z,,,(j),
T,” j h=k-j
n=j
(h, h-1,. . . , &c-j) : b’k E {k,. . . , k - j}, k o
%,h
=
1 + (bi,h,
c h=k-j (pi,h
=
Xi(h),
vh=n-j
>
,
y=min{m,k}.
Pi
For the Cox-2 model (c& 2 0.5), an even simpler equation exists: (10.73)
b(k) - L&l . bill
+ fiil
- ,%2
vi(k) = (Xi(k) + /Gil + /Liz) - vi(k - 1) ’
for k > 1.
(10.74)
458
ALGORITHMS FOR NON-PRODUCT-FORM
NETWORKS
We do not discuss the algorithms for the computation of the conditional throughputs of the other model type (c& < 0.5). The interested reader is referred to the original literature [Mari80]. The computed vi(b) are used in the next iteration step as load-dependent service rates for the nodes in the substitute network, Eq. (10.64). To stop the iteration, two conditions must be fulfilled. First, after each iteration we check whether the sum of the mean number of jobs equals the number of jobs in the network:
i=l
(10.75)
< E,
K where E is a suitable tolerance and K Ki = ~hri(k) k=l
is the mean number of jobs at the ith node. Second, we check whether the conditional throughputs consistent with the topology of the network:
of each node are
1 N rj - -N c ri i=l
<E
for j=l,...,N,
(10.76)
1 N N c ri i=l
where: xj r.?=-=-ej
1 ej
is the normalized throughput of node j and ej is the visit ratio. Usual values for E are 10e3 or 10b4. Thus, the method of Marie can be summarized in the following four steps: Initialize. Construct a substitute network with the same topology as the original one. The load-dependent service rates pi(k) of the substitute network for i = 1,. . . , N and k = 0,. . . , K are directly given by the original network. A multiple server node with service rate ,QL~ and rrzi servers can be replaced by a single server node with the following load-dependent service rate (see also Section 6.5): for Ic = 0, “(Ic)=
?. i Ziin{i, 7rn,}-pi,
for Ic > 0 .
NON-EXPONENTIAL
DISTRIBUTIONS
459
With Eq. (10.60), determine for i = 1,. . . , N and k = 0,. . . ,(K-1) ependent arrival rates Xi(lc) by analyzing the substitute network. Here any product-form analysis algorithm from Chapter 8 can be used. For k = K we have: Xi(K) = 0. Analyze each individual node of the non-product-form network oad-dependent arrival rate X;(k). Determine the values vi(k) and the state probabilities ri(k). The computation method to be used for the vi(k) depends on the type of node being considered (for example Eqs. (10.73) and (10.74) for Cox-2 nodes), while the ~i (k) can all be determined using Eqs. (10.62) and (10.63). Check the stopping Conditions (10.75) and (10.76). If the conditions are fulfilled, then the iteration stops and the performance measures of the network can be determined with the equations given in Section 7.2. Otherwise use Eq. (10.64), pi(k) = v,(k), t o compute the new service rates for the substitute network and return to Step 2.
Fig. IO.16
Closed queueing network.
Example 10.8 In this example we show how to apply the method of Marie to a simple closed queueing network with N = 3 nodes and K = 2 jobs (see Fig. 10.16). The service strategy at all nodes is FCFS and the service times are generally distributed with the mean service times: -
1
= 2seq
Pl
-
1
= 4sec,
P2
-
1
= 2 set,
cl3
and the squared coefficients of variation are: 2 CBl
1,
2 cf32 = 4,
Ci3 = 0.5.
el = 1,
e2 = 0.5,
e3 = 0.5.
=
The visit ratios are given as:
We assume a tolerance of E = 0.001. The analysis of the network can then be done in the given four steps.
460
ALGORITHMS FOR NON-PRODUCT-FORM
NETWORKS
Initialize. The service rates of the substitute network can directly be taken from the given network: /-Q(l)
= Pl(2)
=
0.5,
/42(l)
= 0.25,
= p2(2)
,~(l)
= ~~(2) = 0.5.
We obtain the substitute network from the original model by replacing the FCFS nodes with generally distributed service times by FCFS nodes with exponentially distributed service times. Compute the load-dependent arrival rates Xi (k) for i = 1,2,3, and for k = 0, 1,2: When using the convolution method (see Section S.l), we first need to determine the normalizing constants of the substitute network: G(0) = 1,
G(1) = 5,
G(2) = 17,
and the normalizing constants GE’ for the network with the short-circuited node i with Eq. (8.11) are:
GE)(O)= 1, G;h)
= -9 3
G$‘2) = 2,
GE)O) = 1, G$)
= -7 3
Gg’2) = 1,
The throughput in the short-circuited network is then given by Eq. (8.14):
G~)O) = 1, ($1)
4 = -,
12 Gj;3)2) = -*
node with (K-k)
jobs in the substitute
Now the load-dependent arrival rates can be determined using Eq. (10.60): G;)(l) Xl(O) = X(l)(2) = el = 0.429. G(l) N (2) C
Similarly, the values for the other arrival rates can be determined: X2(O) = 0.214,
X3(O) = 0.167,
X1(1) = 0.333,
X2(1) = 0.167,
h(2)
x2(2>
As(l) = 0.125, A,(2) = 0.
= 0,
= 0,
Perform the isolated analysis of the nodes of the given network. Because c& = 1, the service time of node 1 is exponentially distributed (product-form node) and we have to use Eq. (10.64) vi(lc) = pi(k). The state
NON-EXPONENTIAL
probabilities
DISTRIBUTIONS
461
for node 1 can be determined using Eqs. (10.62) and (10.63):
1
v(o) =
Xl (0)
7rr(l) = 7rr(O) * Pi(l)
= 0.412,
= 0.353,
7rr(2) = 7rl(O) * fi x1(J) j=O/Ll(j+l)
=0.235*
Nodes 2 and 3 are non-product-form nodes. Because c2si 2 0.5, for i = 2,3, we use the Cox-2 distribution to model the non-exponentially distributed service time. The corresponding parameters for node 2 are given as follows: 1 = 0.125. a2 = 2cg,
= 0.0625,
jk?l = 2p2 = 0.5, For node 3 we get: b31
= @32 = a3 =
1.
Using Eqs. (10.73) and (10.74), we can determine the conditional throughputs: v2(1)
=
X2(1&21b2
+ b21P22
X2(l)+riL22
(X2(2)
=
+
LT
+ a2lP21
h(2)jhb2 v2(2)
= ()
357
+ fiaG22
&q + /r&9)
-
v2(1)
= o.152*
Then: v3(1)
For the state probabilities
0.471,
z+(2) = 0.654.
we use Eqs. (10.62) and (10.63) and get:
7r2(0) = 0.443, 7r3(0)
=
= 0.703,
7r2(1) = 0.266, ~~(1) = 0.249,
7r2(2) = 0.291, x3(2) = 0.048.
Check the stopping condition, Eqs. (10.75) and (10.76):
i=l
k=l
K
= 7.976 - 10-3.
Because the first stopping condition is not fulfilled, it is not necessary to check the second one.
462
ALGORITHMS FOR NON-PRODUCT-FORM NETWORKS
Now we compute the new service rates of the substitute network using Eq. (10.64): pi(k) = v,(k),
for i = 1,2,3,
and go back to Step 2. Determine the load-dependent arrival rates using Eq. (10.60) and analyze the nodes of the network isolated from each other:
After four iterations we get the following final results: Marginal probabilities: ~~(0) = 0.438, 7rl(l) = 0.311, 7rl(2) = 0.251,
~~(0) = 0.438, ~~(1) = 0.271,
n3(0) = 0.719, rg(l) = 0.229,
7r2(2) = 0.291,
n,(2) = 0.052.
Conditional throughputs: ~~(1) = 0.356, Utilizations,
z+(l) = 0.466,
V2(2) = 0.151,
~~(2) = 0.652.
Eq. (7.19): pl = 1 - ~~(0) = 0.562,
p2 = 0.562,
p3
= 0.281.
Mean number of jobs, Eq. (7.26): Fl = ~~(1) + 2 .7rr(2) = 0.813,
E2 = 0.853,
x3 = 0.333.
In most cases the accuracy of Marie’s method has proven to be very high. The predecessor of Marie’s method was the iterative method of [CHW75a], which is a combination of the FES method and a recursive numerical method. A network to be analyzed is approximated by K substitute networks where the kth substitute network consists of node Ic of the original network and one composite node representing the rest of the network. The service rates of the composite nodes are determined using the FES method under the assumption that the network has a product-form solution. Then, the K substitute networks are analyzed using the recursive numerical method. The performance measures obtained from this analysis are approximate values for the whole network. If the results are not precise enough, the substitute networks are analyzed again with appropriate modifications of the service rates. These results are then the approximate values for the given network. The results thus obtained are normally less accurate than the ones derived by Marie’s method.
NON-EXPONENTIAL
463
DISTRIBUTIONS
10.1.4.3 Extended SUM and BOTT The summation method and the bottapprox method can easily be extended to non-product-form queueing networks with arbitrarily distributed service times. For the mean number of jobs K; in a -/G/l-FCFS node, we use Eq. (6.54): (10.77) Again we introduce a correction factor in Eq. (10.77) as we did in the productform case, and we get: (10.78) For multiple server nodes the mean number of jobs ??i in a -/G/m-FCFS node is given by: Ki=mpi+L.
The introduction
ai - Pmi. 1 - pi of the correction factor yields: Pi 1 _
(10.80)
- ai +Pm,.
K-m,--a
Kmm, ’ pi
Now we can use the SUM method for networks with these two additional node types by introducing Eqs. (10.78) and/or (10.80) in the system equation (9.23) and calculate the throughput X and other performance measures in the same way as in Section 9.2. We can also use the BOTT method if we use the corresponding formulae for the function to h(K;) for the bottleneck ??i = zbott. For the node type -/G/l-FCFS, /L(K~,,~,) has the form: mbott)
=
-Cl + b ’ ??;bott) + d(l+
b * Kbott,j2 + 4 - Kbott . (a - b) 7 2. (a - b)
with b = (K - 1 - a)/(K - 1). For the node type -/G/m-FCFS, the function h(Kbott) cannot be given in an exact form and therefore an approximation is used:
h(Kb,,,)= b- dc 2 +f -mbott ’ with: f= b = zbott
K-m
. f -t mbott
bott
-
a 7
K
-
a=---
1+
cg 7
mbott
+ Pmbott
+ a,
c = b2
f(4
a mbott
* zbott
’ f>s
The SUM and BOTT methods for non-product-form queueing networks can also easily be extended to the multiclass case as it was done in Sections 9.2 and 9.3 for product-form queueing networks.
ALGORITHMS FOR NON-PRODUCT-FORM
464
10.1.5
Closing
Method
for Mixed
NETWORKS
Networks
In [BGJ92], a technique is presented that allows every open queueing network to be replaced by a suitably constructed closed network. The principle of the closing method is quite simple: The external world of the open network is replaced by a -/G/l node with the following characteristics: 1. The service rate pL, of the new node is equal to the arrival rate X0 of the open network. 2. The coefficient of variation cg, of service time at the new node is equal to the coefficient of variation of the interarrival time of the open network. 3. If the routing behavior of the open network is specified by visit ratios, then the visit ratio coo of the new node is equal to 1. Otherwise the routing probabilities are assigned so that the external world is directly replaced by the new node. The idea behind this technique is shown in Figs. 10.17 and 10.18.
Fig. 10.17
Open network before using the closing method.
A very high utilization of the new node is necessary to reproduce the behavior of the open network with adequate accuracy. This utilization is achieved when there is a large number of jobs K in the closed network. The performance measures are sufficiently accurate after the number of jobs in the network has passed a certain threshold value Ki (see Fig. 10.19). Depending on the chosen solution algorithm, the following values for the number of jobs are proposed (see [BGJ92]): K = 100 K = 5000 K = 10000
mean value analysis, summation method, SCAT algorithm.
To check whether the number of jobs K in the closed network is sufficiently large, we can repeat the closing method with a larger number of jobs K and compare the results.
NON-EXPONENTIAL
Fig. IO.18 method.
Closed queueing network with the additional
DISTRIBUTIONS
-/G/l
465
node for the closing
Performance measure 1
Exact value
Fig. IO.19
Threshold in the closing method.
The closing method can be extended to multiple class queueing networks and mixed queueing networks. In the case of multiple class networks, the service rate poor of the new node is given given by R . Xcr, with r = 1, . . . , R, where R is the number of classes. In the case of mixed queueing networks, the new node is added to the open classes only and then we can use any method for closed queueing networks also for mixed queueing networks. Open nonproduct-form networks are easier to analyze using the methods introduced in Section 10.1.3. But the closing method in combination with the SCAT algorithm or the SUM method is an interesting alternative to the MVA (see Section 8.2) for mixed queueing networks in the product-form case. The main advantage of the closing method is the ability to solve mixed non-product-
form-queueing networks.
466
ALGORITHMS FOR NON-PRODUCT-FORM NETWORKS
Example 10.9 In this first example, the closing method is shown for an open product-form queueing network with Xe = 5, p12 = 0.3, p13 = 0.7, and ~21 = ~31 = 0.1 (see Fig. 10.17). The service rates are ~1 = 7, ,Q = 4, and ~3 = 3. Using Jackson’s method for this network, we obtain the following values for the mean response time T and the mean number of jobs in the open network:
In Table 10.12 the values for these performance measures are shown as a function of &lo&, the number of jobs in the closed network. The results are obtained using MVA. From the table we see that we obtain exact results if Table 10.12 Mean response time ?; and mean number of jobs ?? for several values of the number of jobs in the closed network K closed
T-
10 20 30 50 60 100
K
0.95 1.24 1.29 1.30 1.30 1.30
4.76 6.17 6.47 6.52 6.52 6.52
K&,sed is large enough and that, in this case, K&s&
= 50 is the threshold.
Sink Fig. 10.20
Open queueing network.
In this second example, the closing method is used for Example 10.10 an open non-product-form network (see Fig. 10.20) with Xe = 3 and cg = 1.5. The routing probabilities are given by: Pl2 = Oe3,
P13 = O-7,
P33 = 0.1,
p34 =
0.9,
p22
= 0.3,
p41 = 0.4.
$I24 = 0.7,
NON-EXPONENTIAL
DISTRIBUTIONS
467
Table 10.13 Input parameters for the network of Fig. 10.20
1 2 3 4
9 10 12 4
0.5 0.8 2.4 4
The other parameters are listed in Table 10.13. With the method of Pujolle (see Section 10.1.3), we can analyze the open network directly to obtain: T = 4.43 and K = 13.47. The alternative method we use to solve the network of Fig. 10.20 is to first use the closing method and then use the SUM method (see Section 10.1.4.3) to approximately solve the resulting closed non-product-form network. The results are shown in Table 10.14. From this table we see that the threshold for Table 10.14 Mean response time T and mean number of jobs K as a function of K closed K closed 100 200 500 1000 5000 10000
T
Fz
3.94 4.16 4.29 4.34 4.37 4.37
11.83 12.49 12.88 13.02 13.12 13.12
K closed is about 5000 and the results of the closing method and the method of Pujolle differ less than 3%. The open networks of Example 10.9 and Example 10.10 Example 10.11 are much easier to analyze using the Jackson’s method and the Pujolle method, respectively, rather than by the closing method. Mixed product-form networks are easy to analyze using the MVA for mixed networks but also using SCAT or SUM in combination with the closing method, especially for networks with multiple server nodes. But the main advantage of the closing method is the solution of mixed non-product-form queueing networks. As an example we use a mixed network with N = 5 nodes and R = 2 classes. Class 1 is closed and contains K1 = 9 jobs (see Fig. 10.21). Class 2 (see Fig. 10.22) is open with arrival rate Xer = 5 and the squared coefficient of variation c& = 0.7.
468
ALGORITHMS FOR NON-PRODUCT-FORM
Fig. 10.21
Closed class 1 of the mixed network.
I
I -_____-_____--______-------62--KFig. 10.22
NETWORKS
~---..-.---.---------.-------’
Open class 2 of the mixed network with an additional
There is no class switching in this network. two classes are as follows:
= 0.7,
P31,l
=
p42,l
=
P53,l
=
The routing probabilities
p24,l
= 0.3,
p35,1
=
p23,2
= 0.8,
p30,2
=
0.7,
1.0,
p42,2
=
1.0,
1.0,
p53,2
=
1.0.
0.7,
of the
Class 2: p12,2 = 1.0,
Class 1: ~~2,~ = 1.0, p23,l
node.
0.3,
p24,2
= 0.2,
p35,2
=
0.3,
The other input parameters are listed in Table 10.15. Now we solve the equivalent closed queueing network with an additional node (CL, = 5, c& = 0.7) for class 2 using the SUM method (see Section 10.1.4.3) and K2 = 500. The results for the utilization pi and the mean number of jobs Ki are listed in Table 10.16. Since SUM is an approximation method, we obtain deviations from discrete-event simulation results, which are about 6% for the utilizations pi and about 5% for the mean number of jobs Ki.
DIFFERENT SERVICE TlMES AT FCFS NODES
Table 10.15 Input parameters mixed queueing network Nodes
Pi,1
4 4 6 4 5
1 2 3 4 5
Table 10.16 Utilizations discrete-event simulation
1 2 3 4 5
10.2
= Pi,2
= cf2
the
mi
3 4 3 2 2
0.4 0.3 0.3 0.4 0.5
p and mean number of jobs from (DES) and the closing method (CLS)
0.89 0.90 0.85 0.46 0.46
DIFFERENT
ctl
for
469
0.84 0.85 0.80 0.43 0.43
4.9 5.5 4.0 1.1 1.1
5.2 5.8 4.1 1.0 1.0
SERVICE TIMES AT FCFS NODES
If the given non-product-form network is a “small-distance” away from a product-form network, we may be able to modify the MVA algorithm to approximately solve the network. Based on this idea [Bard791 extended the MVA to priority scheduling as well as to networks with FCFS nodes and with different service rates for different job classes. The key idea for treating the latter case is to replace the MVA Eq. (8.31) by: T+(k)
R 1 _ = b + c his . Kis(k - I~). ir s=l
(10.81)
This equation can also be extended for multiple server nodes [Hahn881 as: Fir(k)
= $
a (10.82) m,--2
+l
C(?rLi-l-j) i-4 j=o
with: 1 ;=
R Ai, -.cs=l Ai
1 t%s ’
470
ALGORITHMS FOR NON-PRODUCT-FORM
NETWORKS
In order to demonstrate the use of this method, consider a queueing network consisting of N = 6 nodes and R = 2 job classes. Nodes 1 to 5 are FCFS nodes with one server each and node 6 is an IS node. Class 1 contains Ki = 10 jobs and class 2 has K2 = 6 jobs. All service times are exponentially distributed. Table 10.17 shows all service rates and visit ratios for the network. Service Tab/e 10.17 Parameters described in Fig. 10.22 i
PLzl
l-422
1 2 3 4 5 6
* 1 1 1 1 0.1
0.5 1 1 1 1 lo7
of the network
61
8 2 2 2 2 1
ei2
20 2 4 6 8 1
rate ~11, marked with an * in the table, is varied and the mean overall response time for all class-l jobs is determined. We compare the results of the extended MVA with DES. Tab/e10.18 Mean overall response time of a class-l job for different service rates ~11 Pll
MVA-Ext DES
.
0.5
2
128
260.1 260.1
143.1 141.1
105.4 102.0
The values given in Table 10.18 show that the extension of the MVA by Eq. (10.81) gives good results for queueing networks with FCFS nodes with different service rates for different job classes. SCAT and Bard-Schweitzer algorithm can also be extended [F&+39] in a similar fashion.
10.3
PRIORITY
NETWORKS
We will consider three different methods for networks with priority classes: Extended MVA, shadow server technique and extended SUM method. 10.3.1
Extended MVA (PRIOMVA)
If a queueing network contains nodes where some job classes have priority over other job classes, then the product-form condition is not fulfilled and the techniques presented in Chapters 8 and 9 cannot be used. For the approximate analysis of such priority networks [BKLC84] suggest an extension of the MVA.
PRIORITY NETWORKS
471
The MVA for mixed queueing networks is thus extended to formulae for the following priority node types: l
-/M/l-FCFS
PRS (preemptive resume):
As per this strategy, as soon as a high priority job arrives at the node it will be serviced if no other job with higher priority is in service at the moment. If a job with lower priority is in service when the higher priority one arrives, then this low priority job is preempted. When all jobs with higher priority are serviced, then the preempted job is resumed starting at the point where it was stopped. It is assumed that the preemption does not cause any overhead. The service strategy within a priority class is FCFS. l
-/M/l-FCFS
HOL (non-preemptive head-of-line):
Here a job with higher priority must wait until the job that is processed at the moment is ready to leave the node even if its priority is lower than the priority of the arriving job. The service strategy within a class is again FCFS. The priority classes are ordered linearly where class R has lowest priority and class 1 has the highest one. At first we consider -/M/l-FCFS PRS nodes and derive a formula for the mean response time T,. In this case the mean response time of a class-r job consists of: l
The mean service time:
l
The time it has to wait until all jobs with higher or the same priority are serviced:
r KS c-a s=l
l
PL,
The time for serving all jobs with a higher priority the response time Fr :
that arrive during
r-1 T, ’ A, . c PS s=l
If we combine these three expressions, then for the mean response time of a priority class-r job, 1 5 T 5 R, we get the following implicit equation:
T,$K”+C-+L, r-1 T, *A, s=l
PS
s=l
A
Pr
472
ALGORITHMS FOR NON-PRODUCT-FORM NETWORKS
where K, denotes the mean number of class-s jobs at the node. With the relation ps = X,/p3 it follows that:
(10.83)
If, on the other hand, we consider an -/M/l-FCFS HOL node, then the considered job can not be preempted any more. In this case the mean response time consists of: l
The mean service time:
l
The time needed to serve higher priority waiting time with the mean W =
l
The mean time needed for the job that is being served at the moment:
jobs that arrive during the
cRPs s=l
l
L’
The mean service time of jobs with the same or higher priority that are still in the queue:
cr K-P,. s=l
PS
If we combine these three terms for the mean response time of a priority class-r job, 1 5 T 5 R, we get:
rz-ps
Tr=C_+qf+C s=l
Rp s=l
r-l Tr-i s=l
.$+
(
)
If we use the relation ps = Xs/ps, then we get: 07 2+ c s=l Ps
e s=r+lPs r-l
CCPS
s=l
ps .
(10.84)
PRIORITY NETWORKS
473
The equations just developed give the exact T, values for M/M/lpriority queues. But in a priority queueing network the arrival process at a node is normally not Poisson. This condition means that if we apply the equations given to a priority network, we do not get exact results. Furthermore, these equations allow class-dependent service time distributions, while in product-form queueing networks a class-independent service time distribution is required at FCFS nodes. Nevertheless the approximation of [BKLC84] is based on these assumptions. Let xi,” denote the mean number of class-s jobs at node i that an arriving class-r job sees given the network population vector k. If Eq. (10.83) is applied to node i of a priority network, then this equation can be generalized as follows:
(10.85) 1 - c Pis s=l
Similarly, Eq. (10.84) is of form:
T;,.(k)= & +
r i+;‘(k) s=l
s=r+l
PiS
Pis
r-l 1 -
For the computation of $,T’(k), to hold:
++
c-
c s=l
(10.86)
Pi.3
the arrival theorem (see [BKLC84]) is assumed
wj,“(k) = K. eas(k
- l,),
for closed classes r, for open classes r.
&s(k),
For open class s, the equation: -
KiS(k) = A,* ci,.Ti,(k)
(10.87)
(10.88)
can be used to eliminate the unknown Ei, (k). Then in Eqs. (10.85) and (10.86) we only need to determine the utilizations pis. For open classes we have pis = &slPis, independent of the number of jobs in the closed classes. But for closed class s the utilization pis depends on the population vector and it is not obvious which pi,(k), 0 5 k 5 K, should be used. However, [ChLa83] suggest: Pis
=
pis
(k
-
K;,),
(10.89)
where (k - Kis) is -the population vector of the closed classes with Ki, jobs fewer in class s. If Ki, is not an integer, then linear interpolation is used to
474
ALGORITHMS FOR NON-PRODUCT-FORM NETWORKS
get a convenient value for pis. Equation (10.89) is based on the assumption that if there are already Ei, jobs at node i, then the arrival rate of class-s jobs at node i is given by the remaining k - Ki, jobs in the network, and hence: p
= hs(k - ;i?is)
is
=
pis(k
- Xi,).
PiS
If we insert Eqs. (10.87)-(10.89) into Eqs. (10.85) and (10.86), respectively, then the mean response time of the closed class r at the priority node i when there are k jobs in the closed classes of the network:
PRS:
T+(k)
= 1 -
sgl
ds
r &(k HOL:
- 1,)
+
5
pis(k
-
1,)
c
1 = -+
Ti,(k)
(10.90)
,
r-l
s=l
s=r+l
/JJiS
Pis
r-l
f%r 1 -
c
(10.91)
P:,
s=l
where: p, =
pis(k - Ki,),
for closed class s, for open class s.
is PiS,
Analogously, for the mean response time of the open class r at the priority node i we get:
PRS: Fir(k) =
(10.92) 1- c
s=l
HOL:
T+(k)
Pi,
=
r-l
l-
c
s=l
Pls
Now the modified MVA with the additional priority node types can be given. The closed classes are denoted by 1, . . . , CL and the open classes by 1 * * >OP. Because of the fact that the values of l?is (k) are also needed for the open class s, it is necessary to determine the performance measures for the open classes in each iteration step of the algorithm. Initialize. For each node i = 1, . . . , N determine the utilizations of the open classes op = 1, . . . , OP using Eq. (8.55), that is: Pi,op =
1 -X0P%,0p7 Pi,op
PRIORITY NETWORKS
and check for stability Set Ki,,l(O) 1?* - - ?CL.
&,
475
5 1).
= 0 for all nodes i =
1,
. . . , N and all closed classes cl =
Construct a closed model that contains only jobs of the closed classes and solve this model using the MVA. Iterate for k = 0,. . . , K: I] r = l,... [] (8.46).
Determine Ti, (k) for all nodes i = 1, . . . , N and all closed classes , CL using Eqs. (8.53), (8.54), and (10.90) and (10.91), respectively. Determine X,(k) for all closed classes r = 1, . . . , CL using Eq.
-22.31 Determine ??i, (k) for all nodes i = 1, . . . , N and for all closed classes r = l,... , CL using Eq. (8.47). [] Determine ??ir (k) for all nodes i = 1, . . . , N and all open classes r = l,... ,OP using Eqs. (8.51) and (8.52), and Eqs. (10.88), (10.92) and (10.93) respectively. Determine
all other performance measures using the computed
results. The computation sequenced from the highest priority down to the lowest one. The algorithm is demonstrated in the following example: Example 10.12 Consider the queueing network in Fig. 10.23. The network consists of N = 2 nodes and R = 3 job classes. Class 1 has highest priority and is open, while the classes 2 and 3 are closed. Node 1 is of type
----
Source
w
Fig. 10.23
-
Mixed
network
with
Open Class Closed Class
priority
nodes.
-/G/l-PS and node 2 is of type M/M/l-FCFS PRS. The arrival rate for class1 jobs is Xi = 1. Each closed class contains one job (K2 = Ka = 1). The
476
ALGORITHMS FOR NON-PRODUCT-FORM
NETWORKS
mean service times are given as: -
1
-
= 0.4,
Pll
-
1
= 0.6,
P21
1
= 0.3,
-
1
P12
Pl3
1 = 0.5, CL22
-
= 0.5,
1
=
0.8.
p23
The visit ratios are: ell = 2,
el2
=
1,
el3
1,
=
e21
=
1,
Initialize. Determine the utilizations ~11 = hell+ L
= 0.8,
e22 = 0.5,
e23 = 0.4.
of the open class:
~21 = JW21.
=0.6.
k
Pll
Set Frz(O)
= 7713(O)
=
FT22(0)
=
= 0.
X23(O)
Analyze the model using the MVA. MVA-iteration
for k = (0,O):
(-1 Mean number of jobs in the open class 1: Node 1, Eq. (8.51): Pll %l(O,
[l + 7G2(0,0>
+ K3(0,
O)]
= 4
0) = 1 -
Pll
Node 2, Eqs. (10.88) and (10.92): K21(0,0>
MVA-iteration
= Xl-
VP21 e21. -=& 1 - P21
for k = (0,l):
1-2.11 Mean response time for the closed class 3: We use Eqs. (8.53) and (10.90) to get:
A-+Eg?!.l Tl3(0,1)
= I*
1
CL13
[STEP-]
1 -
Throughput
= 2.5,
T23(O,
1) =
-4.25.
Pll
l-
P21
of class 3, from Eq. (8.46): X3(0, 1) = 0.238.
IJ
Mean number of jobs in class 3, from Eq. (8.47): T&3(0,1)
= 0.595,
K23(O,l)
= 0.405.
-
PRIORITY NETWORKS
477
1-1 Mean number of jobs in the open class 1, using Eqs. (8.51)) (lO.SS), and (10.92), respectively: &(o, 1) = 6.38, &(O, 1) = 1.5. MVA-iteration
for k = (1,O) :
p5TE-xq
C&(1,0)
ri+zFTq
X2(1,0) = 0.308,
-1
Kiz(l,
mj
3i;,,(l,O)
MVA-iteration
= 1.5,
%(l,
0) = 0.462, = 5.85,
0) = 3.5,
?&(l,
0) = 0.538,
K&,0)
= 1.5.
%(l,l)
= 3.5,
for k = (1,l):
piTiF%q
T&l)
= 2.393,
[I
X2(1, 1) = 0.241,
[I
&(l,
1) = 0.578,
z&l,
[I
FIs(l,
1) = 3.654,
??&l,
1) = 4.923,
1) = 0.422,
11
X3(1, 1) = 0.178,
jJ
3T&l)
= 0.65,
1?&1)
= 0.35,
[I
K&l)
= 8.91,
17&l)
=1.5.
These are the final results for the given mixed priority
network.
Up to now we considered a pure HOL service strategy or a pure PRS service strategy at a node. In many cases we may need to combine these two priority strategies. Let us therefore consider a combination of PRS and HOL: HOL-group
Fig. 10.24
I
Splitting
PRS-group
into HOL and PRS classes.
In this combination, some job classes are processed by HOL and some job classes are processed by PRS. It is assumed that HOL jobs always have higher priority than PRS jobs (see Fig. 10.24). In this case:
478
ALGORITHMS FOR NON-PRODUCT-FORM 0
NETWORKS
An arriving HOL job cannot preempt a HOL job that is still in progress, but it can preempt a PRS job that is still in progress. Therefore the response time of a HOL job is not influenced by the arrival of PRS jobs. r K-P,
T,=c,-
+k;+@-;).2+;.
s=l
s=l
s=l
By simplifying the above expression we get, in analogy to Eq. (10.91): %~glp) Tir(k)
= ;
+ s=$+l P&t-yL)
+ ‘=’ 1 -
j&s
0
=
pi,&
-
,
r-l c s=l
(10.94)
As
%s).
A PRS job can always be preempted by a job with higher priority. priority class r we therefore get: r-1 T, -A,
T,,f++c-+A. s=l
For
PS
PS
s=l
Pr
In analogy to Eq. (10.90) we get after simplification:
(10.95)
@is
=
pi+
-
Eis).
If these formulae for Tir (k) are used in the MVA instead of Fir, then we get an approximate MVA algorithm for closed queueing networks with a mixed priority strategy. In the next section we introduce the technique of shadow server that can also be used to analyze priority networks. 10.3.2
The Method
of Shadow
Server
10.3.2.1 The Original Shadow Technique This technique, shown in Fig 10.25, was introduced by [Sevc77] for networks with a pure PRS priority strategy. The idea behind this technique is to split each node in the network into R parallel nodes, one for each priority class. The problem then is that in this extended node, jobs can run in parallel whereas this is not possible in the original node. To deal with this problem and to simulate the effect of preemption, we iteratively increase the service time for each job class in the
PRIORITY NETWORKS
479
shadow node. The lower the priority of a job, the higher is the service time for the job. After all iterations we have: Si,r
L
Si,r*
Here 5i,r = l/pir denotes the approximate service time of class-r jobs at node i in the shadow node. Class 1 rfut-eClass i
cL
--g--@-
Original
Class R
l
’
IIIIIGShadow Node
Node
Fig. 10.25
-J-j-@-
. .
The idea behind the shadow technique.
Shadow Algorithm
Transform the original model into the shadow model. Set X+ = 0. Iterate: 1-1
Compute the utilization e Pi,r
(1
for each shadow node:
=
Xi,r
(10.96)
’ i?i,r.
Compute the shadow service times: Si,r
Si,r
=
r-l l
-
C s=l
'
(10.97)
&,s
Here si,r denotes the original service time of a class-r job at node i in the original network while s”i,r is the approximated service time in the shadow network. 1-1 Compute the new values of the throughput of class r at node i, Xi,r, of the shadow model using any standard analysis technique from Chapters 8 and 9. If the Xi,, differ less than E in two successive iterations, then stop the iteration. Otherwise go back to Step 3.1
ALGORITHMS FOR NON-PRODUCT-FORM
480
10.3.2.2
NETWORKS
Extensions of the Shadow Technique
Measurements on real systems and comExtension of Kaufmann 10.3.2.2.1 parisons with discrete-event simulation results have shown that the technique of shadow server is not very accurate. Because of the lack of accuracy, [Kauf84] suggested an extension of the original shadow technique. This extended method differs from the original method in that an improved expression &,r is used instead of pi,f. To derive this improved utilization we define: c = class of the job under consideration, Ni,r = number of class-r jobs at node i in the shadow node, s.Pt a,r -
approximated mean service position time, r-l
1=1
For a pure PR strategy the following relations hold: l
The improved utilization
/3ir is expressed using s$,.:
/%,r = k,r /k,r
= P(K,r
‘> O>,
Xi,r
= P(Ni,l
= 0 A NQ
A Ni,r
P(Ni,r
l
* sft7
= 0 A * * * A Ni,r-1
= 0
> 0) * Pi,r
= P(Mi,r
= 0 A Ni,r
> 0) * pi,r,
> 0) = P(Mi,r
= 0 A Ni,r
> 0) * pi,r
* of’,:.
The last equation can now be used to derive an expression for s$: Pt ‘i,r
-
P(Ni,r P(Mi,r
> 0)
= 0 A Ni,r
> 0)
* IL”’
1 =
l
By using conditional probabilities into:
the last expression can be rewritten
l
Now we rewrite the numerator of the last expression and define the correction factor Si,r which we use to express /3i,r: 1 p
i,r
=
1 -
P(Mi,r
> 0 A C # T 1 Ni,r
> 0)
’ “,”
PRIORITY NETWORKS
481
With: bi,r = 1 - P(A&,r > 0 A c # T ] N+. > 0) and: j4,r
= h,r
bi,r
’ gi ,7-7
= Xi ,r * SPt 2.,7-T
we obtain:
Oi,r
(Ji,r
As we see, the improved utilization &,r is just given by dividing the utilization fii,r by the correction factor 6i,T. In [Kauf84] it is shown that the following approximation holds:
6;r = PW - &-)I + 44 *Pb-+ 1) 7 Ptr)[l - Ptr>12 + a(r> ’ Pi,r
for r = 2,3, . . . , R,
(10.99)
with: r-l
r-l
‘I
This correction factor S changes only Step 3.2 in the original shadow algorithm: Si,r Si,r
=
r-l 1 -
(10.100)
1
C s=l
6
* Pi,s i,s
If the utilizations of the shadow nodes are very close to 1, then the expression: r-l 1 -
C $ * Pi,s SE1 2,s
can be negative because of the approximate technique. In this case the iteration must be stopped or the extended shadow technique with bisection can be used. In the original and extended shadow method the expression: r-l cs=l
Pi,s
can be very close to 1. Then the service time si,r gets very large and the throughput through this node is very low. In addition, the value of: r-l cs=l
Pi,s
482
ALGORITHMS FOR NON-PRODUCT-FORM
NETWORKS
will be very small in the next iteration step and in some cases the shadow technique does not converge but the computed values move back and forth between the two values. Whenever such swinging occurs, we simply use the average of the last two values of throughput in Step 3.3. Thus we have used bisection. If class switching is allowed in the network then 10.3.2.2.2 Class Switching we use the shadow technique in combination with the concept of chains (see Section 7.3.6.1). The resulting algorithm can now be given in the following steps: Determine the chains in the network. Transform the original model into the shadow model. Set X+. = 0. Iterate: [I Pi,r
s”i,r
=
Xi,r
* s”i 7r *
Si,r
=
r-l 1 -
c 3~1
1 6
* Pi,3 i,s
Here si,r is the original service time of a class-r job at node i in the original network. I] Transform the class values into chain values (to differentiate between class values and chain values, the chain values are marked with *). I] Compute the performance parameters of the transformed shadow model with the MVA. If the model parameters swing, then set
and compute the performance measures with this new response time. -1 Transform chain values into class values. If the XF,, (chain value) in two successive steps differs less than E, stop the iteration. Otherwise go back to Step 4.1.
PRIORITY NETWORKS
483
HOL Group
PR Group Original
Shadow Node
Node
fig. 10.26
Shadow technique and mixed strategy.
The original shadow tech10.3.2.2.3 Class Switching and Mixed Priorities nique [Sevc77] can only be used for networks with a pure PRS-strategy, but in the models of a UNIX-based operating system (see Section 13.1.4), for example, or in models of cellular mobile networks [GBB98], a mixed priority strategy is used. For this reason the original shadow technique is extended to deal with mixed priority strategies. Consider the situation where the classes 1,2, . . . , u are pure HOL classes and u + 1,~ + 2,. . . , R are pure PRS classes, as shown in Fig. 10.26. We define: u Yi,r =
c
Ni,l,
l=l,l#r
and get the following relations: The utilizations
l
can be expressed as: n Pi,r =A. v- .sPt z,r) fii,r = P(Ni,r >
0).
l
A PRS job can only be served if no other job with higher priority waiting:
l
For HOL jobs, two cases have to be distinguished:
is
ALGORITHMS FOR NON-PRODUCT-FORM NETWORKS
484
1. Class-r jobs as well as HOL jobs are at the node and a class-r job that is just served. Then we get: N,r > 0
A
c= r
A
yi,r > o,
or 2. Class-r jobs but no HOL jobs are waiting for service at a node. Then we get: Ni,r > 0
A
yi,r = 0.
For the HOL classes we therefore get: - The utilization of node i by class-r HOL jobs is given by the probability that at least one class-r job is at node i (and is served) and other HOL jobs are waiting, or at least one class-r job is at node i and no other HOL jobs are waiting for service: h,r
= Pi,r * /b,r > 0 A C = r A yi,r > 0) + P(Ni,r > 0 A yi,r = O)] * pi,r = [P(Ni,r > 0) - P(Ni,r > 0 A yi,r > 0 A c # T)] * pi,rs
= [P(n;i,r
- As in the case of a pure PRS strategy, we get analogously: P(Ni,r
sPt zz a,r
P(Ni,r
=
1-
> 0) - P(Ni,T
> 0) >
0A
-3 yi,r
>
P((yi,r >O/IL#T)ANi,r P(Ni,r
0 /I
c
# r) * “yr
>0)
“‘,‘*
> 0)
- The fraction in the numerator is now rewritten using conditional probabilities: = 1 - P(Yi,r l
> 0 A c # r) (
Ni,r
> 0) ’
As before, we now define the correction factor I/I~,~as:
@i,r = 1- P(yi,r >0 A c # r 1Ni,r
> O),
and get: ji,r
= 1;1- ’ &,7-e i,r
(10.101)
PRIORITY NETWORKS
485
As we see, the correction factor for the HOL case differs from the PRS case only in the summation index. We then get to the following equations:
e(r>[l - e(r)1 + PW * ix4 = @(r)[l - @(?+)I2 + p(r) ’ @i,r’
‘i”r
P(r) =
’ u Pi* r L c wr,k k=l,k#r r-l c
wr,k
=
1si k
if T E HOL, ’
LPi r wr,k
\ k=l
(10.102)
if r E PR, ’
7
si,r
u c k=l,k#r
t%,k,
if r E HOL,
r-l c k=l e(r)
&,k,
if r E PR,
+ pi,r*
Up to now we assuC/ass Switching and Arbitrary Mixed Priorities 10.3.2.2.4 med that the classes 1, . . . , u are pure HOL classes and the classes u + 1, . . . , R are pure PRS classes. In the derived formulae we can see that in the HOL case, the summation in the correction factor is over all HOL classes excluding the actually considered class. In the PRS case, the summation is over all classes with higher priority. With this relation in mind it is possible to extend the formulae in such a way that every mixing of PRS and HOL classes is possible. Let Jr be the set of jobs that cannot be preempted by a class-r job, r $ Jr. For the computation of the correction factor, this mixed priority strategy has the consequence that : l
For an HOL class r the summation is not from 1 to u but over all priority classes s E &..
l
For a PRS class r the summation is not from 1 to r - 1 but over all priority classes S E
For any mixed priority strategy we get the following formulae: Si,r si,r
=
1 - C 1 * fii,s ’ SE:E?. +i13
e(r)Cl - e(r)1 + P(r) - @W “J- = e(r)[l - e(r)12 + p(r) * @i,r ’
(10.103)
486
ALGORITHMS FOR NON-PRODUCT-FORM
c Pi,
NETWORKS
,r
=
wr,k
MET
‘h,k
=
’
%k -7 Sir
Fig. 10.27 Simple network for Example 10.13. Example 10.13 Consider the network, given in Fig. 10.27. Node 1 is a -/M/l-PRIORITY node and node 2 is a -/M/l-FCFS node. The network contains two priority job classes, Class 1: PRS, and Class 2: HOL, and class switching is allowed. The routing probabilities are given by: 0.4,
Pll,ll
=
0.5,
P11,12
P12,ll
=
0.5,
P21,ll
=
1.0,
p 12,12 = 0.4, p22,11 = 1.0.
=
p11,21 = 0.1, p12,22
= 0.1,
The other parameters are: Sll = 0.1, s12 = 0.1, K = 3, E = 0.001.
s21 = 1.0,
522 = 1.0,
Compute the visit ratios ei, in the network.
For this we use the
formula: eir = 5
2
ejsPjs,ir
j=ls=l
and get eri = 1.5, e12 = 1.0, epl = 0.15, e22 = 0.1. Determine the chains in the network and compute the number of jobs in each chain (refer to Section 7.3.6). In Fig. 10.28 the DTMC state diagram for the routing matrix for the chains is shown, and it can easily be seen that we can reach each state from every other state (in a state (i,r), i denotes the node number and T denotes the class number). Therefore we have only one chain in the network. IT ransform the network into the shadow network, as it is shown in Fig. 10.29.
PRIORITY NETWORKS
Fig. 10.28
DTMC state diagram of the routing matrix.
Originally
fig. 10.29
487
Node 1
The Shadow model for the network in Example 10.13.
The corresponding parameters are: sll = 0.1, 512
ell = 1.5,
e31 = 0.15,
e12 = 0.0,
e32 = 0.1,
= 0,
= 0, ~22 = 0.1,
S21
~31
= 1.0, e21 = 0.0,
532
= 1.0,
e22 = 1.0.
Computxe the visit ratios eTqper chain and remember that we have only one chain:
c %Then we get: eT1 =
ell +
el2
ell + el2
= 1,
e;, =
e21 + e22
ell +
el2
= -,2
3
e& =
e31 + e32
1 = -*
ell + el2
6
Compute the scale factors air. For this we use the equation:
488
ALGORITHMS FOR NON-PRODUCT-FORM
NETWORKS
air = ei, c eis’ SE9 and get: a11
=
ell ell +
= 1.0, el2
e21 Q21
+
= e31
a22
0,
+
ell +
= 0, e12
e21
= 0.6,
=
=
e22
e31 Q31
el2
e22
--
= e21
ai2 =
+ e32
0~32
= e31
e32
+
1.0,
e22
= 0.4. e32
Compute the service times per chain that are given by the equation:
which gives:
ST1 = Sll * 0111+ s12 * cl!12= 0.1, s& = s21 * a21 + s22 * a22 = 0.1, SGl
=
S31
* cY31 +
S32
* Ci!32
= 1.0.
Now we can analyze the shadow network given in Fig. 10.29 using the following parameters: 1 eT1 = 1.0, e;, = i, e& = 6’ sil = 0.1,
s;i = 0.1,
s& = 1.0,
a11 = 1.0,
a31 = 0.6,
a12 = 0.0,
cl!21= 0.0, cl!22= 1.0
K = 3,
N = 3,
Cl
J2
=
{2),
=
Q32
= 0.4, E = 0.001,
(1).
Start the shadow iterations. 1. Iteration: & = 0.505, XT1 = 5.049, q1 = 0.117,
& = 0.337, A;, = 3.366, g1 = 0.145,
& = 0.841, A;, = 0.841, s”zl = 1.000.
pT1 = 0.541, XT1 = 4.597,
& = 0.447, A;, = 3.065
& = 0.766, A& = 0.766,
;Tl = 0.125,
~$1~= 0.142,
s”& = 1.000.
2. Iteration:
PRIORITY NETWORKS
489
3. Iteration: & = 0.569, xT1 = 4.533, s”T1= 0.122,
pG1= 0.430, A;, = 3.022, z;I = 0.147,
pi1 = 0.756, A;, = 0.756, s;r = 1.000.
& = 0.556, A;, = 4.531, ST1= 0.124,
pal = 0.447, A;, = 3.012, s”gl = 0.144,
,o& = 0.755, A;, = 0.755, s”zl = 1.000.
/& = 0.565, Xi1 = 4.528,
& = 0.437, A;, = 3.018,
& = 0.754, A;1 = 0.754.
4. Iteration:
5. Iteration:
Now we stop the iteration because the difference between the X* in the last two iteration steps is smaller than E. Retransform the chain values into class values: Xl1 = all * XT1 = 4.528, AZ2 = o22 * A;, = 3.018, As1 = agl * A;, = 0.452, X32
=
Cl!32
* A;, = 0.301.
If we rewrite the results of the shadow network in terms of the original network, we get the following final results: Xpprox) = Xl1 = 4.528, XFprox) =
X31
= 0.452,
Xprox)
= X22 = 3.018,
AgProx) =
X32
=
0.301.
Verify the results. To verify these results we generate and solve the underlying CTMC via MOSES [BoHe96] (see Chapter 12). The exact results for this network are given by: AFact) = 4.52, $yt) = 0.45,
AFact) = 3.01, Apt)
= 0.30.
As we see, the differences between the exact MOSES results $Tact) and the approximated results X!Tpprox) are very small.
490
ALGORITHMS FOR NON-PRODUCT-FORM
NETWORKS
10.3.2.2.5 Quota Nodes The idea of quota is to assign to each user in the system the percentage of CPU power he or she can use. If, for example, we have three job classes and the quota are given by 41 = 0.3, q2 = 0.2 and q3 = 0.5, then class 1 customers get 30% of the overall CPU time, class 2 customers get 20% CPU time, and class 3 customers get 50% of the overall CPU time. In an attempt to enforce this quota, we give priority to jobs so that the priority of job class r, pri,, is: pri, = E.
(10.104)
1 CPU 1
I
fig.
10.30
Transformation of a quota node into the corresponding shadow node.
The higher the pri,, the higher is the priority of class r. As we can see, the priority of class-r jobs depends on the CPU utilization at node i and the quota qr of class T. A high priority job class is influenced only a little by lower priority job classes, while lower priority job classes are influenced much more by higher priority job classes. A measure for this influence is the weight i;l,,j, which is defined as:
(10.105)
The weights 13,,j are a measure for how much the rth job class is influenced by the jth job class. The closer these weights are to one, the higher is the influence. Since an active job influences itself directly (r = j), we have LZ~,~= 1. The mean virtual service time of a class-r job can approximately be given as: virt s . (27)
_ -
S&T-
1 - 5 pi,?-* iz;,,k lC=l
(10.106)
PRIORITY
Class
1
NETWORKS
491
1 . . . m
++I Class
4%
2
’
1
. . .
. . .
m
43
m
Class
R
1 . . . m
+--El -/M/m
Node
Shadow
Fig. 10.31
Shadow transformation
-/M/m
of a -/M/m
Node
node.
If we compare this equation with the shadow equation to compute the shadow service time, then the basic structure of the equations are the same. The idea, therefore, is to use the weights ;j,,j to include quota into the shadow technique. In order to transform a quota node with R job classesinto a shadow node, we apply the shadow technique as shown in Fig. 10.30. The resulting equation for computing the service time at each shadow node can now be given as: Si,r
if r E HOL,
1- 5 2 Si,r
=
/c=l kfr
r-l
Gi,k =
’ jji,k '
si,r _
priz prii + prif ’
(10.107) if r f PRS,
prik = E# 2,
By using the weights Gi,k, we make sure that each class gets the amount of CPU time as specified by the CPU quota. In the case of an arbitrary mixed priority strategy, we use the summation index < as defined before.
492
ALGORITHMS
FOR NON-PRODUCT-FORM
NETWORKS
10.3.2.2.6 Extended Nodes We now introduce a new node type, the so-called extended -/M/ m node. It is very easy to analyze a regular -/M/m node since we only have to transform it into the corresponding shadow form and use the equations for -/M/ m nodes when analyzing the node. The transformation of a -/M/m node into its corresponding shadow form is shown in Fig. 10.31. The node shown in Fig. 10.32 is called extended -/M/m node. This node contains R priority classesthat are priority ordered (job class 1 has highest priority, job class R has lowest priority). The CPU can process all job classes,while at the APU (associate processing unit), only the job classes For a better overview, a queue U 7* ** 7R (1 5 u 5 R) can be processed.
Fig.
Extended
10.32
-/M/m
node (original
form).
is introduced for each job class. The result of this transformation is shown in Fig. 10.33. For example, let u = R = 3, then the job classes1, 2, and 3 can
I I
R
Class
u
Class
I Fig.
Class
10.33
Class
:
u-l
1
Extended
I
t-
:
I
-/M/ m node with priority
queues.
enter the CPU, while the APU can only be entered by class-3 jobs. Priority class-3jobs can always be preempted at the CPU, but in this case they cannot be preempted at the APU because this job class is the only one entering the APU. In general, the actual priority of a job also depends on the node it
PRIORITY
NETWORKS
493
enters. If it enters the CPU, the priority order remains unchanged (priority class 1 has highest priority and priority class R has lowest priority), while at the APU priority, class-u jobs have highest priority. Since preemption is also possible at the APUs, if we have more than one priority class, the shadow transformation can be applied to each node, from which a switch to the APU is possible. The result of this transformation is shown in Fig. 10.34.
Class
Class
R
u-l
---0 Class
1
CPU
.. l
CPU
u-l
1
--o--
Fig. 10.34
Extended
-/M/ m node (complete
shadow form).
The components APU,, . . . , APUR form the original APU in the network. Now it is possible to apply the shadow algorithm to the network transformed in this way. Because the priority order at the APU can change, we have to separate the APU iterations from the CPU iterations meaning that in one shadow step we first iterate over all CPUs and then over all APUs. Because over CPU and APU, we usually get different final of the separated iterations service times for CPUi and APUi, resulting in a so-called asymmetric node [BoRo94]. At the CPU, class-l jobs have highest priority, while at the APUs, class-u jobs have highest priority. In the shadow iteration, only Step 3.2 is to be
ALGORITHMS
494
FOR NON-PRODUCT-FORM
NETWORKS
changed as follows: Qc : &J.
Si,r
= l-
c S>C,SEE?-
c=l
R.
7”‘7
&*Pi,s’ (
An application for this node type can be found in Section 13.1.4 where it plays an important role. 10.3.3
Extended
SUM
The SUM method (see Section 9.2) can be applied to priority networks if we extend the formula for the mean number of jobs in the system, Fir of an individual priority node as we did for an FCFS node with multiple job classes. The extended method is referred to as PRIOSUM. Here we use the formulae for single station queueing systems with priorities (see Section 6.15). In this context it is convenient to consider class1 as the classwith the highest priority and class R as the one with the lowest priority. Of course, this convention leads to minor changes in the formulae. l
-/M/l-FCFS-HOL node After we have introduced multiple classesand considered the reverse order of priorities, we derive the following formulae from Eqs. (6.106) and (6.107): WiO
(10.108)
wir = (1 - air)(l - Crir+i) ’ with: r-l oar
pij
c j=l
=
7
r > 1,
(10.109)
otherwise,
0,
and the remaining service time: (10.110) We obtain the final result for the mean number of jobs xi, with: Kir
=
Pir
in node i (10.111)
+ Qir,
and the introduction of correction factors as in the FCFS case:
Fir = pir +
xi, F i+ j=l Pi, 1 - moir K-2
l-K-2
)(
mgir+l
* >
(10.112)
PRIORITY
l
NETWORKS
495
-/M/m-FCFS-HOL node In this case we also can use Eqs. (10.108) and (10.109). Together with:
(Pi>R Pij wio= -pm% mipi cj=l Lij
(10.113)
Fir = m&r + Qir7
(10.114)
and:
we obtain:
Fir
= mipir +
(
1_
K-m,-1 K-&
1_
Oir
>(
K-7&-1 ~~~~
pmi (Pi) . mifh Oir+l
>
(10.115)
a -/G/l-FCFS-HOL node Again with Eqs. (10.108) and (10.109) and the remaining
service time:
R Wio
=
i
C
(10.116)
XijO!2(Bij),
j=l we obtain: $xir Kir
=
pir
$J j=l
XijQJa(Bij)
+ 1
(
K-l-a
K-1
air
1-K-l-a >(
7 - K-1
Q’ir+l
(10.117)
>
with the second moment of the service time distribution: + 1 &2CBij) = (l-lij)2 , $j
(10.118)
c& + 1
(10.119)
a=:!. l
-/G/m-FCFS-HOL
node (10.120)
$hr Kir
=
W&r
+
1 _
K-mi-a K-m,
5 Pijaz(Bij)pij
j=l
Oir
pm% (Pi> .____ 1 _ )(
K-mi-a K-m,
Oir+l
miPi
(10.121)
ALGORITHMS
496
FOR NON-PRODUCT-FORM
NETWORKS
If preemption is allowed, we get: (10.122) Let WJ,‘:’ be the mean waiting time of a job of class r in node i if we consider only the classes with the r highest priorities. Then we get:
W(” 20
w!‘) = ” l
-/M/l-FCFS-PRS
(1 -
(10.123)
.
air+l)
node
r p.. Iv;;’ =x--$.
(10.124)
i=l
Using Eq. (10.111) and Little’s theorem we obtain: Fir = /3ir + AirWire
(10.125)
For wir we use Eqs. (10.122), (10.123), and (10.124) and introduce the correction factor:
.1
(10.126)
l
-/M/m-FCFS-PRS
node WCr)
= i0
-. p77;2i mi~ir+l
’ c j=l
Pij -7 Pij
(10.127)
with probability of waiting P?I;Li,which we get from the probability of waiting Pmi by replacing pi by gir+i, With Eq. (10.114) and Little’s theorem: Kir = mipir $
AirFir
and using Eqs. (10.122), (10.123), and (10.127)
r Fir = mif&r •t
(10.128)
7
we obtain:
SIMULTANEOUS
l
RESOURCE
POSSESSION
497
-/G/l-FCFS-PRS node With the remaining service time: (10.130) and Eqs. (10.118), factors:
l
(10.119),
(10.122), and (10.123) and the correction
-/G/m-FCFS-PRS node Using the corresponding equations
as before we get for this node type: (10.132)
.
I
(10.133) 10.4
SIMULTANEOUS
RESOURCE
POSSESSION
In a product-form queueing network, a job can only occupy one service station at a time. But if we consider a computer system, then a job at times uses two or more resources at once. For instance, a computer program first reserves memory before the CPU or disks can be used to run the program. Thus memory (a passive resource) and CPU (or another active resource such as a disk) have to be reserved at the same time. To take the simultaneous resource possession into account in performance modeling, extended queue&g networks were introduced [SMK82]. I n addition to the normal (active) nodes these networks also contain passive nodes that consist of a set of tokens and a number of allocate queues that request these tokens. These passive nodes can also contain additional nodes for special actions such as the release or generation of tokens. The tokens of a passive node correspond to the service units of a passive node: A job that arrives at an allocate queue requests a number of tokens from the passive node. If it gets them, it can visit all other nodes of the network, otherwise the job waits at the passive node. When the job arrives
498
ALGORITHMS
FOR
NON-PRODUCT-FORM
NETWORKS
at the release node that corresponds to the passive node, the job releases all its tokens. These tokens are then available for other jobs. We consider two important cases where we have simultaneous resource possession. These two cases are queueing systems with memory constraints and I/O subsystems. 10.4.1
Memory
Constraints
.. . @ T erminals
Fig. 10.35
Central-server
model with memory constraint.
In Fig. 10.35 we see the typical case of a queueing network with memory constraints where a job first needs to reserve some memory (passive resource) before it can use an active resource such as the CPU or the disks. The memory queue is shown as an allocate node. The job occupies two resources simultaneously: the memory and the CPU or memory and disks. The maximum number of jobs that can compete for CPU and disks equals the number of tokens in the passive node, representing available memory units. The service strategy is assumed to be FCFS. Such models can be solved by generating and solving the underlying CTMC (via SPN P for example). To avoid the generation and solution of resulting large CTMC, we use an approximation technique. The most common approximation technique is the flow-equivalent server method (FES) ( see Section 8.4). We assume that the network without considering the simultaneous resource possession would have product-form solution. The subsystem that contains the simultaneous resource possession is replaced by an FES node and analyzed using product-form methods. For simple systems with only one job class, the load-dependent service rates pi(k) of the FES node can be determined by an isolated analysis of the active nodes of the subsystem (CPU, disks), where the rest of the network is replaced by a short circuit path, and determination of the throughputs X(lc) for each possible job population along the short circuit. If K denotes the number of memory units,
SIMULTANEOUS
RESOURCE
then the service rates of the FES node are determined P(k) =
Je)l
POSSESSION
as follows
499
[SaCh81]:
for k = 1,. . . , K, for k > K.
X(K),
If there are no more than K jobs in the network, then there is no memory constraint, and the job never waits at the allocate node. Example 10.14 To analyze the central-server network with memory constraint given in Fig. 10.35, we first replace the terminals with a short circuit and compute the throughputs X(k) of th e resulting central-server network using any product-form method as a function of the number of jobs k in the network. Then we replace the subnetwork by an FES node (see Fig. 10.36). This reduced network can be analyzed by the MVA for queueing networks
Fig. 10.36
Reduced
network.
with load-dependent service rates or as we do it here using a birth-death type CTMC (seeSection 3.1). With the mean think time l/p at the terminals and the number of terminals M, we obtain for the birth rates: h=(M-k)p,
k=O,l,...,
K,
(10.134)
and for the death rates:
uk =
v4
i= l,...,K,
X(K),
i > K.
(10.135)
Marginal probabilities of the FES node are (seeEq. (3.11)): r(k) = n-(O)(“;ii& -
(10.136)
. I-I”, fp+ i=l
and with Eq. (3.12): .
(10.137)
500
ALGORITHMS
FOR NON-PRODUCT-FORM
NETWORKS
Now the throughput X of the FES node that is also the throughput terminal requests is obtained by:
x = 5 n(i)b$= 5 +& + VK 5 i=l
i=l
n(i).
of
(10.138)
i=K+l
As a numerical example, assume that the mean think time l/p = 15sec, the maximum number of programs in the subnetwork is K = 4 and the other parameters are as shown in Table 10.19. We calculate the mean response time T for terminal requests as a function of the number of terminals M and, in comparison, the mean response time + for the case that the main memory is large enough so that no waiting for memory space is necessary, i.e., K 2 M. In the case with ample memory, the overall network is of product-form type and, hence, T can be determined using algorithms from Chapter 8. From Table 10.19 network Node CL3 Pl,
Parameters
CPU
Disk
89.3 0.05
for the Central-server 1
Disk
44.6 0.5
Table 10.20 we see that Q! is only terminals M is not too large.
a
2
26.8 0.3
Disk
3
13.4 0.15
good approximation if the number of
Table 10.20 Response time for the centralserver system with memory constraint M
10
20
30
40
50
?;
1.023 1.023
1.23 1.21
1.64 1.46
2.62 1.82
7.03 3.11
LF
For other related examples see[Triv82], and for a SHARP E implementation of this technique see [STP96]. A n extension of this technique to multiple class networks can be found in [Saue81], [B ran82], and [LaZa82]. This extension is simple in the case that jobs of different job classesreserve different amounts of memory. But if on the other side, jobs of different job classesuse the same amount of memory, then the analysis is very costly and complex. Extensions to deal with complex memory strategies like paging or swapping also exist [BBC77, SaCh81, LZGS84]. For the solution of multiple class networks with memory constraints see also the ASPA algorithm (average subsystem population algorithm) of [JaLa83]. This algorithm can also be extended to other casesof simultaneous resource possession.
SIMULTANEOUS
Problem SHARPE.
10.1
Verify the results
RESOURCE
POSSESSION
501
shown in Table 10.20 using PEPSY and
Solve the network of Fig. 10.35 by formulating it as a Problem 10.2 GSPN and solve it using SHARPE or SPNP. Compare the results obtained by means of FES approximation. 10.4.2
I/O Subsystems
Fig. 10.37
I/O subsystem.
The other very important case of simultaneous resource possessionoccurs in modeling an I/O subsystem consisting of several disks that are connected by a common channel and a controller with the CPU. In the model of Fig. 10.37 channel and controller can be considered as one single service station because they are either simultaneously busy or simultaneously idle. An I/O operation from/to a certain disk can be subdivided into several phases: The first phase is the seek phase when the read/write head is moved to the right cylinder. The second phase is the latency phase, which is the disk rotation time needed for the read/write head to move to the right sector, Finally we have the data transfer phase. During the last two phasesthe job simultaneously needs the disk and the channel. In the special casesof rotational position sensing systems (RPS) the channel is needed only in the last phase. Delays due to simultaneous resource possessionoccur if a disk is blocked and no data can be transmitted because another disk is using the channel. For the analysis of I/O subsystem models, different suggestions have been published. In the formulation of [Wilh77], an increased service time for the disks is computed approximately. This increased service time takes into account the additional delay for the competition of the channel. Each disk is then modeled as an simple M/G/l queueing system. To determine the disk service time, another model for the probability that the channel is free when needed is used. Hereby it is assumedthat only one accesspath for each disk exists, and a disk cannot be used by several processors simultaneously. This limitation is removed by [Bard801 and [Bard821 where the free path probability is determined using the maximum entropy principle. We consider a method for analyzing disk I/O systems without RPS, introduced in [LZGS84]. The influence of the channel is considered by an additional
502
ALGORITHMS
FOR NON-PRODUCT-FORM
NETWORKS
term in the mean service time si of the disk. disk; is given then as: Si
=
S,i
+
Sli
+
Sti
The mean service time of the
+
(10.139)
sCi,
with: sli = mean latency time,
s,i = mean seek time, Sti
= mean transfer
time,
sci = mean contention
time.
The contention time sci is calculated iteratively. To obtain an approximate formula for s,i , we first use the formula for the mean number of customers in a -/M/l FCFS node in an open queueing network (see Section 6.3): (10.140)
K,=-J&. i
In the case of the channel, any requests ahead of a diski request at the channel must be associated with some disk other than i, so the equation is slightly modified to: zch
=
Pch
-
&h(i)
1 -
Pch
(10.141) ’
where:
Pch
=
5 i=l
(10.142)
&h(i)
is the utilization of the channel and &h(i) is the utilization of the channel due to requests from diski. Here we have assumed that the number of disks is equal to U. Because the mean channel service time is the sum of the mean latency time S,i and the mean transfer time sti , we obtain for the contention time: sci
=
Pch
1 -
&h(i) p Ch
* hi
+
(10.143)
St&
and, hence, for the mean disk service time: Si
=
S.yi +
(Sli
+
Sti)(l 1 -
-
p&(i))
Pch
(10.144) *
The measures &h(i) and &h, and other interesting performance measures can be calculated using the following iterative algorithm. We assume that the overall model is a closed queueing network that will be of product-form type if we have no contention at the channel. Initialization.
System throughput
X = 0.
SIMULTANEOUS
RESOURCE
POSSESSION
503
Iteration:
I
STEP 2 1 channel:
For each diski, the contribution
of diski to the utilization
(10.145)
pch(i) = A. ei(qi + sti). . STEP 2 2
of the
Channel utilization:
ikh
=
(10.146)
x&h(i). i=l
-9
Compute the mean service time for each diski using Eq. (10.144).
I] Use MVA or any other product-form method method to calculate system throughput X. Return to Step 2.1 until successive iterates of X are sufficiently close. Obtain the performance
measures after the final iteration.
Table 10.21 Iterative calculation of the throughput and the channel utilization. k’ch (i>
2 3 4 5 6 7 8 9 10 11 12
0 0.056 0.030 0.050 0.038 0.047 0.041 0.045 0.042 0.044 0.043 0.043
0 0.167 0.090 0.150 0.113 0.140 0.123 0.135 0.127 0.132 0.129 0.130
l’ch
0 0.836 0.449 0.749 0.564 0.701 0.611 0.674 0.635 0.659 0.645 0.651
si
s ei
11.00 23.24 12.96 18.16 14.10 16.63 14.77 15.96 15.18 15.63 15.36 15.48
x out
0.056 0.030 0.050 0.038 0.047 0.041 0.045 0.042 0.044 0.043 0.043 0.043
Example 10.15 We consider a batch computer system with multiprogramming level K = 10, the mean CPU service time of 15sec, one channel, and five disks [LZGS84], each with the mean seek time s,i = 8sec, the mean latency time sli = 1 set, and the mean transfer time sti = 2 sec. We assume ei = 1 for each node. With a queueing network having six nodes, we can solve the problem iteratively using the preceding method. The first 12 steps of the iteration are given in Table 10.21. This iterative algorithm can be easily extended to disk I/O systems with: l
Multiple classes
. RPS
504
ALGORITHMS
l
Additional
FOR NON-PRODUCT-FORM
NETWORKS
channels
0 Controllers For details see [LZGS84]. [STP96]. Problem algorithm
F or related examples
see
10.3 Verify the results shown in Table 10.21 using the iterative of this section and implement it in SHARPE.
Problem 10.4 Develop a stochastic and solve using SPN P. 10.4.3
of the use of SHARPE,
Method
of Surrogate
reward
net model for Example 10.15
Delays
Apart from special algorithms for the analysis of I/O subsystem models and queueing networks with memory constraints, more general methods for the analysis of simultaneous resource possessionin queueing networks are also known. Two general methods with which a satisfactory accuracy can be achieved in most casesare given in [FrBe83] and [JaLa82]. Both methods are based on the idea that simultaneously occupied service stations can be subdivided into two classes:the primary stations that are occupied before an access to a secondary station can be made. In the caseof I/O subsystem models, the disks constitute the primary stations and the channel the secondary station. In the case of models with memory constraints, the memory constitutes the primary stations and the CPU and disks constitute the secondary stations. The time while a job occupies a primary station without trying to get service from the secondary station is called non-overlapping service time. This non-overlapping service time can also be zero. The time while a job occupies both the primary and the secondary station is called overlapping service time. Characteristic for the method of surrogates [JaLa82] is that the additional delay time can be subdivided into: l
The delay that is caused when the secondary station is heavily loaded. Heavy load means that more jobs arrive at a station than can immediately be served.
l
The delays due to an overload at the primary station. This is exactly the time one has to wait for a primary station while the secondary station is not heavily loaded.
To estimate the delay times, two models are generated, where each model provides the parameters for the other model. In this way we have an iterative procedure for the approximate determination of the performance measuresof the considered queueing model. This method can be applied to single class closed queueing networks that would otherwise have product-form solution
if we neglect the simultaneous resource possession.Studies have shown that the method of surrogate delays tends to overestimate the delay and therefore
SIMULTANEOUS
RESOURCE
POSSESSION
505
underestimate the throughput. In all examples we considered , the deviation compared to the exact values was found to be less than 5%. 10.4.4
Serialization
Up to now we only modeled contention for the hardware resources of a computer system. But delays can also be caused by the competition for software resources. Delay that is caused because a process has to wait for software resourcesand that makes a serialization in the system necessary is called serialization delay. Examples for such software resources are critical areas that have to be controlled by semaphoresor non-reentrant subprograms, Serialization can be considered as a special case of simultaneous resource possession becausejobs simultaneously occupy a special (passive) serialization node and an active resource such as CPU or disk. The phrase serialization phaseis usedhere for the processingphase in which at most one job can be active at any time. A job that wishes to enter the serialization phase or that is in the serialization phase is called a serialized job. The processingphasein which a job is not serialized is called the non-serialized phase, and a job that is in a non-serialization phase is called a non-serialized job. There can always be more than one job in the non-serialization phase. All nodes that can be visited by serialized jobs are called serialized nodes, all other nodes are called non-serialized nodes. Each serialized node can be visited by serialized jobs as well as non-serialized jobs. An iterative algorithm for the analysis of queueing networks with serialization delay is suggested by [JaLa83]. This algorithm can also be used for multiple class networks where the jobs can be at most in one serialization phase at one time. The starting point of this method is the idea that the entrance of a job in a serialization phase shall lead to a class switch of that job. Therefore a class-r job that intends to enter the serialization phase s is denoted as a class-(s,r) job. When the job leaves the serialization phase, it goesback to its former class. In each serialization phase, there can be at most one job at a time. Non-serialization phasesare not limited in this way. The method uses a two layer model and the iteration takes place between these two layers. For more information see [JaLa83]. There are also other formulations for the analysis of queueing networks with serialization delays, but these methods are all restricted to single class queueing networks. In the aggregate server method [AgBu83], an additional node for each serialization phase is introduced and the mean service time of the node is set equal to the mean time that a job spends in the serialization phase. The mean service time of the original node and the additional node have to be increased in a proper way to take into consideration the fact that at the original node the service of non-serialized jobs can be hindered by the service of serialized jobs and vice versa. Therefore, iteration is needed to solve
the model,
506
ALGORITHMS
FOR NON-PRODUCT-FORM
NETWORKS
Two different techniques to model the deleterious effect of the serialization delays to the performance of a computer system are proposed by [Thom83]: an iterative technique and a decomposition technique. The decomposition technique also uses two different model layers. At the lower layer the mean system throughputs for each possible system state is computed. These throughputs are then used in the higher level model to determine the transition rates between the states from which the steady-state probabilities can be determined by solving the global balance equations of the underlying CTMC. More details and solution techniques can be found in [AgTr82], [MiiRo87], and [SmBr80]. A SHARPEimplementation can be found in [STP96]. Another complex extension of the models with serialization delays are models for the examination of synchronization mechanism in database systems (concurrency control). These mechanisms make sure that the data in a database are consistent when requests and changes to these data can be done concurrently. These models can be found in [BeGo81], [CGM83], [MeNa82], [Tay87], or [ThRy85].
10.5
PROGRAMS
WITH
INTERNAL
CONCURRENCY
A two-level hierarchical model for the performance prediction of programs with internal concurrency was developed by Heidelberger and Trivedi (see [HeTr83]). Consider a system consisting of A4 servers (processors and I/O devices) and a pseudo server labeled 0. A job consists of a primary task and a set of secondary tasks. Each primary task spawns a fixed number of secondary tasks whenever it arrives at node 0. The primary task is then suspended until all the secondary tasks it has spawned have finished execution. Upon spawning, each of the secondary tasks is serviced by network nodes and finally on completion return to node 0. Each secondary task may have to wait for its siblings to complete execution and return to node 0. When all its siblings complete execution, the primary task is reactivated. It should be clear that a product-form queueing model cannot be directly used here as the waiting of the siblings at node 0 violates the assumptions. Of course, under the assumption of service times being exponentially distributed, a CTMC can be used. In such a CTMC, each state will have to include the number of tasks of each type (primary, secondaryl, secondary2, etc.) at each node. The resulting large state space makes it infeasible for us to use the one-level CTMC model even if it can be automatically generated starting from an SPN model. The two-level decomposition we use here is based on the following idea: Assume that the service requirements of the primary and secondary tasks are substantial so that on the average, queue lengths in the network will have time to reach steady state before any spawned task actually completes. We could then use a product-form queueing network model to compute throughputs and other measures for a fixed number of tasks of each type. This is assumed to be a closed product-form network and, hence, efficient algorithms such as MVA
PARALLEL
PROCESSING
507
can be used to compute steady-state measures of this lower level queueing network. A CTMC is constructed as the outer model where the states are described just by a vector containing the numbers or each type of tasks currently in the system. To describe the generator matrix of the CTMC, we introduce the following notation. For simplicity, we assume that the parent task spawns exactly two children after arriving at node 0. The parent tasks are labeled 0, while the children tasks are labeled 1 and 2. Let ai denote the number of tasks of type i. Then a generic CTMC state is a = (a~, ar,az). The state space of the CTMC will then be:
S ={a:O~a~~N,O~a~~N-ao,O~a~~N-ao,N
(a0- l,w + l,aa + 1)
x0(4
(ao,m (u0,u1,u2
X1(8)(1
(a0 (a0
+ +
-
La2)
- 1) l,Ul 1, al,
-
142) a2 - 1)
Transition
da, b)
Xa(a)(l
-n(a)) - r2(a))
Xl
(a>>
(a)@1
A:!(4 6-2(4
>
Task Task Task Task Task
0 1 2 1 2
Explanation completion, completion, completion, completion, completion,
tasks 1 and 2 spawned sibling active sibling active sibling waiting sibling waiting
implementation of this algorithm see [STP96].
10.6
PARALLEL
PROCESSING
Two different types of models of networks with parallel processing of jobs will be considered in this section. The first subsection deals with a network in which jobs can spawn tasks that do not need any synchronization with the spawning task. In the second subsection we consider fork-join networks in which synchronization is required. 10.6.1
Asynchronous
Tasks
In this section we consider a system that consists of m active resources (e.g., processorsand channels). The workload model is given by a set of statistically
508
ALGORITHMS
FOR NON-PRODUCT-FORM
NETWORKS
independent tasks. Each job again consists of one primary and zero or more statistically identical secondary tasks. The primary task is labeled 1 and the secondary tasks are labeled 2. A secondary task is created by a primary task during the execution time and proceeds concurrently with it, competing for system resources. It is assumedthat a secondary task always runs independently from the primary task. This condition in particular means that we do not account for any synchronization between the tasks. Our treatment here is based on [HeTr82]. The creation of a secondary task takes place whenever the primary task enters a specially designated node, labeled 0; the service time at this node is assumedto be 0. Let eii, i = 1,2, . . . , m denote the average number of visits to node i per visit to node 0 by a primary task. Furthermore ei2, i = 1,2,. . . , m denotes the average number of visits to node i by a secondary task. After completing execution, a secondary task leaves the network and eir, ei2 < 00. Let l/pij denote the average service time of a type j task at node i. Each task type does not hold more than one resource at any time. The number of primary jobs in the system is assumedto be constant K and concurrency within a job is allowed only through multitasking, while several independent jobs are allowed to execute concurrently and share system resources. In the casethat the scheduling strategy at a node is FCFS, we require each task to have an exponentially distributed service time with common mean. In the case of PS or LCFS-PRS scheduling, or when the node is an IS node, an arbitrary differentiable service time distribution is allowed and each task can have a distinct service time distribution. Since concurrency within a job is allowed, the conditions for a product-form network are not fulfilled (see [BCMP75]). Th erefore we use an iterative technique where in each step a product-form queueing network is solved (the product-form solution of each network in the sequenceis guaranteed due to the assumptions on the service time distributions and the queueing disciplines). The queueing network model of the system consists of two job classes,an open classand a closed class. The open classesare used to model the behavior of the secondary tasks and, therefore, the arrival rate of the open classis equal to the throughput of the primary task at node 0. The closed class models the behavior of the primary tasks. Because there are K primary tasks in the network, the closed class population is K. Notice that the approximation assumesthe arrival process at node 0 is a Poissonprocess that is independent of the network state. Due to these two assumptions (Poisson process and independence of the network state), our solution method is approximate. A closed form solution is not possible because the throughput of the closed chain itself is a non-linear function of the arrival rate of the open chain. For the solution of this non-linear function, a simple algorithm can be used. In [HeT’r82] regu1a f aIsi was used and on the average only a small number of iteration steps was necessary to obtain an accurate solution of the non-linear function.
PARALLEL
PROCESSING
509
Let Xuz denote the arrival rate and X2 the throughput of the open class, respectively. If any of the queues is saturated, then X2 < X02. Furthermore let Xi denote the throughput of the closed class at node 0. We require X2 = Xi and for the stability of the network, we must have X2 = X02. Therefore the following non-linear fixed-point equation needs to be solved: Wo2)
=
x02.
(10.147)
Here X1(X02) is the throughput of the primary task at Node 0, given that the throughput of the secondary task is X02. Equation (10.147) is a non-linear function in X02 and can be evaluated by solving the product-form queueing network with one open and one closed job class for any fixed value of X02. Let XT and Xz, denote the solution of Eq. (10.147), i.e., Xi = X& = Xr (Xi,). Here Xi, is the approximated arrival rate of a secondary task. At this point, the arrival rate of a secondary task is equal to the throughput of a primary task at node 0. To get the approximate arrival rate X02, we can use any algorithm for the solution of a non-linear function in a single variable. Since the two classes share system resources, an increase of X02 does not imply an increase in Xi; we conclude that Xi is a monotone non-increasing function of X02. If of the open class, then the stability x yax denotes the maximum throughput condition is given by: x02 < ,yax. The condition for the existence of a stable solution, no queue is saturated, for Eq. (10.147) is given by: lim X1(X02) < AFax. x02+Xy=
(10.148) i.e., a solution where (10.149)
If a stable solution exists, it is also a unique one (monotonicity property). If the condition is not fulfilled, primary tasks can generate secondary tasks at a rate that exceeds the system capacity. The node that presents a bottleneck determines the maximum possible throughput of the open class. The index of this node is given by: (10.150) Here argmax is the function that determines the index of the largest element in a set. Therefore the maximum throughput is given by: max x2
=
Pbott,2
(10.151)
ebott,2
In Fig. 10.38 the three types of possible behaviors, depending on the system parameters, are shown. In case (a), node bott is utilized by the primary task, i.e.: ebott,l pbott,l
>
0.
(10.152)
510
ALGORITHMS
FOR NON-PRODUCT-FORM
NETWORKS
6 i
Open
Chain
Open
Throughput
Open
Chain
Chain
Throughput
Throughput
Fig. 10.38 Three types of possible behavior of the approximation method tention at bottleneck (b) N o contention at bottleneck, moderate contention devices (c) No contention at bottleneck, little contention at other devices.
(a) Conat other
When the queue of node bott grows without bound due to an excessive arrival of jobs of the open job class, the throughput of the closed job class will approach zero. If Condition (10.152) is not satisfied, but some other network nodes are shared by two job classes,then either case (b) or case (c) results depending on the degree of sharing. Because case (b) yields a unique solution to (10.147), we can conclude that (10.152) is a sufficient but not necessary condition for convergence, while (10.150) is both necessary and sufficient. Compared to Condition (10.150), Condition (10.152) has the advantage that it is testable prior to the solution of the network. We can also extend this technique to include caseswhere more than one type of secondary task is created. Assume, whenever a primary task passes node 0, a secondary task of type k, (Ic = 2,. . . , C) is created with probability pk. To model the secondary tasks, C - 1 open job classes are used with arrival rate Xek each. Let Xe = CF=, Xek denote the total arrival rate of the secondary tasks. The individual arrival rates are constrained so that AOk = pkAO$ 2 2. The total throughput of secondary tasks is set equal to the throughput of the primary tasks at node 0. This condition again defines a non-linear equation in the single variable X. This equation is similar
PARALLEL
PROCESSING
511
(b)
Fig. 10.39 I/O (b).
The central-server
model without
overlapped
I/O (a) and with overlapped
to Eq. (10.147) and must be solved. As a concrete example we consider the central-server model. The standard model without overlap is shown in Fig. 10.39a. Node 1 represents the CPU and nodes 2, 3, 4, and 5 represent I/O devices such as disks. In the case without overlap, the number of primary tasks is constant C. The processing time of the primary tasks at the CPU is assumed to be random with mean l/pp. As soon as a primary task finishes processing, a secondary task is created with probability f. Once a secondary task is generated, the primary task returns to the CPU for processing (with mean l/pi), while the secondary task starts at the CPU queue (with mean service time l/piz). With probability ~12, the secondary task moves to an I/O device. The secondary tasks leaves the system with probability 1 - p2 and returns to the CPU with probability ~2. In the case that a primary task completes CPU processing and does not create a new secondary task (with probability 1 - f), it begins an overhead period of CPU processing with mean ~/PO. On completion of this overhead processing, the primary task moves to I/O device i with probability pi and then returns to the CPU. As scheduling strategies, we assumePS at the CPU and FCFS at all I/O devices with mean service times l/p;.
512
ALGORITHMS
FOR NON-PRODUCT-FORM
NETWORKS
A primary task can issue I/O requests that are processed by the secondary task. Then the task continues processing without waiting for these I/O requests to complete. If we assumel/,~c = l/piz, then this is the average time to initiate an I/O request. In the notation of Section 10.5 we have On th e average, each secondary task generates l/p11 = 1/1-+ + (I- f)l/po. l/(1 - pz) I/O requests and thus the fraction of all I/OS that overlap with a primary task, is given by: -. f”l =
f
1
(10.153)
(1-f)+&’
bP2)
In the case of p2 = 0, we get fOl = f. In Fig. 10.39b, the central-server model with overlap is shown. Here the additional node 0 with service time 0 is introduced. Once a primary task leaves the CPU, it moves to node 0 with probability f and to I/O device i with probability (1 - f)pil. The rate at which secondary tasks are generated is equal to the throughput of primary tasks at node 0. Secondary tasks are modeled by an open class of customers where all arrivals are routed to the CPU. We will assumethat the arrival process is Poisson. The arrival rate of the open class customers X02 is set equal to the throughput of the closed class (primary tasks) at node 0. The routing of the secondary task is as described earlier. A secondary task leaving the CPU visits the I/O device labeled i with probability pi2 and returns then to the CPU with probability p2 or leaves the system with 1 - ~2, respectively, For this mixed queueing network, the stability condition is ei2Xo2 < pi2 for all i where er2 = l/(1 -ps) and ei2 = piz/(l - p2), i > 1. Table10.23 Input parametersfor the central-servermodel Model Set I II III IV V VI
rCL2
-iG
lx1
0.04 0.04 0.04 0.04 0.04 0.04
1 P4
1-15
0.04 0.04 0.04 0.10 0.20 0.40
P22
P21
=
P31
= P32
P41
= P42
0.25 0.25 0.30 0.30 0.30 0.30
P51
= P52
0.25 0.25 0.10 0.10 0.10 0.10
PO2
0.00 0.50 0.00 0.00 0.00 0.00
Six sets of central-server models with 60 models for each set were analyzed by [HeTr82]. Each set contains models with a moderate utilization of the devices as well as heavily CPU and/or I/O bound cases. Each set also contains a range of values of multiprogramming levels, K, the mean service time at the CPU, l/pi, and an overlapping factor f. A model within a set is then characterized by the triple (K, 1/,~1, f) where K = 1,3,5, l/pi = 0.002,0.01,0.02,0.10,0.2, and f = 0.1,0.25,0.5,0.75. The differences
PARALLEL
PROCESSING
513
between the sets are the I/O service times and the branching probabilities pij and ~02, respectively. Furthermore they assumefor all models that the overhead processing times are equal, l/p0 = l/pi2 = 0.0008. The mean service times of the least active I/O device is varied in an interval around the mean service time of the other I/O devices. In addition the I/O accesspattern ranges from an even distribution to a highly skewed distribution. The input parameters for each set of 60 central-server models are shown in Table 10.23. As a second example, we consider the terminal-oriented timesharing systems model shown in Fig. 10.40a. Basically it is the samemodel as the centralserver model except that it has an extra IS node (node 6) that represents a finite number of terminals. The mean service time at node 6, l/psr, can be considered as the mean thinking time. The number of terminals submitting primary tasks to the computer system is denoted by K. Each primary task first uses the CPU and then moves to I/O device i with probability pil. As soon as a primary task completes service at the I/O, it returns to the CPU with probability pl or enters the terminals with probability 1 - pl. We assumethat during the execution of a primary task a secondary task is generated which executes independently from the primary task (except for queueing effects). In particular, the secondary task can execute during the thinking time of its corresponding primary task. Routing and service times are the same as in the central-server model. The approximate model of this overlap situation is shown in Fig. 10.40b. Between the I/O devices and the terminals, node 0 (with service time 0) is inserted. Therefore the throughputs at node 0 and the terminals are equal. The secondary tasks are modeled as open job classesand arrive at the CPU. The corresponding arrival rate X02is set equal to the throughput of primary tasks at node 0. The stability condition for this model is the same as in the model before: Xo2/(pr2 ’ (1 -pz)) < 1 and X02. pi2/(,t42 . (1 - ~2)) < 1 for all i > 2.
Table 10.24 1 P61
10 10 15 15 15 15 15
---1 Pll
-
1 P12
0.010 0.015
0.002 0.010
0.020 0.020 0.030
Input parameters
1 CL,
Pi,
0.04 0.04 0.04 0.04 0.04 0.08 0.08
0.25 0.25 0.25 0.25 0.25 0.25 0.25
Pl
=
pa
0.10 0.10 0.10 0.10 0.10
0.02 0.02
for the terminal K (no.
system of terminals)
1, 5, 10, 20, 30, 40, 50,60, 70, 80, 90, 100 1, 5, 10, 20, 30, 40, 50,60, 70, 80, 90, 100 25, 50, 100 25, 50, 100 25, 50, 100 1, 5, 10, 20, 30, 40, 50 1, 5, 10, 20, 30, 40, 50
This model can be interpreted as follows: Each terminal interaction can be split into two tasks. A terminal user has to wait for completion of a primary task but not for completion of a secondary task. In [HeTr82] 47 models of this type were considered. The corresponding parameter settings for this model are described in Table 10.24. Each of the models was simulated using the IBM
514
ALGORITHMS
FOR NON-PRODUCT-FORM
NETWORKS
Sink
Fig. IO.40 tasks (b).
Terminal
model
without
overlapped
tasks
(a) and with
overlapped
PARALLEL
Table 10.25 Comparison of DES and analytic models with balanced I/O subsystem Models
approximations
Disk
CPU
Disk
P
P
Q
s
Mean Maximum
0.003 0.026
0.006 0.071
Mean Maximum
1.2 15.9
1.5 15.4
Mean Maximum
0.003 0.020
0.005 0.054
Mean Maximum
1.0 9.2
1.1 8.7
Absolute error 0.011 0.110 Relative
Absolute error 0.009 0.073 Relative
of simulation Tab/e 10.26 Comparison models with balanced I/O subsystem
Terminal Models
approximations
Disk
CPU P
Disk P
CPU s
0.002 0.010
0.002 0.008
0.429 2.261
Absolute 0.056 0.421
error
Mean Maximum
0.3 1.4
0.4 1.5
3.0 13.4
Relative 1.4 4.5
error
Mean Maximum
s
0.044 0.457
0.056 0.710
3.2 16.2
1.3 15.5
0.072 0.670
0.034 0.317
3.1 15.1
1.0 9.0
error
2.8 25.0
and analytic
x
error
2.8 26.2
Set II
515
for the central-server
CPU
Set I
Models
PROCESSING
Terminal G
for the terminal
X
T
0.115 0.674
0.010 0.036
0.228 1.141
0.6 2.7
0.7 2.2
1.7 5.4
516
ALGORITHMS
Tab/e 10.27 server
models
FOR NON-PRODUCT-FORM
Comparison of simulation with imbalance I/O
Models
CPU P
Disk P
NETWORKS
and
2
analytic
Disk
3-5
approximations
CPU
P
Set III Mean Maximum
0.003 0.025
0.007 0.076
0.003 0.025
Mean Maximum
1.2 14.7
1.4 14.4
1.9 13.9
Mean Maximum
0.003 0.029
0.007 0.078
0.007 0.069
Mean Maximum
1.4 15.8
1.6 16.2
2.1 17.4
Mean Maximum
0.003 0.024
0.005 0.059
0.008 0.102
Mean Maximum
1.4 15.6
1.5 15.3
1.6 15.9
Mean Maximum
0.002 0.011
0.002 0.180
0.007 0.067
Mean Maximum
1.0 7.2
1.2 7.2
1.3 8.1
Disk
3-5
central
A
v
0.050 0.631
3.8 18.3
2.0 13.1
1.3 14.3
Absolute error 0.010 0.044 0.095 0.555
0.030 0.323
0.055 0.636
3.1 13.6
4.2 29.2
1.4 15.9
Absolute error 0.009 0.043 0.065 0.878
0.394 5.499
0.039 0.490
3.2 37.4
11.4 73.2
1.4 15.3
Absolute error 0.008 0.041 0.032 0.394
0.592 4.224
0.019 0.149
13.7 89.0
1.1 7.1
error
2.9 27.7
error
2.9 27.0
Relative
Set VI
2
the
0.004 0.065
Relative
Set V
CT
Absolute error 0.010 0.060 0.101 0.799 Relative
Set IV
Disk
s
for
error
3.1 28.7
Relative 4.0 28.9
error
7.4 47.9
PARALLEL
PROCESSING
517
Research Queueing package RESQ, which offers a so-called split node that splits a job into two tasks that proceed independently through the network. This node type was used to model the overlap of jobs. In Table 10.25, 10.27, and 10.26, the results of the comparison of discrete-event simulation (DES) with the approximate analytic method are shown. Both absolute errors as well as relative errors are given. For each set of models, the mean and maximum absolute and relative errors are listed for utilizations, queue lengths, throughputs, and response times (for the terminal models). As we can see from the results, the approximation is particularly accurate for estimating utilizations and throughputs. For these quantities, the relative error is about 1.3010, and the maximum relative error is 17.4%. We can also see that the estimates for the mean queue length are somewhat less accurate, the mean relative error is 4.2%. The maximum relative error is very high (up to 89%). Th ese high errors always occur in very imbalanced systems such as the central-server model sets V and VI (in these sets the overlap factor is very high). If, on the other hand, the model is better balanced (lower values of fOl), we get quite accurate approximations. In this case the mean relative error for all performance measures is 1.9%, and the maximum relative error is 2.9%. 10.6.2
Fig.
10.41
Fork-Join
Systems
Task precedence graph with (u) four parallel
tasks and (b) four tasks.
In this section we consider computer systems that are able to run programs in parallel by using the fork-join or parbegin-parend constructs. The parallel programs (jobs) consist of tasks that have to be processed in a certain order. This order can be given by a task precedence graph. In Fig. 10.41a the task precedence graph for a program consisting of four parallel tasks Y”r, T2, Ts, T4 is shown, and in Fig. 10.41b another task precedence graph for a program consisting of four tasks Tr, T’, Ts, TJ is shown. In Case b, task Ts can start after both tasks Tr and Tz are finished while task T4 can be executed in parallel with task Tr , T2, and T3. The corresponding parbegin-parend constructs look as shown in Fig.10.42.
518
ALGORITHMS
FOR NON-PRODUCT-FORM
NETWORKS
10.6.2.1 Modeling Consider a model of a system in which a series of external parallel program execution requests arrive at an open queueing network where nodes indicate the processors and external arrivals indicate the jobs [Duda87]. It is assumed that each processor can execute exactly one special task. Thus task T! always needs to be served by processor Pi. Furthermore it is assumed that processors are always available when needed. The interarrival times of jobs are exponentially distributed with mean value l/X, and the service times of the tasks are generally distributed with mean value l/pi, i = 1,. . . , N. Figure 10.43 shows the queueing network model for jobs with the structure shown in Fig. 10.41~. Figure 10.44 shows the queueing network model of a parallel system, where jobs have the structure shown in Fig. 10.41b. The fork and join operators are shown as triangles.
fig. 10.42 Parbegin-pared construct for (a) four paralleltasks (Fig. 10.41a), parallel-sequentialtasks (Fig. 10.41b)
(b) four
The basic structure of an elementary fork-join system is shown in Fig. 10.43. The fork-join operation splits an arriving job into N tasks that arrive simultaneously at the N parallel processorsPi, . . . , PN. As soon as job i is served, it enters the join queue Qi and waits until all N tasks are done. Then the job can leave the fork-join system.
Fig, 10.43 Fork-join system.
A special caseof the elementary fork-join system is the fission-fusion system (see Fig. 10.45), where all tasks are considered identical. A job can leave the
PARALLEL
PROCESSING
519
Fig. 10.44 Queueing network model for the execution of a job with the task precedence graph shown in Fig. 10.41b.
system, as soon as any N tasks are finished. These tasks do not necessarily have to belong to the samejob.
Fig. 10.45
Fission-fusion
system.
Another special case is the split-merge system (see Fig. 10.46) where the N tasks of a job occupy all N processors. Only when all the N tasks have completed, can a new job occupy the processors. In this casethere is a queue in front of the split station and there are no processor queues. This variant corresponds to a blocking situation: A completed task blocks the processor until all other tasks are finished. Pl
PN
Fig. 10.46
Split-merge
system.
10.6.2.2 Performance Measures In this section some performance measures of fork-join systems are defined. They are based on some basic measuresof queueing systems that are defined as follows (seeFig. 10.47 and Section 7.2.1):
520
ALGORITHMS
FOR NON-PRODUCT-FORM
NETWORKS
-Ts,l
TS,N
T9,2
(4
DTJ,N
\
(b) Fig. 10.47 System used for performance allel system.
/
TFJ
comparison:
(a) sequential
system, (b) par-
Ts,+ = The mean responsetime of task Ti belonging to a purely sequential
job Ts = The mean -response time of a purely sequential job, equal to the sum of all Ts,i FFJ = The mean responsetime of parallel-sequential jobs with fork-join synchronization TJ,~ = The mean time spent by task Ti waiting for the join synchronization in queue Qi si = The mean number of tasks waiting for the join synchronization in queue Qi TJ = The mean join time KF.J = The mean number of parallel-sequential jobs with fork-join synchronization In addition to these measures,the following other measuresare of interest for parallel systems:
PARALLEL
521
PROCESSING
Speedup: Ratio of the mean response time in the system with N sequential tasks to the mean response time in a fork-join system with N parallel tasks:
Ts GN=TFJ.
(10.154)
Normalized Speedup: G=$,
GE [+,l]s
(10.155)
Synchronization Overhead: Ratio of the mean time that tasks have to wait altogether in the join queues until all tasks are finished to the mean responsetime of the fork-join system: N
sN
c TJ,i = -?l=l %J ’
(10.156)
Normalized Synchronization Overhead: s = $,s
E [O,l].
(10.157)
Blocking Factor: The blocking factor is the average total number of blocked tasks in the join queues: BN = 2Qi.
(10.158)
i=l
Normalized Blocking Factor: The normalized blocking factor is given by: B=N.
BN
(10.159)
10.6.2.3 Analysis The method of [DuCz87] for the approximate analysis of fork-join systems is based on an application of the decomposition principle: We consider fork-join systems as open networks and replace the subnetwork that contains the fork-join construct by a composite load-dependent node. The service rates of the composite node are determined by analyzing the isolated (closed and) short-circuited subnetwork. This analysis is done for every number of tasks (1. . . 00). For practical reasons this number is, of course, limited. During the analysis, the mean number of tasks in the join queues is determined by considering the CTMC and computing the corresponding probabilities using a numerical solution method.
522
ALGORITHMS
FOR NON-PRODUCT-FORM
NETWORKS
&I
Pl
PN
Fig, IO.48
:
Closed network
QN
r
with N- k tasks.
Consider an elementary fork-join system with N processors as shown in Fig. 10.43. To analyze this system, we consider the closed queueing network shown in Fig. 10.48. This network contains N. Ic tasks, Ic = 1,2,. . .. The analysis of the network, via the numerical solution of the underlying CTMC, gives the throughput rates X(j), which are used as the loaddependent service rates p(j) of a FES node that replaces the fork-join system (see Fig. 10.49). Th e analysis of the FES node finally gives the performance measuresof the overall system.
Fig. 10.49
Composite
service station.
In the similar fashion, the performance measuresof more complex fork-join systems can be computed because a job can have task precedence graphs with several nested fork and join constructs. The model for the service of a job has then the form of a series-parallel network with subnetworks that again contain fork-join systems. In Fig. 10.44 we have already given an example of such a model. The described decomposition principle is applied in this case several times in order to solve the fork-join subnetworks numerically so that they can be replaced by FES service stations. Example 10.16 Consider a two-server fork-join queue. Here we assume that the service times at both servers are exponentially distributed with mean l/p. This system is shown in Fig. 10.43 (with N = 2). In order to solve the fork-join queue, we consider the FES queue with state-dependent service rate, shown in Fig. 10.49. The decomposition step consists of analyzing the closed subsystem with 21ctasks, k = 1,2,. . . (Fig. 10.48). The service rate of the FES in Fig. 10.49 is set equal to the throughput of the closed subsystem. The solution of the FES gives overall performance measures. At the level of the closed subsystem we deal with forked tasks and delays caused by the join primitive, while at the FES level we obtain the performance measuresfor jobs.
PARALLEL
523
PROCESSING
Let X(t) denote the total number of tasks in collectively both the join queues at time t. The stochastic process is a CTMC with the state space {i = 1,2,.
. . ) k}.
The state diagrams of the CTMCs for various values of Ic are shown in Fig. 10.50. Each of these CTMCs is a finite birth-death process, and thus the
Fig. IO.50
CTMC
for the total number
of tasks waiting
in the join queues.
steady-state probabilities can be derived using Eqs. (3.11) and (3.12): 1 ro= 1+21c’ 7ri = 27r0, i = 1,2,. . . ) It,
(10.160)
k = 1,2 )....
In order to calculate the throughput of jobs A( Ic), we observe that a successfuljoin corresponds to a transition at rate ,Qfrom state i to state (i - 1) in the CTMCs of Fig. 10.50. Thus assigning reward rate p to each of the states i = 1,2,. . . ) Ic, we obtain the needed throughput as the expected reward rate in the steady states.
X(lc) =/&i = &,
k = 1,2,.
..*
(10.161)
i=l
State-dependent service rates of the FES are then set equal to the throughput
of the closed subsystem: p(k) = X(k),
k = 1,2,. . .. The solution of the FES
queue then gives the steady-state probabilities of the number of jobs in the
524
fork-join
ALGORITHMS
FOR NON-PRODUCT-FORM
NETWORKS
queue and is given by:
Vk = vOpki=lk
)
II= 1,2 )...)
n 2i i=l
and with Eq. (3.12): 1
(10.162)
v(-J =
fi2i+1
l+
E Pk i=l
k=l
fJ 2i i=l
=
(1 -
t$7
(see [BrSeSI-37P.32),
where p = X/p. These results hold for p < 1, which is just the ergodicity condition for fork-join queues. Knowing the probabilities of Eq. (10.162), the expected number of jobs in the system is:
j&=-&,k=3.P2
l-p'
k=l
(10.163)
and, using Little’s theorem: T&J =
KFJ 3 1 - x = 5 * /V(l - p)’
(10.164)
If these values are compared to the corresponding ones for the M/M/l queue, it can be seen that the mean number of jobs and the mean response time of the fork-join queue are both 3/2 times larger than those of the M/M/l queue. The performance indices of the synchronization primitives are derived in the following: With the mean responsetime Ts,i of task i of a purely sequential job (= mean responsetime of a M/M/l queue, Eq. (6.15))
Ts,+ = ~ 41
1
i = 1,2,
- P> ’
we obtain the mean responsetime of a purely sequential job:
T,=&,,p i=l
2 PO -P)'
Equations (10.154) and (10.164) are used to derive the speedup:
Ts GN=TFJ=3.
4
PARALLEL
The normalized
speedup is computed
using Eq. (10.155).
PROCESSING
525
With N = 2 we get:
To compute the synchronization overhead SN, we need the average time a task spends in the join queues. This time is just given by the difference between ?;FJ and TM/M/r (where TM/M/r is the time a task spends in the servers). If:
then the synchronized
and the normalized
overhead is given by (Eq. (10.156)):
synchronization
overhead (Eq. (10.157)):
In order to obtain the blocking factor BN, we need the mean number of tasks in the join queues. This number can be determined from Eq. (10.165), using Little’s theorem:
and the normalized
blocking
factor is given by:
Note that the speedup as well as the synchronization overhead do not depend on the utilization p, and that the speedup is rather poor. It can be shown that the speedup GZ increases from 4/3 to 2 if the coefficient of variation of the service time decreases from 1 (exponential distribution) to 0 (deterministic distribution). In Tables 10.28, 10.29, and 10.30, the speedup is given for different distribution functions of the service times. It is assumed that jobs arrive from outside with arrival rate X = 0.1. The service rate is ,Q = 10. For the case cx < 1, we use an Erlang-lc distribution and get the results shown in Table 10.28. For cx > 1, we use a hyperexponential distribution and get the results shown in Table 10.29. Because of the phase-type service time distribution, the method discussed so far can easily be extended. Underlying
526
ALGORITHMS
Tab/e
10.28
FOR NON-PRODUCT-FORM
NETWORKS
System speedup with Erlang-k
OeO
service times
Erlang-k
Exact var(X)
distributed
ii+5
iA
cx
0.0
&
53
Speedup
2.0
1.760
1.671
1
1
1
1
1
600
500
400
300
200
1 100
5
5
-A
5
5
1.0
1.630
1.605
1.569
1.524
1.453
1.333
Table 10.29 System speedup with hyperexponentially service times . var(X) Speedup
distributed
2.0
4.0
6.0
8.0
10.0
20.0
1.060
1.043
1.029
1.028
1.027
1.026
CTMCs will now be more complex than those shown in Fig. 10.50. We chose to generate and solve these CTMCs using the MOSES tool (see Chapter 12). Finally we show in Table 10.30 the system speedup if the service times are normally distributed with standard deviation 0~ and mean value p = 10. The results in this case were obtained using DES. Using a similar approach, Tab/e
10.30
System speedup with normally
distributed
service times
var(X)
0.001
0.005
0.01
0.02
0.08
0.1
0.2
0.4
Speedup
1.697
1.525
1.487
1.463
1.439
1.437
1.433
1.431
performance measures for the fission fusion system (see Fig. 10.45) and the split merge system (see Fig. 10.46) can be computed [DuCz87]. For fork-join systems with a large number of processors, [Duda87] developed an approximate method for the analysis of the subnetworks. To demonstrate this method we consider a series-parallel closed fork-join network as shown in Fig. 10.51. In this method it is assumed that all service times of the tasks are exponentially distributed. The method is based on construction of a product-form network with the same topology as the original fork-join subnetwork and approximately the same number of states in the corresponding CTMC. The main difficulty when following this procedure is to determine the number of states in the fork-join subnetwork. If N denotes the number of parallel branches where the ith branch contains exactly ni nodes and Ic tasks, then the number of ways to distribute Ic tasks to ni + 1 nodes (including the join queue &i) is given by: (10.166)
PARALLEL
PROCESSING
527
Nk Tasks Fig.
10.51
Series-parallel
closed fork-join
network.
Then the number of ways to distribute N. Ic tasks to all nodes is given by the following expression:
fi
(10.167)
xi 0k .
i=l
This number can be limited further. If there is exactly one job in each join queue Qi, then the jobs are passed immediately to the first station. Thus j-J:, zi(k - 1) of th e given combinations are not possible because this is the number of possible ways to distribute Ic - 1 tasks over ni stations, given that there is one task in each join queue Qi . The number of states in the fork-join subnetwork is therefore given by: &J(k)
= f&i(k) i=l
- fiZi(k
- 1).
(10.168)
i=l
In the following steps, the algorithm for the analysis of fork-join networks with possibly several nested fork-join subnetworks is given. The replacement of these fork-join subsystems by FES nodes is done iteratively. Choose one subsystem that does not contain any other fork-join structures and construct a closed network with N. Ic tasks by using a short circuit. Compute the number of states .&J(IC) of this closed subnetwork for k = 1,2, . . . (Eq. (10.168)).
528
ALGORITHMS
FOR NON-PRODUCT-FORM
Compute
the throughput
NETWORKS
of the subnetwork.
-1 Construct a product-form network with exactly the same M = CL, ni number of nodes as the subnetwork. The number of jobs K in the product-form network is chosen so that the number of states in the productform network is approximately equal to the number of states in the fork-join subnetwork. Since the state space structures of both networks are almost identical, the throughput of the product-form network XPF(K) will be approximatively equal to the throughput of the fork-join network XFJ(IC). Thus we need to determine K by using Eqs. (10.166) and (10.168) so that for Ic = 1,2,. . .: K
with &F(K)
: lzFJ(k)
-
zPF(K)I
= .z~-1(K)
=
mp[ZFJ(k)
-
ZPF@)i
?
(10.169)
holding.
1-1 Solve the product-form network for all possible numbers of jobs K using an algorithm such as MVA and determine the throughput as a function of K. This throughput XPF(K) of the product-form network is approxXFJ(/C) of the fork-join subnetwork for the imately equal to the throughput value of K corresponding to Ic. Replace the subnetwork in the given system by a composite node whose load-dependent service rates are equal to the determined throughput rates: p(k)
network.
=
XFJ(IC)
for k = 1,2.. . .
If the network under consideration contains only one FES node, nalysis of this node gives the performance measures for the whole Otherwise go back to Step 1.
From these performance measures, special characteristic parallel systems such as the speedup GN and the synchronization SN can be computed. The algorithm
is now demonstrated
Example 10.17 Consider depicted in Fig. 10.41b. The network is presented in Fig. service times of the tasks are 1 - = 10, CL1
values of overhead
on a simple example.
a parallel program with the precedence graph model in the form of a series parallel fork-join 10.44. The arrival rate of jobs is X = 0.04. All exponentially distributed with the mean values: -
1
= 5,
CL2
-
1
CL3
= 2,
-
1
ZE 10.
l-b
The analysis of the model is carried out in the steps that follow: At first the inner fork-join and P2, is chosen and short-circuited. 2. k tasks, k = 1,2, . . . (Fig. 10.52).
system, consisting of the processors Pi This short circuited model consists of
PARALLEL
fig. 10.52 Table 10.31
Short-circuited
Number
PROCESSING
529
model.
of states in the subsystem.
k
1
2
3
4
5
6
7
8
9
10
&‘J K m&F
3 3 3
5 5 5
7 7 7
9 9 9
11 11 11
13 13 13
15 15 15
17 17 17
19 19 19
21 21 21
The number of states ZFJ of this subsystem is determined using Eq. (10.168). For Ic = 1,. . . , 10, the results are shown in Table 10.31. Determine the throughput of the subnetwork. -1 Construct a product-form network with N = 2 nodes. The number of jobs K in this network are determined using Eq. (10.169). For k = 1,2..., 10 the values for K and the number of states ZPF(K) are given in Table 10.31. -1 Using MVA we solve the product-form network, constructed in Step 3.1, for all possible number of jobs K. The throughputs are given by: X43)
= 0.093,
&F(5) = 0.098,
&F(7)
= 0.100,
&(g)
= o-100,
These values approximately correspond to the throughputs XFJ(~) of the forkjoin subnetwork: XFJ(~) = XPF(3) = o-093, XFJ(2) = &F(5) = [email protected], &J(3) = XPF(7) = 0.100, XFJ(4) = XPF(g) = 0.100,
The analyzed fork-join subnetwork, consisting of the processorsPI and P2, is replaced by an FES node (see Fig. 10.53) with the load-dependent service rates p(k) = XFJ(k) for k = 1,2, . . . .
530
ALGORITHMS
FOR NON-PRODUCT-FORM
fig. 10.53
Fork-join
NETWORKS
subnetwork
with one FES node.
This network contains more than one node and therefore we return to Step 1.
fig. 10.54
Short-circuited
network.
The short circuit of the network shown in Fig. 10.53 results in a closed network with 2. k tasks, k = 1,2,. . . (Fig. 10.54). The possible number of states ZFJ in this network is given by Eq. (10.168). For k = 1,. . . , 10 the values are shown in the Table 10.32. Tab/e 10.32
The possible number of states
ZFJ
as a function
of k = 1,. . . , 10
k
1
2
3
4
5
6
7
8
9
10
ZFJ
5 2 6
12 3 10
22 5 21
35 7 36
51 9 55
70 10 66
92 12 91
117 14 120
145 16 153
176 17 171
K ZPF
Determine the throughputs. 1m3.1) Construct a product-form network consisting of N = 3 nodes and determine the number of jobs K in the network using Eq. (10.169). For k = l,... , 10 the values of K and .&F(K) are given in Table 10.32. \-I Solve the network for all number of jobs K. For the computed throughputs, the following relation approximately holds: XpF (K) = XFJ( k).
PARALLEL
PROCESSING
531
ITsing the MVA we get:
~~~(1) = X42) &~(3)
= 0.062,
&J(Z)
= APF(5) = 0.082,
= XPF(3) = 0.072,
XFJ (4) = xpF( 7) = 0.087,
XFJ(5) = &(9)
= 0.089,
XFJ (6) f8 iiPF(10) = o-090,
XFJ(7) = XPF(l2)
= o-092,
XFJ(~) % XPF(14) = o-093,
XFJ(S)
x
XFJ(~O) z XpF(l7)
Xp~(16)=0.094,
= 0.094.
The fork-join subnetwork is now an FES node (see Fig. 10.55) with the load-dependent service rates $(Ic) = XFJ(~), k = 1,2,. . . .
Fig.
10.55
FES queue for Example
10.17.
The fork-join network is now reduced to only one node and therefore the iteration stops and the performance measuresof the system can be determined by analyzing the FES node of Fig. 10.55. For this analysis we determine at first the steady-state probabilities of the node using Eqs. (3.11) and (3.12): r. = 0.428,
xl = 0.287, 7r2= 0.155,
7r3= 0.076,
7r4
= 0.034,
7r6= 0.007,
7r7
= 0.003, r8 = 0.001,
TT5
= 0.015,
7rg= 0.001, n-10= 0.000. These probabilities are then used to determine the mean number of jobs in the fork-join model, Eq. (6.8): KFJ
= 1.116,
and the mean response time, Eq. (6.9): TFJ
=
27.89.
With Eq. (6.15) and:
r;,=&T,;+=f: i=l
i=l
l /-a
we obtain: ?& = 41.76.
-
Pi>
’
532
ALGORITHMS
FOR NON-PRODUCT-FORM
NETWORKS
To obtain the speedup, we use Eq. (10.154) and get: GN = 1.497. From this result we can compute the normalized GN
G = N
speedup (Eq. (10.155)):
= 0.374.
To obtain the synchronization overhead SN, we need the mean waiting time queues. First we consider the fork-join TJ,.i, (i = 1,2,3,4) in synchronization queue of node 1 and node 2:
TJ& = TFJ12 - T.i,
(10.170)
where TF.J~ 2 is the mean response time in the fork-join construct consisting of node 1 and node 2. It is obtained from the state-dependent service rate ,~(lc) for the composite node of the inner fork-join construct. At first we need the state probabilities, using Eqs. (3.11) and (3.12): x0 = 0.580,
7rl = 0.249,
r2 = 0.102,
= 0.041,
r4 = 0.016,
7r5
7r3
r6 = 0.003,
= 0.001,
7r7
= 0.007,
iTTf.3 = 0.000,
and the mean number of jobs: &J12
= 2
hrr, = 0.701.
k=l
If we apply Little’s theorem, we obtain for the mean response parallel-sequential job with fork-join synchronizations: K
T FJu = -
FJn x
time of a
= 17.514.
In order to obtain the mean response time, we apply Eq. (6.15) and get: 1
= Pl(1
-
= 16.667,
T&=6.25.
Pl)
Using Eq. (10.170) we finally get the the results: TJ,1
=
- T1 = 0.874,
%J12
TJ,2
=
11.264.
From Fig. 10.53 we have: TJ,~
=
TFJ
-TFJ~~
-T3,
TJ,4
=
TFJ
-574.
NETWORKS
Applying
WITH
ASYMMETRIC
NODES
533
Eq. (6.15) again: T3 = 2.174,
T4 = 16.667.
Together with TFJ and TFJ~~ from the preceding we get: TJ,3 = 8.202,
TJ,4 = 11.233,
which finally gives us the synchronization 6 s
The normalized
N
synchronization
=i=l
-
overhead (see Eq. (10.156)):
TJ,i
= 1.132.
%J
overhead is given by (Eq. (10.157)): SN s=N = 0.292.
Now it is easy to calculate the blocking obtain, via Eq. (10.158):
BN = k?Ji
factor.
Using Little’s
theorem,
we
= 1.263,
i=l
and with Eq. (10.159) the normalized
blocking factor is given by:
B = 1BN
4
= 0.316.
A comparison of the mean response time computed with the preceding approximation with DES shows that the differences between the DES results (?T = 26.27) and the approximation results are small in our example. For fork-join systems with higher arrival rates, the differences become larger. It can be shown in general that the power of the fork-join operations is reduced considerably by join synchronization operations. Also the speedup is reduced considerably by using synchronization operations. Other approaches to the analysis of parallel programs can be found in [Flei89], [MiiRo87], [NeTa88], [SaTr87], and [ThBa86]. Problem 10.5 Consider a parallel program with the precedence graph shown in Fig. 10.56. The arrival rate is X = 1 and the service rates of the tasks are ~1 = 2, ~2 = 4, ,Q = 4, ,Q = 5. All service times are exponentially distributed. Determine the mean response time and the speedup.
10.7
NETWORKS
WITH
ASYMMETRIC
Whenever we have considered multiserver assumed that all the servers
NODES
nodes in a network, we have always
at the node are statistically
identical.
We now
534
ALGORITHMS
FOR
fig.
NON-PRODUCT-FORM
NETWORKS
Precedence graph for Problem
10.56
consider the case of networks where servers ent service times. We first consider closed followed by open networks with asymmetric cases we will only obtain an approximative 10.7.1
Closed
10.5.
at a multiserver node have differnetworks with asymmetric nodes nodes. Note that in each of these solution.
Networks
In this section we show how to modify the summation method, the mean value analysis, and the SCAT algorithm to deal with asymmetric nodes[BoRo94] as introduced in Section 9.1.2. 10.7.1.1 Asymmetric SUM (ASUM) To extend the SUM (seeSection 9.2) to queueing networks with asymmetric nodes, we need to adapt the formula for computing the mean number of jobs (see Eqs. (9.15) and (6.28)): Ki
= mi
Pi
f pi +
1-K-me--l K-m,
*pm,,
Type-l,
‘Pi
with:
‘mi(Pi)
=
m;! . (1 - pi) me1 (mi * pi)” + (mi . pi)mi ’ k! k=O mi * (1 - pi)
For symmetric nodes, pi is given by: pi =
Ai ____ 3 mi * pi
and for asymmetric nodes we can use Eq. (6.136) or Eqs. (6.150) and (6.146), respectively. Since the SUM is an approximate method, so is the extended SUM. By replacing p; and Pm, by the corresponding formulae, the BOTT can be extended to queueing networks with asymmetric nodes in the same way.
NETWORKS
WITH
ASYMMETRIC
535
NODES
10.7.1.2 Asymmetric MVA (AM VA) The MVA was developed by Reiser and Lavenberg [ReLa80] for the exact analysis of product-form networks and is based on Little’s theorem and the arrival theorem of Reiser, Lavenberg, Sevcik, and Mitrani (see Section 8.2) [LittBl, ReLa80, SeMi81]. To extend the MVA to networks with asymmetric nodes, the equation for the calculation of the mean response time for symmetric -/M/m nodes:
a.
i
i
1 + Ki(Ic - 1) + c (Mi - j - 1) *%(j I k - 1) ? j=o ) (10.171)
needs to be extended. Here 7ri(j 1Ic) is the probability that there are j jobs at the ith node, given that there are Ic jobs in the network. This condition probability ri(j 1Ic) is given by:
*dj-w-l),
j=l,...,
mi-1,
(10.172)
and ri(O 1Ic) is obtained by:
;Iri(O )Ic)= 1- L
mi
2 *X(k)+ mel(mi - j) *7ri(j) Ic) * j=l [ pi I
(10.173)
In Eq. (10.173), the factor ,Q,. rni is the overall service rate of the symmetric -/M/m node. For asymmetric nodes m; . pi is replaced by:
Thus the mean response time of an asymmetric node can be computed as follows:
T@) = -A2 j=l
(
m,-2
l+IiTi(k-l)+
,Uij
C j=O
( mi-j-l)*ni(jIIc-1)
. ) (10.174)
Because of the different service rates, it is necessary to take into account which servers are occupied as well as the number of servers when calculating
the marginal probabilities
n&j 1 k). If we assume that the free server with the
highest service rate is selected and the service rates are arranged in descending
ALGORITHMS
536
FOR NON-PRODUCT-FORM
order, then the marginal
NETWORKS
probabilities
are given by: (10.175)
m,-1
7ri(O 1 k) = 1 - *
* X(k) - -&- * C( mi j=l c Pi1
rni - j) .7ri(j
1 k).
(10.176)
l=l
If we consider Eq. (10.173) in more detail, we see that the factor: ei . X(k) E
Pi1
l=l
is the utilization pi and, therefore, Eq. (10.176) can be rewritten as: ni(O
1k) = 1 - pi(k) - k
z
* mgl(mi
- j) * ni(j I k).
j=l
Asymmetric SCAT (ASCAT) The core of the SCAT algorithm is a single step of the MVA in which the formula for the mean response time, Eq. (10.174), is approximated so that it depends only on K and not on (K- 1). However, it is necessary to estimate the marginal probabilities ni(j 1K - 1) that need a lot of computation time. For this reason, another formula for the mean response time is derived from Eq. (6.29). This derivation depends only on ??i and pi. For -/M/l nodes we apply Little’s theorem to the following equation: 10.7.1.3
li’,
=
pi
1 - pi’ and after reorganizing we get: Ti
-i(l
=
+Ki),
Pi
which is the basic equation of the core of the SCAT algorithm for -/M/l nodes. Similarly, from Eq. (6.29) we obtain:
and for -/M/m
nodes we have: Ti =
&
* (mi 2’
2
+
Ki
+
pmi(pi)
-
mi
* pi).
NETWORKS
WITH
This formula can easily be extended to asymmetric
ASYMMETRIC
NODES
537
nodes as follows:
1
Ti=-.
Crni + Ki + Pm% (pi) - mi . pi) . 2
Pij
j=l
At this point we proceed analogously to the SUM by using the equations belonging to the selected method to calculate pi and Pm%(pi). Our experience with these methods suggeststhat the mean deviation for ASUM and AMVA is in the order of 2-3% for the random selection of a server as well as for the selection of the fastest free server. For ASCAT, the situation is somewhat different. For random selection the deviation is considerably high (7%), while for the selection of the fastest free server the mean deviation is only of the order of 1%. Therefore, this method can be recommended. ASUM, AMVA, and ASCAT can easily be extended to multiclass queueing networks [AbBo97]. 10.7.2
Open
Networks
Jackson’s method for analyzing open queueing networks with symmetric nodes (see Section 7.3.4) can also be extended to analyze open queueing networks with asymmetric nodes. Note that this will result in an approximate solution. Jackson’s theorem can be applied to open single classqueueing networks with symmetric -/M/m-FCFS nodes. The analysis of an open Jackson network is done in two steps: Calculate the arrival rates Xi using traffic Eq. (7.1) for all nodes i = l...N. Consider each node i as an elementary -/M/m-FCFS queueing system. Check whether the ergodicity (p < 1) is fulfilled and calculate the performance measuresusing the known formulae. Step 1 remains unchanged when applied to asymmetric open networks. In Step 2, formulae given in Section 6.16 for asymmetric -/M/m-FCFS queueing networks should be used instead of formulae for symmetric -/M/m-FCFS queueing systems. Similarly, the BCMP theorem for open networks (see Section 7.3.6) can be extended to the approximate analysis of asymmetric queueing networks. Using the closing method (see Section 10.1.5) AMVA, ASCAT and ASUM can also be applied to open asymmetric queueing networks. Furthermore, it is also possible to apply the closing method to open and mixed queueing networks with asymmetric nodes. Example 10.18 To demonstrate different methods for networks with asymmetric nodes we use the example network shown in Fig. 10.57. Node 1 is an asymmetric -/M/2, nodes 2 and 3 are of -/M/l type, and node 4 is -/G/co. The routing probabilities are: 1312 = 0.5,
p13
= 0.5,
p24 =p34
=p41
= 1,
538
ALGORITHMS
FOR NON-PRODUCT-FORM
fig. 10.57
NETWORKS
Closed network with one asymmetric
node.
and the service rates are: /All = 3.333,
/Liz = 0.666,
/.L~= 1.666,
/.L~= 1.25,
ru4= 1.
The total number of jobs in the network is K = 3. From the routing probabilities we obtain the following visit ratios: el = 1,
e2 = 0.5,
e3 = 0.5,
e4 = 1.
We start with the ASUM method, which consistsof the samesteps asthe SUM algorithm (see Section 9.2), but with different formulae for the utilization pr and the probability of waiting Pml. For the stopping condition we use E= 0.001. Initialization:
Xl = 0 and X, = min i
=Q.
Bisection: r---T-] STEP 2 i
h + AL X = 2=
1.25.
-1 Calculation of the values fi(Xi), i = 1,. . . ,4. In order to get the result for node 1, Eqs. (6.146) and (6.150) for the exact analysis of networks with asymmetric nodes are used (selection of the fastest free server). With: p1 =
e1 Pll +
1.25 z---z 4
*A p12
0.3125
and: pm1 2-e
l-c
-= PA”’ N
1
1 - 0.3125
* -0.111 = 0.153, 1.050
we get:
Jwl)
= 2Pl + PI * Pm1 = 0.673.
NETWORKS
Accordingly
WITH
ASYMMETRIC
NODES
539
we obtain:
IT,(X,) = -E1 K3(X3)
=
+ x4
K,(X,)
= -
= 0.500
with
p2 = 0.375,
= 0.750
with
p3 = 0.500,
iP2
3P3
= 1.250,
P4
which yields: g(X) = -&xi)
= 3.173.
i=l
-1 X = 1.25.
Check the stopping condition. Becauseof g(X) > K, we set X, =
r] STEP 2 i
X= v
-1
Calculation of the values fi(&), 171
(h)
=
2Pl
+
= 0.625.
p1
= -Jk-
K&42)
1 -
= 0.319. with p1 = 0.156 and Pm, = 0.0386,
= 0.214 with p2 = 0.188,
$P2
= -&-
E3(X3)
* en,
i = 1,. . . ,4. With:
l-
= 0.300
with p3 = 0.250,
$P3
x4
77,(X,)
= Lq = 0.625,
g(X) = &Ti(hi) i=l (-2.31
= 1.458.
Check the stopping condition. Becauseof g(X) < K, we set Xl =
X = 0.625.
STEP 2 (r---T-j
I
-2.2 =
s?;,(X,)
=A
?74(X4)
= 0.938.
Calculation of the values fi (A,), i = 1, . . . ,4. With
K(h)
E3(X3)
*
X=
=
=
2p1
+
p1
= 0.489. with p1 = 0.234 and Pm, = 0.0881,
= 0.346
1 -
$P2
dz1 -
$P3
x4 L
* En,
with p2 = 0.281,
= 0.500 with p3 = 0.375, =
0.938,
ALGORITHMS
540
FOR NON-PRODUCT-FORM
NETWORKS
we get: g(X) = &?i(hi)
= 2.273.
i=l
-1 x = 0.9375.
After fulfilled.
Check the stopping
condition.
Because of g(X) < K, we set Xl =
11 iterations we have g(X) = 3.000 and the stopping condition For the overall throughput we obtain X = 1.193, which yields: x1
=
1.193,
For the utilizations p11 =
0.294,
X2
0.596,
=
X3
0.596,
=
X4
=
1.193.
of the nodes we get: ~12
= 0.317,
pl = 0.298,
p2 = 0.358,
p3 = 0.477,
and for the mean number of jobs in a node: K1 = 0.638,
K2 = 0.470,
Now we analyze the network
?T, = 0.700,
using the AMVA
??, = 1.193.
algorithm.
Initialization: K(O)
= F2(0)
Iteration k=
over the number
Pi(l
IO) = 0.
of jobs in the network
1.
I-2.11
Mean response times: Ti(1)
= Pll
+ I-L12
(1 -t%(O)
+m(0
T2(1)
= ;
(1+17,(o))
= 0.6,
T3(1)
=
;
(1+
=
;
= 1.
?54(1)=
I-2.2
p1(0 ( 0) = 1,
= 0,
= K3(0)
K3(0))
0.8,
Throughput: X(l)=
l 5 i=l
e;Ti(l)
= 0.455.
IO)) = 0.5,
starting
with
is
NETWORKS
1-1
NODES
Mean number of jobs: xi(l)
= A(l)rr(l)er
KS(~) = X(l)??s(l)es Iteration
WITH ASYMMETRIC
= 0.227,
x2(1)
= X(l)T2(l)e2
= 0.136,
= 0.182,
F4(1)
= X(l)TJ(l)e,
= 0.455.
for k = 2:
I]
Mean response times: Fi(2)
=
IL CL11 +
p1(0 1 1) = 1 -
(l-tJG(l)
elx(l) Pull
- -&(l
1 1) = 0.818
= ;
(1 +x2(l))
= 0.682,
Ts(2)
= ;
(1 +Ka(1))
= 0.945,
i
and
10) = 0.136,
%(2)
T4(2),=
1 1)) = 0.511,
+ Pl2
elx(l) p1(1 1 1) = -pl(O
I]
+p1(0
p12
= I.
Throughput: X(2) =
2 e eiTi(2)
= 0.860.
i=l
[]
Iteration ]STEP]
Mean number of jobs: 171(2) = X(2)&(2)el
= 0.440,
K2(2)
= X(2)T2(2)e2
= 0.293,
X3(2)
= 0.407,
K4(2)
= X(2)574(2)e*
= 0.860.
= X(2)T3(2)e3
for k = 3: Mean response times: Tr(3)
=
l Pll
+
(1 + K,(2) P12
+ p1(0 ) a>) = 0.530,
541
542
ALGORITHMS
FOR NON-PRODUCT-FORM
NETWORKS
where: p1(0 1 2) = 1 -
+
1 12) = 0.679
and
I-412
pr(1 12) = ~Pl(O e1x(2)
-2.2
- --&(
e1x(2) Pll
I 1)
=
0.211,
T2(3) = i
(1 +K2(2))
= 0.776,
T3(3) = ;
(l+&(2))
= 1.125,
Throughput: X(3) =
4
= 1.209.
3
C eiTi(3) i=l
(1
Mean number of jobs: Fr(3)
= X(3)T1(3)el
= 0.641,
K2(3)
= X(3)T2(3)e2
= 0.469,
Z,(S)
= X(3)T3(3)e3
= 0.681,
K4(3)
= X(3)T4(3)ed
= 1.209.
The algorithm
stops after three iteration
X1 = 1.209, For the utilizations
X2 = 0.605,
steps. For the throughput
X3 = 0.605,
we get:
X4 = 1.209.
of the nodes we obtain: p1 = 0.302,
p2 = 0.363,
p3 = 0.484,
and for the mean number of jobs in a node: K1 = 0.641,
K2 = 0.469,
K3 = 0.681,
r4 = 1.209.
Finally, the analysis of the network is performed using the ASCAT algorithm, where we use the approximate Eqs. (6.135) and (6.136) and Eqs. (6.27) and (6.28) f or asymmetric multiple server nodes. Modified input parameters:
core algorithm
Ki(K) = $
=m
for K = 3 jobs in the network
and Di(K)
=0
for i = 1,...,4.
with
the
NETWORKS
-1
Estimate
WITH
ASYMMETRIC
NODES
values for the ?Ti(K - 1) from Eq. (9.5): xi(s)
Ki(2)
= 23
= 0.5
for i = 1,. . . ,4.
For the next step we need an initial value for the throughput: +?)
-1
=
P4
%(2)
= 0.5.
One step of MVA: Tr(3)
1
= Pll
+
P12
*
(ml + %(2)
+ &nl (PI) - mlpl)
= 0.569,
with: elA(2) Pl
cl11
%(3)
=-
=
0.5 4
+cL12
= 0.125
= ;
(1 +17,(2))
= 0.9,
TS(3) = ;
(1+ Ks(2))
= l.2,
T*(3) = i
= 1.
and Pm1 (pr) = 0.0278,
Throughput: X(3) =
4 C i=l
3
= 1.145.
eiTi(3)
Mean number of jobs: Kr(3)
= X(3)T1(3)el
= 0.652,
??p(3)= A(3)T2(3)e2
= 0.515,
?T3(3) = X(3)T3(3)es
= 0.687,
K,(S)
= 1.145.
[STEP]
Check the stopping /$l’(3) max i
-1
{
Estimate J?,(2)
= 0.435,
= X(3)T4(3)e4
condition: - $O’(3) 3
1 = 0.132 > E = 0.01. 1
values for the Ici(K - 1) from Eq. (9.5): li;2(2) = 0.344,
Ks(2)
= 0.458,
z4(2)
For the next step we need an initial value for the throughput: x(2) = &q(2)
= 0.764.
= 0.764.
543
544
ALGORITHMS
r-1
After jobs:
FOR NON-PRODUCT-FORM
NETWORKS
One step of MVA:
three iterations
zr(3)
we get the following
= 0.626,
Ka(3)
= 0.475,
Modified core algorithm the input parameters: Ki(2)
-1
= ;
Estimate
= 0.5
Ka(3)
for the mean number
= 0.702,
Ed(S)
and Di(2)=0
= 27m4
for i=1,...,4.
from Eq. (9.5):
= 0.
for i = 1,...,4.
For the next step we need an initial value for the throughput: X(l)
[ml
=
err,(l)
p4
= 0.25.
One step of MVA: Tr(2) =
IL CL11 +
* cm,
Pl2
+
K(1)
+
en1
(Pl)
-
mlpl)
=
0.533,
with: Pl
fd(l>
= Pull
T2(2)
= i
T3(2) = ; ?&)
= i
+
= 7
= 0.0625 and Pm1(pl) = 0.00735,
Pl2
(l+&(1))
=0.75,
(l+K3(1))
= Lo,
= 1.
Throughput: X(2) =
4
2
C eiTi(2) i=l
= 0.831.
of
= 1.198.
for (K - 1) = 2 jobs in the network
values for the ?ri(l)
J?,(l)
results
with
NETWORKS
WITH
ASYMMETRIC
NODES
545
Mean number of jobs: Kr(2)
= X(2)T1(2)el
= 0.443,
1?3(2) = X(2)T3(2)e3
= 0.415,
-1
Check the stopping (R!“(2)
-1 X1(1)
- $O’(2) 2
Estimate
values for the xi(l) &(l)
= 0.311, = 0.831.
condition:
max i
= 0.221,
K2(2)= X(2)T2(2)e2 K*(2) = A(2)T4(2)e4
( = 0.165 > E = 0.01.
= 0.156,
from Eq. (9.5): Es(l)
= 0.208,
x4(1)
= 0.415.
For the next step we need an initial value for the throughput: X(1) = /J*K4(1) [I
= 0.415.
One step of MVA:
After jobs:
three iterations Kr(2)
= 0.434,
we get the following Kz(2)
Estimate
= 0.295,
results
I&(2)
the I?i and Di values with
for the mean number
of
1?4(2) = 0.857.
= 0.414,
Eqs. (9.3) and (9.4), respec-
t ively : Fl(3)
x1 (3)
= --j--
= 0.209,
h(3) = 3% (3) = 0.158, K3
F3(3)
= --Fj-
F4(3)
=
K4(3) --j-
(3)
JG(2) Fr(2) = 2 = 0.217, m2) F2(2) = 2 -
= 0 .148,
= 0.234, F3(2) = F =
0.399,
= 0.207,
F4(2) = 23w)
= 0 .429 .
Therefore we get: D1(3) = Fl(2) - Fl(3) = 0.00835, D3(3) = I;;(2) - F3(3) = -0.0270,
&(3)
= F2(2) - &(3) = -0.0106,
D4(3) = F4(2)
-
F4(3)
= 0.0292.
Modified core algorithm for K = 3 jobs in the network with the computed E(i (K) values from Step 1 and the Di (K) values from Step 3 as inputs.
546
ALGORITHMS
[I
FOR NON-PRODUCT-FORM
Estimate Er(2)
NETWORKS
values for the K,(K K(3) -yj-
= 2
17,(2) = 0.295,
- 1) from Eq. (9.5):
+ Dr (3)
= 0.434, > ?73(2) = 0.414, K4(2)
= 0.857.
For the next step we need an initial value for the throughput: x(2) = ~4 . F4(2) j]
= 0.857.
One step of MVA: T1(3) =
l Pll
+
p12
*
(ml + %(2)
+ pm, (~1) - mlpl)
= 0.520,
with: pr = 0.24 &(3)
= ;
and Pm1 (pr) = 0.0756, (1 + K2(2))
= 0.777,
?i3(3) = ;
(1 +Ks(2))
= 1.131,
T4(3)
= I.
=
;
Throughput: X(3) =
4
3
= 1.212.
C &T(3) i=l
Mean number of jobs: Kr(3) = X(3)T1(3)el Es(S) -1
= X(3)T3(3)es
= 0.631,
?72(3)= X(3)??2(3)e2 = 0.471,
= 0.686,
X4(3)
= X(3)T4(3)e4 = 1.212.
Check the stopping condition: /f7j1’(3) - r?i0’(3) / max i
{
3
= 0.00531 < E = 0.01. 1
The stopping condition is fulfilled. Therefore the algorithm stops and we get the following results: Throughputs: Xl = 1.212,
x2 = 0.606,
x3 = 0.606,
x4
= 1.212.
NETWORKS
WITH
BLOCKING
547
Mean number of jobs: -K1 = 0.631,
IT, = 0.471,
K3 = 0.686,
774 = 1.212.
Utilizations: p1 = 0.303,
p2 = 0.364,
p3 = 0.485.
Table 10.33 Throughput A, and mean number of jobs Ei for different methods compared with the DES results 1
2
3
4
ASUM AMVA ASCAT DES
1.19 1.21 1.21 1.21
0.60 0.61 0.61 0.61
0.60 0.61 0.61 0.61
1.19 1.21 1.21 1.21
ASUM AMVA ASCAT DES
0.64 0.64 0.63 0.64
0.47 0.47 0.47 0.47
0.70 0.68 0.69 0.68
1.19 1.21 1.21 1.21
Node Ai
E
In Table 10.33 we compare the results for the throughputs Xi and the mean number of jobs xi. For this example, the results of all methods are close together and close to the DES results. As in the case of asymmetric single station queueing networks, the preceding approximations are accurate if the values of the service rates in a station do not differ too much. As a rule of thumb, we consider the approximations to be satisfactory if pmin/pmax < 10 at each asymmetric node. Problem 10.6 Formulate the closed network of Fig. 10.57 as a GSPN and solve it exactly using SHARPE or SPNP. Compare the exact results with the approximative onesobtained in this section using ASUM, AMVA and ASCAT.
10.8
NETWORKS
WITH
BLOCKING
Up to now we have generally assumedthat all nodes of a queueing network have queues with infinite buffers. Thus a job that leaves a node will always find an empty place in the queue of the next node. But in the case of finite capacity queues, blocking can occur and jobs can be rejected at a node if the queue is full. Queueing networks that consist of nodes with finite buffer capacity are called blocking networks. Blocking networks are not only important for modeling and performance evaluation of computer systems and computer networks, but also in the field of production line systems. Exact results for general blocking networks can only be obtained by using the generation and
548
ALGORITHMS
FOR NON-PRODUCT-FORM
NETWORKS
numerical solution of the underlying CTMC (either directly or via SPNs) that is possible for medium-sized networks. For small networks, closed-form results may be derived. The most common techniques presented in the literature for the analysis of large blocking networks are, therefore, approximate techniques. 10.8.1
Different
Blocking
Types
In [Perr94] three different types of blocking are introduced: 1. Blocking after service (see Fig. 10.58): With this type of blocking, node i is blocked when a job completing service at node i wishes to join node j that is already full. This job then has to stay in the server of node i and block until a job leaves node j. This type of blocking is used to model production line systems or external I/O devices [Akyi87, Akyi88b, Akyi88a, Alti82, AlPe86, OnPe89, PeA186, Perr81, PeSn89, SuDi86]. In the literature, this type of blocking has been referred to by a variety of different terms such as Type-l blocking, transfer blocking, production blocking, and non-immediate blocking. Job blocks
Fig. IO.58
node
i
Blocking
after service.
2. Blocking before service (seeFig. 10.59): The job to be served next at node i determines its source node j before it enters service at node i. If node j is already full, the job cannot enter the server of node i and node i is therefore blocked. Only if another job leaves node j will node i be deblocked and the first job in the queue served [BoKo81, GoNe67b, KoRe76, KoRe78]. In the literature, this type of blocking has also been referred to as Type-2 blocking, service blocking, communication blocking, and immediate blocking. Job blocks
node
Fig. 10.59
i
Blocking
before service.
3. Repetitive blocking (seeFig. 10.60): If a job that has been served at node i wishes to go to node j whose queue is full, it is rejected at node j and goes to the rear of node i queue. This procedure is repeated until a job leaves node j, thereby creating one empty slot in the queue of
NETWORKS
WITH
BLOCKING
549
node j. This type of blocking is used to model communications networks or flexible production line systems [Akvo89, BaIa83, Hova81, Pitt79]. Other terms for this type of blocking are Type-3 blocking or rejection blocking.
Fig.
10.60
Repetitive
blocking.
In [OnPe86], different kinds of blocking are compared with each other. In [YaBu86], a queueing network model is presented that has finite buffer capacity and yet is not a blocking network. Here a job that would block node i because node j is full is buffered in a central buffer until there is an empty spot at node j. Such models are especially interesting for production line systems. 10.8.2
Product-Form
Solution
for Networks
with
Two
Nodes
Many different techniques for the solution of blocking networks are available. We only consider a simple exact solution technique in detail that can be applied to closed blocking networks with only two nodes and blocking after service [Akyi87]. This method can also be extended to queueing networks with several nodes, but then it is an approximate technique that is very costly and it can only be used for throughput analysis [Akyi88b]. We consider queueing networks with one job class, exponentially distributed service times, and FCFS service strategy. Transitions back to the same node are not possible (pii = 0). Each node has a fixed capacity iL& that equals the capacity of the queue plus the number rni of servers. Nodes with an infinity capacity can be taken into consideration by setting A4i > K so that the capacity of the node is bigger than the number of jobs in the network. The overall capacity of the network must, of course, be bigger than the number of jobs in the network for it to be a blocking network: K<Ml+M2.
(10.177)
In the case of equality in the preceding expression, deadlocks can occur [Akyi87]. Th e p ossible number of states in the underlying CTMC of a closed network with two nodes and unlimited storage capacity is: Z=K+l.
(10.178)
At first we consider the simple case where the number of servers at each node is ml = m2 = 1. The state diagram of the CTMC for such a two-node network without blocking is shown in Fig. 10.61.
550
ALGORITHMS
Fig. 10.61
CTMC
FOR NON-PRODUCT-FORM
NETWORKS
for a two-nodenetwork without blocking (ml =
m2
= 1).
As can be seen, the minimum capacity at each node is A4i = K (the queue must contain at least K - 1 spots) so that all states are possible. If the nodes have a finite capacity AL&< K, then not all K + 1 states of Fig. 10.61 are possible. Possible states of the blocking network are then given by the condition that the number of jobs at a node cannot be bigger than the capacity of that node. In the case of a transition to a state where the capacity of a node is exceeded, blocking occurs. The corresponding state is called a blocking state. Recall that due to Type-l blocking, the job does not switch to the other node but stays in the original node. The state diagram of the blocking network is shown in Fig. 10.62. P2
P2
P2
Fig. 10.62 CTMC for the two-nodeblockingnetwork (statesmarkedwith * are blocking states)ml = m2 = 1 (blocking after service).
The state diagram of the blocking network shown in Fig. 10.62 differs from the state diagram shown in Fig. 10.61 in two aspects. First, all states where the capacity is exceeded are no longer necessary and, second, Fig. 10.62 contains the blocking states marked with an asterisk. The number of states 2 in the blocking network is the sum of all possible non-blocking states plus all the blocking states: i = min{K, hli + 1) + min{K, AL?2 + 1) - K + 1.
(10.179)
It can be shown [Akyi87] that for this closed network with two nodes and blocking after service, an equivalent closed network with two nodes without blocking exists. Apart from the number of jobs i? in the equivalent network, all other parameter values remain the same. The population k in the
NETWORKS
equivalent network
can be determined
WITH
BLOCKING
551
using Eqs. (10.178) and (10.179):
I? = min{ K, IHr + 1) + min{K,
(10.180)
A/IQ + 1) - K.
The CTMC of the equivalent network isshown in Fig. 10.63. The equivalent network is a product-form network with Ici jobs at node i, 0 5 & 5 I?.
Fig.
10.63
CTMC
of the equivalent network corresponding
to the CTMC
in Fig. 10.61.
The steady-state probabilities of the equivalent network can be determined using the following equation: (10.181) By equating the steady-state probabilities of the blocking network with the corresponding steady-state probabilities of the equivalent non-blocking network, the performance measuresof the blocking network can be determined. To demonstrate how this technique works, the following simple example is used: Example 10.19 Consider a closed queueing network with two nodes, K = 10 jobs, and the routing probabilities: PI2 = 1,
P21 =
1.
All other system parameters are given in Table 10.34. Table IO.34
1 2
System parameters
1 1
2 0.9
for Example
1 1
10.19
7 5
All service times are exponentially distributed and the service discipline at each node is FCFS. Using Eq. (10.178), the number of states in the network without blocking is determined to be 2 = K + 1 = 11. The CTMC of the network without blocking is shown in Fig. 10.64, while the state diagram of the blocking network (seeFig. 10.65) contains all possible non-blocking states
552
ALGORITHMS
Fig. IO.64
FOR NON-PRODUCT-FORM
CTMC
of the Example
NETWORKS
10.19 network
without
.___-_______
blocking.
-
Fig. 10.65
CTMC
of the Example
10.19 network
-
-
- -
_ _ -
_ _ _ I
with blocking.
from the original network as well as the blocking states (marked with asterisk). The number of states 2 of this blocking network is given by Eq. (10.179) as the sum of all possible non-blocking states (= 3) and the blocking states (= 2): 2 = 5. The CTMC of the equivalent network (see Fig. 10.66) therefore contains 2 = 5 states and the number of jobs in the network is given by Eq. (10.180) as I? = 4.
Fig. 10.66 The throughput ly using the MVA network:
CTMC
of the equivalent
network
for Example
X,,(4) of the equivalent network and is equal to the throughput
A,( 10) = X,,(4) The steady-state probabilities ing with the corresponding blocking network:
10.19.
can be determined directX~(10) of the blocking
= 0.500.
of the blocking network steady-state probabilities
~(6,4)
= n(2,2)
= 0.120,
7r(7,3)*
= n(4,O)
= 0.550,
n(5,5)
7r(7,3)
= 7r(3,1)
= 0.248,
n(5,5)*
= n(l, = ~(0,4)
are obtained by equatof the equivalent non-
3) = 0.054, = 0.026.
NETWORKS
WITH
553
BLOCKING
The mean number of jobs at both nodes can be determined using the steadystate probabilities: K1 = 7 [7r(7,3)* + 7r(7,3)] + 6r(6,4) + 5 [7r(5,5) + 7r(5,5)*] = 6.73, K2 = 3 [7r(7,3)* + n(7,3)] + 4~(6,4) + 5 [7r(5,5) + 7r(5,5)*] = 3.27. With these results all other performance measures can be derived using the well-known formulae. Of special interest for blocking networks are the blocking probabilities: PB1 = n(5,5)* = 0.026, PB2 = 7r(7)3)* = 0.0550. Next consider the case where each node has multiple servers (mi > 1). The state diagram of the CTMC underlying such a two-node network without blocking is shown in Fig. 10.67, while the state diagram of the blocking
Fig. IO.67
CTMC
of a two-node network without blocking (m, > 1).
network (with Type-l blocking) is shown in Fig. 10.68. Here a state marked m2
P2
",2
-/---A+**..*/-*...*
2P2
(m2--1)C12 \
*
7n2P2
m2fi2 /---.+
l
mlP1
mlP1
mlP1
(ml-l)fil
2@1
‘*
Pl
Fig. IO.68 CTMC for the two-node network with blocking. Blocking marked with * (mi > 1).
states are
with asterisks shows the number of blocked servers in that state. Therefore we get for the number fi of states in the blocking network, the sum of all possible non-blocking states plus the blocking states (mi > 1): 9 = min{K, 1Mr + rnz} + min{K, Mz + ml} - K + 1.
(10.182)
I? is given by Eqs. (10.178) and (10.182): I? = min{K, Mr + m2) + min{K, Mz + ml} - K.
(10.183)
554
ALGORITHMS
FOR NON-PRODUCT-FORM
NETWORKS
The state diagram of the equivalent network is shown in Fig. 10.69. The equivalent network is again a product-form network with & jobs at node i, 0 5 ii 5 k. P2
ml/4
Fig. IO. 69
m2p2
21-12
2Pl
mlpl
CTMC
of the equivalent
map2
product-form
CL1
network
(m, > 1).
The steady-state probabilities of the equivalent network can be determined using Eq. (7.59): (10.184) with pi(&), Eq. (7.62). The following example is used to demonstrate this technique: Example 10.20 Consider again the network given in Example 10.19, but change the number of servers, m2 = 2. All other network parameters remain the same. From Eq. (10.178), the number of states in the network without blocking is 2 = K + 1 = 11. The corresponding state diagram is shown in Fig. 10.70. The state diagram of the blocking network (see Fig. 10.71)
Fig. 10.70
CTMC
of the Example
10.20 network
without
blocking.
contains all possible non-blocking states from the original network plus all the blocking states (marked with *). The number of states 2 of the blocking network is given by Eq. (10.182) as the sum of all non-blocking states (= 3) and the blocking states (= 3): 2 = 6. The state diagram of the equivalent network (see Fig. 10.72) contains, therefore, also 2 = 6 states, and for the number of jobs in the network, Eq. (10.180) is used: I? = 5. The throughput X,,(5) of the equivalent network can be determined using the MVA and, because of the equivalence of both networks, is identical with the throughput X~(l0) of the blocking network: X,(10) = X,,(5)
= 0.499.
NETWORKS
I I
Fig. IO.71
CTMC
P2
of the Example
2P2
Fig. 10.72
CTMC
10.20 blocking
2P2
of the Example
Pl
1
__-___--'
network.
a-42
10.20 equivalent
555
BLOCKING
Pl
Pl
Pl
Pl
t-------_----__-_______
WITH
product-form
2P.2
network.
The state probabilities of the blocking network are given by equating the corresponding steady-state probabilities of the equivalent non-blocking network. The steady-state probabilities are determined using Eq. (10.181): 7r(7,3)** = 7r(5,0) = 0.633,
n(6,4) = n(2,3) = 0.014,
7r(7,3)* = 7r(4,1) = 0.284, 7r(5,5) = 7r(l, 4) = 0.003, n(7,3) = x(3,2) = 0.064, n(5, S)* = 7r(0,5) = 0.001. The mean number of jobs at each node can be determined using the steadystate probabilities: Kl = 7 [7r(7,3)** + 7r(7,3)* + 7r(7,3)] + 6X(6,4) + 5 [n(5,5) + n(5,5)*] = 6.978, z,
= 3 [7r(7,3)** + n(7,3)* + 7r(7,3)] + 4n(6,4) + 5 [7r(5,5) + n(5,5)*] = 3.022.
All other performance measures can be determined using the well-known equations. Of special interest are blocking probabilities: P& = 7r(5,5)* = 0.001, P&
= 7r(7,3)** + $(7,3)’
= 0.775.
For computing PB~,the steady-state probability ~(7,3)* can only be counted half becausein state (7,3) * only one of the two servers of node 2 is blocked. For blocking networks with more than two nodes, no equivalent non-blocking network can be found. For these casesonly approximate results exist, which
556
ALGORITHMS
FOR NON-PRODUCT-FORM
NETWORKS
are often inaccurate or can only be used for very special types of networks such as tandem networks [SuDi84, SuDi86] or open networks [PeSn89]. Problem 10.7 Consider a closed blocking network with N = 2 nodes and K = 9 jobs. The service times are exponentially distributed and the service strategy at all nodes is FCFS. All other network parameters are given in the Table 10.35. Table 10.35 Network Problem 10.7
i
ea
1 2
1 1
input parameters
for
l/Pi
mi
Mi
1 1
1 1
6 4
(a) Draw the state diagram for the network without blocking as well as the state diagram of the blocking network and the equivalent non-blocking network. (b) Determine the throughput XB, the mean number of jobs xi, blocking probabilities I& (i = 1,2) for the blocking network.
and the
Problem 10.8 Directly solve the CTMC of the blocking network in Problem 10.7 by using SHARPE. Note that you can easily obtain transient probabilities and cumulative transients with SHARPE in addition to the steady-state probabilities. Problem 10.9 For Problem 10.8, construct an SRN and solve it using SPNP. Compare the results obtained from SPNP with those obtained by hand computation. Note that SRNs enable computation of transients and cumulative transients. Furthermore, easy definition of rewards at the net levels enables straightforward computation of all desired performance measures.
Optimization
Analytic performance models are very well suited as kernels in optimization problems. Two major categories of optimization problems are static and dynamic optimization. In the former, performance measures are computed separately from an analytic queueing or CTMC model and treated simply as functions (generally complex and non-linear) of the control (decision) variables. In the latter class of problems, decision variables are integrated with the analytic performance model and hence optimization is intimately connected with performance evaluation. We limit our discussion to static optimization. For a treatment of dynamic optimization in the context of computer and communication systems see [PGTP96], [DrLa77], [MeDii97], [MTD94], [MeFi97], [MeSe97], and [Meer92]. Designs of early computer and communication systems made little use of mathematical optimization techniques. Because the systems were not so complex, the number of possible design alternatives was limited and a good design decision was often obvious. This state of affairs has changed dramatically since modern systems are very complex and requirements for high performance and dependability have evolved. Therefore an optimal design is not obvious any more and the number of possible design decisions can be enormous. A system designer is called on to produce a design of an implementable system from a given design specification. The design specification may include cost, performance, and dependability specifications, in addition to functional specifications. The performance specification states how well the specified functions (as stated in the functional specification) should perform in terms of throughput, response time, etc. The dependability specification states how the system tolerates component/sub-system faults. The dependability spec557
558
OPTIMIZATION
ification also includes minimum acceptable reliability, availability, etc. Cost specifications may indicate maximization or minimization of total cost or provide a constraint in the optimization. The system design methodology takes the system specification as an input and transforms it into a system configuration (characterized by a set of parameters) that fulfills the specification. These (hardware) parameters include the number of processors, the number of memories, the speed and capacity of resources, and other architectural parameters such as the system topology. Software configuration parameters describe the distribution of processes over system resources as well as resource scheduling policies. In addition to the hardware and software specification, maintenance policies might be described (e.g., preventive vs. corrective). It is thus a very difficult task to choose the configuration that best meets the system specification. Traditionally, dependability, performance, and economic analyses of computer systems have been studied separately and independently. Optimization methods provide a feasible avenue for combining performance, dependability, and cost for deriving optimal system parameters that meet the stated design specifications. Usually these questions are multiobjective optimization problems. Although it is possible to deal directly with multiobjective problems [IPM82], it is common to pick a single relevant system specification to be used as the objective function and to let the remaining design factors provide constraints. An alternative is to combine multiple criteria into a single utility function.
11.1
OPTIMIZATION
PROBLEMS
AND COST FUNCTIONS
To formulate the optimization problem it is important to determine the system parameters that can be varied and hence considered to be decision variables. Possible types of optimization are: l
Minimization
of cost for given throughput.
l
Maximization
of throughput
l
Minimization
of the mean response time with cost constraint.
with cost constraint.
The simplest cost functions are linear functions: If device (resource) service rates /.~i are decision variables, the linear cost function is given by: C(p) = 2
tip; = COST,
(11.1)
i=l
with the vector of the service rates p = (~1,. . . , pi). COST is the total cost of the considered system, and ci is the cost factor for service rate pi of resource i.
OPTIMIZATION
BASED ON THE SUMMATION METHOD
559
More realistic estimates are obtained by using non-linear cost functions: (11.2) The service rate CL;of a server depends on the speed vi of device i and the mean number of work units di per visit:
For the CPU, the mean number of instructions per CPU burst is given by di and the CPU speed (number of instructions per time unit) by vi. In the case of a storage device, di is the mean number of words in an I/O operation and 21;is the speed of the storage device in words per time unit [TrKi80]. Thus if the mean number of work units di of the jobs at station i is given, then the optimal value of the device speed vi is immediately given using the optimal values of the service rates pi. We can extend the optimization problem if we also consider the cost for main memory. In the case of a multiprogramming computer system, this cost depends on the degree of multiprogramming K (the number of jobs in the system). The cost function is then given by: C(/J, K) = C(K) + 5
(11.3)
cipyz = COST
i=l
where the C(K) is the cost of the main memory satisfying the storage requirement of K concurrent jobs. The next step is to choose the performance model. This can be a DES model or an analytic model. Among the latter we can choose a product-form queueing network model or a non-product-form queueing network model, a CTMC model, or a hierarchical model. The final step is to choose the optimization method. There exist numerous optimization methods that can be used in combination with analytical methods for the performance evaluation introduced in this book. We show the use of a solution method based on Lagrange multipliers. We introduce an approximation method based on SUM (see Section 9.2) and a more complex but exact method, which is based on the convolution method (see Section 8.1). Finally, we show the use of geometric programming in the design of a memory hierarchy. For further study of performance optimization including the issue of convexity, see [TrKi80], and [SRT88].
11.2
OPTIMIZATION
BASED ON THE SUMMATION
METHOD
This method was introduced by Belch, Fleischmann, and Schreppel [BFS87] (therefore called BFS method for short) and is now demonstrated using the following example:
560 11.2.1
OPTIMIZATION
Maximization
of the Throughput
We assume a closed queueing network with N stations and K jobs of a single class. The visit ratios e; and the functions fi (see Eq. (9.15)) that describe the behavior of the queues are also given. The correction factor (K - 1)/K is neglected to obtain formulae that are as simple as possible. This approach is valid if the number of jobs is large, because then (K - 1)/K is close to 1. However, usable formulae are also obtained if the correction factor is included. These provide more accurate values for the optimum, especially for small values of K [AkBo88b]. For Xi = X . ei, rni = 1 and pi = $ we obtain: Type-l, 2,4, and mi = 1, Ki(X9P.i)
=
fi(X,/%)
=
Type-3 IS. We seek service rates pi that maximize the overall throughput X and satisfy a linear cost constraint: C(p) = g,,i
= COST +
i=l
Here the cost C(K) for the main memory is neglected but can easily be included by repeating the optimization for several values of K and choosing the one with the maximum throughput . The service rates must satisfy the system equation as well as the cost constraints. The system equation is as follows, using Eqs. (9.16) and (9.22):
The optimization problem can be solved using Lagrange multipliers [BrSeSl]. The Lagrange function L(X, ~1,. . . , PN, yi, 92) with objective function x and two Lagrange multipliers yi and y2 is given by: N
q~,Pl,...,
~N,yl,!/2)=~+~1
CG/X-COST i=l Xei +Y2
c(
#IS
+$-” Pi
-
. i
Xei
)
A necessary condition for optimal service rates pi and maximum throughput X is obtained by differentiating with respect to X7 Pi7 Yl, Y2: dL ax=
E=o @i
-= dL 0
0, ’
8Yl
i=l
aL ,“‘7 N ? -= dY2
(11.4)
’
(11.5)
0 *
OPTIMIZATION
BASED ON THE SUMMATION METHOD
561
The following optimal values are obtained by solving the preceding system of equations:’ Jj* =
COST *K
(11.6)
(~~)‘+K~w
type i # IS, (11.7) type i = IS. It can be seen from Eq. (11.6) that the maximum throughput A* is directly proportional to the total cost COST. The optimal service rates pf cannot be explicitly stated in more complicated optimization problems with non-linear cost functions, several types of stations, and further auxiliary constraints. They can be obtained iteratively from the system of equations. Further results for some of the optimization problems already mentioned are given in the following. 11.2.2
Minimization
of Cost
Next we consider the minimization of cost subject to a minimum throughput requirement, A. By the same method as in Section 11.2.1, we obtain a closedform formula for the approximate optimal values of service rates:
The minimal cost is given by:
C”(p)= &lCi. i=l
‘Since the summation method is an approximation values are approximate optimal values.
method for performance
analysis, these
562
OPTIMIZATION
The following iteration method can be used for a non-linear cost function: pi = 1, for i = 1, . . . , N. Initialize: Iterate until the deviations are sufficiently small: y=x.
($&q2,
(11.9)
i=l
, Jwi + 7a&pqt-1
type i # IS, (11.10)
1 hei ( >
a,--1
,
11.2.3
Minimization
type i = IS.
QiC,
of the Response
Time
The problem of minimizing the mean response time T is equivalent to the problem of maximizing the overall throughput. This result follows from Little’s theorem, T = K/X, and the fact that the number of jobs K in a closed network is constant. Explicit formulae can be provided if the cost function is linear. In the following we wish to maximize the throughput with Example 11.1 cost constraints. The service rates that produce maximum throughput are to be calculated. The example consists of a product-form queueing network with N = 5 stations, three of which are of type -/M/l and two of type -/G/oo. The queueing network contains 20 jobs. The visit ratios are as follows: el = 1, e2 = 0.2, e3 = e4 = 0.5, e5 = 0.3. We assume a linear cost function with the following coefficients: Cl = 10, c;! =
c3
= 5,
c4
= 2,
c5
= 1.
The total cost is constrained to be COST = 100. Using the BFS method, the following values are obtained for approximate optimal service rates that achieve the throughput X* subject to the given cost constraints: /A; = 6.189, /L; = 1.689, /L; = 3.812, & = 1.096, /~uj= 1.236, with the approximate optimal throughput: A* = 6.189. The exact value of the throughput X = 6.569 for the approximate optimal service rates pL;*is obtained by means of MVA. In order to verify these approximate optimal-values, we vary the service rates, subject to the constraint: N c i=l
Ci/..Li
= 100,
OPTIMIZATION
563
BASED ON THE SUMMATION METHOD
as before. The new throughput is then compared with the approximate optimal throughput. If the following values of the service rates are used: p1 = 7, /.L2= 1.7, pg = 3.8, pLLq = 1, pug= 0.5, the value for the throughput obtained by MVA is X = 6.473. This is less than X = 6.569, the value of the throughput obtained using the approximate optimal optimal service rates pa. Similar results are obtained for other values of the service rates that satisfy the constraint given previously. If we increase the distance between the selected service rates and those that produce the approximate optimal throughput considerably: PI
= 8,
then the throughput
P2 = 2,
P3 =
p4 =
2,
CL5 =
1,
decreases and approaches its minimal value X = 2.
Table 11.1 Maximization of the throughput Example BFS Method
1,
1
with cost constraints (COST = 100)
Example 2
Complex Method
BFS Method
Example 3
Complex Method
BFS Method
Complex Method
Pi & & 4 CL:
6.90 1.69 3.81 1.13 1.24
6.81 1.65 3.99 1.01 1.50
8.07 1.30 2.83 0.90 1.10
7.97 1.41 2.95 0.93 1.13
3.04 0.66 1.27 0.59 0.67
3.00 0.69 0.80 0.79 0.84
%FS hNA
6.19 6.57
6.56 6.55
6.66 7.54
7.56 7.57
2.63 3.00
2.95 2.97
100.00
99.81
100.00
99.97
100.00
100.00
COST
Table 11.1 contains additional examples of the maximization of the throughput of a network with linear cost constraints. It compares values obtained by the BFS method with values obtained by the complex method which requires much more computation. If we consider the simplicity of our method, it is surprising how close our values are to those obtained by the complex method. These examples show that optimization problems can be relatively easily solved using the system of equations. If the cost constraint is linear, then the optimal service rates ,LL;can be explicitly stated. If the cost function is non-linear, then a simple iteration method converges well. Table 11.2 contains the results for the optimization of the throughput with a non-linear cost function. The service rates that produce optimal throughput were used as input values for the exact MVA. The BFS method is well suited for solving optimization problems because it provides a simple relation between the quantity to be optimized (e.g., throughput) and the decision variables (service rates). If the cost constraint is linear, then explicit formulae for optimal service rates and throughputs are provided. It is also possible to apply the BFS
564
OPTIMIZATION
Tab/e 11.2 Optimization of throughput the non-linear cost function)
(CX= cxpouent of
a
1
1.25
1.5
2.0
2.5
3
6 CL; Pu5 4 d
6.912 1.689 3.182 1.096 1.236
4.911 1.242 2.727 0.921 0.998
3.884 1.017 2.173 0.821 0.883
2.860 0.806 1.625 0.735 0.781
2.361 0.716 1.363 0.706 0.774
2.067 0.674 1.214 0.704 0.732
%FS xMVA
6.189 6.596
4.438 4.711
3.532 3.752
2.625 2.790
2.180 2.319
1.918 2.039
method to non-product-form queueing networks with arbitrarily distributed service times if we use the corresponding formulae in the system equation (see Section 10.1.4.3).
11.3
OPTIMIZATION ALGORITHM
BASED ON THE CONVOLUTION
With the BFS method, combined with Lagrange multiplier, we obtained approximate results for the optimization problem. To obtain exact results, we can again use the concept of Lagrange multipliers together with the convolution algorithm [TrKi80, SRT88]. 11.3.1
Maximization
of the Throughput
In the convolution algorithm we use the following formula (see Eq. (8.14)) for the throughput:
G7 K) =
G(P7K - 1) G(CL’K) ’
(11.11)
with:
5 ki=K ‘=I
2=1
and:
Pi(h) =
ki!7
ki I mi7
mi!mFierni,
ki 2 mi,
17
mi = 1.
(11.12)
OPTIMIZATION
BASED ON THE CONVOLUTION ALGORITHM
565
As a cost function, we use the one suggested in Eq. (11.2) with additional main memory cost C(K) : N
C(p, IT) = C(K) + c
(11.13)
c~,cL;~= COST.
i=l
To maximize the throughput with a cost constraint, we can use the following Lagrange function, which is derived from Eqs. (11.11) and (11.13)):
-qcL,Y, K)
= X(/A, K) + y
C(K) - COST + 5 tips”
-
i=l
In order to obtain the optimal values for ~1, we differentiate the Lagrange function. This operation results in the following non-linear system of equat ions: 8L G =O, aL ay=
0
i=
l,...,N,
(11.14) (11.15)
*
We can use the computer algebra program MAPLE to obtain the derivatives and solve the non-linear system of equations. Alternatively, Moore’s formula [Moor721 for the normalization constant can be used to obtain explicit formula for the preceding derivatives [TrKi80]. The system of non-linear equations can then be solved using the Newton-Raphson or other suitable method. The solution of the system is the optimum service rate vector ~1that fulfills Eq. (11.15). Th e maximum throughput can then be obtained with Eq. (11.11) Example 11.2 Consider a multiprogramming computer system with a CPU and three disk devices. The product-form queueing network model is given in Fig. 11.1. The corresponding routing probabilities are: Pll
= 0.05,
P12 =
0.5,
p13 = 0.3,
$I14= 0.15,
p21 = p31 = p41 = l,(),
For special values of the cost coefficients ci and the exponents oi (given in Table 11.3), the total budget COST = 500 and a linear memory cost function C(K) = C,K with C, = 50, the optimal values of the throughput X”, the service rates &’ and degree of multiprogramming K are given in Table 11.4. We see that we have an optimum of throughput when degree of multiprogramming K = 2. The computation time for this method is high compared to the BFS method. On the other hand this method delivers exact results. Problem
11.1
Use the BFS method for Example 11.2.
566
OPTIMIZATION
Disk
CPU
Disk
Fig. 11.1
Central-server model of the multiprogramming
system in Example 11.2.
Table 11.3 Cost factors cz and exponents a, for the central-server model of Fig. 11.1
CPU Disk1 Disk2 Disk2
11.3.2
Optimal
Design
131.9 11.5 54.2 54.2
of Storage
0.55 1.00 0.67 0.67
Hierarchies
This example is intended to address the problem of optimally determining the technology and capacity of the different memory levels of a linear storage hierarchy. The storage hierarchy is used in a multiprogrammed environment. Our goal is to maximize the system throughput subject to a cost constraint. It is assumed that we know the number of storage levels as well as the capacity of the slowest level. Consider the storage hierarchy, given in Fig. 11.2 [TrSiSl]. It consists of m electronically accessed levels (numbered 1,2, . . . , m), as for example a cache and p mechanically accessed levels (numbered m + 1, m +
Table 11.4 Optimal throughput A* with the optimal service rates ~5 for the central-server model of Fig. 11.1
1 2 3 4 5
0.071 0.091 0.089 0.076 0.059
3.090 2.819 2.356 1.967 1.557
4.240 3.373 2.684 1.979 1.084
2.054 1.525 1.238 0.847 0.598
1.376 1.032 0.752 0.571 0.431
OPTIMIZATION
BASED ON THE CONVOLUTION ALGORITHM
567
Cost per byte
Access time and capacity
Delay boundary
Fig. 11.2
A
linear storage hierarchy.
2 - - 7m + p), e.g., disk. Let M = m + p. These two groups are divided bk a delay boundary. It is assumed that the speed increases with higher levels (lower numbers denote a faster level) and the capacity decreases with higher levels. Whenever a piece of information is not found in level 1, a request is sent to a lower level. This activity goes on until the request can be satisfied. If i < m (meaning the device is above the delay boundary), it is assumed that the CPU is kept waiting until the information is retrieved, meaning that there will be no more than one request at levels 1,2, . . . , m. For m + 1 5 i 5 m + p, a process swap is performed to allow another process to use the CPU, meaning that several requests may be queued at each of the memory levels m+l, m+2,. . . , m+p. This linear storage hierarchy is modeled using a closed queueing network where K (the multiprogramming degree) programs circulate sharing the resources of the system. The capacity of the ith level is bi bits, which are organized into blocks of si bits each. The block sizes si, i = 1,2,... , M, are assumed to be known. The K active programs share the capacity of each level equally. A success function Hi denotes the probability that a storage reference is a hit at level i. A reasonable assumption is that this function depends only on hi/K and si [TrSi81]. Hi = H&/K,
si)
i = 1,2,. . . ,M.
Let the miss ratio function Fi be defined as & = 1 - Hi. The hit ratio is the probability that a storage reference generated by the CPU is found in
568
OPTIMIZATION
level i and not in levels 1,2, . . . , i - 1. The hit ratio hi is given by:
The definition of bu and so is such that Fe(be/K, SO) = 1 and F~(blll/K, 0. Let pa be the probability that a given request is the last memory i.e., the program has finished execution). Then the average total ( of memory references per program is given by l/pe. The probability memory request at level i is satisfied is then given by: pi=(l-pa)hi,
i=1,2
,...,
SM) = request number that a
M.
The closed queueing network model of this system is shown in Fig. 11.3 [TrSiSl]. All 1evels where requests are not queued are merged into an equiva-
fig. 11.3
Queueing network model of the linear storage hierarchy shown in Fig. 11.2.
lent server. The branching
probabilities
PO0 =
=
are given as:
“; CPi’ i=l
Ipoj
for the network
l?iz
)
j = 1,2 ,...,
p.
1-EPi i=l In [TrSiSl]
the service time for the equivalent
server is derived:
(11.16)
OPTIMIZATION
BASED ON THE CONVOLUTION ALGORITHM
569
In this context tj denotes the average transfer time for a block of size sj-1 from the jth level to the (j - l)st level. Furthermore we assume that the service discipline at all nodes is FCFS and all service times are exponentially distributed. The goal is to maximize the system throughput, which is defined as the average rate of flow along the new program path, X = poo,x~po, where pe is the utilization of the equivalent server node and is given by:
po= s m-4 K - 1)) PO w4K) I-1=(Po,P1.1,...,ccp), x0=;
=gpi2ti+
5
i=l
=
(’
j=l
-PO)
tl
+
pi&j,
i=m+1
j=l
~tj~j-l(bj-l/K,
i
(11.18)
,
Sj-1)
j=2
1
P
1 -
(11.17)
= eiti+,
= ti+m
c
Pi
Pj+-rn,
j=l
=
(1 - PO)ti+mFi+m-l(Ki+m-l/K,
Si+m-l),
i =
1,2,.
. . ,P.
(11.19)
In these expressions, e = (ee, er, . . . , eP) is the vector of the visit ratios of the specified closed queueing network. Since the ei can only be determined within a multiplicative constant, we choose: m
ee-l-
c
Pi,
i=l
which yields: P ei =
c
Pj+m*
j=l
The system throughput
The optimization words, to minimize:
is given by:
problem is to maximize the throughput,
or, in other
Let the technology cost curve for the ith level be given by Ci(ti). The total cost of memory level i is given by ci(ti)Bi (bi), where Oi(bi) accounts for the
570
OPTIMIZATION
economies of scale that are present in the memory t ethnology. The optimization problem can now be defined as follows: Minimize x(p, K) = -1 G(CL,K) POG(CL,K - 1) ’ with p as defined in Eqs. (11.17), (11.18), and (11.19), and subject to: 5
ci(ti)Oi(bi)
5 COST.
(11.20)
i=l
In this case the decision variables are the capacities and the average access timesforeachlevel(bi,t;,i= 1,2 ,..., M). Thevaluesm,p,K,pa,bM,si ,..., SM-1, and COST are assumed to be fixed parameters. Furthermore, we assume that & and 8i are functions of the memory capacities bi. The b;, ci , and 8i are posynomial functions of the access times ti (posynomials are generalizations of polynomials where the coefficients are positive real numbers while the exponents are allowed to be arbitrary real numbers). In the special since: case of K = 1, x(p, 1) is a polynomial function of the ,UU;
The optimization problem in the case of K = 1 is then seen to be a standard geometric programming problem , provided we assume that bi, ci, and 1!3iare polynomial functions of ti. In [TrSi81] it is shown that even in the multiprogramming case, the optimization problem becomes a convex programming problem when logarithmically transforming the variables ti and bi (i.e., ti = e”z and bi = eWz,i = 1,2,. . . , A4) and requiring the functions Fi (bi ) , ci (t; ) and hi(bi), i = 1,2,. . . , M to be posynomial functions. Thus any local solution to the optimization problem is also its global solution.
12 Performance
Analysis Tools
Performance analysis tools have acquired increased importance due to increased complexity of modern systems. It is often the case that system measurements are not available or are very difficult to get. In such cases the development and the solution of a system model is an effective method of performance assessment. Software tools that support performance modeling studies provide one or more of the following solution methods: l
Discrete-event simulation.
l
Generation and (steady-state and/or transient) solution of CTMC and DTMC.
l
Exact and/or approximate solution of product-form
l
Approximate
l
Hierarchical (multilevel) models combining one or more of the preceding methods.
solution of non-product-form
queueing networks.
queueing networks.
If we use DES, then the system behavior can be described very accurately, but computation time and resource needs are usually extremely high. In this book queueing network solutions as well as Markov chain analysis methods have been introduced to analyze system models. Queueing networks are very easy to understand and allow a very compact system description. For a limited class of queueing networks (see Chapters 8 and 9), so-called product-form queueing networks, efficient solution algorithms (such as convolution, MVA, SCAT) are available. But many queueing networks do not fulfill the productform requirements. In this case approximation methods can be used (see 571
PERFORMANCE ANALYSIS TOOLS
572
Chapter 10). It is also possible to develop a multilevel model to approximately solve a non-product-form queueing network. If the approximation methods are not accurate enough or cannot be applied to a certain problem, as can, for example, be the case when we have non-exponentially distributed service times or when blocking is allowed in the network, then DES or CTMC is used. As we have seen in earlier chapters, the state space for even simple systems can be huge and grows exponentially with the number of nodes and the number of jobs in the network. Nevertheless, due to increasing computational power and better solution algorithms, CTMCs have acquired greater importance. For the solution of our models we realize that: l
The application of exact or approximate solution algorithms for queueing networks is very cumbersome, error prone, and time consuming and hence not feasible to carry out by hand calculations.
l
The generation and solution of the CTMC/DTMC tems by hand is nearly impossible.
for even small sys-
We therefore need tools that can automatically generate the state space of the underlying CTMC and apply exact or approximate solution algorithms and/or that have implemented exact and approximate algorithms for queueing networks. In this chapter we introduce four representative performance modeling and analysis tools: PEPSY, SPNP, MOSES, and SHARPE. As mentioned previously, tools differ not only in their solution algorithms (DES, exact and approximate solution algorithms for queueing networks, generation and transient and steady-state solution of CTMCs) but also in their input/output facilities. There are tools that have a graphical user interface (GUI) and/or a special input language (batch mode or interactive) or both. Some tools are based on special model types such as queueing networks, SPNs, CTMC models, or precedence graphs (and others) or a combination of these model types.
12.1
PEPSY
PEPSY (performance evaluation and prediction system) [BoKi92] has been developed at the University of Erlangen-Niirnberg. Using this tool it is possible to describe and solve PFQNs and NPFQNs. More than 30 solution algorithms are incorporated. It is possible to specify open, closed, and mixed networks where jobs cannot switch to another job class. Closed job classes are described by the number of jobs in each class, while open job classes are described by the arrival rate of jobs at the class. To compute performance measures such as throughput, utilization, or mean response time of a given network, different exact as well as approximation algorithms are provided for PFQN and NPFQN. The network is described in textual and/or graphical form. The X11-windows version of PEPSY is called XPEPSY [Kirs93]. PEPSY
PEPSY
573
can be used on almost all UNIX machines, whereas XPEPSY needs the X11 windows system. A Windows95, WindowsNT version WinPEPSY with similar features is also available but it has a restricted set of solution algorithms. 12.1.1
Structure
of PEPSY
12.1.1.1 The input File The PEPSY input file contains the specification of the system to be modeled. The specification file has a name of the form e-name and is divided into a standard block and possibly additional blocks that contain additional information specific to solution algorithms. The standard block contains all necessary descriptions for most of the solution methods, for example, the number of nodes and job classes, the service time distribution at each node, the routing behavior of the jobs, and a description of the classes. The input file is usually generated using the input program, which is provided by PEPSY. Information that is not asked for by the input program is provided using the addition program. The type of each node (and its characteristics) needs to be specified: the -/M/m-FCFS, -/G/l-PS, -/G/-IS, and -/G/l-LCFS) and several other node types. For example, two types of multiple server nodes with different service time distributions -/M/m-FCFS-ASYM and -/G/m-FCFS-ASYM can be specified. The service time distribution is defined by its first and second moment, or by the service rate and the squared coefficient of variation. For the solution of the specified models, many different algorithms are implemented in PEPSY. These algorithms can be divided into six groups:
1. Convolution algorithm for product-form 2. MVA for product-form
networks.
networks.
3. Approximation
algorithms for product-form
4. Approximation
methods for non-product-form
networks. networks.
5. Automated generation and steady-state solution of the underlying CTMC. 6. DES.
In addition to these six groups, there are some methods and techniques that do not fit in any of these groups. For example, the bounds method performs a calculation of the upper bounds of the throughput and lower bounds of the average response time. 12.1.1.2 The Output file For each file name e-name, an output file with the computed performance measures is generated. It is called xx-name where “name” is the same as in the input file and “xx” is an abbreviation for the used method. The output file consists of a short header with the name of
574
PERFORMANCE ANALYSIS TOOLS
the model, the corresponding input file, and the solution method used to calculate the performance measures. The performance measures of all nodes are separated by classes. The output file contains the performance measures for each node (throughput, visit ratio, utilization, average response time, average number of jobs, average waiting times, and average queue length) and the performance measures for the entire network (throughput, average response time and average number of jobs). 12.1.1.3 Control Files As already mentioned, PEPSY offers more than 30 solution methods to compute performance measures of queueing networks. In general, each method has certain limitations and restrictions. Because it would be impossible for the user to know all these restrictions and limitations, control files are used in PEPSY. Each control file contains the limitations and restrictions for each solution method, namely the network type (open, closed, or mixed), the maximum number of nodes and classes allowed, whether the method can deal with routing probabilities or visit ratios, whether the service rates at -/M/m-FCFS nodes have to be the same for different job classes, the range of the coefficients of variation, and the maximum number of servers at a -/M/m node. 12.1.2
Different
Programs
in PEPSY
The following lists the most important detailed reference see [Kirs94]. input:
programs of PEPSY. For a more
This program is used for an interactive description of a queueing network.
addition: For some solution algorithms, additional information is needed. This information is provided by invoking the addition program. selection: Using this program, all possible solution algorithms can be listed that are applicable to the system description (the file that was just created). The output of this program consists of two columns. The left-hand column contains all solution algorithms that can be applied directly, while the right-hand column contains the methods that can only be applied after specifying an additional block (using the addition program). analysis: After a complete system description, the queueing network desired solution algorithm can be applied by invoking the analysis program. The results are printed on the screen as well as written to a file. pinf:
This program is invoked to obtain information about the solution method utilized.
PEPSY
575
transform: As already mentioned before, it is possible to specify the network using either routing probabilities or visit ratios. By invoking the transform program, visit ratios can be transformed into routing probabilities. pdiffz By invoking pdiff, the results of two different solution methods can be compared.
Fig. 12.1
12.1.3
Example
Simple queueing network example.
of using PEPSY
The main steps in working with PEPSY are described herein. Starting with the model, first the model file is created, then the solution method is selected, and the analysis is carried out. The model to be analyzed is shown in Fig. 12.1. At first the input program is invoked by means the command input. The user will then be asked to enter the type and number of job classes (a closed network with only one job class in our case), the number of nodes (three in our case), the type of each node, and its service rate. After the basic network information is specified, the routing information is entered. In PEPSY we need to consider one special point when talking about routing: open job classes contain a (combined) source/sink-node that is used in PEPSY as a reference node (outside node) for relative performance values. For closed queueing networks the outside node also needs to be specified. Finally the number of jobs in the network needs to be specified. The corresponding model Now the file is saved in, say, e-first-example. description, generated by input, is shown in Fig. 12.2. The next step is the selection of the solution method. PEPSY supports the user with a built-in database with the limitations and restrictions of each algorithm and returns a list of applicable solution methods. In our example the result, of applying the selection program to the model description is shown in Fig. 12.3. This list contains all the methods that can be used to analyze the network e-simple-example. In addition to the standard network description that has been entered in the beginning, some methods need further parameters.
576
PERFORMANCE ANALYSIS TOOLS
fig. 12.2
Fig. 12.3
Model description
The list of applicable
solution
file e-simple-example.
methods
presented
by the selection
program.
577
PEPSY P
\ PERF?ORMANCE,INDICES FORNET: simple,exaqKle description of the network is In file ~e,slmpSe,example' the closed network was 801ved using z;he method 'mari@' jobclass 1 maa l/mu rho I lambda B llWZ marle -----------*---------------------------------------------------------------0.613 0.166 3.922 I.000 0.100 0,392 CPU 3.503 0.700 0,333 0,915 1.276 disk I 2.746 5.883 s.000 o.ooa 6 .a00 1.177 Q*300 tarminaX I characteristic indices: t lambda marie -----------$--------------*-----------mvz I
3.922
2.549
mwsl 0.221 2.588 0.000
maa 10.000
legend 6 : average number of visits rho : utilisation lllV2 : average respbnae time maa : average number of jab8 mwz : averaGe waiting time snws2: average queue-length
Fig. 12.4
mwz 0.056 0.943 0.000
mu : smvice rate lambda: mean throughput
/
Performance measuresof the network esimple-example.
For example, in order to run the sim2 method (DES), some information about the maximum simulation time and accuracy needs to be specified. If the user is not familiar with a particular method, PEPSY assists the user with online help. Suppose the user selects the method name marie and invokes the solution method by entering the command analysis marie simple-example. The resulting performance measures are shown on the screen and are also written into the file a-c-simple-example. The contents of this file are shown in Fig. 12.4. 12.1.4
Graphical
User Interface
XPEPSY
To demonstrate how to work with XPEPSY, the same queueing model as in Section 12.1.2 is used. After the program is started, the user can graphically draw the network as shown in Fig. 12.5. Nodes or connections can be created, moved, or deleted simply by mouse clicks. The necessary parameters are specified in the corresponding dialog boxes. Consider, for example, the node data dialog box, shown in Fig. 12.6~ that appears when clicking on the service station in the drawing area and that is used to specify node parameters. When the queueing network is fully specified, XPEPSY switches from the edit mode to the analysis mode and a new set of menus will appear. Now PEPSY assists you in selecting the solution method. Four possibilities are offered: 1. Select from a limited set of methods, i.e., the ones that are very well suited for the problem. 2. Select from all possible methods.
578
PERFORMANCE ANALYSIS TOOLS
1 22
~~~Wgr
Project w
Mode
Optlons
About
Help1
-----
1’ _ -__“_x”“-_‘L ._
Crld:
50 -J ---I
-.--.”
..--- -“-“.--_--
_--. “““-xI
Working field
_-.--A--
_--_ -‘-.--.~-..-“-2~.-~-.---_~-x_L.-.-.-’-
Standard J ] / Class 1 -A
J
/ Dtrectlon
9
/ Model-name
first
fig. 12.5 The main XPEPSY window.
--
Nods Data
ode name IEPLI --~~---“--~-
,~ ,,I. * I.I, ~::’ 2 2’
Node type Number of nodes
XPEPSY
(b) /
--
I
Class 1
TransItIon Probablhtles
-I 1
2 DISK b 7 r-----~ 3 TERM $1.3 r-
-*1 I----
iI
I”““” Analysis “x” ” , “” * Standard v’ Extended v‘ Results v Non feasible
L-
“-
”
x”x
” ~x~“.~-~xI”“I”“”
Selected method [yilzr--------
M/G/7 - PS
~-“-“--
Method lnformatlon
-m-v---
Analysis
method options 2: :: a’ I : j: F-------numerical parameters
Reget Value_d --
APP’Y
j I
r-
Fig. 12.6 (a) The node data dialog box and (b) the menu of solution methods corresponding to Fig. 12.5.
SPNP
579
3. Show only methods that have already been used and for which valid results can be found without resolving the model. 4. Show all methods that cannot be used due to inherent restrictions of the solution method. Now the solution can be carried out by simply activating the Analysis button from the analysis menu, shown in Fig. 12.6b, and the results are presented on the screen (see Fig. 12.4). The results can also be presented as a histogram or as a line drawing. In a line drawing it is also possible to show an output parameter as a function of an input parameter. Some additional programs from PEPSY are integrated into XPEPSY as well. Using the Method Information button, the PEPSY internal database can be searched for information about the selected method.
12.2
SPNP
The stochastic Petri net package (SPNP) d eveloped by Ciardo, Muppala and Trivedi, is a versatile modeling tool for performance, dependability, and performability analysis of complex systems. Input models developed based on the theory of stochastic Petri nets are solved by efficient and numerically stable algorithms. Steady-state, transient, cumulative transient, time-averaged, and up-to-absorption measures can be computed. Parametric sensitivity analysis of these measures is efficiently carried out. Some degree of logical analysis capabilities is also available in the form of assertion checking and the number and types of markings in the reachability graph. Advanced constructs available - such as marking-dependent arc multiplicities, guards, arrays of places and transitions, and subnets - reduce modeling complexity and enhance the power of expressiveness of the package. The most powerful feature is the capability to assign reward rates at the net level and subsequently compute the desired measures of the system being modeled. The model description language is CSPL, a C-like language, although no previous knowledge of the C language is necessary to use SP N P. 12.2.1
SPNP Features
The input language of SPNP is CSPL (C-based stochastic Petri net language). A CSPL file is compiled using the C compiler and then linked with the precompiled files that are provided with SPNP. The full power of the C programming language can be used to increase the flexibility of the net description. An important feature of CSPL is that it is a superset of C. Thus a CSPL user can exploit C language constructs to represent a large class of SPNs within a single CSPL file. Most applications will only require a limited knowledge of the C syntax, as predefined functions are available to define SPNP objects.
580
PERFORMANCE ANALYSIS TOOLS
Another important characteristic of SPNP is the provision of a function to input values at run time, before reading the specification of the SPN. These input values can be used in SPN P to modify the values of scalar parameters or the structure of the SPN itself. Arrays of places and transitions and subnets are two features of SPNP that are extremely useful in specifying large structured nets. A single CSPL file is sufficient to describe any legal SPN, since the user of SPNP can input at run time the number of places and transitions, the arcs among them, and any other required parameters. The SPNP package allows the user to perform steady-state, transient, cumulative transient, and sensitivity analysis of SPNs. Steady-state analysis is often adequate to study the performance of a system, but time-dependent behavior (transient analysis) is sornet imes of greater interest: instantaneous availability, interval availability, and reliability (for a fault-tolerant system); response time distribution of a program (for performance evaluation of software); computation availability (for a degradable system) are some examples. Sensitivity analysis is useful to estimate how the output measures are affected by variations in the value of input parameters, allowing the detection of system performance bottlenecks or aiding in design optimization. Sophisticated steady-state and transient solvers are available in SPNP. In addition, the user is not limited to a predefined set of measures: Detailed expressions reflecting exactly the measures sought can be easily specified. The measures are defined in terms of reward rates associated with the markings of the SPN (see Section 2.2.3). The numerical solution methods provided in the package address the stiflness problems often encountered in reliability and performance models (see Section 5.1.4.2). A number of important Petri net constructs such as marking dependency, variable cardinality arc, guards, arrays of places and transitions, subnets, and assertions facilitate the construction and debugging of models for complex systems. A detailed description of SPNP and the input language CSPL can be found in [CFMT94] and [CTM89]. SPNP has been installed at more than 100 sites and has been used to solve many practical problems at Digital Equipment Corporation, Hewlett Packard, Nortel and other corporations. Some very powerful new features that have been added to S PN P include the capability of DES of non-Markovian nets and fluid stochastic Petri nets [CNT97]. 12.2.2
The CSPL
Language
Modeling with SPNP implies that an input file describing the system structure and behavior must be prepared. Such an input file can be prepared automatically via the graphical user interface that has been recently developed as discussed in Section 12.2.3. Alternatively, the user may decide to prepare such a file himself. The language designed to do so is named CSPL, a superset of the C language [KeRi78]. What distinguishes CSPL from C is a set of predefined functions specially developed for the description of SPN
SPNP
Fig. 12.7
581
Basic structure of a CSPL input file
entities. Any legal C construct can be used anywhere in the CSPL file. All the f scanf, log, exp, etc., are available and C library functions, such as fprintf, perform as expected. The only restriction to this generic rule is that the file should not have a main function. In spite of being a programming language, CSPL enables the user to describe SPN models very easily. There is no need to be a programmer to fully exploit all the built-in features of SPNP. Just a basic knowledge of C is sufficient to describe SPNs effectively; although, for experienced programmers, CSPL brings the full power and generality of the C language. Fig. 12.7 shows the basic structure of a CSPL input file with six different segments. Parameters Segment The function parameters allows the user to customize the package. Several parameters establishing a specific behavior can be selected. The function iopt (fopt) enables the user to set option to have the integer (double-precision floating point) value. Any of the available options can be selected and modified. For example:
specifies that the solution method to be used is transient solution as uniformization and that the reachability graph is to be printed. The function input permits the input of parameter values at run time. A/et Segment The function net allows the user to define the structure and parameters of an SPN model. For built-in functions that can be used inside the net segment, Fig. 12.8 provides some illustrations.
582
PERFORMANCE ANALYSIS TOOLS
Fig. 12.8
The CSPL
net segment.
There are other functions such as guards, which allow the user to define the enabling condition of a transition as a function of tokens in various places, or functions that allow the user to define arrays of places and transitions or functions that pertain to sensitivity analysis. Assert Segment The assert function allows the evaluation of a logical condition on a marking of the SPN. For example, the assert definition:
f
assex% 0 I if tmark(“p2”3+mark(‘~p3”) ret= (RESJRROR) ; else returnCRES,NOERR); It
> != 4 I I enabledC’%ii”)
&& enabledf”t7”)3
J
will stop the execution in a marking where the sum of the number of tokens in places p2 and p3 is not 4, or where t 11 and t7 are both enabled. Ac-init and AC-reach Segment The function ac-init is called just before starting the reachability graph construction. It can be used to output data about
SPNP
583
the SPN in the “. out” file. This is especially useful when the number of places or transitions is defined at run time (otherwise it is merely a summary of the CSPL file). The function ac-reach is called after the reachability graph construction is completed. It can be used to output data about the reachability graph in the “. out” file. The function ac-final is called after the solution of the CTMC has been completed, to carry out the computation and printing of user-requested outputs. For example, the following function: AC-final Segment
writes in the “ . out” file for each place the probability that it is not empty and its average number of tokens, and for each transition the probability that it is enabled and its average throughput, respectively. The derivatives of all the preceding standard measures with respect to a set of previously chosen parameters are computed and printed. In the end it writes the message goodbye. Before applying the ac-f inal function, additional functions often have to be provided. This can be done, for instance, using the construct reward-type as it is shown in the following example: reward-type reward-type reward-type l
.
epl(l c return(mark(“p1”)) ep30 I return(mark(“p3”)) ep7 (1 < return(mark(“p7”)
; 3 ; 3 >; 3
.
ac,f inaL (1 -C x = expected(epl)*expectedCsp7)+expected(ep3)*1.2; printf (“%f I’ ,x1 ;
3
1
Example Using the example of Fig. 12.9, we show that the application of SPNP to analyse systems that can be modeled by an SPN is a straightforward procedure. The inhibitor arcs avoid a capacity overflow in places pl and p2. The CSPL file for this SPN is shown in Fig. 12.10. A short description of this specification follows:
Line 4-6: The reachability graph, the CTMC, and the state probabilities the CTMC are to be printed.
of
Line 10-22: The places, the initial marking of the places, the transitions, their firing rates, and the input, output, and inhibitor arcs are defined. Line 26-29: Check whether a capacity overflow in either place pl or place p2 has occurred.
584
PERFORMANCE ANALYSIS TOOLS
Pl
P2 fig. 12.9
Fig. 12.10
Petri net example.
CSPL file for the SPN in Firr. 12.9.
SPNP
Line 33-37: Data about the SPN and the reachability
585
graph of the SPN are
to be printed.
Line 40: The number
is the reward
Line 41: The number
is the reward
of tokens in place pl in each marking rate sl in that marking.
of tokens in place p2 in each marking rate s2 in that marking.
Line 42-49: Self-explanatory. Line 52: Data about the CTMC
and its solution
are to be printed.
Line 53-54: The mean number of tokens in the places pl and p2 are to be computed
and printed.
Line 55-57: The probabilities computed
12.2.3
that the places pl and p2 are empty
are to be
and printed.
iSPN
Input to SPNP is specified using CSPL, but iSPN (integrated environment for modeling using stochastic Petri nets) removes this burden from the user by providing an interface for graphical representation of the model. The use of Tcl/Tk [Welc95] in designing iSPN makes this application portable to all platforms [HWFT97]. The major components of the iSPN interface (see Fig. 12.11) are a Petri net editor, which allows graphical input of the stochastic Petri net, and an extensive collection of visualization routines to display results of SPNP and aid in debugging. Each module in iSPN is briefly described in the following:
Input
Data: iSPN provides a higher level input format to CSPL, which provides great flexibility to users. iSPN is capable of executing SPNP with two different file formats: Files created directly using the CSPL and files created using iSPN’s Petri net editor. The Petri net editor (see Fig. 12.12), the software module of iSPN that allows users to graphically design the input models, introduces another way of programming SPN P: The user can draw the SPN model and establish all the necessary additional functions (i.e., rewards rates, guards, etc.) through a common environment. The Petri net editor provides several characteristics normally available only in sophisticated two-dimensional graphical editors and a lot of features designed specifically for the SP N P environment.
iSPN also provides a textural interface, which is necessary if we wish to accommodate several categories of users. Beginners may feel more comfortable using the Petri net editor, whereas experienced SPNP users
586
PERFORMANTE
ANA1 YSIS TOOI 5
Modeling Environments
Manager
Y Fig. 12.11 Main software modules for iSPN.
may wish to input their models using CSPL directly. Even if the textual input is the option of choice, many facilities are offered through the integrated environment. In both cases, the “SPNP control” module provides everything a user needs to run and control the execution of SP N P without having to switch back and forth among different environments. Output Data: The main goal of most GUIs is only to facilitate the creation of input data for its underlying package. Usually, the return communication between the software output and the GUI is neglected. One of the advantages of iSPN is the incorporation of displaying SPNP results in the GUI application. iSPN’s own graphing capability allows the results of experiments to be graphically displayed in the same environment (see Fig. 12.13). Different combinations of input data may be compared against each other on one plot or viewed simultaneously. The graphical output format is created in such a way that it may be viewed by other visualization packages such as gnuplot of xvgr.
SPNP
fig. 12.12
The Petri net editor.
Fig. 12.13
The iSPN output page.
587
588
PERFORMANCE ANALYSIS TOOLS
12.3
MOSES
In this section the CTMC-based tool MOSES (modeling, specification, and evaluation system), developed at the University of Erlangen, is introduced. MOSES is based on the system description language MOSEL (modeling specification, and evaluation language) [BoHe96]. The core of MOSEL consists of several constructs to specify the possible state and the state transitions of the CTMC to be analyzed. The basic way that MOSES computes the performance measures is shown in Fig. 12.14.
Fig. 12.14
Different steps when computing the performance measures with MOSES.
MOSES gets as input the system description (in MOSEL) and develops the underlying generator matrix Q. Five solution methods for the CTMC are provided (Grassmann (see Section 3.3.2), Jacobi (see Section 3.4.3), Gauss-Seidel (see Section 3.4.4), multilevel method (see [Hort96]), and uniformization (see Section 5.1.4)). From the state-probability vector, the system performance measures of interest are computed. In the following section we provide a brief overview of the model description language M OSEL. 12.3.1
The Model
Description
Language
MOSEL
MOSEL consists of a series of segments, described in the following: The DECLARATION
Part:
This method of definition is similar to the corresponding C constructs except that it only allows numerical values (integer and floating point) and no expressions or other types of assignments.
#define
#enum This construct allows the definition of a set of constants. If we do not assign any integer value to a constant name in the enumeration list, then this constant gets a default integer value. For example, the following
MOSES
589
two enum declarations are both identical: enumcpu,stal;e (idle enum cpu,stats
= 0, user = 1, kernel = 2, driver ~idle,user,kern~l,drives);
= 3);
This defines how a state vector looks The VECTOR Description Part: and which state vectors are prohibited (NOT Part). (NODES Part) In this part of the specification the components of the state NODESPart vector are specified. For each component of the state vector, its name and capacity (since we consider finite CTMCs) must be given. As an example, consider: ; /a Range: Q,,K */ NODEnade,lCKI NUDEcpuLcpu,stateI; /* Range: 0+.3 */
For the definition of cpu-state, see the example in the #enum construct given previously. There is also the possibility for defining subnodes within a NODE declaration. For instance: NDDENICK; Nlifi];
N22[1]]
/* Node Nl could model a M/E2/1 queue *
Subnodes can, for example, be used for components with Erlang distributed service times. START Part This optional part specifies a valid start state for the transient analysis. NOT Part This construct is used to specify the prohibited system states. If we have, for example, a closed queueing network, then the sum of jobs over all stations in the network cannot be different from K (the overall number of jobs in the system). The RULES Part: This is the most important segment in the system because the RULE constructs are used to specify the state transitions. Each rule consists of a global part and, optionally, a local part. Consider, for example, the following two rules where the first rule consists only of a global part while the second one has also a local part, consisting of two local rules:
The first rule specifies a transition from node-l to node-2 with transition rate mu-l. Since we do not specify a routing probability, 1.0 is chosen as default value. It should be noted here that instead of directly specifying the state transitions of the underlying CTMC (although it can be done), service
590
PERFORMANCE ANALYSIS TOOLS
rates of stations and routing probabilities of the network are specified. This compact method of specification is much easier and less error prone. The second rule consists of two local rules. The global part states that transitions begin at node-l and all take place with transition rate mu-l. These transitions occur only if statenode-l == up. In the local part we define the transition It is even possible to specify local conditions for targets and probabilities. each local rule. In this case the global as well as local conditions have to be satisfied for the transition to occur.
The RESULT CTMC ed.
Part
In this part the basic performance measures of the from the steady-state probabilities are requested to be print-
computed
Some parts of a specification happen to be very similar, Loops in MOSEL differing, for example, only in one index. For situations like these, MOSEL offers the loop construct. Consider the following example specification:
It is very cumbersome and error prone to repeat every single specification since they differ from each other only in the index. These three lines of code can be shortened using the loop construct: <2..4> FROMnode-# TO node-1 W mu-# P p#l ; As we can see, the loop indices are specified 12.3.2
within
the angled brackets.
Examples
12.3.2.1 CentraLServer Mode/ In the following the central-server model is specified. By using MOSEL, the queueing network model shown in Fig. 12.15 can be specified in many different ways. Here we present two different versions: a longer one that is quite straightforward (see Fig. 12.16) and a second one that is a much shorter specification and demonstrates the power of loops (Fig. 12.17). A short description of the lines in the specification of Fig. 12.16 follows:
Line 2: Definition Line 5-7: Definition
of the number
of jobs in the system.
of the state space.
Line 9: Definition
of the prohibited states in the system. sum of all jobs at all nodes must be I(.
Line 13-14: Specification other nodes.
of all possible state transitions
At any time, the
from node N1. to all
MOSES
fig. 12.15
Fig. 12.16
MOSEL implementation
Line 15-16: Specification node Nl.
591
The central-server model.
of the central-server model without loops.
of all possible transitions from nodes N2 and N3 to
Line 18-20: Specification of the utilization of a node using the UTIL construct (which computes the utilization of a node using the state probabilities). Line 22-24: Specification of the mean number of jobs at a node using the MEAN construct. Line 26-32: Definition of other interesting performance measures. As we can see, the use of loops considerably reduces the length of the specification. Here we show the usability of 12.3.2.2 Fault Tolerant Multiprocessor System MOSEL for the model of a fault tolerant multiprocessor system. Fig. 12.18 shows a multiprocessor system with a finite buffer capacity. It is assumed that the failure and repair of servers are mutually independent. The following characteristics also apply:
592
PERFORMANCE ANALYSIS TOOLS
Fig. 12.17
MOSEL implementation
Fig. 12.18
of the central-server system with loops.
An example of a multiprocessor
l
The system has a finite job queue of length
l
The arrival
l
The time to repair is exponentially
l
The time llmtbf.
l
If a server that is currently again on a free server.
l
If no server is currently available, repaired or becomes free.
l
Service times are exponentially
system.
L.
rate to this queue is lambda.
between
failures
distributed
is exponentially
working
with the rate l/mttr. distributed
with
the rate
on a job fails, then the job is started
then a job must wait until
distributed
a server is
with mean l/p.
The implementation in Fig. 12.19 shows one way to describe this model of the fault tolerant multiprocessor system. Descriptions of the lines follow:
Line 2-4: Definition
of the states space.
593
MOSES
/
1 2 3 4 5 6 7 8 Q 10 11 12 13 14 15 16 17 18 IQ 20 21 22 23 L24
Fig. 12.19
/* VECTORdascription NODEqueue[Ll NODEactiveC31 NODEworkingC31 NOT queue*active
‘\
part */
> L+3;
/* RULESpart FROME FROMworking FROME
*/ TO queue W lambda; W l/mtbf; TOE TO working W l/mttr;
FROMqueue
TO active
IF working-active
3 0;
<1..3> FROMactive TOE W #*mu IF active==# AND work$ng>*#; <1.,2> FROMactive TOE W #*mu IF active># AND working-#; /* RESULTS part */ RESULTA = MEANactive; RESULTrho = A/3; RESULTIF (working > 0) p-working +* PROB; RESULTDIST queue; RESULTq = MEANqueue; RESULT>) throughput = 3*rho*mu*p,working;
MOSEL implementation
J
of the fault tolerant multiprocessor system
Line 6: Definition
of the prohibited states in the system. The sum of the active jobs and the jobs in the queue shall not exceed L+3.
Line 9: Specification
of the arrival for FROM External).
of jobs in the system.
(FROME: short form
Line 10-11: Specification
and the repair of a processor.
Line 13: Specification
from the queue to a processor.
of the failure short form for TO External). of the transition
Line 15-16: Specification
(TOE:
of the possible transitions from the processor to external. If a processor fails while the job is being processed, the job is started on a free server, if available, or has to wait.
Line 19: Request the mean number of active processors. Line 20: Request the utilization Line 21: Request the probability
of the processors. that at least one processor
is working.
Line 22: Request the pmf of the number of jobs in the queue. Line 23: Request the mean queue length. Line 24: Request the throughput.
594
PERFORMANCE ANALYSIS TOOLS
12.4
SHARPE
SHARPE (symbolic hierarchical automated reliability performance evaluator) tool was originally developed in 1986 by Sahner and Trivedi at Duke University [STP96]. It is implemented in C and runs on virtually all platforms. The advantage of SHARPE is that it is not restricted to one model type as PEPSY or SPNP. It offers seven different model types including product-form queueing networks, stochastic Petri nets, CTMCs, task precedence graphs and fault trees. The user can choose the model type that is most convenient for the problem. The models can also be hierarchical, which means the output of a submodel can be used as the input for another submodel. Several graphical user interfaces for SHARPE are now available but the original version used textual, line-oriented ASCII code. An interactive or batch oriented input is possible. The syntax for both cases is the same. The possibilities that SHARPE offers depend on the chosen model type. If the user wishes to analyze a product-form queueing network, the commands for the performance measures such as throughput, utilization, mean response time at a station, and mean queue length are provided by SHARPE. MVA is used to solve product-form queueing networks. For a continuous-time Markov chain, reward rates and initial state probability vector can be specified. Steadystate, transient, and cumulative transient measures can be computed. The tool offers two steady-state solution methods. At first it starts with the SOR method and after a certain period of time, it prints the number of iteration steps and the tolerance if no solution is found. In this case the user is asked whether he or she wants to go on with the SOR method or switch to Gauss Seidel. Three different transient CTMC solution methods are available. It is not possible in this context to describe the full power of SHARPE. Instead of describing the syntax and semantics in detail, we give examples that use the most important model types to show the reader that SHARPE is very powerful and also easy to use. SHARPE has been installed at more than 220 sites. For more detailed information the reader can consult [STP96]. 12.4.1
Central-Server
Queueing
Network
As the first example we consider a central-server queueing network (Fig. 12.20) consisting of a CPU and two disks with the following input parameters: /Lo = 1000/20, Pll
=
0.1,
p1 = 1000/30, p12 = 0.667,
p2 = 1000/42.9, p13 = 0.233,
P2l = P31 = 1.
At each node the scheduling discipline is FCFS and the service times are exponentially distributed. Figure 12.21 shows a SHARPE input file for this
SHARPE
595
Disk 2 Fig. 12.20
\
f
I
A central-server rnodcl.
1
* central-server
2 3 4 5 6 7 8 9 10 11 12 13 14 16 16
pfqn csm(jobs) cpu diski ~12 cpu disk2 ~13 disk1 cpu 2 disk2 cpu 1 end * servers cpu fcfs 1000/20 diski fcfs 1000/30 disk2 fcfs ~1000/42.918 end * numberof jobs chain1 jobs end Fig. 12.21
mod&
17 18 19 20 21 22 23 24 25 26 27 28 29 30
bind ~12 0.667 p13 0,223 end
Soap i,Z,lG,2
expr expr expr: expr end
tputCcsm,cpu;i) util(csm,cpu;i) qlength(csm,cputi) rtime(csm,cpu; i>
end /
SHARPE Input for central-server network.
model. Lines with comments begin with an asterisk. The description of the SHARPE specification is given as follows: Line 3: Specifies the model type pf qn (= product-form model named csm, and model parameter jobs.
queueing network),
Line 4-8: Specification of the routing probabilities. Line 9-13: Specification of the nodes and the service parameters of the nodes (name, queueing discipline, service rate). Line 15-16: Gives the number of jobs in the network. Line 18-21: The routing probabilities
are bound to specific values.
Line 23-28: Requests the computation and printing of the throughput, the utilization, the mean queue length, and the mean response time at the cpu for the number of jobs: 2, 4, 6, 8, and 10. The output produced by SHARPE in this case is shown in Fig. 12.22. We see that the throughput grows from 29 to 45, the utilization from 0.59 to 0.90, the mean queue length from 0.82 to 4.60, and the mean response time from 0.03 to 0.10, as the number of jobs increases from 2 to 10.
596
PERFORMANCE ANALYSIS TOOLS
f
L
2.9406&01 5.88lle-01 8.23318-01 2.7998e-02
i=8.000000 tput(csm,cpu;i): utilCcsm,cpu;i): qlength(csm,cpu;i): rtime(csm,cpu;i):
4,37E;3e+Ol 8.75066-01 3,6209e+OO 8.2768e-02
i=4.000000 tputCcsm,cpu;i): util(csm,cpu;i): qlength(csm,cpu;i): rtime(csm,cpu;i):
3.7976e+Ol 7.5962e-0 1.7202e+OO 4.6298e-02
i=10.000000 tput(csm,cpu;iI: util(csm,cpu;i): qlength(csm,cpu;iI: rtime(csm,cpu;i):
4*4992e+Ol 8.99836-01 4.59556+00 1.02146-01
i=6.000000 tput(csm,cpu;i): util(csm,cpu;i): qlength(csm,cpu;i): rtime(csm,cpu;i):
4.1733e+Ol 8.3466e-01 2.659ie+OO 6,3717e-02
fig. 12.22
12.4.2
\
i=2.000000 tput(csm,cpu;i): util(csm,cpu;i): qlength(csm,cpu;i): rtime(csm,cpu;i):
M/M/m/K
/
SHARPE output for the central-scrvcr model.
System
As a simple example for a CTMC model, we consider an M/M/5/100 queueing system. The state diagram for this CTMC is given in Fig. 12.23. It is of birth-
Fig. 12.23
Markov chain for the M/M/5/100
system.
death type and irreducible. Each state name gives the number of jobs in the system. The SHARPE input file is shown in Fig. 12.24. In the first two lines the values of the arrival and service rates are bound. Then the model type (markov = CTMC) and the name (mm5) of the model are given. Subsequent lines specify the transitions between the state together with their transition rates. Then five variables are defined. Pidle is the steady-state probability of being in state 0. This is the probability that there are no jobs being served, i.e., that the station is idle. Pfull is the steady-state probability of being in state 100. This is the probability that the queue is full. Lre j ect is the rate at which jobs are rejected. Mqueue is the mean number of jobs in the system. This can be calculated by the built-in function sum. Mresp is the mean response time of accepted jobs computed using Little’s theorem as Mqueue/(X-Lreject), expr expression prints the value of the expression in the output file (Fig. 12.25). Note that a recent extension to SHARPE includes a loop specification in the definition of a CTMC that can be used to make concise specification of a
SHARPE
597
structured CTMC such as the one just described. For an example of the use of this feature see Section 13.3.2.
Fig. 12.25
12.4.3
M/M/l/K
Output file for the M/M/5/100
System
with
Server
Failure
system.
and Repair
Now we extend the M/M/l/K queueing model by allowing for the possibility that a server could fail and could be repaired. Let the job arrival rate be X and the job service rate be p. The processor failure rate is y and the processor repair rate is 7. This system can be modeled using an irreducible CTMC, as shown in Fig. 12.26 for the case where m = 1 (one server) and IY = 10 (the maximum number of jobs in the server and queue is 10). Each state is named with a two-digit number ij where i E (0, 1, . . . ,9, a} is the number of jobs in the system (a is 10 in hexadecimal) and j E (0, l} denotes the number of operational processors. SHARPE input and output files for this example are shown in Figs. 12.27 and 12.28. The probability that the system + prob(mmlk,Ol), and the rate at which is idle is given by prob(mmlk,OO) jobs are rejected because the system is full is given by X (prob(mmlk, d) + prob(mmlk,
al)).
The generalized stochastic Petri net (GSPN) in Fig. 12.29 is equivalent to the CTMC model and will let us vary the value of K without changing the model structure. We use this example to show how to handle GSPNs
598
PERFORMANCE ANALYSIS TOOLS
Fig. 12.26
f
CTMC for an M/M/1/10
markov mmlk 01 11 LAM 1% ot Mu
11 10 CAM 10 ii TAU
system with server failure
and repair.
\
30 40 LAM
21 31 LAM
30 31 TAU
40 50 LAM 60 70 LAM ‘70 60 LAM 80 90 LAM 90 a0 LAM
31 21 Mu 31 41 LAM 41 31 Mu
41 40 GAM
end
40 41 TAU
41 62 LAM
50 51 TAU
bind LAM 1
61 41 Mu
61 60 GAM
MU 2
51 61 LAM 61 51 MU 62 71 LAM
60 61 TAU
GAM O.UUOl
70 71 TAU
71 70 GAM
TAU 0.1
71 61 MU 72 81 LAM
81 80 GAM
var Pidle
probCmmik,OO)+~rob~mlk,Ol)
80 81 TAU
var Pfull
prob~mmik,a0)+grob~mlk,al)
91 90 al a0 00 10
var Lreject exps Pidle
11 21 LAM 21 11 Mu
81 81 91 92 al 01 \, 00
71 91 81 al 91 00 01
Fig. 12.27
MU LAM MU LAM
Mu GAM TAU
21 10 GAM 20 21 TAU 31 30 GAM
61 60 GAM
90 91 a0 al IO 20
GAM TAU GAM TAU LAM LAM
end
LAM*PFULL
expr Lre j ccc end
20 30 LAM
/
SHARPE input for the M/M/1/10
system with server failure
and repair.
using SHARPE. The loop in the upper part of the GSPN is a representation of an M/M/l/K queue. The lower loop models a server that can fail and be repaired. The inhibitor arc from place server-down to transition service shows that customers cannot be served while the server is not functioning. The number within each place is the initial number of tokens in the place. All of the transitions are timed, and their firing rates are shown below the transitions. The input file for this model is shown in Fig. 12.30, and the description of this file follows:
Line l-7:
The input
parameters
are bound to specific values.
Line 9: Model type and name of the model. Line 11-15: Places and initial
numbers
of the tokens.
SHA RPE
fig. 12.28
SHARPE output
Fig. 12.29
for the M/M/1/10
system with server failure
GSPN model for queue with server failure
599
and repair.
and repair.
Line 17-21: Timed transitions. Line 23-27: Arcs from places to transitions and arc multiplicities. Line 29-33: Arcs from transitions to places and arc multiplicities. Line 35-36: Inhibitor
arcs and their multiplicities.
Line 38: Define a variable for the probability Line 40: Define a variable for the probability server.
that the server is idle. that a job is rejected at the
Line 42: Define a variable for the rejection rate. Line 44: Define a variable for the average queue length at server. Line 46: Define a variable for the throughput Line 48: Define a variable for the utilization
of server. of server.
Line 50-54: Values requested to be printed (Fig. 12.31). This GSPN is irreducible. For GSPNs that are non-irreducible, SHARPE can compute the expected number of tokens in a place at a particular time t, the probability that a place is empty at time t, the throughput and utilization of a transition at time t, the time-averaged number of tokens in a place during the interval (O,t), and the time-averaged throughput of a transition during (0,t). See [STP96] for the syntax for these functions. Other model types
PERFORMANCE ANALYSIS TOOLS
600
’
1. 2 3 4 6 6 7 8
9
10 11 12 13 14 16 26 17 18 19 20 21 22 23 24 25 26 27
bind Lambda 1.0 mu 2.0
28 29 30 31 32 33 34 36 36 37 38 39 40 41 42 43 44 46 46 47 48 49 60 51 52 53 64
gamma 0.0001.
tau 0.1 K 10 end gspn mmlk-fail * places jobsource K queue 0 servexup 1. serverdown 0 end * transitions job-arrival ind Lambda service ind mu faiLuxe ind gamma repair ind tau end * arcs, places-transitions jobsouxce job-arrival 1, queue service I serverup failure 1 serverdown repair 1 end
Fig. 12.30
Input
* axes, transitions-places job-arrival queue I service jobsource I faiLwe servexdown 1 repaix sexverup 1, and * inhibitox arcs serverdown service 1 end
I
vax Pidle prempty(mmik-fail,q
cue> u vax Pxeject prempty(mmlk-fail/jobsource) var Lrejeet prempty(mmik-failrjobsource) var avquelength etok(mmlk-fail,queue$ vax thxuput tput(mmlk-fail.,service) var utiY.ization expx expr exps expr end
uitl(mmlk-fail,service)
PicUe Lreject, Preject avquelength thruput, utilization
for GSPN model of t,he system with server failure
avquelength:
1
I and repair.
1,0028e+OO
9.9@04e-01 utilization:
Fig. 12.31
Output
for the GSPN model.
provided by SHARPE are multiple chain product-form queueing networks, semi-Markov chains, reliability block diagrams, fault trees, reliability graphs, and series-parallel graphs. For details see [STP96].
12.5
CHARACTERISTICS
OF SOME TOOLS
In Table 12.1 other important tools are listed together with their main features. There are pure queueing network tools such as QNAP2 or PEPSY, pure Petri net tools such as SPNP or PANDA, and tools that have a more general input and are therefore more flexible such as SHARPE or MOSES. Many of them provide a GUI and/or a textual input language. The GUI is very convenient for getting accustomed to the tools, but the experienced user often prefers the use of a textual input language.
Table 12.1 Tool
Location
Reference QN
RESQ QNAP2 MAOS PEPSY HIT
Performance
Modeltype SPN other
DES
Evaluation
Solution Method pqfn npqfn m-tr
*
(*)
* ILMAOS
1 -
i *
XPEPSY HITGRAPHIC
HISLANG
* * * * * *
iSPN
CSPL
1 -
* * * * * *
-
-
-
*
*
APNN
*
* -
* -
* * * -
* * * * *
GSHARPE
* MOSEL
* *
-
-
*
*
*
-
-
*
*
TU Miinchen Univ. Erlangen Univ. Dortmund
[FeJoSO] [Kirs94] [BMW891
* * *
-
1 -
* * *
* * *
i -
SPNP GreatSPN TIMENET DSPNexpress Ultra SAN PANDA
Duke Univ. Univ. Torino TU Berlin TU Berlin Univ. Illinois Univ. Erlangen
[CFMT94] [ABC+951 [GKZH94] [Lind94] [SOQW95] [AlDa97]
-
* * * * * *
-
* * *
-
-
i *
1 -
QPN
Univ.
[BaKe94]
*
*
-
-
SHARPE MOSES MARCA MACOM DNAmaca
Duke Univ. Univ. Erlangen NC State Univ. Univ. Dortmund Univ. Cape Town
[STP96] [BoHe96] [StewSO] [ScMiiSO] [KnKr96]
* * * * *
* * *
* * * *
(i) -
solver
steady-state
CTMC
solver
transient,
m-ss:
CTMC
Hierarchical
*
*
[VePo85]
m-tr:
Language
-
[SaMa85]
SIMULOG
Notes:
GUI m-ss
*
IBM
Dortmund
Tools
-
-
-
* * * * *
(*)
*
(Menue)
*
(I)
2
*
XMARCA *
(“SE*N”M)
1
C++-constructs
-
2i % 4
5 i a % $
m
13 Applications
This chapter considers several large applications. The set of applications organized into three sections. In Section 13.1, we present case studies queueing network applications. In Section 13.2 we present case studies Markov chains and stochastic Petri nets. In Section 13.3, case studies hierarchical models are presented.
13.1
CASE STUDIES
OF QUEUEING
is of of of
NETWORKS
Five different case studies are presented in this section. These range from multiprocessor system model, several networking applications one operating system model and a flexible production system model. 13.1.1
Multiprocessor
Systems
Models of tightly coupled multiprocessor systems will be discussed first followed by models of loosely coupled systems. 13.1,l.l Tightly Coupled Systems Consider a tightly coupled multiprocessor system with caches at each processor and a common memory, connected to the processors via a common bus (see Fig. 13.1). The system consists of m processors and a common memory with n memory modules. A processor sends a request via the common bus to one of the n memory modules when a cache miss occurs, whereupon the requested data is loaded into the cache via
603
604
APPLICATIONS
fig. 13.1
PTl
Processor n
72= 1,...,5
cn
Cache n
n=
iVIAl,
Memory Module n n = 1,. . . ,4
A multiprocessor
1,...,5
system with caches, common
the common bus. Figure 13.2 shows a product-form of such a multiprocessor system.
memory,
and common
bus.
queueing network model
Replies to Memory Requests
Replies to Memory Requests fig. 13.2
Queueing
network
Memory Modules
model of the multiprocessor
system shown in Fig. 13.1.
The m processors are modeled by an IS node and the bus and the memory modules by single server nodes. The number of requests in the system is m since a processor is either working or waiting until a memory request is finished. Using this model we can calculate the utilization of the bus or the mean response time of a memory request from a processor. The mean time between two cache misses is modeled by the mean thinking time of the IS node. Other parameters that we need are the mean bus service time, the mean memory request time, and, finally, pi, the probability of a request to memory module i. In the absence of additional information, we assume pi = l/n
CASE STUDIES OF QUEUEING NETWORKS
605
(i = 1,2,... , n). Note that for each cache miss, there are two service requests to the bus. It is for this reason that with probability 0.5, a completed bus request returns to the processor station. Similarly, the probability of visiting memory module i subsequent to the completion of a bus request is pi/2. We assume that the service times for the two types of bus requests have the same mean. This assumption can be easily relaxed. If we have explicit values for these parameters, then we can calculate interesting performance measures such as bus utilization and mean response time as functions of the number of processors. Another interesting measure is the increase in the mean response time assuming a fixed number of processors and memory modules, while changing the mean bus service time and the mean time between two cache misses. In Fig. 13.3 the mean response time is shown as a function of the mean bus service time and the mean time between two cache misses [BolcSl].
Mean
R.espont Time
4ce Time
Mean Time Between Cache Misses fig. 13.3 Mean response time as a function time between cache misses (m = 5, n = 4).
5o
of the bus service time,
and the mean
We can see that there is a wide area where the increase in the mean response o compared to the case with no waiting time at the bus time is tolerable (1501) or a memory queue (this is the minimum value of the mean response time). On the other hand, there is also a wide area with an intolerable increase in the mean response time. Thus the analysis of such a queueing network model can help the system designer to choose parameter values in a tolerable area. Problem 13.1 Verify the results shown in Fig. 13.3 by hand computation and using any software package available (e.g., SHARPE or PEPSY).
606
APPLICATIONS
Problem
13.2
Improve the model described in Section 13.1.1.1 by considering different mean service times for bus requests from processor to memory and from memory to cache (Hint: Use a multiclass queueing network model). In a loosely coupled multiprocessor system, the processors have only local memory and local I/O devices, and communicate via an interconnection network by exchanging messages. A simple queueing network model of such a system is shown in Fig. 13.4. The mean 13.1.1.2
Loosely
Coupled
Systems
Network
Processors
Fig. 13.4
A simple queueing
network
model of a loosely coupled
system.
service time at the processors is the mean time between the messages, and the mean service time at the network node N is the mean message delay at the network. The routing probability pi is the probability that a message is directed to processor i. Arriving
/ Departing
Jobs
Replies to I/O Requests
Processors Replies to Memory Requests
Fig. 13.5
A complex
queueing
network
I/O Processors model of a loosely coupled system.
A more complex and detailed model of a loosely coupled multiprocessor system is shown in Fig. 13.5 [MAD94]. Here we assume that we have n separate The computing processors processors. I/O processors and m “computing” send I/O requests via the network N to the I/O processors and get the replies to these requests also via the network N. We assume that a job that begins at a computing processor is also completed at this processor and that the processors are not multiprogrammed and are heavily loaded (for each job that leaves the system after being processed by a processor, a new job arrives immediate-
CASE STUDIES OF QUEUEING NETWORKS
607
ly at the processor). This system can be modeled by a closed product-form queueing network with m job classes (one for each computing processor) with the population of each class equal to 1 [MAD94]. As an example we consider a loosely coupled multiprocessor system with eight computing processors and n = 2,3,4 I/O processors. The mean computing processor service time is 30 msec, the mean I/O time is 50 msec, and the mean message delay at the network is 1 msec. The probability that a job leaves the system after it has been processed at a computing processor is p&ne = 0.05. we assume that pi = l/n for all i. Some interesting results from this model are listed in Table 13.1. Table 13.1 Performance measures for the loosely coupled multiprocessor different, numbers of I/O processors 2
Number of I/O Processors
4.15 set 1.93 set- l 0.145 0.070 0.867
mean response time throughput Pcomputingprocessor Pnetwork PI/Oprocessor
3 3.18 set 2.51 set-l 0.189 0.090 0.754
system for
4 2.719 set 2.944 set-l 0.220 0.106 0.662
Problem 13.3 Verify the results shown in Table 13.1 by hand computation and using any software package available (e.g., PEPSY or SHARPE). Problem
13.4 Extend the complex model described in Section 13.1.1.2 so as to allow distinct mean network service times for request from a computing processor to I/O processor and vice versa. 13.1.2
Client
Server
Systems
A client server system consists of client and server processes and some method of interprocess communication. Usually the client and the server processes are executing on different machines and are connected by a LAN. The client interacts with the user, generates requests for a server, transmits the request to the server, receives the results from the server, and presents the results to the user. The server responds to requests from the clients and controls access to resources such as file systems, databases, wide area networks, or printers. As an example we consider a client server system with a fixed number m of client workstations that are connected by an Ethernet network to a database server. The server consists of a single disk (node number 4) and a single CPU (node number 3). This leads to a closed product-form queueing network model shown in Fig. 13.6. The client workstations are modeled as an IS node (node number 1) [MAD94], and the number of jobs in the closed system is equal to the number of workstations m. The Ethernet network (carrier sense multiple access with
608
APPLICATIONS
Workstations
fig.
13.6 Closed
qucueing
network model of a client server system.
collision detection, or CSMA/CD, network) can be modeled as a server (node number 2) with the load-dependent service rate [LZGS84, MAD94, HLM96]:
(
-
y$ * 3
PIlet (W =
i
+ s * c(l))-l,
~.~+S*C(t+l))-l, (
k = 1, (13.1) k>l,
where C(k) = (1- A(k))/A(k) is tkre average number of collisions per request is the probability of a successful transmission and Ic and A(k) = (1 - l/k)'-' the number of workstations that desire the use of the network. Other parameters are described in Table 13.2 and shown in Fig. 13.6. We Table 13.2
Parameters for the client server example
Parameter
NP B S
GJ
Description Average number of packets generated per request Network bandwidth in bits per second Slot duration (i.e., time for collision detection) Average packet length in bits
compute the throughput X and other interesting performance measures using the load-dependent MVA (see Section 8.2). As an example we use the parameters from Table 13.3 and determine the throughput as a function of the number of client workstations m (see Fig. 13.7) Problem 13.5 Verify the results shown in Fig. 13.7 by hand computation and using an available modeling package such as SHARPE or PEPSY.
13.1.3
Communication
Systems
13.1.3.1 Description of the System As an example of a more complex queueing network and the performance evaluation of communication systems, we consider the LAN of a medium size enterprise [Ehre96].
CASE STUDIES OF QUEUEING NETWORKS
Fig. 13.7
” 20
4b
Throughput
as a function of the number of workstations.
Table 13.3 Numerical server example Np = 7 B = 10 Mb/ set S = 51.2 psec z, = 1518 bits pa1 = 0.5 P12 = 1 p23 = 0.5
m 0
6b m
parameter
8b
160
values for the client-
pl = pc~ = O.l/ set CL2 = pNet(k) p3 = pcpu
= 16.7/set p~q= PDisk = 18.5/set p32 = 0.5 P43 = p34 = 0.5
1
Bridges in the Computer Center Bridges in the Other Buildings WAN Router LAN Analyzer
@ Server •i Computer
Fig. 13.8
The FDDI backbone with 13 stations.
609
610
APPLICATIONS
The LAN connects several buildings that are close together and it is divided into several network sections. In each network section the devices of a building are connected to each other. Three other sections are located in one building and are used to connect the servers to the LAN. As a backbone for the sections, a fiber distributed data interface (FDDI) rin g is used. Eleven bridges for the sections, a WAN router, and a LAN analyzer that measures the utilization of the LAN constitute the stations of this ring. Figure 13.8 shows the FDDI ring with the stations schematically, wherein the structure of two stations is shown in more detail. The typical structure of a section in which all computers within a building are connected is shown in Fig. 13.9. The building has four floors. The computers on the second and third floor are connected to different segments, and the computers on the fourth and fifth floor are connected to the same segment. These three segments are connected to each other and to the FDDI ring via a multiport bridge. The computers are connected to a stack of four hubs that
5th Floor
4th Floor
Cheapernet I
I
I 3rd Floor
Chcapernet Optical Fiber
I
1 ---p-&
2nd Floor (Ground Floor) -x Hubs FDDI
Fig. 13.9
A section of a communication
Ring
system with three segments.
are connected to each other via Cheapernet, and there is a connection of the hubs to the multiport bridge via optical fibers. The CSMA/CD principle is used with a transmission rate of 10 Mb/set which is equal for all segments. Each segment is a different collision domain.
CASE STUDIES OF QUEUEING NETWORKS
611
The nodes of the queueing network model 13.1.3.2 Queueing Network Mode/ of the LAN are the FDDI ring, the Ethernets that connect the devices in the segment, the computers, the servers in the computer center, and the bridges. The FDDI ring can be considered as a server that serves the stations in the ring (12 stations without the LAN analyzer) in a fixed sequence. A station (or section) sends a request to another station via the FDDI ring, and the reply to the request is sent back to the station again via the FDDI ring. Thus we obtain a closed queueing network model for the LAN as shown in Fig. 13.10 with a simplified representation of the individual sections. The FDDI ring is modeled by a single server node. The LAN analyzer does not influence the other sections and for this reason it does not appear in the model. The WAN router can be modeled as a single server node and the WAN itself as an IS node (see Fig. 13.11). There are arrivals at the WAN router from the WAN
FDDI
P4
P5
P3
PG
P2
P7
Ring
*1 4-Ixt-+ +Ict-3 Sec&yye;swith
ps qT-+* pg q--q--@ Pl
WAN
pl”
+55--j-*
p11 +i-j--* p12 +-ii+--~ 8 Sections
with
Computers
Fig. 13.10 Closed queueing network model of the LAN with a simplified representation of the individual sections. and from the ring. The WAN router sends requests to the ring and to the WAN as well. A similar situation occurs at all the bridges/routers. In the sections, the segments are connected to the bridge via Ethernet. The Ethernet segments are modeled by a multiple server with only one queue where the number of servers is the number of segments in the section. Because of the CSMA/CD strategy, the frames are either transferred successfully to the computers of the sections to the bridge or in the case of a collision, sent back to the Ethernet queue with collision probability qi . Each computer sends requests to a server via the LAN and waits for the reply before it sends another request. Therefore the number of requests in the LAN equals the number of
612
APPLICATIONS From
To
From
To
A
The Ring
0.5 0.5
-oarWAN Router
WAN Fig. 13.11
Detailed
model of the WAN and the WAN router.
active computers, and the requests do not have to wait at the computers. For this reason the computers at a section can be modeled by an IS node. A queueing network model of a section of the LAN is shown in Fig. 13.12 and the queueing network model of a computer center section with the servers From
To
To
From
-IIllK+-
Bridge
42
:::
l--q, 2
l-q, 2
Computers Fig. 13.12
Detailed
model of a section of the LAN.
is shown in Fig. 13.13. Servers are modeled by a multiple server node with one queue where the number of servers at the node is equal to the number of real servers. The queue is necessary because there are more computers that can send requests to the servers than the number of available servers. The whole LAN model consists of 1 node for the FDDI ring, 12 nodes for the bridges, 11 nodes for the Ethernet segments, 8 nodes for the computers, 3 nodes for the servers, and 1 node for the WAN. As we will allow non-exponential service time distributions, it is a non-product-form network. As parameters we need the total number of 13.1.3.3 Model Parameters requests, the routing probabilities p,, the mean service times l/pi, and the
CASE STUDIES OF QUEUEING NETWORKS To
From
613
To
From
Server ”
Fig. 13.13
Detailed
model of a section with servers.
coefficient of variation c, of the service times of the nodes. Furthermore, we need the number of service units m; for the multiple server nodes. The total number of requests in the network is the mean number of active computers K = 170 (measured value). We estimate the routing probabilities pi from the FDDI ring to the bridges by measuring the amount of data transferred from the ring to the bridges, which can be measured at the bridges. This estimate was done based on measurements over a two-day period (see Table 13.4).
Table 13.4 Total data transferred and estimated routing probabilities
from FDDI
ring to the bridges for a two-day
period
p,
Section
1
2
3
4
5
6
7
8
9
10
11
12
Data/Mb Pzl%
2655 9.5
1690 6.2
2800 10
1652 5.9
2840 10.1
1500 5.4
3000 10.7
200 0.7
1940 6.9
1180 4.2
4360 14.8
4380 15.6
Notes: 1: WAN; 2,3,4: section with servers; 5, . . .,12: section with computers.
Table 13.5
Collision
qz
probabilities
Section
1
2
3
4
5
6
7
8
9
10
11
12
Qz/%
1
5
1
1
1
1
1
1
1
1
3
2
The other necessary routing probabilities are shown in Figs. 13.11, 13.12, and 13.13 and in Table 13.5. The number of servers m; of the multiple server nodes is given by the number of servers for the server nodes (Table 13.6) or by the number of segments for the Ethernet node (Table 13.7). To obtain the mean service times of the nodes, we use a LAN analyzer which can measure
614
APPLICATIONS
Table 13.6 Number of service units m, in server nodes
Table 13.7
Number
Section
2
3
4
mi
14
23
13
of service units nz; (= number
of segments)
in Ethernet
nodes
Section
2
3
4
5
6
7
8
9
10
11
12
ma
4
7
2
4
5
3
1
4
2
5
5
the interarrival times and the length of the frames transferred by the FDDI ring. The results of the measurement are shown in Tables 13.8 and 13.9. From Tab/e 13.8 Empirical pmf of the interarrival times of the frames. Interarrival
Time (ps)
15 5-20 20-82 82-328 329-1300 1300-5200
% 3.0 0.9 15.3 42.7 27.1 1.0
Tab/e 13.9 Empirical the frame length L
pmf of
Length (Bytes)
%
5 32 32-63 64-95 96-127 128-191 192-511 512-1023 1024-1526
11.1 10.0 33.0 7.9 6.8 4.1 5.8 21.3
these tables we obtain the mean values and the square coefficients of variation (Table 13.10) of the interarrival times and frame lengths. Given the measured throughput X = l/346 psec = 2890 per set, and the routing probabilities (see Table 13.4), the traffic equations are solved to produce the arrival rates of the individual sections (see Table 13.11). N ormally in a closed network, relative throughputs are computed from routing probabilities or visit ratios. However, in this case we have measured throughput at one node and hence the relative throughputs at each node are also the actual throughputs. The mean service time of the FDDI ring is given by the sum of the mean transfer time of a frame and the mean waiting time until the token arrives at a station [MSWSS]. We assume one-limited service, i.e., a station transmits at most one frame when the token arrives. An upper limit of the service time of the ring is the token rotation time. The utilization pi = Xi/p is the probability that a station wishes to transmit a frame, and 1 --pi is the probability that the token is transferred to the next station without transmitting a frame. In this case the transfer of the token can be considered as the transfer of a frame with
615
CASE STUDIES OF QUEUEING NETWORKS
Tab/e 13.10
Characteristic
values of the interarrival Interarrival
Length of the Frames
Times of the Frames
382.5 Byte 1.67
346 p set 1.48
Mean value Squared coefficient of variation
Tab/e 13.11
times and the frame length
Arrival
rates of frames at the stations
Station
1
2
3
4
5
6
7
8
9
10
11
12
A,
275
179
289
171
292
156
309
20
199
121
428
451
length
0. Accordingly,
the mean token rotation time ?& can be calculated as
follows: T,=U+R-‘*Z*&II~.
(13.2)
Here U denotes the free token rotation time (22psec), R denotes the transfer and 2; denotes the mean frame length (see Table 13.10). rate (100 Mb/set), With:
c
pi
=
Ipi x
(13.3)
-=-
P
P’
the approximation:
and Eq. (13.2), it follows that the service rate of the FDDI
‘=
R-XL U.R
ring is given by: (13.4)
*
With X = 2890/ set, z = 382.5 Bytes (see Table 13.10), we obtain service rate: p = 41435/set, and the mean service time at the FDDI
the mean
ring:
T, M 1 = 24 psec. I-L The variance
of the token rotation crgT = Re2 (p . var(L)
time T, is given by [MSW88]: + p. (1 - pep?)
z2)
,
(13.5)
616
APPLICATIONS
where o$ and z2 can be calculated using the values from Table 13.9, the values ofrpi are given in Table 13.4, and p = X/p = 0.070 is given by the values of X and p. Then we obtain the squared coefficient of variation:
The service time at each bridge is deterministic with the forwarding rate of 10,000 frames/set. And, of course, the coefficient of variation is 0 in this case. The service time of the Ethernet is given by the sum of the transfer time Tt and the delay time Td of a frame. To obtain the transfer time Tt we need the mean frame length, which we get from Table 13.9 given that the minimum frame size is 72 bytes in the CSMA/CD case. Thus, 54.1% of the frames have a size between 72 and 95 bytes and it follows: Leth = 395 bytes = 3160 bit,
ciplJL = 1.51.
Given a transfer rate of 10 Mb/set (see Table 13.12), we finally have: F = t
3160 bit = 316 psec 10 Mb/set
c$~ = 1.51.
The mean delay time ??d of the Ethernet can be calculated using Fig. 13.9 and Table 13.12: Td = 0.011 km. 5 psec/km + 0.005 km . 4.3 psec/km + 0.05 km - 4.8 psec/km = 0.3 psec. Table13.12 Characteristics of the optical fiber, the Cheaper-net,and the twisted pair Transfer Rate Optical fiber Cheapernet Twisted pair
Mean Length
10 Mb/set 10 Mb/set 10 Mb/s
11 m 5m 50 m
Signal Time 5 psec/km 4.3 psec/km 4.8 psec/km
In this case, Td can be neglected compared to Ft and we have: -
1
= 316psec.
peth
To obtain the service rates of the IS node (WAN or computers in a section) we use the formula: xi = ,i Pi ’ and have: (13.6)
CASE STUDIES OF QUEUEING NETWORKS
617
The values of the Xi are listed in Table 13.11. In order to get the K; we need the mean number of active computers. This value cannot be measured directly but we have the mean total number of active computers which is 170. To get an approximation, for the Ki, we divide 170 by the number of sections, which is 12 (see Table 13.4 or Fig. 13.17). This yields:
Ki For sections with
E $y
servers (multiple
= 14.17.
server node) we use:
pi= xi ml-La ’ and obtain: pi = xi
(13.7)
mipa ’
and can calculate the service rates using the utilization pZ of the servers, which was measured approximately as 90%, and the number of servers mi from Table 13.6. The values of the service rates are listed in Table 13.13. Table 13.13 Station Pa
Service rates of the computers,
servers, and the WAN
1
2
3
4
5
6
7
8
9
10
11
12
19.4
14.2
14.0
14.6
20.6
11.0
21.8
14.1
14.8
8.5
30.2
31.8
Now the closed non-product-form queueing network model of the considered communication system is completely defined. We solve it using Marie’s method (see Section 10.1.4.2). 13.1.3.4 Results The closed queueing network model from Section 13.1.3.2 together with the model parameters (as derived in Section 13.1.3.3) is solved using the queueing network package PEPSY (see Chapter 12). The PEPSY input file for this example is given in Fig. 13.14. In PEPSY, either routing probabilities pij or visit ratios ei can be used in the input file. Here we use the visit ratios, which are calculated from the originally given routing probabilities using Eq. (7.5). We have a closed queueing network, therefore we have the total number of jobs K = 170 as input. From the input file, PEPSY produces an output file, shown in Fig 13.15, with all performance measures. Since we have a non-product-form network, Marie’s method was used for the analysis. In the output file we see that the computed utilization of the servers is = 0.9, which matches with the estimated value from actual measurement, Pserv and that the utilization of the ring is p7. = 0.07, which also matches with the measured value. The mean queue length of the ring is negligible and queue
618
APPLICATIONS
fig. 13.14
PEPSY input file for the LAN example.
length at the servers is QCCZ= 3.1, GCCs= 1.9, and GCC4= 3.2. The mean response time ?? for the LAN is 0.59 sec. Now we can use this model for some experiments. For example, we can change the number of active computers K. As we can then see in Table 13.14, the server nodes are the bottleneck and the number of active computers should not exceed 250 because the queue lengths become very large. The mean response time ‘T is not influenced much by the number of active computers K in our example. Table 13.15 demonstrates the influence of changing the number of servers at server nodes. If the number of servers is reduced to m2 = 10, ms = 16, and m 4 = 9, then we have a bottleneck at the server nodes. The situation deteriorates if we further reduce the number of servers. The utilization of the ring decreases when the number of servers decreases, whereas the utilization of the server nodes increases.
CASE STUDIES OF QUEUEING NETWORKS
619
Fig. 13.15 PEPSY output file for the queueing network model of the communication system (cc; = computing center%, b, = building,).
These two experiments show that it is easy to study the influence of the variation in the parameters or the structure of the LAN once we have constructed and parameterized the queueing network model. We can use PEPSY, or any other queueing network package such as QNAP2 [VePo85] or RESQ [SaMa85], for analysis. To obtain the mean response time for a computer, its service time has to be subtracted from mean system response time T = 0.585 and we obtain for the computers in building 5, for example, the mean response time 0.54 sec.
Problem 13.6 Verify the results of the communication system model just described using another queueing network package such as RESQ or QNAP2.
620
APPLICATIONS
Table 13.14 Pcrformancc measures for the communication bers of active computers K
system for different num-
K
100
130
150
170
180
200
300
P2
0.55 0.55 0.55
0.71 0.71 0.71
0.81 0.81 0.81
0.90 0.89 0.90
0.93 0.92 0.92
0.97 0.97 0.97
0.99 0.99 0.99
0.05 0.02 0.11
0.6 0.3 0.6
1.4 0.8 1.4
3.1 1.9 3.2
4.8 2.9 4.8
7.9 8.3 9.3
50 30 43
0.56 0.043
0.56 0.055
0.57 0.063
0.59 0.070
0.60 0.072
0.64 0.075
0.94 0.077
P3 P4 g2 93
Q4 T
Pring
m2 = 14, mg = 23, rnq = 13.
Notes:
Table 13.15 Performance measures for the communication system for different number of servers m2, m3, and m4 of section 1, section 2, and section 3, respectively 7 12 7
10 16 9
14 23 13
16 18 16
18 26 17
28 46 26
0.999 0.95 0.92
0.96 0.98 0.98
0.90 0.89 0.90
0.69 0.998
0.64
0.73 0.88 0.71
0.47 0.47 0.47
G2
63 9.6 7.0
10 13 23
3.1 1.9 3.2
0.4 28 0.2
0.5 0.7 0.5
0 0 0
T
1.05 0.039
0.77 0.053
0.59 0.070
0.67 0.061
0.56 0.072
0.56 0.073
m2 m:3 m4
P2 P3 P4
Pring Note:
K = 170.
13.1.4
UNIX Kernel
We now develop a performance model of the UNIX operating system [GrBo96]. A job in the considered UNIX system is in the user context when it executes user code, in the kern context when it executes code of the operating system kernel, or in the driw context when it executes driver code to set up or perform an I/O operation. In Fig. 13.16 the life-cycle of a UNIX job is shown. The job always starts in the kern context from where it switches to the user context with probability p,,,, or to the driv context with probability pi,. After a mean service time s,,,, , the job returns from the user context to the kern context. From the driw context, where the job remains Sdrive time units, the job returns to the kern context with probability Pdrivdone or starts a I/O operation with probability 1 -Pdrivdone. After finishing the I/O, the job returns to the context driu and again remains there (with mean time Sdriv)
CASE STUDIES OF QUEUEING NETWORKS
fig. 13.16
Table 13.16
(14 w>
(a31
0 Pi0
0
621
Model of a UNIX job.
Routing probabilities
0
Pdrivdone Pdone
Puser
of the monoprocessor model
0
0 0
0 0
1 - Pdrivdone
1 0
0 0
0 0
0 0
0 0
0
0
0
0
0
0
0
0
0
0
and returns to the kern context with probability p&-ivdone. A job can only leave the system when it is in the kern context (with probability P&ne). The mean service time in the kern context is S&n. The I/O operation is carried out on an I/O device, whereas the remaining three activities are serviced by the CPU. 13.1.4.1 Model of the UN/X Kernel The UN IX kernel just described can be modeled by a closed non-product-form queueing network with two nodes (CPU, I/O) and three job classes, priorities, and class-switching (see Fig. 13.17). user
+O--. IO
driv
Fig. 13.17
1
The monoprocessor model.
Since we have only one processor, we call this the monoprocessor model. The three job classes correspond to the user, kern, and driv context. To obtain a closed network, we assume that when a job leaves the system a new job immediately arrives at the kern queue.
622
APPLICATIONS
A state in the routing DTMC of this model is described as a pair (node number, class number) with: node number 1: CPU, node number 240, class number 1 : h-iv, class number 2 : kern, class number 3 : user.
The routing probabilities are shown in Table 13.16. In the following we summarize the assumptions we make: l
The network is closed and there are K = 10 jobs in the system.
l
The service times are exponentially
l
The system has three job classes user, kern, driv with the priority order driv > kern > user.
l
Class switching is allowed.
l
Jobs in the context user can always be preempted by driv jobs. Other preemptions are not possible.
l
The time unit used in this section is msec.
l
The following parameter values are used:
l
and kern
= varied from 0.25 to 20.0, Skern= 1.0,
pi0 = 0.05, Pd one -- O.OIL/O.OO5,
Suser
Pdrivdone
Sdriv
= 0.4,
distributed.
= O-5.
The I/O system consists of several devices that are assumed to form a composite node (see Section 8.4) with load-dependent service times (which were measured from the real system) : sio(l) = 28.00, sio(2) = 18.667, Sio(3) = 15.555, Sio(4) = 14.000, sio(5) = 13.067,
sio(6) Sio(7)
sio(8) Sio(9) Sio(l0)
= = = = =
12.444, 12.000, 11.667, 11.407, 11.200.
The parameters Suser, Skern, and S&iv are the mean service times of class i = 1,2,3) at the CPU (node l), and sio( k) is the mean load-dependent service time of class 1 jobs at the I/O devices (node 2). At the CPU (node 1) we have a mixture of preemptive and non-preemptive service strategy, while at the I/O devices (Node 2) we have only one job class with the FCFS service discipline. The following concepts are used to solve this model: (i
CASE STUDIES OF QUEUEING NETWORKS
Fig. 13.18
623
The transformed monoprocessor model.
l
Chain concept (see Section 7.3.6)
l
Extended shadow technique with mixed priority strategy and class switching (see Section 10.3.2.2).
Since we have three job classes, we split the CPU into three CPUs, one for each job class (see Fig. 13.18). Approximate iterative solution of the resulting model can be carried out by PEPSY or SHARPE.
0
CPU
Fig. 13.19
The master slave model.
13.1.4.1.1 The Master Slave Mode/ For this model, shown in Fig. 13.19, we have the same basic assumptions as for the monoprocessor model but now we also have two APUs. An APU is an additional processor (associate processing unit) that works with the main CPU. It is assumed that only jobs in the user context can be processed at the APU. Since this is the only job class that can be processed at the APU, no preemption can take place there. If the APU as well as the CPU are free and there are jobs in the user queue, then the processor (CPU, APU) on which the job is served is decided randomly. If we apply the shadow transformation to the original model, we get the transformed master slave model, shown in Fig. 13.20.
624
APPLICATIONS
I
APU
0(I
user
6
CPUuser CPUkern CPUdriv
& 0
Fig. 13.20
The transformed master slave model.
Fig. 13.21
The associated processor model.
13.1.4.1.2 The Associated Processor Model The assumptions for the associated processor model, shown in Fig. 13.21, are the same as for the master slave model but now jobs in the kern context can also be processed on the APU. Since jobs in kern context have higher priority than jobs in user context, preemption can also take place at the APU. The model needs to be transformed so that it can be solved approximately using the shadow technique in combination with the MVA. The transformed associated processor model is shown in Fig. 13.22. We now solve the three non-product-form network models 13.1.4.2 Analysis using approximate techniques and compare the results to the ones obtained from the (exact) numerical solution of the underlying CTMC. We show the computation of the performance measures for the monoprocessor model step
CASE STUDIES OF QUEUE/NC NETWORKS
0
625
CPUuser CPUkern
CPU&-iv
Fig. 13.22
The transformed associated processor model
by step. The procedure is analogous only the results are given. 13.1.4.2.1 The Monoprocessor Model tains three job classes and two nodes: class 1 : &iv, class 2 : kern, class 3 : user,
for the other two models and therefore
The model,
given in Fig. 13.17, con-
node 1 : CPU, node 2 : I/O.
order of the job classes is driu > kern > user. Jobs in the context by driw and kern jobs. Other preemptions are not possible. The chosen numerical parameter values are: The priority
user can always be preempted P11,12
= Pdrivdone
P12,ll
=
P12,13
= Puscr
P21,ll
=
Sll
Pi0 = 0.05, --
0.94,
Pll,21
=
1 - Pdrivdone
P12,12
= Pdone
P13,12
=
1.0,
1.0,
= S&-iv
5’12=
= 0.4,
Skerp
=
0.5,
E = 0.001,
=
1.0,
N = 5 jobs,
s13
= Sllscr = 1.5,
~21
= sio = load dep.
= 0.01,
= 0.6,
626
APPLICATIONS
Now the analysis nine steps:
of the monoprocessor
Compute
the visit ratios
ei, using Eq. (7.15):
ell = edriv = 12.5, = euscr = 94,
e13
Fig. 13.23
model can be done in the following
e12
=
ekern
= 100,
e21 = e;, = 7.5.
The routing DTMC for the monoprocessor model.
Determine the chains in the network. The DTMC state diagram for the routing matrix in the monoprocessor model is given in Fig. 13.23. As we can see, every state can be reached from every other state, which means that we have only one chain in the system. Transform the network into the shadow network (see Fig. 13.18). The corresponding parameter values for the transformed model are: 4511= 0.5,
s22 = 1.0,
s33
ell = 12.5,
e22 = 100,
e33 = 94,
Compute
sA1 = load dep.
= 1.0,
e41 = 7.5.
the visit ratios ezq per chain and recall that we have only
one chain:
Then we get: eTl
=
ell
ell
e33
ezl = G
Compute
=
e22
1, eG1
= 7.52,
the scale factors
=
z e41
eil = G
8, =
= 0.6.
air, using the equation:
CASE STUDIES OF QUEUEING NETWORKS
and get: a11
ell
=
a12
= 1.0,
ell
+ el2 el2
+ e13
ell
+ el2 el3
+ ~13
=
Q13 = a21
ell +
el2 e21
+
e13
e21 + e22 + e23 e22 a22
= e21 + e22 + e23 e23
a23
= e21 + e22 + e23
1-1
= 0,
Cl!32 =
XZ 0,
a33
=
e31 + e32 + e33 e33
Q41 = e41 + e42 + e43 e42
1.0,
a42
= e41 + e42 + e43 e43
= 0,
a43
=
1 0, = 0,
= 1.0, = 1.0, = 0, =
0.
e41 + e42 + e43
Compute th e mean service times for the chain: srq zz -
1
=
t%q
ST1= 0.5, m] Iteration
=
e31 + e32 + e33 e32
e31 + e32 + e33 e41
= 0,
=
e31
Q31 =
sll = 1.0,
c
Si?” ’ air,
rE7rp
s& = 1.5,
s& = load dep.
Start the shadow iterations: 1: pi1 = 0.037,
p;1 = 0.594,
si;‘” = 0.5,
p& = 0.838,
pi1 = 0.659,
Ss;;‘” = 1.03842,
x;, = 0.075,
x;, = 0.594,
sS$;‘*= 4.06504,
x;, = 0.558,
xl;, = 0.045,
-(‘)* S41
pT1 = 0.016,
p& = 0.27,
p& = 0.994,
pi1 = 0.289,
sg* = 0.5, -(2)* = 1.01626, S21
XT1 = 0.033,
A;, = 0.26,
SE’* = 2.10084,
A;, = 0.245,
xi, = 0.02,
-(‘)* S41
pi1 = 0.030,
,o& = 0.481,
siy* = 0.5,
pG1= 0.935,
pi1 = 0.525,
g*
= 1.03092,
XT1 = 0.033,
x;, = 0.473,
-(3)* S31
= 3.16373,
Xi1 = 0.245,
xi, = 0.02,
-(3)* S41
= load dep.
= load dep.
Iteration 2:
= load dep.
Iteration 3:
627
628
APPLICATIONS
Iteration 15: Finally, after 15 iterations, the solution converges and we have the following results. Sll ,.(15)*
--
pi1 = 0.435,
@*
= 1.0256,
x;, = 0.392, xl;1 = 0.029,
S31 -(15)*
= 2.61780,
-(15)* S41
= load dep.
pi1 = 0.025,
&
/I& = 0.969, X;cl = 0.049, A;, = 0.368,
= 0.402,
0.5,
Retransform the chain values into class values:
x11 = allA;, = 0.049, x22 = cQ&,
= 0.392,
X33 = CX~~X;,
= 0.368,
x41
=
= a4&
0.029.
If we rewrite the results of the shadow network in terms of the original network, we get the following final results:
~~;ppro”) = 0.049, x~;pp’“) = 0.392, J@prox)
The overall throughput
= 0.368,
xFprox)
= 0.029.
of the network is then given by:
x(wp rox) = 1000 ’ pdone * ~$prox’
= 3.92.
Verify the results: To verify the results, we construct and solve the underlying CTMC using MOSES. The exact overall throughput is given by: X(exact) = 3.98. The difference between the exact and the approximate throughput
is small.
The other two models were also analyzed using the extended shadow technique with class switching in a similar manner. The overall system throughput of the UNIX models is plotted as a function of the user service time in Fig. 13.24. There is nearly no difference between the approximate values and the exact values obtained from the numerical solution of the CTMC for the chosen parameter set. As expected, the associated processor model has the highest system throughput. In a more detailed study it was found that for the master slave model the throughput cannot be increased any more by adding additional APUs because the CPU is the bottleneck. For systems that spend a lot of time doing system jobs (UNIX is said to spend approximately 50% of its time doing system work) it is worth having a parallel system kernel. For more I/O-intensive jobs, the number of
CASE STUDIES OF QUElJElNG NETWORKS
Fig. 13.24
Throughput
629
for the different models covered in Section 13.1.4.
peripheral devices must also be increased, otherwise the processors have to wait too long for I/O jobs to be completed. In this case it is not worth having a multiprocessor system. Another very important result from this study is that for suSer> 3 msec, the master slave and associated processor model behave nearly the same and only for s,,,, < 3 msec does the associated processor model gives better results than the master slave model.’ However, it does not seem to matter as to which of the two (an associated processor configuration or a master slave one) configurations is used. On the other hand, if the system is used in a real-time environment where response times are required to be extremely short, the associated processor model delivers better performance than the master slave configuration and it is worthwhile to have a completely parallel kernel. The computation time to solve the model was approximately 45 minutes if the CTMC method is used. Using the shadow technique, the results were generated within seconds for the entire parameter set. By contrast, when this non-product-form network was solved using DES, it took nearly 120 hours of computing time. The impact of priorities is shown for the master slave model in Table 13.17. Here pi, is varied while p&ne is constant. On the right-hand side of the table the throughput for the network without priorities (nonpre), the original values with priorities (orig = obtained from numerical solution of the CTMC), and approximate values (approx = obtained using the shadow
‘This result means that if the system worthwhile to have a parallel kernel.
will
be used
for very
time
consuming
jobs,
it is
630
APPLICATIONS
approximation) are given. As can be seen in the table, for some parameter values the impact of priorities cannot be neglected. Tab/e13.17 Impact of the Suer
Flexible
0.9 0.01 0.001 0.9 0.01 0.001 0.9 0.01 0.001 0.9 0.01 0.001 0.9 0.01 0.001
Production
@done = 0.01) Throughput
Pi0
0.50 0.50 0.50 1.00 1.00 1.00 1.50 1.50 1.50 2.00 2.00 2.00 2.50 2.50 2.50
13.1.5
priorities
Nonpre
Approx
Orig
0.6593 9.0049 9.0897 0.6593 8.2745 8.3398 0.6593 7.6517 7.7027 0.6593 7.1078 7.1489 0.6592 6.6170 6.6485
0.6594 9.8765 9.9875 0.6594 9.8736 9.9859 0.6594 9.8194 9.9395 0.6594 9.5178 9.6326 0.6593 8.8123 8.8802
0.6593 9.8765 9.9875 0.6593 9.8716 9.9832 0.6592 9.7666 9.8983 0.6592 9.4700 9.4000 0.6592 8.6144 8.5100
Systems
Queueing network models combined with numerical solution methods are also a very powerful paradigm for the performance evaluation of production systems. In this section we demonstrate how to model and analyze such systems using open and closed queueing networks. 13.1.5.1 An Open Network Model Consider a simple production system that can be modeled as an open queueing network. The system contains the following stations (Fig. 13.25): l
A load station where the workpieces are mounted on to the pallet (LO).
l
Two identical lathes (LA).
l
Three identical milling machines (M).
l
A transfer system (T) that does the transfer between the stations and consists of two automatically controlled vehicles.
l
A station to unload the pallet (U), which removes the workpieces from the production system.
The identical milling machines and lathes are modeled by multiple server nodes. With these assumptions we obtain the queueing network model shown
631
CASE STUDIES OF QUEUEING NETWORKS
Fig. 13.25
Layout of a simple production
system.
Source
Fig. 13.26
Open network model of the production
system shown in Fig. 13.25.
in Fig. 13.26. The production system produces different products and therefore the machines are used in several different ways. For example, 60% of the workpieces that leave the milling machine are passed to the station that unloads the pallet, the rest are transferred to the lathes. Table 13.18 contains the probabilities qij, i = LO, M, LA, j = M, LA, U that the transfer system T moves workpieces from a station i to station j. To obtain from the transfer probabilities qij the probabilities of routing from the transfer system T to the lathes PT,LA, milling machines PT,M, or the unload station pT,U, we have to weight the individual probabilities qij with the individual arrival rates A;, i = LO, M, LA. The arrival rate XT to the transfer system can be easily obtained from Fig. 13.26:
AT
=
ALO
+
h4
+
ALA,
(13.8)
632
APPLICATIONS
Table 13.18 Routing table for the transfer system of the production system '
\i LO M LA
MLA
U
0.5 0 0.7
0.5
0
0.4
0.6
0
0.3
and it follows that: PT,M
ALO
=
- XT
ALO
PT,LA = Xr
ALA
PT,U = -
XT
ALA * qLO,M $- x T
’ qLA,M,
AM + x'qM,LA, T
'qLO,LA * qLA,U
AM XT
i-
(13.9)
* qMJJ.
Then, using the values from Table 13.18:
PT,M= & (ALO* 0.5 -k ALA * 0.7)) PT,LA = $(A PTJJ
=
&
LO ’ 0.5 + AM - 0.4) , (ALA
*
0.3 + &f * 0.6) ,
we finally obtain the matrix of routing probabilities pij for the queueing network model (see Fig. 13.26 and Table 13.19). In order to apply Jackson’s Table 13.19 Routing matrix for the queueing model of the production system
\i
j
outside
LO LA M M u T
outside
0 0 0 0 0 1 0
theorem for open product-form following assumptions:
LO
LA
1 0 0 0 0 0 0
0 0 0 0 0 0 ?‘LA
M
UT
000 001 001 001 001 000 ?‘M
PU
0
networks (see Section 7.3.4), we make the
CASE STUDIES OF QUEUEING NETWORKS
a The service times at the stations and the interarrival station LO are exponentially distributed. l
The service discipline at each station is FCFS.
l
The system is stable.
633
times at the load
Table 13.20 Arrival rates Xoi, service rates pi (in l/h), and number of servers m, for the model of Fig. 13.26 i
XOi
Pi
LO LA M u T
15 0 0 0 0
20 10 7 20 24
mi
1 2 3 1 2
The service rates pi, the arrival rates Xei, and number of servers rni are listed in Table 13.20. Now we use Eq. (7.1) and the values of the routing probabilities from Table 13.19 to determine the arrival rates Xi: Ai = XOi + C
Xj
i, j = LO, LA, ikl, U, T,
* pjiy
(13.10)
and obtain: XL0 = XOLO = 15, ALA = XT - PT,LA = XLO - 0.5 + AM - 0.4 = 14.58, AM = XT ‘PT,M = ALO * 0.5 •k ALA - 0.7 = 17.71, Au = XT * PTJJ = ALO = 15, XT = ALO + ALA + AM = 47.29. Performance measures, which were calculated using Jackson’s theorem, are listed in Table 13.21. We see that the transfer system T is heavily loaded and has a very long queue, and that its utilization is nearly 100%. If we use an additional vehicle, the utilization and the queue length are reduced substantially to PT = 0.66,
QT
=
0.82,
WT = 0.02.
The work in progress (WIP, the mean number of workpieces in the system) and the mean response time ?II for both cases are listed in Table 13.22. To improve the performance we could increase the number of milling machines because they have a very high utilization, PM = 0.84, or the number of lathes. The results for these changes are shown in Table 13.23. A further increase in
APPLICATIONS
634
Table 13.21 Performance measures for the production system
LO LA A4 u T
2.25 1.66 3.86 2.25 65.02
0.15 0.11 0.22 0.15 1.38
0.75 0.73 0.84 0.75 0.985
Tab/e 13.22 Work in progress (WIP) and mean response time T for 2 and 3 vehicles in the transfer sys km mT
2 3
WIP
T-
82.5 18.3
5.5 1.2
the number of the vehicles in the transfer system has a negligible influence on the performance. Maximum improvement can be achieved by increasing the number of milling machines, because that station is the bottleneck (p; = 0.84, see Table 13.21). Although we have used a very simple model to approximate the behavior of the production system, it does provide insight into the system behavior and enables us to identify bottlenecks. 13.1.5.2 A Closed Network Model Now we consider the example of a production system that can be modeled as a closed multiclass queueing network [SuHi84]. The production system consists of (see Fig. 13.27): a Two load/unload stations (LU). l
Two lathes with identical tools (LA).
Tab/e 13.23 WIP and T for different numbers of vehicles, milling machines, and lathes mT
mLA
mM
3 4 3 3 3
2 2 3 2 3
3 3 3 4 4
WIP
T
18.3 17.6 16.9 15.0 13.6
1.22 1.18 1.12 1.00 0.90
CASE STUDIES OF QUEUEING NETWORKS
l
Three machine
l
A central
Fig. 13.27
centers with different
transfer
tools (M).
system with eight transporters
Layout of the production
635
(T).
system for a closed multiclass network model.
In the production system, workpieces for three different product classes are manufactured. A fixed number Ki of pallets for each class circulates through the system. In the first class the workpieces are mounted on the pallets in the LU station then they are moved by the transfer system either to the lathes LA or to the third Ms. Finally, they are shipped back to the LU station where they are unmounted. For workpieces of the other two classes, the procedure is similar, but they use machine MJ and MS, respectively. The load/unload station LU and the lathes LA can be modeled as a multiple server node and the machines Ma, Md, and n/r, as single server nodes. Since there are more transporters in the transfer system T than circulating pallets, it is modeled by an IS node. The parameters for the model are summarized in Table 13.24.
Table 13.24 Station, LU LA M3 M4
M5 T
The numbers are given by:
Parameter values for the stations mz
l/PLz
2 2 1 1 1 8
7.5 28 12 30 15 2.5
el,
e2%
2 0.5 0.5 0 0 2
2 0 0 1 0 2
of pallets Ki, which are the numbers
K1 =4,
K2 = 1,
2 0 0 0 1 2
of workpieces
in class i,
K3 =2.
With these parameters we obtain the queueing network models for these three classes, given in Fig. 13.28. Furthermore we assume that each network fulfills the product-form assumptions. Now we have all the necessary input param-
636
APPLICATIONS Class 1:
Class 2:
Class 3:
=rl’lq-@ ’ outside
Fig. 13.28
0.5
Closed queueing model of the production
system shown in Fig. 13.27.
eters and can use any solution method for closed product-form queueing networks such as MVA, convolution, or SCAT to obtain performance measures. In Table 13.25, the throughput and the mean system response time for the different classes are shown. The total throughput is 0.122 workpieces/min or 7.32 Table 13.25 Class 1 2 3
Performance measures for workpieces Throughput 0.069 0.016 0.037
Mean Response Time 58.35 63.88 53.77
workpieces/h. For other interesting performance measures of the machines, see Table 13.26. Using the queueing network model, the system can be optimized. In Table 13.26 we can see that the load/unload stations are the bottleneck and, therefore, we increase the number of these stations to three and four, respectively. The results for the modified system models are given in Table 13.27,
CASE STUDIES OF QUEUEING NETWORKS
Tab/e 13.26
Station,
Performance measures for the stations
m,
LU LA M3 M5
T
Mean number
Mean Queue
in Service
Length oi
1.82 0.96 0.41 0.47 0.56 0.61
1.68 0.12 0.19 0.00 0.16 0.00
pi
2 2 1 1 1 8
M4
637
0.91 0.48 0.41 0.47 0.56 0.08
Tab/e 13.27 Utilizations of the stations for different number of servers at the LU stations and routing probabilities of Table 13.28
LU LA M3 M4
M5 T Pmax - Pmin
1
2
3
4
4a
4b
4c
4d
0.997 0.27 0.23 0.26 0.30 0.04 0.77
0.91 0.48 0.41 0.47 0.56 0.08 0.50
0.73 0.58 0.49 0.57 0.68 0.092 0.240
0.58 0.61 0.52 0.59 0.72 0.096 0.200
0.55 0.60 0.56 0.68 0.52 0.09 0.16
0.56 0.60 0.54 0.64 0.62 0.09 0.10
0.56 0.60 0.56 0.59 0.62 0.08 0.06
0.77 0.82 0.77 0.82 0.85 0.129 0.08
columns 2, 3 and 4. In column 1, the reduced to one. In this case the system at LU, the mean response time is greater length at LU is 3.23. For rnLu = 3, the
Tab/e 13.28
I’23 E ii
0.1 0.2 0.2
number of load/unload stations is works, but with a severe bottleneck than 100 units, and the mean queue utilization of the LU station is still
Routing probabilities
I’24
P34
0.9 0.8 0.1 0.2 Kl = 0.8 8, K2 = 2, 0.1 K3 = 4
P35
0.9 0.8 0.9
,OLU = 0.73, and th e system is still unbalanced, which means that the utilization of the different stations differ considerably, as seen in the last column of Table 13.27. For rnLu = 4, the system is more balanced, but now PM5 = 0.72 is too high. To get a more balanced system, a fraction of the workpieces of class 2 are serviced at station A& (see Fig. 13.29). Now the system is better balanced especially in case 4c, but the utilizations are relatively low (pi E 0.6).
638
APPLICATIONS
outside Class 3:
$lgy@&g outside
Fig. 13.29 with different
Queueing routing
.
network model of the production for class 2 and 3.
system shown in Fig. 13.28,
Therefore, we double the number of pallets in the system (K1 = 8, E(2 = 2, and Ka = 4) and obtain the values of the utilization in column 4d. The system is now balanced at a higher level of utilization (pi z 0.8). Increasing the number of LU servers leads to a higher throughput and a lower mean response time (see Table 13.29). If we increase the number of pallets to Kr = 8, K2 = 2 and KS = 4, the mean response time increases along with the throughput (see Table 13.30). Performance
measures with four LU servers
Class
Throughput
Mean Response Time
1
0.086 0.020 0.048
46.29 50.61 41.97
Table 13.29
2 3
Problem 13.7 Verify the results of the closed queueing network model of the production system by hand computation and by using the SHARPE software package.
13.2
CASE STUDIES OF MARKOV
CHAINS
In this section we show the use of CTMC models either constructed directly or via a higher-level specification.
CASE STUDIES OF MARKOV CHAINS
639
Table 13.30 Performance measures with four LU servers and a higher number of pallets Class
Mean Response Time
Throughput
1 2 3
0.117 0.026 0.063
68.21 76.56 63.54
Notes: K1 = 8, K:! = 2, KS = 4.
13.2.1
Wafer
Production
System
In this section we formulate and solve a CTMC model of the photolithographic process of a wafer production system. The system is specified using a queueing network formalism, although for this product-form network we use the package MOSEL to automatically generate and solve the underlying CTMC. In [Rand861 the photolithographic process is divided into the following five steps: 1. Clean the wafer. 2. Put on photoresist. 3. Bake wafer and remove solvent by evaporation. 4. Align mask, and illuminate
and develop the wafer.
5. Etch the wafer and remove photoresist. The photolithographic
processing is carried out by three machine types:
l
The spinning machine does the first three steps (machine type 1).
l
The masking machine does step four (machine type 2).
l
The etching
machine
does step five (machine type 3).
From the first machine type, three machines are available. From the second machine type, we have machines from different producers. These machines are called masking 1 and masking 2. The third machine type does the steps of etching and removing the photoresist. The job of removing the photoresist is done by machine five in case the wafer has not been produced correctly. Since three layers of photoresist have to be put on, the wafer needs to be processed again after a successful masking and spinning. In Fig. 13.30, the open queueing network model of the wafer production system is shown and in Table 13.31 its parameter values are given.
640
APPLICATIONS
Etching,
Remove
Good Wafer Remove Photoresist
Fig. 13.30 13.2.1.
Open network model of the wafer production system discussed in Sec-
tion
In the representation of the queueing network model we use node 6 as a dummy node with zero service time. We use node 6 in order to easily determine the routing probabilities, but it is, of course, also possible to determine the routing probabilities without the introduction of node 6. Since it is an open network, we limit the number of wafers in the system to K so that the underlying CTMC is not infinite. The MOSEL specification of the network is shown in Fig. 13.31; note the use of the loop construct to reduce the size of the specification. WIP is the work in progress, i.e., the mean number of wafers in the system. This value needs to be as small as possible for good system performance. Results of the analysis are shown in Fig. 13.32. Here we plot the mean response time as a function of the number of machines at station 1. There are several things that we can learn from the plot: l
Having more than five machines at station 1 does not reduce the mean response time very much. Therefore it is not cost effective to have more than five machines at station 1.
l
Increasing the number of machines at stations 2 and 3 from two to three reduces the mean response time considerably, while adding more machines is not any more cost effective.
l
We get a large decrease in the mean response time if we increase the number of machines at station 1 from three to four and if we increase the number of machines at stations 2 and 3 from two to three.
CASE STUDIES OF MARKOV CHAINS
641
Tab/e 13.31 Parameter values for the wafer production system discussed in Section 13.2.1
ml m2 mg mq
= 2...4 = 2...4 = 2
m5
=1
p12 = 0.5 p13 = 0.5 p24 = 0.9
=3...10
p25
= 0.1
p34
=
0.9
= 0.1 pqg = 0.1 p35
p4t3 =
0.9
PGO = 213 PGl
It is possible to extend
13.2.2
l/3
this model to allow for machine
Problem reward
=
13.8 Formulate the wafer production net and solve it using SPN P. Polling
failures
problem
and repairs. as a stochastic
Systems
Polling systems are very important in modeling computer and communication systems. The generic polling model is a system of multiple queues visited by a single server in cyclic order (see Fig. 13.33). For an exposition and performance models of polling systems, we refer the reader to [TakaSO]. In this subsection we wish to illustrate the use of SPNs in automatically generating and solving the underlying CTMC for polling system performance models. Our treatment is based on the paper by Ibe and Trivedi [IbT’rSO]. In Fig. 13.34, the GSPN for a three-station finite population single service polling system is shown. The potential customers that can arrive at station j are indicated by tokens in place Pjr, j = 1,2,3. This place contains Mj tokens where iVj denotes the population size of customers that potentially arrive at the station. The firing rate of the transitions tlj is marking-dependent, meaning that when the number of tokens in place Pjr is Ic, then the firing rate of tlj is given by Xjlc, 0 < Ic < LV,?~.A marking-dependent firing rate is represented by the # symbol and placed next to transition tlj. A token in place E’jp represents the condition that the server is polling station j; a token in place PUB represents the condition that a customer has arrived at station j, and Pjs represents the condition that the server has arrived at station j. The server commences polling station j + 1 after finishing service at station j or found no customer there when it arrived. Whenever the timed transition tlj fires, a customer arrives at station j. The server has finished polling station j when the timed transition t,j fires. Such a GSPN can be easily input to any of the SPN tools such as SHARPE or SPNP, whereby the underlying CTMC will be automatically generated and solved for steady-state or transient
642
APPLICATIONS
Fig. 13.31
MOSEL specification of the wafer production
system.
behavior, and resulting performance measures calculated. We refer the reader to [STPSB] and [IbTrSO]. Other kinds of polling schemes have also been considered by [IbTrSO]. In the gated service polling system, only those customers are served by the server that arrive prior to the server’s arrival at station j. The SRN model of station 1 in a finite population gated service polling system is shown in Fig. 13.35. The difference between the GSPN model of the single service polling system is that now a variable multiplicity arc (indicated by the zigzag sign) is needed. Therefore, the new model is a stochastic reward net model that cannot be solved by pure GSPN solvers such as SHARPE or GreatSPN [STP96, ABC+95]. However, SPN P is capable of handling such models. For further details we refer the reader to [IbT!rSO] and [ChTr92].
CASE STUDIES OF MARKOV CHAINS
* Station
2 and
3 each
have
two
p
Station
2 and
3 each
have
three
I)-----o
Station
2 and
3 each
have
four
Number
Fig. 13.32
of machines
at Station
643
machines machiucs machines
1
Results for different number of machines.
The SRN (or GSPN) models can be solved using SPNP. The SPNP input file contains the specification of timed transitions and corresponding firing rates, immediate transitions, the input and output arc for each transition, and the initial number of tokens in each place. From this information SPNP generates the reachability graph, eliminates all vanishing markings, and constructs the underlying CTMC. In order to obtain the steady-state probability of each marking, a combination of SOR and the Gauss-Seidel method is used. It is possible to obtain the mean number of tokens in each place as well as more general reward-based measures. Furthermore, transient and sensitivity analysis can also be carried out.
.. I1o3 Fig. 13.33
A polling queue.
644
APPLICATIONS
I
L
I P2P
Y--
Fig. 13.34
GSPN Model
for the single service polling
system.
CASE STUDIES OF MARKOV CHAINS
I Q
Fig. 13.35
P2P
72
from
645
t,l-r .1
SRN Model of gated service polling system.
Problem 13.9 Specify and solve the GSPN model of the single server polling system using SHARPE. Determine the size of the underlying CTMC as a function of the number of customers &,!I = iU2 = MS = M. Determine the steady-state and the transient performance measures. Problem 13.10 Specify and solve the SRN model of the gated service polling system using SPNP. Determine the steady-state and transient performance measures. Problem 13.11 Extend the polling model of Fig. 13.34 to an Erlang-3 distributed polling time. Run the resulting model on both SHARPE and SPNP. 13.2.3
Client Server Systems
Consider a distributed computing system with one file server and N workstations, interconnected by a LAN. Because the communication medium is shared by several workstations, the server and each workstation should have access to the network before it starts transmitting request/reply messages. It is assumed that a workstation does not start generating a new request until it has received the reply of its previous request. This assumption is based on the fact that in many situations future operations at a client workstation
646
APPLICATIONS
depend on the outcome of the current operation (current service request). As long as a workstation’s request is outstanding, it can continue its local processing. Our model deals only with the aspect of a user’s processing that requires access to the server. Our exposition here is based on [ICT93]. In our system model, we assume that each time the server captures the access right to the network it transmits at most one reply message. The order of service at the server is FCFS. It is important to distinguish between the client server models here and the single buffer polling system presented in Section 13.2.2. The difference is that in a single buffer polling system the station does not wait for reply to its message. Thus in a polling system, as soon as a station transmits its message, it is ready to generate a new one after the reply to the last transmitted message has been received. The interdependencies of the access control of the network and workload between the client workstations and the server also make the analysis of client server systems different from the traditional analysis of plain LANs. We consider client server systems based on a token ring network. For another model with CSMA/CD, see [ICT93]. The first step of the analysis is to find an appropriate SRN model for the system. Since there might be a confusion between the word token in a Petri net or in a token ring network, we refer to token in a ring network as network token and to the token associated with the SRN as the PN token. We assume that the time until a client generates a request is exponentially distributed with mean l/X. The time for the network to move from one station to the next station (polling time) is also assumed to be exponentially distributed with mean l/y. The processing time of a request at a server is exponentially distributed with mean l/p, and the time required for the server to process a request is assumed to be exponentially distributed with mean l/p. The system consists of N clients and one server. We note that the assumption of exponential distribution can be realized by resorting to phase-type expansions. As can be seen in Fig. 13.36, we consider a tagged client and lump the remaining clients into one super-client. This superclient subsystem enables us to closely approximate (N - 1) multiples of the single client subsystem so that we reduce the number of states of the underlying CTMC. The part of the SRN model that deals with the tagged client is described in Fig. 13.37. The place PTI contains one PN token and represents the condition that the client is idle (in the sense the client does not generate a network-based request). The firing rate of the timed transition tt, is X. Thus, the firing of tt, represents the event that the client has generated a request. The place PTA represents the condition that the network token has arrived. Suppose that there is a PN token in PTS. Then if there is a PN token in PTA, the timed transition tt, whose mean time to fire is the mean request transmission time is enabled; otherwise the immediate transition sr fires. The place PSI represents the condition that the client’s request has arrived at the server, and the place PTW represents the condition that the client is waiting to receive a reply from the server. The place P sp represents the condition that the client
CASE STUDIES OF MARKOV CHAINS Server
fig. 13.36
A possible configuration
647
Tagged Client
of network
relative
to the tagged client.
has finished transmitting its request, if it is reached through tts, or the client has no request to transmit, if it is reached via si. The timed transition t,, represents the event that the network token is being passed to its neighbor, the server in this configuration. The place PSS represents the condition that the server has received the network token and can commence transmitting a reply, if one is ready. It should be noted that in this model, a network token is released from a station as soon as it finishes transmitting its packet. This operation of the token ring network is called the single-token operation and is used in IEEE 802.5 token ring networks [Ibe87]. Single-token operation requires a station that has completed its packet transmission to release a network token only after its packet header has returned to the station. In the case that the round-trip delay of the ring is lower than the transmission time, the two schemes become identical. Let us consider now the superclient subsystem. The corresponding SRN for this subsystem is shown in Fig. 13.38 and can be interpreted as follows: Place Par initially contains N - 1 PN tokens and represents the condition that no member of the superclient subsystem has generated a request. The number of tokens can generally be interpreted as the number of idle members of the subsystem, The firing rate of the timed transition t,, is markingdependent so that if there are 1 PN tokens in place POT, the firing rate of t,, is given by 1 . X(0 5 1 2 N - 1). The places POW, POS, PTP, and POA play the same role respectively as PTW, PTS, Psp and PTA of Fig. 13.37. Similarly, the transitions ~2, ttp, and t,, correspond respectively to si, tsp, and tt, of Fig. 13.37. Furthermore, the two additional places Pc2 and Pc1 are introduced and used to keep a count of the number of times the PN
648
APPLICATIONS
L k---l 0 SP
Fig. 13.37
0
0
pss
SRN for tagged client subsystem.
PTS
Fig. 13.38
SRN for the superclient subsystem in token ring network.
CASE STUDIES OF MARKOV CHAINS
649
token has gone through POS. As soon as each member of the superclient subsystem has been polled, a PN token is deposited in both places Pcl and Pc~. Note that P ~2 receives a PN token only after a walk time has been completed while Pcl receives the PN token immediately after a request, if one is transmitted, or prior to the commencement of the polling of the next member, if no transmission takes place. A regular arc of multiplicity N - 1 from place Pc1 is input to transition s4 and an inhibitor arc of multiplicity N - 1 is input from Pc~ to the immediate transition ss. Transition ss is enabled if a PN token is in place Pc2 and fewer than N - 1 PN tokens are in place Pci. This represents the condition that the network token leaves the superclient subsystem if fewer than N - 1 clients have been polled. If the number of PN tokens in place Pc~ reaches N - 1, implying all members of the subsystem have been polled, then with a PN token in Pc~, s4 fires immediately and a PN token is deposited in PTS. In this situation the tagged client now has access to the network after all members of the superclient subsystem had their chance to transmit their requests. ’
0
PTI
.
0 N-l
POI Kg. 13.39 SRN for the server subsystem.
The SRN for the subsystem is shown in Fig. 13.39. The places Pop, PSS, and PSA serve the same purpose as Psp, PTS, and PTA in Fig. 13.37. A token in place PSI represents the condition that the server has received a request, while the place PTw (POW) represents the condition that a client
650
APPLICATIONS
from the superclient subsystem (the tagged client) is waiting for a reply to its request. The firing rates of the timed transitions top, t,,, and t,, are given by y, p, and X respectively. The condition that the server has completed serving one request is represented by the place PSW and a determination is to be made regarding whose request was just serviced. In order to separate the requests from the superclient subsystem the server receives before that of the tagged client from those the server receives after that of the tagged client, the immediate transition s6 is used. To denote that the input arc from POW to .!96has a variable multiplicity, the zigzag sign on that arc is used. Let us assume POWcontains Ic PN tokens, 0 < Ic 5 N - 1, and PTW has no token. Then the immediate transition s6 is enabled and fires immediately by removing all Ic PN tokens from POWand deposits them in POH. The number of clients from the superclient subsystem whose requests were received by the server before that of the tagged client is therefore given by the marking of POH. The tagged client’s request cannot be replied to until requests from the superclient subsystem that arrived before it have been replied to, or stated differently. Due to the FIFO queue at the server, the immediate transition ss cannot fire as long as there is a PN token in POH. As soon as the immediate transition s7 fires, a PN token is deposited in POI, which means that a waiting member of the superclient subsystem becomes idle again. The complete SRN model for the system is shown in Fig. 13.40. Problem 13.12 Specify and run the client server SRN using the SPN P package. Compute the average response time as a function of the number of stations N. In each case, keep track of the number of states and the number of non-zero transitions in the underlying CTMC. Problem 13.13 Extend the previously described model to allow for an Erlang-2 distributed polling time. Solve the new SRN model using SPNP. 13.2.4
ISDN
Channel
13.2.4.1 The Baseline Mode/ In this section we present a CTMC modeling a simplified version of an ISDN channel. Typically, heterogeneous data are transported across an ISDN channel. In particular, discrete and continuous media are served at the same time, although different qualities of service requirements are imposed by different types of media on the transport system. It is well known that voice data is very sensitive to delay and delay variation (jitter) effects. Therefore, guarantees must be given for a maximum endto-end delay bound as far as voice data is concerned. On the other hand, continuous media streams such as voice can tolerate some fraction of data loss without suffering from perceivable quality degradation. Discrete data, in contrast, are very loss sensitive but require only weak constraints as far as delay is concerned. Trade-off analysis is therefore necessary to make effective use of limited available channel bandwidth.
CASE STUDIES OF MARKOV CHAINS
pTP
Fig. 13.40
SRN for the client server system (N > 1).
651
652
APPLICATIONS
In our simplified scenario, a single buffer scheme is assumed that is shared between a voice stream and discrete data units. For the sake of simplicity, the delay budget available for voice packets to be spent at the channel is assumed to be just enough for a voice data unit transmission. Any queueing delay would lead to a violation of the given delay guarantee. Therefore, voice packets arriving at a non-empty system are simply rejected and discarded. Rejecting voice data, on the other hand, is expected to have a positive impact on the loss rate of data packets due to sharing of limited buffer resources. Of course, the question arises to what extent the voice data loss must be tolerated under the given scenario.
Fig. 13.41
CTMC
modeling
an ISDN channel with voice and discrete data.
Table 13.32 Parameter values for the ISDN channel in Fig. 13.41 Parameter
x1 x2
1-11
cl2 k
Value 1.0 2.0 5.0 10.0 10
For the time being, we assume Poisson arrivals for discrete data packets with parameter X2 and for voice data units with parameter X1. The transmission times of data packets are exponentially distributed with mean l/p2 and for voice data with mean l/pi. Buffer capacity is limited to Ic = 10. The resulting CTMC is depicted in Fig. 13.41 and the parameters summarized in Table 13.32. States are represented by pairs (i, j) where i, 0 5 i < k denotes the number of discrete data packet!s in the channel and j, 0 5 j 5 1 indicates the presence of a voice data unit in the channel. Under these assumptions and the model shown in Fig. 13.41, some numerical computations are carried out using the SHARPE software package. First, transient and steady-state results for the rejection probability of data packets
CASE STUDIES OF MARKOV CHAlNS
653
are presented in Fig. 13.42 and for the rejection of voice data in Fig. 13.43. Comparing the results, it can be seen that the rejection probability of voice data very quickly approaches the steady-state value of a little more than 33%. Rejection probability of discrete data is much smaller, on the order of lo-“, and steady-state is also reached in coarser time scale. It was assumed that the system was initially empty.
0
t:::;;~::;;~;;::.:,;;;;I::.:,::,: 0.5 1 Fig. 13.42
1.5
Rejection
3 ! 5 Tirrlc
2.5
probability
for data pack&s.
The channel-idle probability of 66% is approached in a similar time scale as the rejection probability reaches its steady-state for voice data, as can be seen in Fig. 13.44. Often, a GSPN model specification and automated generation of the underlying CTMC is much more convenient and less error prone due to possible state space explosion. The GSPN in Fig. 13.45 is equivalent to the CTMC in Fig. 13.41 [STP96]. The GSPN was also solved using the SHARPE package. The results of Fig. 13.46 illustrate the qualitative and quantitative difference in rejection probabilities of voice vs. discrete data. Note that rejection probabilities increase for voice data as channel capacity is increased. This effect is due to the reduced loss rate of discrete data data packets as a function of increased buffer size. Figure 13.47 shows the impact that the arrivals of discrete data packets have on voice rejection probabilities as compared to rejection of discrete data packets themselves. Both steady-state and transient probabilities are presented. Note that voice data is much more sensitive to an increase in X2. 23.2.4.2 Markov Modulated Poisson Processes Markov processes (MMPP) are introduced to represent correlated
modulated Poisson arrival streams and
654
APPLICATIONS
0.32--
0.3P
0.28-*h I= 3 2 0.26-az
q ____________________
m
a
o Transient
Stationary
0.24--
0.22--
0.2--
: t-
0
I
:
:
:
0.2
Fig. 13.43
;
I
:
:
:
:
0.4
Rejection
I
:
:
:
:
0.6
probability
I
:
:
for voice data units.
0 Transient q _________..______
Fig. 13.44
Channel
idle probability
---.a
Stationary
as a function
:
1 Time
0.8
of time.
CASE STUDIES OF MARKOV CHAINS T Arrival-Voice
655
T Arrival-Data
Data
T Ser-Voice
Fig. 13.45
T Ser-Data
SPN representation of the CTMC in Fig. 13.41.
e Data Packets
Fig. 13.46
Rejection probabilities
as a function of channel capacity k: voice vs. data.
bursty traffic more precisely. An MMPP is a stochastic arrival process with an arrival rate governed by an m-state CTMC [FiMe93]. The arrival rate of an MMPP is assumed to be modulated according to an independent CTMC. The GSPN model of Fig. 13.45 is now modified so as to have two-state MMPP data arrivals as shown in Fig. 13.48 [STP96]. A token moves between the two places MMPPr and MMPP2. When transition Tarrival-datai is enabled and there are fewer than k packets transmitted across the channel, then discrete data is arriving with rate X21. Alternatively, the arrival rate is given by X22. If Xsr differs significantly from X22, bursty arrivals can be modeled this way. Note that the MMPP process is governed by the firing rates of T2r and Tr2, which are given by rates a and b, respectively.
656
APPLICATIONS
+ ____________________ +
e Voice,
*
0
Fig. 13.47
‘-’
2
‘-’
-4
Rejection probabilities
Voice
Stationary
t
= 0.5
O----------------Q
Data Stationary
o--------------------o
Data,
‘-’
t = 0.5
__-..-- ______-da------6 ‘-’ 2
sx3---
__-- *m Q 10
x2
as a function of X2 and time: voice vs. discrete
data.
The results in Fig. 13.49 indicate that for small buffer sizes a Poisson traffic model assumption leads to more pessimistic rejection probability predictions than those based on MMPP assumptions. This effect is quickly reversed for discrete data if channel capacity is increased. For larger capacities there is hardly any difference in rejection probability for voice data, regardless of Poisson- or MMPP-type traffic model. In general, though, correlated or bursty traffic can make a significant difference in performance measures as opposed to merely Poisson-type traffic. But our modeling methods based on GSPN (and SRN), w h’ic h a11ow us to compactly specify and automatically generate
Table 13.33 Parameters to the MMPP-based ISDN model in Fig. 13.48 Parameter
Value
Xl x21 x22
1.0 9.615 0.09615 5.0 10.0 8.0 2.0 10
I*1
P2 a b k
CASE STUDIES OF MARKOV CHAINS
657
T21 n
7’
Arrival-Voice
\
I
I
I
T Ser-Voice
Fig. 13.48
T Ser-Data
A GSPN including an MMPP modulating data packet arrival rates.
G------~--------~...---O
Voice
(M~pp)
t
+ ________----------
----+
Data
(Poisson)
.msm--sT _________--- ---+
Data
(MMPP)
6
Fig. 13.49 MMPP.
Rejection probabilities
+ k
10
as a function of channel capacity k: Poisson vs.
658
APPLICATIONS
and solve large CTMCs and MRMs for steady-state and transient behavior, are capable of handling such non-Poisson traffic scenarios. 13.2.5
ATM
Network
under
Overload
Consider a variant of the ATM network model studied in [WLTV96], consisting of three nodes, N 1, Nz, and N3. Two links, Ni N3 and Nz N3, have been established as shown in Fig. 13.50. Now suppose that the network connection Node 3
Node 1
Node 2
cell arrivals
Fig. 13.50
The [WLTV96] c1ueueing model before rerouting.
between Ni and Ns is down. Before the global routing table is updated, the cells destined to node N3 arriving at node Ni will be rerouted to node Nz (see Fig. 13.51). R ed irected cells from node Nr may be rejected and thus lost if the input queue of node N2 is full. Right after the rerouting starts, the cell loss probability of Ni will overshoot because node N2 is overloaded. In the original paper, the burstiness of the ATM traffic is modeled by a two-state MMPP arrival process. The cell transmission time is modeled by an Erlang-5 distribution. Here we approximate the cell arrival process by a Poisson process and the cell service time by an exponential distribution. We can easily allow MMPP arrivals and Erlangian service time, but the underlying CTMC will become larger. First, we analyze the simple case before the link Node 1
Node 3
X arrivals
I rerouting
Node 2
p-$uM
cell arrivals
Fig. 13.51
I
raster 2
The [WLTV96] queucing model after rerouting.
CASE STUDIES OF MARKOV CHAINS
659
NiN3 breaks down. Here we assume that node Ni and node N2 have the same structure. That is, their buffer sizes and the service rates are the same. Furthermore, we assume that the job arrival rates are the same for the two nodes and that the buffer size at node N3 is large enough to handle all the jobs coming from node Ni and node Nz. Thus node Ni and node Nz can be modeled by a finite-buffer M/M/l/N queue, where N is the buffer size.
Fig. 13.52
The CTMC
for the AI/AI/l/3
queue.
The CTMC and the GSPN models for the node are shown in Fig. 13.52 and Fig. 13.53 respectively. We use SHARPE to find the steady-state probabilities for each state as well as the steady-state cell loss probability before the link NiNa breaks down. Figure 13.54 shows the SHARPE input file of the CTMC q ueue together with their outputs. and the GSPN models of the M/M/1/3 Both models give the same results for the cell loss probability.
AL=-o-i
arr
Fig. 13.53
N
router
The GSPN model for the M/M/l/N
queue.
When the link NiN3 breaks down, the state transition diagram of the subnet can be given as shown in Fig. 13.55. Each state is identified by a pair (i,j) with i,j E (0,. . . ,3}, representing the number of cells in the buffers of node Nr and node Nz. The cell arrival rates are Xi and X2 and the cell transmission rates are ~1 and ~2. States 30, 31, 32, and 33 represent the buffer full states for node Nr, while states 03, 13, 23, and 33 represent the buffer full states for node N2. The loss probability (Ll) of cells destined to node Na going through node Ni is determined by two elements: 1. The probability
that node Ni is full, denoted by pi.
2. The probability that the rerouted cells are dropped because node Nz is full, denoted by d. In short: Lr = pr + (1 - pi)d. The loss probability (Lz) of cells destined to node Na going directly through node N2 is simply the probability that node N2 is full, which is denoted by P2*
660
APPLICATIONS
Fig. 13.54 SHARPE input file for the CTMC and GSPN rnodcls of the M/M/ queue together with the results.
In Figs. 13.56, 13.57, and 13.58 the SHARPE input file for the irreducible Markov reward model from Fig. 13.55 is given. From line 1 to line 6, we assign the values to the input parameters, and from lines 10 through 62, the Markov reward model is defined. Lines 10 through 26 describe the CTMC. By using loops in the specification of Markov chain transitions, we can generate the Markov chain automatically. Without the loop functionality, it would be cumbersome to specify the Markov chain manually, especially when the state space gets large. From lines 29 through 45, the reward rates ri,j are assigned to the states (i, j). By default, any state not assigned a reward rate is assumed to have the reward rate 0. In order to analyze the transient behavior of the system, the initial state probabilities are assigned from the values for the steady-state probabilities before the link NrNa breaks down. In other words:
CASE STUDIES OF MARKOV CHAINS
661
fig. 13.55 The CTMC for Fig. 13.52 after rerouting. where ~i,j is the initial state probability for state (i, j) in the transient model. The probability i~~~,iis the steady-state probability of node Ni in state i before the link breaks down. Similarly, the probability 7rs,j is the steady-state probability of node Nz in state j before the link breaks down. The fact we wish to carry out transient analysis of an irreducible CTMC and hence need to assign initial state probabilities is indicated to SHARPE with the keyword readprobs in line 10. The assignment of the initial probabilities is done from line 46 to line 62. Reward rates for computing the average queue length of node Nr are written in a separate file called reward.quellength. In this file, we assign the reward rates to indicate the number of jobs in the first node. The contents of the file are as follows:
In line 65, the keyword include is used to read the file. From lines 67 through 69, we ask for the steady-state expected queue length of Nr. This result is achieved by asking for the expected steady-state reward rate using the keyword exrss. In line 72, we assign the new reward rates so that the steady-state probability that node Ni is full can be calculated. Thereafter we bind the value of pl to p. Similarly, from lines 83 through 121, we assign different reward rates and ask for the steady-state average queue length of
662
APPLICATIONS
fig. 13.56
fig. 13.57
First
part of input
Second part of input
file for the Markov
file for the Markov
reward
model.
reward model.
CASE STUDIES OF MARKOV CHAINS
f 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 164 156 166 157 168 169 160 161 162 163
663
‘\ * find throughput fox router 2 vax tputx:! exrss (rerouting) * find steady-state drop probability vax d (tPuta2+tputal-tputr2)/tputai
fox rerouted cslJ.s
vax Ll p+(i-PI *d echo steady-stats Closeprobability of cells destined to echo node N3 going through nods Ni expr L1 bind k 3
/
* staxt the transient Loopt ,3, 90 V 3
analysis
bind * pt is transient probability that queue1 is full. sum(i,O,k, tvalus(t;rsToutfng,3,$(i))) pt * p2t is transient probability l;hat qusaue%is full+ sum(i,O,k, tvalue(t;reroutinlT,$(i),3)) P2t * tputa2t is transiant throughput queue2 tputa2t lam2 * ( 1 y sum(i,O,k, tvalua(t;rerouti~,$(i),3))) * tputalt tputait
is transient throughput fox queue1 laml * ( 1 - suan(l,O,k, tvalue(t;seroutiag,3,$Ci)))
* tputr2t tputr2t
is transient throughPut fox router2 mu2 * Cl - sum(i,O,k, tvalue(t;rerouting,$(f),O)))
* dt is transient drop probability for rerouted cells dt CtPutaZt + tputalt - tPutr2tI /tPutalt * Lit is loss probabil.iCy of cells destined to node NS * going through node N1. L&t pt +(S-ptI* dt end end end
fig. 13.58
/ Third
part of input
file for the Markov
reward
model.
node N2, the steady-state probability for node N2 to be full, the throughput of node Nr , node N2, and router 2. Finally we get the cell drop probability for the rerouted cells and the loss probability of cells destined to node Ns going through node Nr. With these steps, we can see the power of the Markov reward model: By just changing the reward rates, we can get different performance measures, such as throughput, average queue length, and rejection probability for the arriving cells. From line 133 to line 163, we perform transient analysis of the model. This result is achieved by using the keyword tvalue. With tvalue, we can get the probability that the system is in state (i, j) at time t. We use the nested sum functions to add up the transient probabilities. From lines 138 through 150, we use the evaluated word notation. An evaluated word is made up of one
664
APPLICATIONS
or more occurrences of $(n). Whenever the evaluated word is used, the string $(n) is expanded into an ASCII string whose digits are the number n. For example, in line 138, the state name 3-$(i) includes the states: 3-0, 3-1, 3-2, and 3-3. In Figs. 13.59 and 13.60 the results from the analysis are shown.
Fig. 13.59
Fig. 13.60
Steady state results.
Results of transient analysis.
From the results, we can see that although both router 1 and router 2 have the same cell arrival rate, the steady-state probability for node Nz to be full is more than four times bigger than that of node Ni. This condition occurs because router 2 has two cell sources, one coming from the original cell source, the other from the rerouted cells coming from router 1. However,
665
CASE STUDIES OF HIERARCHICAL MODELS
the loss probability of cells destined to node Na going directly through N2 is smaller than that going through Nr first. From the transient results, we can answer questions such as: l
How long does it take to reach steady state after the line breakdown?
l
What is the loss probability steady state?
at a particular time before the system is in
These are important questions that need to be considered when we make performance and reliability analysis. Figure 13.61 is derived from the transient analysis.
1'0
2'0
Fig. 13.61 Cell loss probability tion 13.2.5.
3'0
4'0
50
60
I 70
t
for the two nodes in the network discussed in Sec-
Problem 13.14 Modify the model of overload described in Section 13.2.5 to allow for two-state MMPP sources and three-stage Erlangian service. You may find it convenient to use GSPN rather than a direct use of CTMC.
13.3
CASE STUDIES
OF HIERARCHICAL
MODELS
We discuss two examples of hierarchical models in this section. Several other hierarchical models are discussed in Chapter 10. For further exposition on hierarchical models, see [MaTr93].
APPLICATIONS
666
Memory1 -ID-(t . . -Ii-o Memory,
Sink Xfj
fig.
13.62
13.3.1
The outer model for a multiprocessor system with cache protocols. A Multiprocessor
with
Different
Cache Strategies
Section 13.1.1 introduces different models for loosely and tightly coupled multiprocessor systems. Now we model multiprocessor systems with different cache coherence protocols. Our treatment is based on [YBL89]. The outer model is a product-form queueing network, some of whose parameters are obtained from several lower level CTMC models. The outer model shown in Fig. 13.62 (where n = 2 CPUs are assumed). The model consists of open and closed job classes where closed job classes capture the flow of normal requests through the system and open job classes model the additional load imposed by the cache. For the queueing network (outer model), the following notation is used: m
Number of memory modules
n
Number of CPUs
PCPU
Service rate at the CPU
pcache Service rate at the cache pbus
Service rate at the bus
Pmem
Service rate at the main memory modules
and for the CTMC cache models (inner models), we use:
CASE STUDIES OF HIERARCHICAL MODELS
667
c Gc c PC
Number of private cache blocks
Cic
Number of instruction
S
Degree of sharing
fuJ
Probability
that data/instructions
are written
1 - fw
Probability
that data/instructions
are read
h
Probability
of a given request to a private cache being a hit
U
Probability
that a previously accessed block has not been modified
d
Probability that a private block, selected for replacement, is modified and needs to be written back
Pit
Probability request
1 - Pit
Probability that a data block is accessed during a memory request; these blocks can be either shared or private
Overall number of blocks in each cache Number of shared cache blocks (blocks shared by all caches)
that
cache blocks
instructions
are accessed during
a memory
The cache behavior is modeled by the jobs of the open job classes. These additional customers induce additional traffic: l
The traffic at the cache due to requests from other caches (loading of missed blocks).
l
The requests from other processors due to state updates.
l
Additional
Fig. 13.63
bus traffic due to invalidation
CTMC for an instruction
signals.
cache block (inner model 1).
If we access blocks in a cache, we have to differentiate between accessing private blocks, shared blocks, and instructions. Private blocks can be owned only by one cache at a time. No other cache is allowed to have a copy of these blocks at the same time. Since it is possible to read and modify private blocks, they have to be written back to the main memory after being modified. In order to keep shared block consistent, a cache coherence protocol needs to be used because shared blocks can be modified by any other processor that also owns this block. Items in the instruction cache can only be read but never be
668
APPLICATIONS
modified. Therefore no cache coherence protocol is necessary for instruction blocks. Now we develop and solve three inner models: For an instruction cache block, for a private cache block, and for a shared cache block. A block in the instruction cache can either be not in the cache or valid. If a block is selected for replacement, it is overwritten (no write back is necessary) since these blocks can not be modified. This assumption leads to the CTMC, shown in Fig. 13.63. The transition rates are given by: a=
(1 - fw )Pic Cic
with Ci, = pi,C. obtained:
pcpu,
Steady-state probabilities
b = q~cpu,
of this simple CTMC are easily
(13.11) Since private cache blocks can also be modified, they have to be written back notin
fig. 13.64
CTMC for a private cache block (inner model 2).
after being modified. Modified blocks are called dirty. The corresponding state diagram is shown in Fig. 13.64. The transition rates are given by:
1- fzu b = (l - Pic)(l
- S)~/-+U, PC
l-h c= - 0
CPU,
with CPc = (1 - pic(l - S)C. Steady-state probabilities PC To -
C
a+b+c’
r1
PC -
(a
+ b +a&(b + c) ’
of this CTMC are: PC --
x2
&.
(13.12)
For private and instruction cache blocks no coherence protocol is necessary, while for shared blocks a cache coherence protocol must be used since these blocks can become inconsistent in different caches. One of the first cache
CASE STUDIES OF HIERARCHICAL MODELS
669
coherence protocols found in the literature is Goodman’s write-once scheme [Good83]. When using this protocol, the cache blocks can be in one of five notin
Fig. 13.65 CTMC for shared cache blocks (inner model 3).
1. Notin: the block is not in the local cache. 2. Invalid: the block is in the local cache but not consistent with other caches. 3. Valid: the block is in the local cache and consistent with the main memory. 4. Reserved: the block is in the local cache, consistent with the main memory, and not in any other cache. 5. Dirty: the block is in the local cache, inconsistent with the main memory, and not in any other cache. If a read miss to a shared cache block occurs and a block in another cache exists that is in state dirty, this cache supplies the block as well as writing it back to the main memory. Otherwise (no cache has a dirty copy), each cache with a copy of that block sets the block state to valid. In case of a write miss the block is loaded from memory or, if the block is dirty, the block is loaded from the cache that has a dirty copy (this cache then invalidates its copy). When a write hit to an already dirty or reserved cache block occurs, the write proceeds locally, and if the state is valid, the block is written. In case of a write hit to an invalid cache block, the local state of the cache block is changed. Finally, if a block is loaded into the cache, it can happen that there is not enough space in the cache and a block in the cache is to be selected for replacement. Blocks are replaced in the order invalid, dirty, reserved, and valid. The state diagram of the CTMC for shared cache blocks is shown in Fig. 13.65. Transition rates for this CTMC are:
APPLICATIONS
670
The processor under consideration generates a read request to a shared block:
l
notin --+ valid :
SC1 - ftu)(l
- 2-k) cLcpu,
Gc
invalid +P valid : If the cache block is in state dirty, reserved, or valid, the read request can proceed without any state change and bus transaction. Another processor generates a read request to a shared block:
l
reserved --+ valid :
Cn -
dirty -+ valid :
Cn -
l)S(l
-
fw)(l
- Pit)
Gc ljs(l
-
fw)(l
Pcpu9
- Pit) Pcpu *
Gc
If the cache block is in the states valid, invalid, or notin, no state change occurs. The processor under consideration generates a write request to a shared block:
l
valid + reserved :
(n -
l)Sfw(l
- Pit)
GC
reserved + dirty : invalid --+ dirty :
Pcpu7
Cn- l)sful(l - pit) Pcpu7 GC Cn -
‘lsfW(l
- Pic)pcpu,
Gc
notin + dirty :
sfW(l
-Pit)
Pcpu*
Gc
If a block is already dirty, the write request proceeds locally without any state change and bus transaction. l
Another processor generates a read request to a shared block: dirty + invalid :
Cn -
l)Sjtu(l
-Pit)
kPU7
Gc
reserved -+ invalid :
(n -
valid + invalid :
(n -
l)Sf~(l
-Pit)
Gc l)Sf~(l
Gc
-Pit)
Pcpu7 PCPU.
671
CASE STUDIES OF HIERARCHICAL MODELS l
A write or read miss in the local cache can result in a page replacement, which leads to the following state transitions: l-h reserved + notin : ccc
cpu >
l-h dirty -+ notin : L__ 0
cpu 7
l-h valid --+ notin : cc1
cpu >
l-h invalid + notin : 0
cpu *
Let: l-h a= -
0
b=
cpu,
s(l
-
fw)(l
- Pit)
I-1 CPU,
C=
s.fw(l
- Pi,)
~ CPU,
Gc
Gc
with C,, = (1 - pic)S . C. Then the steady-state probabilities are given by:
of the CTMC
L
nr =
a+b+c’ a(7rE + 7rT) + (n - l)a(l - 7rF - 7rF) = 7 (n - 1)a + nb + c
$ TT=
(n-l~~~nb+c’
*y = b(” - w - m a+nb+c
’
Now we are ready to parameterize the outer model. To analyze the outer model given in Fig. 13.62, we need the visit ratios for each station and the arrival rates of the open job classes. We see from the following that these parameters are functions of the state probabilities of the three inner models: Visit
Ratio for a Remote Cache i: A cache miss is either supplied by a remote cache that has a dirty copy or by one of the main memory modules if no dirty copy exists. Because the probability of a dirty and valid copy are 1 - (1 - YTT)+~ and 1 - (1 - $)lL-‘, respectively, the visit ratio of the remote cache i is given by: eci =
(1 - J?ic)S(7rr + ry)(l - (1 - 7rF)7L--I) n-l + JlicTi( 1 - (1 - 7rF)n-1) n-l
672
APPLICATIONS
Visit
Ratio to Main Memory Module i: The main memory is involved when a cache miss to a private block occurs, a cache miss to a shared block occurs and no other cache has a dirty copy, and a cache miss to instructions occurs and no other cache contains these instructions. Hence the visit ratio is: eMi =
(1 -JIic)(l
- S)(l - h) + (1 -pic)S($ m + pi,7rE(l - ~~)?z-l .
+ -iry)(l - 7rr)n-1
m
Visit
Ratio to the Bus: The bus is used to broadcast invalidations and to transmit missed blocks. Hence the visit ratio is: eB =2(1 - pic)S(7$ + 7$) + 2(1 - psC)(l - S)(l - h) + (l - PSC)Sfw~~ + (1 - psc)(l - S)f,u
+ picrf.
Once we know the visit ratios and the structure of the network, we can compute the routing probabilities, using Eq. (7.5). As a result, we get for the routing probabilities: m
(eC+ Pcachei
eB> +
+proci =
c
j=l
eMj
,
eci
Qi=l,,..,
n,
m eB -
c
eMj
j=l
Pcachei-+bus
=
eci
,
Vi = l,...,
12,
,
Vi = l,...,
n,
m eB -
c
eMj
j=l pbuwproc,
=
eB
= 2
Pb us+memj
Vj = l,...,m.
Next we obtain the arrival rates for each open customer class: Derivation of XC: The possible customers of a local cache are the requests from its owner processor (1 - pit) (1 - S) (1 - h), the loading of missed blocks (1 - pi,-)S(rr + R?) + pier:, the requests from other processors for state updating due to a write request to a shared block (n - 1) (1 pic)Sfzu(~~+~~), and requests from other processors for state updating due to a read request for a reserved block (n - 1) (1 - pit) S( 1 - fW)7rF. Therefore the open customer arrival rate, denoted by XC, is given by: ‘C
=PiPCpUi
((1 -pic)(l
- S)(l
- h) + (1 -P;c)S($
+ PicXE + (n - 1)(1 -
pic)Sf~(K~
+
-
(n
-
l)(l
-
Pic)S(l
f~)$?).
+ ry)
+ 7$)
(13.14)
CASE STUDIES OF HIERARCHICAL MODELS
673
Derivation of Xg: A selected block needs to be written back to main memory only if it is private and modified or shared and dirty. The probability for a shared block to be selected is given by C,, (1 - 7t-r)/C and the probability for a private block to be selected is given by C,,( 1 - $“)/C. The selected block is written back if it is private and modified or shared and dirty. The arrival rate at the bus is, therefore, given by: XB
=piPcpu~n((l
-Pic)(l
-
Cpc(1
-
S)(l 7$“),
-
h)
+
Gc(l
C
-(
+
(1 -
-pic)S7$) qy)
C
7rF .
(13.15)
>
Derivation of XM: There are two possibilities: A shared read results in a miss and one of the remote caches has a dirty copy ( l-pic)S( l- fW) (or+ 7T~)(l-(l-,rr~)n-l), or a write hit to a valid shared block occurs and a write through operation is performed (1 - pic)Sfw + (1 -piJ (1 - S)fWu. The arrival rate at the memory modules is, therefore, given by: AA4 =pi&pui
n((l -pi,)S(l+
(l
-Pic)Sftu
fw)(rE +
(1
-Pic)(l
+ 7ry)(l-
(1 - 7rr)n-‘)
S)fwU).
(13.16) Because the arrival rates XC, Xg, and XM are functions of server utilizations that are computed from the solution of the outer model, the outer model needs to be solved using fixed-point iteration. The solution algorithm is sketched in the following: Solve the inner model for shared/private/instruction cache blocks he steady-state probabilities (see Eqs. (13.11), (13.12), and (13.13)). Determine the arrival rates Xc (Eq. (13.14)), XB (Eq. (13.15)), q. (13.16)) of the open customer classes, assuming the initial value i = 1 and using the steady-state probabilities of the inner model. Solve the outer model using the computed arrival rates to obtain s for the utilizations pZ. The computed utilizations
are used to determine the new arrival
The last two steps are repeated until the results of two successive iteration steps differ less than E. For our example we assume the following parameter values (all time units are msec) : C = 2048 s = 0.1 u = 0.1 d = 0.4
PCPU
/f&ache fibus Pmcm
= 0.25 =
1.0
= 1.0 = 0.25
h = 0.98 fw = 0.3 pi, = 0.1
APPLICATIONS
674
Table 13.34 Performance measuresfor the cache model with j&us = 1.00 No. of Processors
PCPU
2 3 4 5 6 7 8 9 10 15 20
0.475 0.454 0.438 0.425 0.413 0.402 0.391 0.377 0.363 0.279 0.219
f'bus
0.356 0.426 0.495 0.562 0.619 0.679 0.734 0.785 0.821 0.902 0.902
System Power 0.950 1.278 1.754 2.127 2.478 2.817 3.131 3.395 3.634 4.180 4.374
In Tables 13.34 and 13.35 the performance measures for different mean bus service times are given as a function of the number of processors. Here, the measure system power is the sum of all processor utilizations. In Table 13.34 we see that by increasing the number of processors from 2 to 20, the system power increases by a factor of 4.6. Because the system bottleneck is the bus, it makes more sense to use a faster bus or to use a dual bus instead of increasing the number of processors. In Table 13.35 the results are shown for a faster bus with pbUs = 1.5. As a result, the system power is increased by a factor of 6.5. Table 13.35 Performance measuresfor the cache model with pbUs= 1.50 No. of Processors
PCPU
Pbus
System Power
2 3 4
0.511 0.486 0.470 0.459 0.451 0.445 0.439 0.432 0.427 0.379 0.331
0.255 0.303 0.354 0.404 0.451 0.501 0.549 0.601 0.643 0.817 0.909
1.023 1.458 1.882 2.297 2.707 3.115 3.515 3.892 4.269 5.688 6.614
6 8 9 10 15 20
13.3.2
Performability
of a Multiprocessor
System
We consider a model that was developed by Trivedi, Sathaye, et al.[TSI+96] to help determine the optimal number of processors needed in a multiprocessor system. A Markov reward model is used for this purpose. In this model, it is assumed that all failure events are mutually independent and that a single repair facility is shared by all the processors. Assume that the failure rate
CASE STUDIES OF HIERARCHICAL MODELS
675
of each processor is o. A processor fault is covered with probability c and is not covered with probability 1 - c. Subsequent to a covered fault, the system comes up in a degraded mode after a brief reconfiguration delay, while after an uncovered fault, a longer reboot action is required. The reconfiguration times are assumed to be exponentially distributed with mean l/y, the reboot times are exponentially distributed with mean l/S, and the repair times are exponentially distributed with mean l/,8. It is also assumed that no other event can take place during a reconfiguration or a reboot phase. The CTMC modeling the failure/repair behavior of this system is shown in Fig. 13.66. In state i, 1 < i 5 n, the system is up with i processors functioning, and n - i processors are waiting for repair. In states (c,-i), i = 0, e. . , n - 2, the system is undergoing a reconfiguration. In states (b+i), i = 0,. . . , n - 2, the system is being rebooted. In state 0, the system is down waiting for all processors to be repaired.
Fig. 13 .66
CTMC for computing the performability
of a multiprocessor
system.
The reward rate in a state with i processors functioning properly correspond to some measure of performance in that configuration. Here the loss probability is used as a performance measure. An M/M/i/b queueing model is used to describe the performance of the multiprocessor system, which is represented by the PFQN shown in Fig. 13.67. This model contains two stations. station mp is the processor station with i processors; it has type ms for multiple servers, each server having service rate ,L The other station is source, which represents the job source with rate X. Because there is a limited number b of buffers available for queueing the jobs, the closed product-form network with a fixed number b of jobs is chosen. The loss probability can be obtained via the throughput tput of station mp. Note that we have used a two-level hierarchical model. The lower level (inner model) is the PFQN (Fig. 13.67), while the outer model is the CTMC (Fig. 13.66). For state i of the CTMC, the PFQN with i processors is evaluated, and the resulting loss probability is used as a reward rate in state i. Then the expected steady-state reward rate is the overall performability measure of
676
APPLICATIONS
Fig. 13.67 PFQN model of the M/M/i/b
queue.
the system, given by EXRSS = 2
rj7ri,
i=l
where 7ri is the steady-state probability of state i, and ri is the loss probability of station mp. We consider just the operational state i because in the down states ci, bi and 0, the loss probability (reward rate) is 1. arr
buffer
Fig. 13.68
request
serving
service
GSPN model of the M/M/i/b
proc
queue.
It is also possible to model the lower level model using the GSPN shown in Fig. 13.68, The initial number nproc of tokens in place proc means that there are nproc processors available. When a new job arrives in place bujFer, a token is taken from place proc. Jobs arrive at the system when transition arr fires. There is a limitation for new jobs entering the system caused by the inhibitor arc from place bufler to transition arr. Thus arr can only fire when the system is not already full. There can be only b jobs in the system altogether, nproc being served (in place serving) and b- nproc in place buffer. The firing rates are X for transition arr and k,x for transition service. Here k is the number of tokens in place serving; the notation for this markingdependent firing rate and in Fig.13.68 is p#. The expected reward rate in the steady state is called the totd loss probability (TLP). It is defined as the fraction of jobs rejected either because the buffer is full or the system is down. A possible SHARPE input file for this hierarchical performability model is shown in Fig. 13.69. Descriptions of the lines of the SHARPE input file are as follows: Line 3-9: Assign the values to the input parameters.
CASE STUDIES OF HIERARCHICAL MODELS
Fig. 13.69 Input file for multiprocessor performability
677
model.
Line 12-37: Define the GSPN model; the model has two parameters, the number of processors nproc and the job arrival rate lambda. Line 39-40: Define a function to use for reward rates. Line 42-48: Specification of a Markov model for nproc = 1 processor. Line 46-47: Assign reward rates to all states. Line 50-63: Specification of a Markov model for nproc = 2 processors. Line 59-62: Assign reward rates to all states. Line 65: Input file with specification of a Markov model for 3 processors. Line 66: Input file with specification of a Markov model for 4 processors. Line 67: Define models for more processors.
678
APPLICATIONS
Line 69: Let the arrival rate L vary from 50 to 200 by increments of 50. Line 70-75: Get the TLP for each number of processors. In Fig. 13.70 the total loss probability and system unavailability (TLP with X = 0) are plotted as functions of the number of processors. The difference between the unavailability and the total loss probability is caused only by limited buffer size b.
0.1
0.01
0.001
0.0001
0.00001
5
4
Number
Fig. 13.70
6
of processors
TLP and unavailability.
B
1'0
Glossary
place Pi in marking
#Pi, 4 VPZ 44
factor for computing
the shadow
argmin
function
the index of the minimum
ABA
asymptotic
bounds
AMVA
asymmetric
MVA
APU
associate processing
ASCAT
asymmetric
ASPA
average sub-system
ASUM
asymmetric
ATM
asynchronous
bi,3
probability that a job at node i leaves the node after the jth phase
B
normalized
number
m
mean service time of the jobs at the ith node
blocking
BN
BCMP
of tokens at a particular
network
that returns
service time value
analysis unit
SCAT population
algorithm
SUM transfer
blocking
mode
factor
factor
queueing network with several job classes, different queueing strategies, and generally distributed service times, named after Baskett, Chandy, Muntz, and Palacios
BFS
Belch, Fleischmann
BJB
balanced
job bounds
and Schreppel analysis
method
680
GLOSSARY
BOTT
bottapprox
method
CAz
coefficient
of variation
EA
Coefficient
of variation
CBz
coefficient
of variation
CBCO
coefficient the closing
of variation method
4.9
capacity function pleted by station normalized function
that
Cl
index
corr
correction
ck
Cox
distribution
m-4
linear
cost
for
of the
batch
of the
service
closed
deviation
time
time time
when
or coefficient
up to the
job
interarrival
applying
i; the number of jobs comare j jobs in the queue
of node i if there
rounds
time
interarrival
of the service
standard
ceiling
next
of variation
integer
classes
factor with
Ic phases
function
CCNC
coalesce
CDF
cumulative
CPU
central
processing
CSMA/CD
carrier
sense
CTMC
continuous
49 driv
function
context
of the
computation
of normalizing
distribution
constants
function unit
multiple time
access
Markov
of the jobs
a job executes operation deterministic
k-s(K)
change of contribution s job is removed from
D+
output
D-
input
DA
diffusion
DAC
distribution
DES
discrete-event
DTMC
discrete
detection
network to set up or perform
code
D
collision
chain
in the
driver
with
an I/O
distribution in the SCAT the network
algorithm
if a class-
arcs arcs approximation analysis simulation
time
stopping
by chain
Markov
chain
condition
e
Vector
e,
visit ratio; node, also
the mean number of visits of a job known as the relative arrival rate
visit ratio method
of the
of the
Visit
ratios
closing
e = (ea, er, . . . , ejV> to the
applying the
ith
681
GLOSSARY
number ratio
of visits
at node
ecz(k) ERG
effective
capacity
function
El,
Erlang
ET.
set of pairs (j, s) that can be reached a finite number of steps
f(CA f
extended
7 CB,
t> dFx
f&d
=
(x) dr
that
density
function
probability
density
function
needed
function
F, F,(k)
functions corresponding of the ith node
Fw (K)
contribution population
5 z)
that
for
miss
= P(X
overlap
probability
floor
Fx(x)
k phases
with
of all I/OS
functions
fk?-
ratio
the
rounds
from
with
a primary
fixed-point
to the
task
iteration
next
lower
to the state
of class-r
cumulative
distribution
FCFS
first-come-first-served
FDDI
fiber
FES
flow
FTCS
symposium
g@>
core
rk4 G
gamma
distribution
general constant
distribution,
batch
integer
jobs
at the
7rTT2 (k,)
probabilities ith
node
for
a given
function
policy
distributed
data
equivalent
server on fault
function
interface method tolerant
of the
computing
summation
also
method
used
for
the
G(K)
normalization work
constant
of a single
G(K)
normalization work
constant
of a multiclass
GKLB
correction
Gn(k)
auxilary auxiliary
(i, r) in
K
full
(k)
pair
F, = 1 - H,
function
FB
Gn
called
approximation
fraction (s;
Q, also
graph
distribution
Kulbatzki
d
01
fx
reachability
i in chain
factor function function
class
normalization
queueing
net-
queueing
net-
Kramer/Langenbach-Belz for for
G;‘(k)
normalization constant from the network
Gkj (k)
normalization node i removed
computing computing
G(K) G(K)
of the network
constant of the multiclass from the network
with
node
removed
network
with
GLOSSARY
GE
Gaussian
elimination
GI
general
distribution
Great SPN
analysis
package
GREEDY
special
case
algorithm with for
of the
minimum
Taksar,
and
GTH
Grassmann,
GUI
graphical
h
function
used
hz
hit
of the
h(P7 4
factor for computing metatos approximation
H/c HW
hyperexponential
user
ratio
entropy
non-preemptive
I/O
input/output
IMMD
IV
i+]
=[p;,
pL;t] interval1
of the
integrated
services
iSPN
integrated Petri nets
environment
k,
number
of the
executes
K-
number
of jobs
K*
overall
number
number population
strategy
rates
machines
and
data
modeling
ith
node
rth
class
using
stochastic
at the
ith
node
code of jobs in a closed of jobs in the system of the of jobs
of jobs in vector
the
rth
class
in the
network,
in the fictious
various
classes,
known
mean
number
in the
as the
an arriving
k
system
mean number synchronization
of parallel-sequential
mean
of jobs
at node
used
network
number of class-s jobs at node i that job sees given the network population of jobs
also
network
mean class-r
number
process-
network for
in the
kern
number number
KS
Cos-
network
constant for the
(K1,...,Kn)
service
digital
at the
of jobs
K
K=
of the
k phases
queueing
mathematical
of jobs
number job
case
interval1
ISDN
context
with
head-of-line
server
kern
in the
distribution
= [i-,
state
analysis
WGI,~,~
infinite
. . .,kN)
algorithm
cache
IS
k2,
nets
policy
Heyman
in bottleneck
for
(kl,
batch
times
Petri
interface
institute ing IV
k 27‘
interarrival
stochastic
function
HOL
I Pn
independent
analyzing
jobs
i
with
fork-join
683
GLOSSARY
zr -* K~R*
mean
number
overall network
-* K
of class-r
number
of class-R*
mean number network
x
throughput
$0)
initial ysis)
value
arrival
rate
of jobs
of jobs
from
of the
primary
overall
arrival rate
of batches
rate
arrival
rate
of class-r
open
(used
in bottleneck
anal-
to the
ith
at node
at the
ith
from
in case
node
i to node
throughput
bound,
overall
bound,
of par-
node
of the
pessimistic
node
0
class tasks
throughput, the rate at which jobs serviced and leave the ith node of the
fictious
in the pseudo
task
jobs
i in the
node
outside
of jobs
i
at node
of the open asynchronous
arrival
optimistic
fictious
network
at node
largest
possible
throughput
lowest
possible
throughput
rth
j class
estimation
Lagrange Lagrange
function multipliers
load
Lx
Laplace transform form of the CDF
factor
with objective yl and y2
of open
classes
LAN
local
area
local
balance
property
LBANC
local
balance
algorithm
LBPS
last
LCFS
last-come-last-served
i, class
-R*
X and
two
pdf:
to station
i
Laplace-Stieltjes
trans-
network
batch
Little’s
function
regarding
of the
LB
processor
theorem,
for
normalizing
constants
sharing
also
known
LST
Laplace-Stieltjes
LT
Laplace
Pa
service
rate
of the
PLO0
service
rate
of the closing
as Little’s
law
transform
transform jobs
at the node
ith
node
when
applying
the
ing method Par
are
throughput
likelihood
L
theorem
jobs
for the throughput
throughput
Little’s
at node
at the ith
maximum throughput allel processing with
(4
jobs
service
rate
of the
ith
node
for jobs
of the
rth
class
clos-
684
GLOSSARY
vector of service rates rn() E M denotes the initial number
of parallel
function numbers
marking
of a PN
servers at the ith node
that delivers
the maximum
value of a set of
M
exponential
distribution
Kr
mean number of customers of class-i who arrive during the waiting time of the tagged customer and receive service before him
M+M
Markov
M(t)
time-averaged
MB
minimum
batch policy
MEM
maximum
entropy
MMPP
Markov
MOSEL
modeling,
specification
and evaluation
language
MOSES
modeling,
specification
and evaluation
system
MRM
Markov
MTTA
mean time to absorption
MTTF
mean time to failure
MTTR
mean time to repair
MVA
mean value analysis
implies
Markov behavior
of the CTMC
method
modulated
reward
property
Poisson process
model
MVALDMX
mean value analysis load dependent
44
vector of the state probabilities
44
conditional
Y
vector of the state probabilities chain (DTMC) limiting
throughput
of node i with k jobs of a discrete time Markov
state probabilities
nCDm
factor of the Cosmetatos
N
product-form number
mixed
at time n
approximation
network
of class-r jobs at node i in the shadow node
mean number of customers of class-i found in the queue by the tagged (priority T) customer who receive service before him
NPFQN
non-product-form
OP
index for open job classes
8
estimate
of 8
ODE
ordinary
differential
queueing
equation
network
685
GLOSSARY
One-limited
service
station arrives
transmits
probability
#)
7rp
probability
r!J
individual
n;(k)
marginal ki = k
a r’j
that all servers are occupied state for all g C G
probability jobs
unconditional
(4
sN>
@l,
that the ith node contains exactly
of node i of a non-lossy
system
state probabilities
probability
rk
the token
of loss
state probability
,W)
at most one frame when
of the number
state probability
of jobs in the system
of a network
with multiple
w
state probabilitiy
7i
derivative
Po,js
probability in an open network that a job from outside enters the jth node as a job of the sth class
PoLi
probability that a job entering first enters the jth node
Pzo
probability that a job leaves the network pleting service at node i
P,,
probability that a job is transferred to the jth node after service completion at the ith node (routing probability)
Pij (t)
simplified CTMCs
Pv (% 4
transition probability of the CTMC i to state j during [u, v)
p’“‘(k a3
probability that the Markov chain transits from state i at time k to state j at time 1 in exactly n = 1 - k steps
7 Z)
vector at any instant
job classes
of the transition
transition
of time u
probabilities
the network
probabilities
from outside just after com-
for time-homogeneous to travel from state
probability in an open network that a job of the rth class leaves the network after having been serviced at the ith node P arkIs
probability that a job of the rth class at node i is transferred to the sth class at node j (routing probability) time-average
accumulated
reward
distribution of the accumulated reward time [0, t), also called performability
over a finite
correction factor for computing the shadow service time in the case of a network with class switching and mixed priorities
Pdf
probability
density function
GLOSSARY
pmf p-i?@(x)
probability
P
places of a Petri net
p7Tl
waiting
I?
derivative
of the matrix
P
stochastic
transition
P(u, 4
matrix
of the transition
p(n)
matrix
of the n-step transition
PANDA
Petri net analysis and design assistant
PASTA
Poisson arrival see time averages theorem
PEPSY
performance
evaluation
PFQN
product-form
queueing
PN
Petri net
w4 PRIOMVA
priority
mass function of a class-r job with respect to a quota node
CDF of the standard
normal
probability
scaling function extended works
of the transition probabilities probabilities
and prediction
system
network
of the extended
SCAT
MVA for the analysis of priority
extension works
PRS
preemptive
PS
processor
qr (t>
priority
Q Q
queue length
6
mean queue length
&
mean queue length
a
mean number of tasks waiting tion in queue Q2.
QNAP
probabilities
matrix
PRIOSUM
aQbatch
distribution
of the summation
method
resume queueing
strategy
algorithm queueing to priority
net-
sharing of class-r at time t
infinitesimal
generator
matrix
for batches for the join synchroniza-
mean queue length
of class-r jobs at the ith node
mean queue length system
for individual
queueing
net-
network
customers
in a batch
analysis package
utilization pbott
utilization
of the bottleneck
Pi
utilization
of node i
P,r (WI
utilization
of node i by class-r jobs
pk
specific utilization of an asymmetric
node
needed for calculating non-lossy system
the utilization
687
GLOSSARY
Pit
utilization
ra
reward
of the kth server of a lossy system rate assigned to state i
starting
priority
reduced
reachability
reachability number
of class-r graph
set
of priority
classes
mean remaining service time of a busy server, also called mean residual life RECAL
recursion
RESQ
research queueing
RPS
rotational
RR
round robin
a$ = var(X)
variance of X or second central virtual
by chain algorithm network
position
package
sensing system moment
service time of class-r jobs at quota node i
mean contention
time
Sa,r
service time of class-r jobs at node i
ga,r
service time of class-r jobs at shadow node i
*
service rate in chain Q at node i
% sa4.d
load-dependent time
S12
mean latency time
SSZ
mean seek time
Sta
mean transfer
Sa s
= =
(Ical, (Sl,...,SN)
* * * 7 ICaR)
mean service time or normal
time
state of the ith node overall state of the network synchronization
SB
service
station
with multiple
classes
overhead
balance property
SCAT
self correcting
SHARPE
symbolic hierarchical evaluator
SIR0
service in random
SMP
semi-Markov
SOR
successive over relaxation
SPN
stochastic
Petri net
SPNP
stochastic
Petri net package
SQL
structured
query language
SRN
stochastic
reward
approximation automated order
process
net
technique reliability
performance
688
GLOSSARY
SUM
summation
7
tangible
markings
T
response
time
T
mean
%
mean response time join synchronization
method
response
time of parallel-sequential
mean
response
time
at node
mean
response
time
of class-r
mean
join
optimistic
bound,
lowest
possible
mean
bound,
greatest
mean
response
rotation
tool
TLP
total
TLP(t)
transient
TPM
transition
time
possible
the join
mean
of a purely
time
of
task
belonging
to
time command
language/tool
kit
loss probability expected
total
probability
loss probability matrix
2 blocking
blocking before service, service blocking, tion blocking, or immediate blocking
3 blocking
repetitive job
Upper
service,
executes
number
time
limit
a purely
of a frame
Type
u
time
of a frame
blocking after service, ing, or non-immediate
context
time
response
job
1 blocking
Type
synchro-
response
sequential
Type
user
i
time
time
mean response sequential job
Tcl/Tk
at node
for
target
transfer
jobs
i waiting
mean
Tt
fork-
i
by task QZ
pessimistic
delay
with
time
mean time spent nization in queue
Td
jobs
tirne time
transfer blocking
blocking,
or rejection
user
production
block-
communica-
blocking
code
of chains limit within system
V
vanishing
w
relaxation
WO
optimal
Wo”
approximation
WT,k
factor
W
waiting
jobs
need
to be served
in a real
markings parameter relaxation
for
parameter of the optimal
computing
time
the
shadow
relaxation service
parameter time
GLOSSARY
W(i, d
weighting
W
mean waiting
ml
mean remaining
function
of the extended
689
SCAT algorithm
time service time of a job in service
m
mean waiting
time at node i
war
mean waiting
time of class-r jobs at node i
WV
mean waiting class-r
time of an arriving
WAN
wide area network
WEIRDP
queueing strategy where the first job in the queue is assigned to the pth part of the processor and the rest of the jobs are assigned to the 1 -pth part of the processor
WinPEPSY
Windows SY
WIP
mean number progress)
ET
set of jobs that cannot be preempted
X ave
average relative the network
2%
relative
Xmax
maximum
Xsum
sum of the relative
Xt
: t E T
version of the queueing of workpieces
utilization
customer
network
package PEP-
in the system
(work
in
by class-r jobs
of the individual
nodes in
utilization relative
family of random
utilization utilizations variables
I?
=E[X],
X”
= E[X”],
nth moment
XPEPSY
X-version
of the queueing
w>
accumulated
& (4
number of possibilities to distribute nodes (including the join queue QZ)
mean value or expected
instantaneous number
of priority
reward
reward
network
value
package PEPSY
in the finite time horizon
tasks to n, + 1
rate at time t
of states in the fork-join
[0, t)
sub-network
Bibliography
AbBo97.
M. Abu El-Qomsan and G. Belch. Analyse von WarteschlangenKnoten und mehreren Aufnet zen mit asymmetrischen Technical report TR-14-97-07, Universitat tragsklassen. Erlangen-Niirnberg, IMMD IV, 1997.
ABC84.
M. Ajmone Marsan, G. Balbo, and G. Conte. A class of generalized stochastic Petri nets for the performance evaluation of multiprocessor systems. ACM Transactions on Computer Systems, 2:93-122, May 1984.
ABC+95
M. Ajmone Marsan, G. Balbo, G. Conte, S. Donatelli, and G. Franceschinis. Modelling with Generalized Stochastic Petri Nets. Wiley, New York, 1995.
Abde82.
A. Abdel-Moneim. Weak Lumpability in Finite Markov Chains. Journal of Applied Probability, 19:685-691, 1982.
AbMa93.
H. Abdalah and R. Marie. The uniformized power method for transient solutions of Markov processes. Computers and Operations Research, 20(5):515-526, June 1993.
ABP85.
I. Akyildiz, Die erweiterte G. Belch, and M. Paterok. Parametrische Analyse fur geschlossene Warteschlangennetze. Number 110 in Informatik-Fachberichte, pages 170-185, Berlin, 1985. Springer.
692
BIBLIOGRAPHY
AgBu83.
S. Agrawal and J. Buzen. The Aggregate Server Method for Analyzing Serialization Delays in Computer Systems. ACM Trunsactions on Computer Systems, 1(2):116-143, May 1983.
AgTr82.
Modeling Reentrant and NonreenJ. Agre and S. Tripathi. t)rant, Software. A CM Sigmetrics Performance Evaluation Review, 11(4):163-178, 1982.
AkBo83.
I. Akyildiz and G. Belch. Erweiterung der Mittelwertanalyse zur Berechnung der Zustandswahrscheinlichkeiten fiir geschlossene und gemischte Netze. Number 61 in Informatik-Fachberichte, pages 267-276, Berlin, 1983. Springer.
AkBo88a.
I. Akyildiz and G. Belch. Mean Value Analysis Approximation for Multiple Server Queueing Networks. Performance Evaluation, 8(2):77-91, April 1988.
AkBo88b
I. Akyildiz and G. Belch. Throughput and Response Time Optimization in Queueing Network Models of Computer Systems. In T. Hasegawa, H. Takagi, and Y. Takahashi, editors, Proc. Int. Seminar on Performance of Distributed and Parallel Systems, pages 251-269, Kyoto, Japan, Amsterdam, December 1988. Elsevier .
AKH97.
S. Allmaier, M. Kowarschik, and G. Horton. State Space Construction and Steady-State Solution of GSPNs on a Shared Memory Multiprocessor. In Proc. 7th Int. Workshop on Petri Nets and Performance Models (PNPM ‘97), pages 112-121, St. Malo, France, Los Alamitos, CA, June 1997. IEEE Computer Society Press.
Akvo89.
I. Akyildiz and H. von Brand. Exact Solutions for Open, Closed and Mixed Queueing Networks with Rejection Blocking. Theoretical Computer Science, 64(2):203-219, 1989.
Akyi87.
I. Akyildiz. Exact Product Form Solution for Queueing Networks with Blocking. IEEE Transactions on Computers, 36(1):122-125, January 1987.
Akyi88a.
I. Akyildiz. Mean Value Analysis for Blocking Queueing Networks. IEEE Transactions on Software Engineering, 14(4):418-428, April 1988.
Akyi88b.
I. Akyildiz. On the Exact and Approximate Throughput Analysis of Closed Queueing Networks with Blocking. IEEE Transactions on Software Engineering, 14( 1):62-70, January 1988.
AlDa97.
S. Allmaier and S. Dalibor. PANDA - Petri Net Analysis and Design Assistant. In Proc. 7th Int. Workshop on Petri Nets and
693
BIBLIOGRAPHY
Performance Models (PNPM ‘97), pages 58-60, St. Malo, France, Rennes, France, June 1997. INRIA.
Alle90.
0. Allen. Probability, Statistics and Queueing Theory with Computer Science Applications. Academic Press, New York, 2nd edition, 1990.
AlPe86.
T. Altiok and H. Perros. Approximate Analysis of Open Networks of Queues with Blocking: Tandem Configurations. AIIE Transactions, 12(3):450-461, March 1986.
Alti82.
T. Altiok. Approximate Analysis of Exponential Tandem Queues with Blocking. European Journal of Operational Research, 11:390398, October 1982.
AMM65.
B. Avi-Itzhak, ing priorities. 1965.
Baer85.
M. Baer. Verlustsysteme mit unterschiedlichen enungszeiten der Kanale. Wissenschaftliche Hochschule
W. Maxwell, and L. Miller. Queueing with alternatOperations Research, 13(2):306-318, March-April
fiir
Verkehrswesen-Friedrich
List,
mittleren
Bedi-
Zeitschrift
der
1985. Sonderheft
15. BaIa83.
S. Balsam0 and G. Iazeolla. Some Equivalence Properties for Queueing Networks with and without Blocking. In A. Agrawala and S. Tripathi, editors, Proc. Performance ‘83, pages 351-360, Amsterdam, 1983. North-Holland.
BaKe94.
F. Bause and P. Kemper. QPN-Tool titative
analysis
of Queueing
Petri
for the qualitative and quanNets. Number 794 in LNCS.
Springer, Berlin, 1994. Bard79.
Y. Bard. Some Extensions to Multiclass Queueing Network Analysis. In Proc. 4th Int. Symp. on Modelling and Performance Evaluation of Computer Systems, volume 1, pages 51-62, Vienna, New York, February 1979. North-Holland.
Bard80.
Y. Bard. A Model of Shared DASD and Multipathing. nications of the ACM, 23(10):564-572, October 1980.
Bard82.
Y. Bard. Modeling I/O Systems with Dynamic Path Selection and General Transmission Networks. A CM Sigme tries Performance Evaluation Review, 11(4):118-129, 1982.
BaTh94.
J. Barner and K.-C. Thielking. Entwicklung, Implementierung und Validierung von analytischen Verfahren zur Analyse von Prioritatsnetzen. Studienarbeit, Universitat Erlangen-Ntirnberg, IMMD IV, 1994.
Commu-
694
BIBLIOGRAPHY
BBA84.
S. Bruell, G. Balbo, and P. Afshari. Mean Value Analysis of Mixed, Multiple Class BCMP Networks with Load Dependent Service Stations. Performance Evaluation, 4:241-260, 1984.
BBC77.
R. Brown, J. Browne, and K. Chandy. Memory Management and Response Time. Communications of the ACM, 20(3):153165, March 1977.
BBS77.
G. Balbo, S. Bruell, and H. Schwetmann. Customer Classes and Closed Network Models-A Solution Technique. In B. Gilchrist, editor, Proc. IFIP 7th World Computer Congress, pages 559-564, Amsterdam, August 1977. North-Holland.
BCH79.
0. Boxma, J. Cohen, and N. Huffels. Approximations of the Mean Waiting Time in a M/G/s-Q ueueing System. Operations Research, 27:1115-1127, 1979.
BCMP75
F. Baskett, K. Chandy, R. Muntz, and F. Palacios. Open, Closed, and Mixed Networks of Queues with Different Classes of Customers. Journal of the ACM, 22(2):248-260, April 1975.
Beau78.
M. Beaudry. Performance-related reliability measures for computing systems. IEEE Transactions on Computers, 27(6):540-547, 1978.
Bega93.
K. Begain. Using subclasses of PH distributions for the modeling of empirical distributions. In Proc. 18th ICSCS, pages 141-152, Cairo, Egypt, 1993.
BeGo81.
P. Bernstein and N. Goodman. Concurrency Control in Distributed Database Systems. Computing Surveys, 13(2):185-221, June 1981.
BeMe78.
F. Beutler and B. Melamed. Decomposition and Customer Streams of Feedback Networks of Queues in Equilibrium. Operations Research, 26(6):1059-1072, 1978.
BFH88.
M. Bar, K. Fischer, and G. Hertel. Leistungsfahigkeit-QualitiitZuverlassigkeit, chapter 4. transpress Verlagsgesellschaft mbH, Berlin, 1988.
BFS87.
G. Belch, G. Fleischmann, and R. Schreppel. Ein funktionales Konzept zur Analyse von Warteschlangennetzen und Optimierung von Leistungsgrijfien. Informatik-Fachberichte, (154):327-342, 1987.
BGJ92.
G. Belch, M. Gaebell, and H. Jung. Analyse offener Warteschlangennetze mit Methoden fiir geschlossene In Operations Research Proc. 1992, Warteschlangennetze.
BIBLIOGRAPHY
pages 324-332, DGOR-Deutsche
695
Aachen, Germany, Berlin, September 1992. Gesellschaft fur Operations Research, Springer.
BKLC84.
R. Bryant, A. Krzesinski, M. Lakshmi, and K. Chandy. The MVA Priority Approximation. ACM Transactions on Computer Systems, 2(4):335-359, November 1984.
BMW89
H. Beilner, J. Mater, and N. WeiGenberg. Towards a Performance Modelling Environment: News on HIT. In R. Puigjaner, editor, Proc. 3rd Int. Conf. on Modelling Techniques and Tools for Computer Performance Evaluation, Palma de Mallorca, volume 1, Amsterdam, September 1989. North-Holland.
BoBr84.
G. Belch and W. Bruchner. Analytische Modelle symmetrischer Mehrprozessoranlagen mit dynamischen Prioritaten. Elektronisthe Rechenanlagen, 26(1):12-19, 1984.
BoFi93.
G. Belch and M. Fischer. Bottapprox: Eine Engpafianalyse fur geschlossene Warteschlangennetze auf der Basis der Summationsmethode. In H. Dyckhoff, U. Derigs, M. Salomon, and H. Tijms, editors, Operations Research Proc. 1993, pages 511-517, Amsterdam, Berlin, August 1993. DGOR-Deutsche Gesellschaft fur Operations Research, Springer.
BoHe96.
G. Belch and H. Herold. Markov Analyzer MOSES. versitat Erlangen-Niirnberg,
BoKi92.
G. Belch and M. Kirschnick. PEPSY-QNS - Performance Evaluation and Prediction System for Queueing Networks. Technical report TR-14-21-92, Universitat Erlangen-Niirnberg, IMMD IV, October 1992.
BoKo81
0. Boxma and A. Konheim. Approximate Analysis of Exponential Queueing Systems with Blocking. Acta Informatica, 15119-66, 1981.
Bolc83.
G. Belch. Approximation von Leistungsgrijl3en symmetrischer Mehrprozessorsysteme. Computing, 31:305-315, 1983.
Bolc89.
G. Belch. Leistungsbewertung von Rechensystemen. Leitfaden und Monographien der Informatik. B.G.Teubner Verlagsgesellschaft, Stuttgart, 1989.
Bolc91.
G. Belch. Analytische Leistungsbewertung-Methoden und Erfolge. PIK-Praxis der Informationsverarbeitung und Kommunilcation, 4~231-241, 1991.
BoRo94
G. Belch and K. Roggenkamp. Analysis of Queueing Networks with Asymmetric Nodes. In M. Woodward, S. Datta, and
MOSEL, A new Language for the Technical report TR-14-96-02, UniIMMD IV, 1996.
696
BIBLIOGRAPHY
S. Szumko, editors, Proc. Computer and Telecommunication Systems Performance Engineering Conf., pages 115-128, London, 1994. Pentech Press. BoSc91.
G. Belch and A. Scheuerer. Analytische Untersuchungen AsymIn W. Gaul, metrischer Prioritatsgesteuerter Wartesysteme. A. Bachem, W. Habenicht, W. Runge, and W. Stahl, editors, Operations Research Proc. 1991, pages 514-521, Stuttgart, Berlin, September 1991. DGOR-Deutsche Gesellschaft fur Operations Research, Springer.
BoTr86.
A. Bobbio and K. Trivedi. An Aggregation Technique for the Transient Analysis of Stiff Markov Chains. IEEE Transactions on Computers, 35(9):803-814, September 1986.
Bran82.
A. Brandwajn. Fast Approximate Solution of Multiprogramming Models. ACM Sigmetrics Performance Evaluation Review, 11(4):141-149, 1982.
BrBa80.
S. Bruell and G. Balbo. Computational Algorithms for Closed Queueing Networks. Operating and programming systems series. North-Holland, Amsterdam, 1980.
BrSe91.
I. Bronstein and K. Semendjajew. Taschenbuch der Mathematik. B.G.Teubner Verlagsgesellschaft, Stuttgart, 1991.
BRT88.
J. Blake, A. Reibman, and K. Trivedi. Sensitivity analysis of reliability and performability measures for multiprocessor systems. In Proc. 1988 ACM SIGMETRICS Int. Conf. on Measurement and Modeling of Computer Systems, pages 177-186, Santa Fee, N.M., May 1988.
Brum71.
S. Brumelle. Some Inequalities for Parallel-Server Queues. Operations Research, (19):402-413, 1971.
Bux81.
W. Bux. Local-Area Subnetworks: A Performance Comparison. IEEE Transactions on Communication, 29( 10):1465-1473, October 1981.
Buze71.
J. Buzen. Queueing Network Models of Multiprogramming. PhD thesis, Div. of Engineering and Applied Physics, Harvard University, 1971.
Buze73.
J. Buzen. Computational Algorithms for Closed Queueing Networks with Exponential Servers. Communications of the ACM, 16(9):527-531, September 1973.
BVDT88.
M. Boyd, M. Veeraraghavan, J. Dugan, and K. Trivedi. An In Proc. Approach to Solving Large Reliability Models. IEEE/AIAA DASC Symp., San Diego, 1988.
BIBLIOGRAPHY
697
CBC+92.
G. Ciardo, A. Blakemore, P. Chimento, J. Muppala, and K. Trivedi. Linear algebra, iklarkov chains and queueing models, volume 48, chapter Automated Generation and Analysis of Markov Reward Models using Stochastic Reward Nets, pages 145-191. Springer, New York, 1992.
CFMT94.
G. Ciardo, R. Fricks, J. Muppala, and K. Trivedi. SPNP Users Manual Version 4.0. Duke University, Department of Electrical Engineering, Durham, N.C., March 1994.
CGM83.
A. Chesnais, E. Gelenbe, and I. Mitrani. On the Modeling of Parallel Access to Shared Data. Communications of the ACM, 26(3):196-202, March 1983.
Chan72.
K. Chandy. The Analysis and Solutions for General Queuing Networks. In Proc. 6th Annual Princeton Conf. on Information Sciences and Systems, pages 224-228, Princeton, N.J., March 1972. Princeton University Press.
ChLa74.
A. Chang and S. Lavenberg. Work Rates in Closed Queueing Networks with General Independent Servers. Operations Research, 221838-847, 1974.
ChLa83.
K. Chandy and M. Lakshmi. An Approximation Technique for Queueing Networks with Preemptive Priority Queues. Technical report, Department of Computer Sciences, University of Texas, February 1983.
ChMa83.
K. Chandy and A. Martin. A Characterization of Product-Form Queuing Networks. Journal of the ACM, 30(2):286-299, April 1983.
ChNe82.
K. Chandy and D. Neuse. Linearizer: A Heuristic Algorithm for Queueing Network Models of Computing Systems. Communications of the ACM, 25(2):126-134, February 1982.
ChSa80.
K. Chandy and C. Sauer. Computational Algorithms for Product Form Queueing Networks. Communications of the ACM, 23( 10):573-583, October 1980.
CHT77.
K. Chandy, J. Howard, and D. Towsley. Product Form and Local Balance in Queueing Networks. Journal of the ACM, 24(2):250263, April 1977.
ChTr92.
H. Choi and K. Trivedi. Approximate Performance Models of Polling Systems Using Stochastic Petri Nets. In Proc IEEE INFOCOM ‘92, volume 3, pages 1-9, Florence, Italy, Los Alamitos, CA, May 4-8 1992. IEEE Computer Society Press.
698
BIBLIOGRAPHY
CHW75a.
K. Chandy, U. Herzog, and L. Woo. Approximate Analysis of General Queueing Networks. IBM Journal of Research and Development, 19(1):43-49, January 1975.
CHW75b.
K. Chandy, U. Herzog, and L. Woo. Parametric Analysis of Queueing Networks. IBM Journal of Research and Development, 19( 1):36-42, January 1975.
Chy186.
P. Chylla.
Zur Modellierung und approximativen LeistungsanalDissertation, Faculty yse von Vielteilnehmer-Rechensystemen.
for Mathematics Munich, 1986. ChYu83.
and
Computer
Science,
W. Chow and P. Yu. An Approximation Server Queueing Models with a Priority
Technical
University
Technique Dispatching
for Central Rule. Per-
formance Evaluation, 3~55-62, 1983. CiGr87.
Cin175.
CiTr93.
B. Ciciani and V. Grassi. tolerant satellite systems. 35(4):403-409, 1987. E. Cinlar. Englewood
Introduction
Performability
IEEE
evaluation
of fault-
Transactions on Computers,
to Stochastic Processes. Prentice-Hall,
Cliffs, N.J., 1975.
G. Ciardo and K. Trivedi. tic Reward Net Models. July 1993.
A Decomposition
Approach
Performance Evaluation,
for Stochas18( 1):37-59,
CMSTSO.
G. Ciardo, R. Marie, B. Sericola, and K. Trivedi. Performability analysis using semi-Markov reward processes. IEEE Transactions 1990. on Computers, 39(10):1251-1264,
CMTSO.
M. Calzarossa, R. Marie, and K. Trivedi. System Performance with User Behavior Graphs. Performance Evaluation, 11(3): 155165, July 1990.
CMT91.
G. Ciardo, J. Muppala, and K. Trivedi. On the Solution of GSPN Reward Models. Performance Evaluation, 12(4):237-254, July 1991.
CNT97.
C. Ciardo, D. Nicol, and K. Trivedi. Discrete-event Simulation of Fluid Stochastic Petri Nets. In Proc. 7th Int. Workshop on Petri Nets and Performance Models (PNPM ‘97), pages 217-225, St. Malo, France, Los Alamitos, CA, June 1997. IEEE Computer Society Press.
CoGe86.
A. Conway and N. Georganas. RECAL-A New Efficient Algorithm for the Exact Analysis of Multiple-Chain Closed Queuing Networks. Journal of the ACM, 33(4):768-791, October 1986.
BIBLIOGRAPHY
699
CoHo86.
E. J. Coffman and M. Hofri. Queueing models of secondary storage devices. Queueing Systems, 1(2):129-168, September 1986.
CoSe84.
P. Courtois and P. Semal. Bounds for positive eigenvectors of nonnegative matrices and for their approximations by decomposition. Journal of the ACM, 31(4):804-825, 1984.
Cosm76.
G. Cosmetatos. Some Approximate Equilibrium Results for the Operations Research Quarterly, Multiserver Queue (M/G/r). USA, pages 615-620, 1976.
Cour75.
P. Courtois. Decomposability, Instabilities and Saturation in Multiprogramming Systems. Communications of the ACM, 18(7):371377, July 1975.
Cour77.
P. Courtois. Applications.
cox55.
D. Cox. A Use of Complex Probabilities in the Theory of Stochastic Processes. In Proc. Cambridge Philosophical Society, volume 51, pages 313-319, 1955.
Crom34.
C. Crommelin. Delay probability Engineer, 26~266-274, 1934.
CSL89.
A. Conway, E. de Souza e Silva, and S. Lavenberg. Mean Value Analysis by Chain of Product Form Queueing Networks. IEEE Transactions on Computers, 38(3):432-442, March 1989.
CTM89.
G. Ciardo, K. Trivedi, and J. Muppala. SPNP: Stochastic Petri Net package. In Proc. 3rd Int. Workshop on Petri Nets and Performance Models (PNPM ‘89), pages 142-151, Kyoto, Japan, Los Alamitos, CA, December 1989. IEEE Computer Society Press.
DaBh85.
C. Das and L. Bhuyan. Bandwith availability of multiple-bus multiprocessors. IEEE Transactions on Computers, 34( 10):918-926, 1985.
Dadu96.
H. Daduna. Discrete Time Queueing Networks: Recent Developments. In Proc. Performance ‘96, Lausanne, October 8 1996. Tutorial Lecture Notes, Revised Version.
DeBu78.
P. Denning and J. Buzen. The Operational Analysis of Queueing Network Models. Computing Surveys, 10(3):225-261, September 1978.
DoIy87.
L. Donatiello and B. Iyer. Analysis of a composite performance reliability measure for fault-tolerant systems. Journal of the A CM, 34(4):179-199, 1987.
Decomposability: Queueing and Computer System Academic Press, New York, 1977.
formulae. Post 08. Electronic
700
BIBLIOGRAPHY
Dosh90.
B. Doshi. Single Server Queues With Vacations-A Survey. Queueing Systems, 1, 1(1):29--66, 1990.
DrLa77.
S. Dreyfus and A. Law. The art and theory of dynamic ming. Academic Press, New York, 1977.
DuCz87.
A. Duda and T. Czachorski. Performance Evaluation of Fork and 24:525-553, Join Synchronization Primitives. Acta Informatica, 1987.
Duda87.
A. Duda. Approximate Performance Analysis of Parallel Systems. and PerIn Proc. 2nd Int. Workshop on Applied Mathematics formance/Reliability Models and Computer/Communication tems, pages 189-202, 1987.
EaSe83.
D. Eager and K. Sevcik. Queueing Networks. ACM 1(2):99-115, May 1983.
program-
Sys-
Performance Bound Hierarchies for Transactions
on Computer
Systems,
EaSe86.
D. Eager and K. Sevcik. Bound Hierarchies for Multiple-Class Queueing Networks. Journal of the A CA& 33( 1): 179-206, January 1986.
Ehre96.
D. Ehrenreich. Erweiterung eines Kommunikationsnetzes auf der Basis von Leistungsuntersuchungen. Diplomarbeit , Universitat Erlangen-Niirnberg, IMMD IV, July 1996.
FeJo90.
R. Feix and M. R. Jobmann. MAOS - Model Analysis and Optimization System. Technical report, University of Munich, FB Informatik, 1990.
Fe1168.
to Probability Theory and its ApplicaW. Feller. An Introduction tions. Volume 1 of [Fe1171],3rd edition, 1968.
Fe1171.
W. Feller. An Introduction to Probability Theory and its Applications, volume 2. Wiley, New York, 2nd edition, 1971.
FiMe93.
W. Fischer and K. Meier-Hellstern. The Markov-Modulated Poisson Process (MMPP) cookbook. Performance Evaluation, 18:149171, 1993.
Flei89.
G. Fleischmann. Modellierung und Bewertung paralleler Progrumme. Dissertation, Universitat Erlangen-Niirnberg, IMMD IV, 1989.
FoG188.
B. Fox and P. Glynn. Computing Poisson probabilities. nications of the ACM, 31(4):440-445, April 1988.
Fijrs89.
C. Fijrster. Implementierung und Validierung von MWAApproximationen. Studienarbeit, Universitat Erlangen-Niirnberg, IMMD IV, 1989.
Commu-
BIBLIOGRAPHY
701
FrBe83.
D. Freund and J. Bexfield. A New Aggregation Approximation Procedure for Solving Closed Queueing Networks with Simultaneous Resource Possession. A CM Sigmetrics Performance Evaluation Review, pages 214-223, August 1983. Special Issue.
GaKe79.
F. Gay and M. a. Ketelsen. Performance evaluation for gracefully degrading systems. In Proc. IEEE Int. Symp. on Fault-Tolerant Computing, FTCS, pages 51-58, Los Alamos, CA, Los Alamitos, CA, 1979. IEEE Computer Society Press.
GBB98.
S. Greiner, G. Belch, and K. Begain. A generalized analysis technique for queueing networks with mixed priority strategy and class switching. Computer Communications, 1998.
GCS+86.
A. Goyal, W. Carter, E. de Souza e Silva, S. Lavenberg, and K. Trivedi. The System Availability Estimator (SAVE). In Proc. 16th Int. Symp. on Fault-Tolerant Computing, FTCS, pages 8489, Vienna, Austria, Los Alamitos, CA, July 1986. IEEE Computer Society Press.
Gele75.
E. Gelenbe. On Approximate Computer System Models. Journal of the ACM, 22(2):261-269, April 1975.
GeMi80.
E. Gelenbe and I. Mitrani. Analysis and Synthesis of Computer Systems. Academic Press, London, 1980.
GePu87.
E. Gelenbe and G. Pujolle. Wiley, Chichester, 1987.
GeTr83.
R. Geist and K. Trivedi. The Integration of User Perception in the Heterogeneous M/M/2 Q ueue. In A. Agrawala and S. Tripathi, editors, Proc. Performance ‘83, pages 203-216, Amsterdam, 1983. North-Holland.
GKZH94.
R. German, C. Kelling, A. Zimmermann, and G. Hommel. TimeNET-A Toolkit for Evaluating Non-Markovian Stochastic Petri Nets. Technical report, Technical University Berlin, 1994.
GLT87.
A. Goyal, S. Lavenberg, and K. Trivedi. Probabilistic Modeling of Computer System Availability. Annals of Operations Research, 8:285-306, March 1987.
Introduction
to Queueing Networks.
GoNe67a. W. Gordon and G. Newell. Closed Queuing Systems with Exponential Servers. Operations Research, 15(2):254-265, April 1967. GoNe67b. W. Gordon and G. Newell. Cyclic Queuing Systems with Restricted Queues. Operations Research, 15(2):266-277, April 1967.
702
BIBLIOGRAPHY
Good83.
J. Goodman. Using cache memory to reduce processor-memory traffic. In Proc. 10th Int. Symp. on Computer Architecture, pages 124-131, New York, 1983. IEEE.
GoTa87.
A. Goyal and A. Tantawi. Evaluation of performability in acyclic Markov chains. IEEE Transactions on Computers, 36(6):738-744, 1987.
GrBo96.
Approximative analytische LeisS. Greiner and G. Belch. tungsbewertung am Beispiel eines UNIX-basierten Multiprozessor Betriebssystems. volume 11 of Inform&k Forschung und EntwickZing, pages 112-124, Berlin, 1996. Springer.
GrHa85.
H. Gross and C. Harris. Fundamentals of Queueing Theory. Wiley, New York, 2nd edition, 1985.
GTH85.
W. Grassmann, M. Taksar, and D. Heyman. Regenerative Analysis and Steady State Distribution for Markov Chains. Operations Research, 33:1107-1116, September 1985.
Hahn88.
T. Hahn. Implementierung und Validierung der Mittelwertanalyse fur hiihere Momente und Verteilungen. Studienarbeit, Universitat Erlangen-Nurnberg, IMMD IV, 1988.
HahnSO.
T. Hahn. Erweiterung und Validierung der Summationsmethode. Diplomarbeit, Universitat Erlangen-Niirnberg, IMMD IV, 1990.
HaSp95.
T. Hanschke and T. Speck. AMS-An APL Based Modeling System for Production Planning. In Operations Research Proceedings 1994, pages 318-323, Berlin, 1995. Springer.
HaTr93.
R. Haverkort and K. Trivedi. Specification and Generation of Markov Reward Models. Discrete-Event Dynamic Systems: Theory and Applications, 3:219-247, 1993.
HaYo81.
L. Hagemann and D. Young. Applied Iterative Methods. Academic Press, New York, 1981.
HBAK86
K. Hoyme, S. Bruell, P. Afshari, and R. Kain. A Tree-Structured Mean Value Analysis Algorithm. ACM Transactions on Computer Systems, 4(2):178-185, November 1986.
Hend72.
W. Henderson. Alternative Approaches to the Analysis of the M/G/l and G/M/l Queues. Operations Research, 15:92-101, 1972.
HeSo84.
D. Heyman and M. Sobel. Stochastic Models in Operations Research, volume 1 and 2. McGraw-Hill, New York, 1984.
BIBLIOGRAPHY
703
HeTr82.
P. Heidelberger and K. Trivedi. Queueing Network Models for Parallel Processing with Asynchronous Tasks. IEEE Transactions November 1982. on Computers, 31(11):1099-1109,
HeTr83.
P. Heidelberger and K. Trivedi. Analytic Queueing Models for Programs with Internal Concurrency. IEEE Transactions on Computers, 32( 1):73-82, January 1983.
HLM96.
G. Haring, J. L&hi, and S. Majumdar. Mean Value Analysis for Computer Systems with Variabilities in Workload. In Proc. IEEE
Int. Computer Performance and Dependability Symp. (IPDS ‘96), pages 32-41, Urbana-Champaign,
September
1996.
HMT91.
D. Heimann, for Computer
N. Mittal, and K. Trivedi. Dependability Modeling Systems. In Proc. Annual Reliability and Muintuinability Symposium, pages 120-128, Orlando, FL, January 1991.
Hort96.
G. Horton. On the Multi-Level Algorithm for 1Markov Chains. In Proc. Cooper Mountain Conf. on Iterative Methods, Cooper Mountain, C.O., April 1996.
Hova8 1.
A. Hordijk and N. van Dijk. Networks of Queues with Blocking. In F. Kylstra, editor, Proc. Performance ‘81, pages 51-65, Amsterdam, 1981. North-Holland.
Howa71.
R. Howard. Dynamic Probabilistic Systems, volume 2: Markov and Decision Processes. Wiley, New York, 1971.
HsLa87.
C. Hsieh and S. Lam. Two Classes of Performance Bounds for Closed Queueing Networks. Performance Evaluation, 7( 1):3-30, February 1987.
HsLa89.
C. Hsieh and S. Lam. PAM-A Noniterative Approximate Solution Method for Closed Multichain Queueing Networks. Performance Evaluation, 9(2):119-133, April 1989.
Hus18 1.
R. Huslende. A combined evaluation of performance and reliability for degradable systems. In Proc. 1981 ACM SIGMETRICS Conf. on Measurements and Modeling of Computer Systems, Performance Evaluation Review, pages 157-164, Las Vegas, Nevada, September 1981.
HWFT97.
C. Hirel, S. Wells, R. Fricks, and K. Trivedi. Environment for modeling using Stochastic
Semi-
iSPN: an Integrated Petri Nets. In Tool
Descriptions 7th Int. Workshop on Petri Nets and Performance Models (PNPM ‘97), pages 17-19, St. Malo, France, Los Alamitos, CA, June 1997. IEEE Ibe87.
Computer
Society Press.
0. Ibe. Performance comparison of explicit and implicit tokenpassing networks. Computer Communications, 10(2):59-69, 1987.
704
BIBLIOGRAPHY
IbTr90.
0. Ibe and K. Trivedi. Stochastic Petri net models of polling systems. IEEE Journal on Selected Areas in Communications, 8( 10):146-152, December 1990.
ICT93.
0. Ibe, H. Choi, and K. Trivedi. Performance Evaluation of ClientServer Systems. IEEE Transactions on Parallel and Distributed Systems, 4( 11):1217-1229, November 1993.
IPM82.
J. Ignazio, D. Palmer, and C. Murohy. A Multicriteria Approach to Supersystem Architecture Definition. IEEE Transactions on Computers, (C-31):410-418, 1982.
Jack57.
J. Jackson. Networks of Waiting Lines. 5(4):518-521, 1957.
Jack63.
J. Jackson. Jobshop-Like Queuing Systems. Management Science, lO( 1):131-142, October 1963.
Jaek91.
H. Jaekel. Analytische temen mit Prioritaten. Niirnberg, 1991.
Jain91.
R. Jain. The Art of Computer Systems Performance Analysis. Wiley, New York, 1991.
JaLa82.
P. Jacobson and E. Lazowska. Analyzing Queueing Networks with Simultaneous Resource Possession. Communications of the ACM, 25(2):142-151, February 1982.
JaLa83.
P. Jacobson and E. Lazowska. A Reduction Technique for Evaluating Queueing Networks with Serialization Delays. In A. Agrawala and S. Tripathi, editors, Proc. Performance ‘83, pages 45-59, Amsterdam, 1983. North-Holland.
Jens53.
A. Jensen. Markoff chains as an aid in the study of Markoff processes. Skandinavian Aktuarietidskr., 36:87-91, 1953.
Kauf84.
J. Kaufmann. Approximation Methods for Networks of Queues with Priorities. Performance Evaluation, 4:183-198, 1984.
KeRi78.
B. Kernighan and D. Ritchie. The C Programming Language. Prentice-Hall, Englewood Cliffs, N. J., 1978.
Kero86.
T. Kerola. The Composite Bound Method for Computing Throughput Bounds in Multiple Class Environments. Performance Evaluation, 6(1):1-g, March 1986.
KeSn78.
J. Kerneny and J. Snell. Finite Markov Chains. Springer, New York, 2nd edition, 1978.
Operations Research,
Untersuchung von Multiple Server SysUniversitat ErlangenDiplomarbeit,
BIBLIOGRAPHY
705
KGB87.
S. Kumar, W. Grassmann, and R. Billington. A Stable Algorithm to Calculate Steady-State Probability and Frequency of a Markov System. IEEE Transactions on Reliability, R-36(1), April 1987.
KGTA88.
D. Kouvatsos, H. Georgatsos, and N. Tabet-Aouel. A Universa1 Maximum Entropy Algorithm for General Multiple Class Open Networks with Mixed Service Disciplines. Technical report DDK/PHG-1. N, Computing Systems Modelling Research Group, Bradford University, England, 1988.
Kimu85#
T. Kimura. Heuristic Approximations for the Mean Waiting Time in the GI/G/s Q ueue. Technical report B55, Tokyo Institute of Technology, 1985.
King70.
J. Kingman. Inequalities in the Theory of Queues. Journal Royal Statistical Society, (Series B 32):102-110, 1970.
KingSO.
J. King.
Computer
Modeling.
Prentice-Hall,
and Communication
Systems
of the
Performance
Englewood Cliffs, N. J., 1990.
Kirs9 1.
M. Kirschnick. Approximative Analyse von WS-Netzwerken Diplomarbeit, Universitat Erlangenmit Batchverarbeitung. Niirnberg, IMMD IV, 1991.
Kirs93.
M. Kirschnick. XPEPSY Manual. Technical report TR-14-18-93, Universitat Erlangen-Niirnberg, IMMD IV, September 1993.
Kirs94.
M. Kirschnick. The Performance Evaluation and Prediction SYstern for Queueing Networks (PEPSY-QNS). Technical report TR14-18-94, Universitat Erlangen-Ntirnberg, IMMD IV, June 1994.
Klei65.
L. Kleinrock. A conservation law for a wide class of queueing disciplines. Naval Research Logistic Quart., (12):181-192, 1965.
Klei75.
L. Kleinrock. York, 1975.
Klei76.
L. Kleinrock. Queueing Systems, volume 2: Computer Applications. Wiley, New York, 1976.
KnKr96
W. Knottenbelt and P. Kritzinger. A Performance Analyzer for the Numerical Solution of General Markov Chains. Technical report, Data Network Architectures Laboratory, Computer Science Department, University of Cape Town, 1996.
KoA188.
D. Kouvatsos and J. Almond. Maximum Entropy Two-Station Cyclic Queues with Multiple General Servers. Acta Informatica, 26:241-267, 1988.
Queueing
Systems, volume 1: Theory. Wiley, New
706
BIBLIOGRAPHY
Koba74.
H. Kobayashi. Application of the Diffusion Approximation to Queueing Networks, Part 1: Equilibrium Queue Distributions. Journal of the ACM, 21(2):316-328, April 1974.
Koba78.
H. Kobayashi. Modeling and Analysis: An Introduction to System Performance Evaluation Methodology. Addison-Wesley, Reading, MA, 1978.
Koba79.
H. Kobayashi. A Computational Algorithm for Queue Distributions via Polya Theory of Enumeration. In M. Arato and M. Butrimenko, editors, Proc. 4th Int. Symp. on Modelling and Performance Evaluation of Computer Systems, volume 1, pages 79-88, Vienna, Austria, February 1979. North-Holland.
Kij1174.
J. Kijllerstrijm. Heavy Traffic Theory for Queues with Several Servers. Journal of Applied Probability, 11:544-552, 1974.
KoRe76.
A. Konheim and M. Reiser. A Queueing Model with Finite Waiting Room and Blocking. Journal of the ACM, 23(2):328-341, April 1976.
KoRe78.
A. Konheim and M. Reiser. Finite Capacity Queuing Systems with Applications in Computer Modeling. SIAM Journal on Computing, 7(2):210-299, May 1978.
Kouv85.
D. Kouvatsos. Maximum Entropy Methods for General Queueing Networks. In Proc. Int. Conf. on Modelling Techniques and Tools for Performance Analysis, pages 589-608, 1985.
KrLa76.
W. Kramer and M. Langenbach-Belz. Approximate Formulae for General Single systems with Single and Bulk Arrivals. In Proc. 8th Int. Teletrafic Congress (ITC), pages 235-243, Melbourne, 1976.
Krze87.
A. Krzesinski. Multiclass Queueing Networks with StateDependent Routing. Performance Evaluation, 7( 2) : 125-143, June 1987.
KTK81.
A. Krzesinski, P. Teunissen, and P. Kritzinger. Mean Value Analysis for Queue Dependent Servers in Mixed Multiclass Queueing Networks. Technical report, University of Stellenbosch, South Africa, July 1981.
Kuba86.
P. Kubat. Reliability analysis for integrated networks with application to burst switching. IEEE Transactions on Communication, 34(6):564-568, 1986.
Kiihn79.
Approximate Analysis of General Queuing NetP. Kuhn. works by Decomposition. IEEE Transactions on Communication, 27( 1):113-126, January 1979.
BIBLIOGRAPHY
707
Kulb89.
J. Kulbatzki. Das Programmsystem PRIORI: Erweiterung und Validierung mit Hilfe von Simulationen. Diplomarbeit, 1989.
Kulk96.
V. Kulkarni. Modeling and Analysis man & Hall, London, 1996.
LaLi83.
A Tree Convolution Algorithm for the S. Lam and Y. Lien. Solution of Queueing Networks. Communications of the ACM, 26(3):203-215, March 1983.
Lam81.
S. Lam. A Simple Derivation of the MVA and LBANC Algorithms from the Convolution Algorithm. Technical report 184, University of Texas at Austin, November 1981.
LaRe80.
State Probabilities at S. Lavenberg and M. Reiser. Stationary Arrival Instants for Closed Queueing Networks with Multiple Types of Customers. Journal of Applied Probability, 17:1048-1061, 1980.
Lave83.
S. Lavenberg. Computer Performance demic Press, New York, 1983.
LaZa82.
E. Lazowska and J. Zahorjan. Multiple Class Memory Queueing Networks. ACM Sigmetrics Performance Review, 11(4):130-140, 1982.
LeWi89.
Y. Levy and P. Wirth. A unifying approach to performance and reliability objectives. Teletraffic Science for New Cost-Effective Systems, Networks and Services, ITC-12, pages 1173-1179. Elsevier, Amsterdam, 1989.
Lind94.
C. Lindemann. burg, Munich,
Litt61.
J. Little. A Proof of the Queuing Formula Research, 9(3):383-387, May 1961.
Luca9 1.
D. Lucantoni. New Results on the single server queue with a Batch Markovian Arrival Process. Stochastic Models, 7(1):1-46, 1991.
Luca93.
D. Lucantoni. The BMAP/G/l queue: A tutorial. In L. Donatiello and R. Nelson, editors, Models and Techniques for Performance Evaluation of Computer and Communications Systems. Springer, New York, 1993.
LZGS84.
E. Lazowska, J. Zahorjan, G. Graham, and K. Sevcik. Quantitative System Performance-Computer System Analysis Using Queueing Network Models. Prentice-Hall, Englewood Cliffs, N.J., 1984.
Stochastic 1994.
Modeling
of Stochastic
Modeling
Systems. Chap-
Handbook.
Constrained Evaluation
using DSPNexpress.
L = XW.
Aca-
Olden-
Operations
708
BIBLIOGRAPHY
V. Almeida, and L. Dowdy. Capacity Planning and Performance Modeling-From Mainframes to Client-Server SysEnglewood Cliffs, N.J., 1994. tems. Prentice-Hall,
MAD94.
D. Men&s&,
MaRa95.
S. Majumdar and R. Ramadoss. Interval-Based Performance Analysis for Computer Systems. In P. Dowd and E. Gelenbe, editors, Proc. 3rd Int. Workshop on Modeling Analysis and Simulation of Computer and Telecommunication Systems, pages 345351, Durham, N.C., Los Alamitos, CA, 1995. IEEE Computer Society Press.
Marc74.
W. Marchal. Simple Bounds and Approximations in Queueing Systems. D.sc. dissertation, George Washington University, Department of Operations Research, Washington, D.C., 1974.
Marc78.
W. Marchal.
Some simpler
bounds
on the mean queueing
time.
Operations Research, 26(6), 1978. Mari78.
R. Marie. mathematiques
Mari79.
R. Marie. An Approximate Analytical Method for General Queueing Networks. IEEE Transactions on Software Engineering, 5(5):530-538, September 1979.
Mari80.
R. Marie. Calculating Equilibrium Probabilities for X(n)/Ck/l/N Queues. ACM Sigmetrics Performance Evaluation Review, 9(2):117-125, 1980.
Mart 72.
J. Martin. Englewood
MaSt77.
Methodes iteratives de resolution de modeles de systemes informatiques. R. A.I. R. 0. Informa1978. tique/Computer Science, 12(2):107-122,
System analysis for data transmission.
Prentice-Hall,
Cliffs, N.J., 1972.
R. Marie and W. Stewart. A Hybrid Iterative-Numerical Method for the Solution of a General Queueing Network. In Proc. 3rd
Symp. on Measuring, Modelling and Evaluating Computer Systems, pages 173-188, Amsterdam, 1977. North-Holland. MaTr93.
M. Malhotra and K. Trivedi. A Methodology for Formal Specification of Hierarchy in Model Solution. In Proc. 5th Int. Workshop on Petri Nets and Performance Models (PNPM ‘93), pages 258-267, Toulouse, France, October 1993.
McKe88.
J. McKenna. A New Proof and a Tree Algorithm for RECAL. In P. Courtois and G. Latouche, editors, Proc. Performance ‘BY, pages 3-16, Amsterdam, 1988. North-Holland.
McMi84.
J. McKenna and D. Mitra. Asymptotic Expansions and Integral Representations of Moments of Queue Lengths in Closed MarkoApril 1984. vian Networks. Journa,l of the ACM, 31(2):346-360,
BIBLIOGRAPHY
709
MCT94.
J. Muppala, G. Ciardo, and K. Trivedi. Stochastic Reward Nets for Reliability Prediction. Communications in Reliability, Maintainability and Serviceability (An nternational journal published by SAE international), 1(2):9-20, July 1994.
MeDii97.
H. de Meer and 0. Diisterhijft. Controlled Stochastic Petri Nets. In Proc. 16th IEEE Symposium on Reliable Distributed Systems (SRDS’97), pages 18-25, Durham N.C., Los Alamitos, CA, October 1997. IEEE Computer Society Press.
Meer92.
H. de Meer. Transiente Leistungsbewertung und Optimierung Dissert arekonfigurierbarer fehlertoleranter Rechensysteme. tion 10, Universitat Erlangen-Niirnberg, IMMD IV, October 1992.
MeFi97.
H. de Meer and S. Fischer. Controlled Stochastic Petri Nets for QoS Management. In K. Irmscher, C. Mittasch, and K. Richter, editors, Proc. 9th ITG/GI Conf. on Measurement, Modeling and Performance Evaluation of Computer and Communication Systems (MMB ‘97), pages 161-171, Freiberg/Sachsen, Germany, Berlin-Offenbach, September 1997. VDE Verlag.
MeNa82.
D. Menasce and T. Nakanishi. Optimistic Versus Pessimistic Concurrency Control Mechanisms in Database Management Systems. Information Systems, 7( 1):13-27, 1982.
MeSe97.
H. de Meer and H. Sevcikova. PENELOPE: dependability evalIn R. Marie, uation and the optimization of performability. R. Plateau, and G. Rubino, editors, Proc. 9th Int. Conf. on Modeling Techniques ant Tools for Computer Performance Evaluation, Lecture Notes in Computer Science 1245, pages 19-31, St. Malo, France, Berlin, 1997. Springer.
Meye80.
J. Meyer. On Evaluating the Performability of Degradable Computing Systems. IEEE Transactions on Computers, 29(8):720731, August 1980.
Meye82.
J. Meyer. Closed-Form Solutions of Performability. actions on Computers, 31(7):648-657, July 1982.
Mitz97.
U. Mitzlaff. Diffusionsapproximation von Warteschlangensystemen. Dissertation, Technical University Clausthal, Germany, July 1997.
MMKT94
J. Muppala, V. Mainkar, V. Kulkarni, and K. Trivedi. Numerical Computation of Response Time Distributions Using Stochastic Reward Nets. Annals of Operations Research, 48:155-184, 1994.
IEEE Trans-
710
BIBLIOGRAPHY
MMT94.
M. Malhotra, J. Muppala, and K. Trivedi. Stiffness-Tolerant Methods for Transient Analysis of Stiff Markov Chains. International Journal on Microelectronics and Reliability, 34( 11):18251841, 1994.
MMT96.
J. Muppala, M. Malhotra, and K. Trivedi. Reliability and maintenance of complex systems, chapter Markov Dependability Models of Complex Systems: Analysis Techniques, pages 442-486. Springer, Berlin, 1996.
MMW57.
C. Mack, T. Murphy, and N. Webb. The efficiency of N Machines unidirectionally patrolled by one operative when walking time and repair time are constant. Journal of the Royal Statistical Society, Series B 19( 1):166-172, 1957.
Moor72.
F. Moore. Computational Model of a Closed Queuing Network with Exponential Servers. IBM Journal of Research and Development, 16(6):567-572, November 1972.
MSA96.
M. Meo, E. de Souza e Silva, and M. Ajmone Marsan. Efficient Solution for a Class of Markov Chain Models of TelecommunicaOctotion Systems. Performance Evaluation, 27/28(4):603-625, ber 1996.
MSW88.
P. Martini, 0. Spaniol, and T. Welzel. File Transfer in High Speed Token Ring Networks: Performance Evaluation by approximate Analysis and Simulation. IEEE Journal on Selected Areas in Communications, 6(6):987-996, July 1988.
MTBH93
H. de Meer, K. Trivedi, G. Belch, and F. Hofmann. Optimal Transient Service Strategies for Adaptive Heterogeneous Queueing SysConf. on Measuretems. In 0. Spaniol, editor, Proc. 7th GI/ITG ment, Modelling Communication
and Performance Evaluation of Computer and Systems (MMB ‘93), Informatik aktuell, pages
166-170, Aachen, Germany, Berlin, September 1993. Springer. MTD94.
H. de Meer, K. Trivedi, and M. Dal Cin. Guarded Repair of Dependable Systems. Theoretical Computer Science, 128:179-210, July 1994. Special Issue on Dependable Parallel Computing.
Munt72.
R. Muntz. Network of Queues. Technical report, University of Los Angeles, Department of Computer Science, 1972. Notes for Engineering 226 C.
Munt 73.
R. Muntz. Poisson Departure Process and Queueing Networks. In Proc.
7th Annual
Princeton
Conf.
on Information
Sciences
and
Sydems, pages 435-440, Princeton, N.J., March 1973. Princeton University Press.
BIBLIOGRAPHY
711
MiiRo87.
B. Miiller-Clostermann and G. Rosentreter. Synchronized Queueing Networks: Concepts, Examples and Evaluation Techniques. Informatik-Fachberichte, (154):176-191, 1987.
MuTr91.
J. Muppala and K. Trivedi. Composite performance and availability analysis using a hierarchy of stochastic reward nets. In G. Balbo, editor, Proc. 5th Int. Conf. on Modeling Techniques pages 322-336, and Tools for Computer Performance Evaluation, Torino, Amsterdam, 1991. Elsevier Science (North Holland).
Mul-!r92.
J. Muppala and K. Trivedi. Numerical Transient Solution of Finite Markovian Queueing Systems. In U. Bhat and I. Basawa, editors, Queueing and Related Models, pages 262-284. Oxford University Press, Oxford, 1992.
MuWo74a.
R. Muntz and J. Wong. Asymptotic Properties of Closed Queueing Network Models. In Proc. 8th Annual Princeton Conf. on Information Sciences and Systems, pages 348-352, Princeton, N.J., March 1974. Princeton University Press.
MuWo74b.
R. Muntz and J. Wong. Efficient Computational Procedures for Closed Queueing Network Models. In Proc. 7th Hawaii Int. Conf. on System Science, pages 33-36, Hawaii, January 1974.
NaFi67.
T. Naylor and J. Finger. Verification of Computer Simulation Models. Management Science, 14:92-101, October 1967.
NaGa88.
W. Najjar and J. Gaudiot. Reliability and performance modeling of hypercube-based multiprocessors. In G. Iazeolla, P. Courtois, and Reliability, and 0. Boxma, editors, Computer Performance pages 305-320. North-Holland, Amsterdam, 1988.
NeCh81.
D. Neuse and K. Chandy. SCAT: A Heuristic Algorithm for Queueing Network Models of Computing Systems. ACM Sigmetrics Performance Evaluation Review, 10(3):59-79, 1981.
NeOs69.
G. Newell and E. Osuna. Properties of vehicle-actuated signals: Science, 3199-125, 1969. II. One-way streets. Transportation
NeTa88.
R. Nelson and A. Tantawi. Approximate Synchronization in Parallel Queues. IEEE puters, 37(6):739-743, June 1988.
Analysis of Fork/Join Transactions
on Com-
Neut81.
Solutions in stochastic Models: An M. Neuts. Matrix-Geometric Algorithmic Approach. The Johns Hopkins University Press, Baltimore, M.D., 1981.
Newe69.
G. Newell. Properties of vehicle-actuated signals: I. One-way Science, 3:30-52, 1969. streets. Transportation
712
BIBLIOGRAPHY
Nico90.
V. Nicola. Lumpability of Markov Reward Models. Technical report, IBM T.J. Wat,son Research Center, Yorktown Heights, New York, 1990.
NNHGSO. V. Nicola, M. Nakajama, P. Heidelberger, and A. Goyal. Fast Simulation of Dependability Models with General Failure, Repair and Maintenance Processes. In Proc. 20th Annual Int. Symp. on Fault- Tolerant Computing Systems, pages 491-498, Newcastle upon Tyne, UK, Los Alamitos, CA, June 1990. IEEE Computer Society Press. Noet79.
A. Noetzel. A Generalized Queueing Discipline for Product Form Network Solutions. Journal of the ACAd, 26(4):779-793, October 1979.
OnPe86.
R. Onvural and H. Perros. On Equivalences of Blocking Mechanisms in Queueing Networks with Blocking. Operations Research Letters, 5(6):293-298, December 1986.
OnPe89.
R. Onvural and H. Perros. Some Equivalences between Closed Queueing Networks with Blocking. Performance Evaluation, 9(2):111-118, April 1989.
PeA186.
H. Perros and T. Altiok. Approximate Analysis of Open Networks of Queues with Blocking: Tandem Configurations. IEEE Transactions on Software Engineering, 12(3):450-461, March 1986.
Perr8 1.
H. Perros. A Symmetrical Exponential Open Queue Network with Blocking and Feedback. IEEE Transactions on Software Engineering, 7(4):395-402, July 1981.
Perr94.
H. Perros. Queueing Networks with Blocking. Oxford University Press, Oxford, 1994.
PeSn89.
H. Perros and P. Snyder. A Computationally Efficient Approximation Algorithm for Feed-Forward Open Queueing Networks with Blocking. Performance Evaluation, 9(9):217-224, June 1989.
Petr62.
C. A. Petri. Kommunikation mit Automaten. versity of Bonn, Bonn, Germany, 1962.
PGTP96.
A. Pfening, S. Garg, S. Telek, and A. Puliafito. Optimal rejuvenation for tolerating soft failures. Performance Evaluation, 27/28:491-506, October 1996.
Pitt79.
B. Pittel. Closed Exponential Networks of Queues with Saturation: The Jackson Type Stationary Distribution and Its Asymptotic Analysis. Mathematics of Operations Research, 4367-378, 1979.
Dissertation,
Uni-
BIBLIOGRAPHY
713
PuAi86.
G. Pujolle and W. Ai. A Solution for multiserver open queueing networks. INFOR, 24(3):221-230,
RaBe74.
E. Rainville MacMillan,
Rand86.
S. Randhawa. Simulation of a Wafer Fabrication Facility Using Network Modeling. Journal of the Paper Society of Manufacturing Engineers, 1986.
RaTr95.
A. Ramesh and K. Trivedi. Semi-numerical transient analysis of Markov models. In R. Geist, editor, Proc. 33rd ACM Southeast Conf., pages 13-23, 1995.
Reis8 1.
M. Reiser. Mean-Value Analysis and Convolution Method for Queue-Dependent Servers in Closed Queueing Networks. Performance Evaluation, 1, 1981.
ReKo74.
M. Reiser and H. Kobayashi. Accuracy of the Diffusion Approximation for Some Queueing Systems. IBM Journal of Research and Development, 18(2):110-124, March 1974.
ReKo75.
M. Reiser and H. Kobayashi. Queueing Networks with Multiple Closed Chains: Theory and Computational Algorithms. IBM Journal of Research and Development, 19(3):283-294, May 1975.
ReLa80.
M. Reiser and S. Lavenberg. tichain Queuing Networks. April 1980.
ReTr88.
A. Reibman and K. Trivedi. Numerical Transient Analysis of Markov Models. Computers and Operations Research, 15( 1):19-36, 1988.
ReTr89.
A. Reibman and K. Trivedi. Transient analysis of cumulative measures of Markov model behavior. Stochastic Models, 5(4):683-710, 1989.
RST89.
A. Reibman, R. Smith, and K. Trivedi. Markov and Markov Reward Model Transient Analysis: An Overview of Numerical Approaches. European Journal of Operational Research, 40:257267, 1989.
SaCh81.
C. Sauer and K. Chandy. Computer Systems Performance elling. Prentice-Hall, Englewood Cliffs, N.J., 1981.
SaMa85.
C. Sauer and E. MacNair. The Evolution of the Research Queueing Package. In D. Potier, editor, Proc. Int. Conf. on ModelZing Techniques and Tools for Performance Analysis, pages 5-24, Paris, Amsterdam, 1985. North-Holland.
and P. Bedient. New York, 1974.
Elementary
and multiclass 1986.
DiSferential
Mean-Value Analysis Journal of the ACM,
Equations.
of Closed Mul27(2):313-322,
lklod-
714
BIBLIOGRAPHY
SaTr87.
R. Sahner and K. Trivedi. Performance and Reliability Analysis using Directed Acyclic Graphs. IEEE Transactions on Softwure Engineering, 13(10):1105-1114, October 1987.
Saue81.
C. Sauer. Approximate Solution of Queueing Networks with Simultaneous Resource Possession. IBM Journal of Research and Development, 25(6):894-903, November 1981.
Saue83.
C. Sauer. Computational Algorithms for State-Dependent Queueing Networks. ACM Transactions on Computer Systems, l( 1):6792, February 1983.
Schr70.
L. Schrage. queue G/G/l.
An alternative
proof
of a conservation
Operations Research, (18):185-187,
law for the 1970.
Schw79.
P. Schweitzer. Approximate Analysis of Multiclass Closed Networks of Queues. In Proc. Int. Conf. on Stochastic Control and Optimization, pages 25-29, Amsterdam, June 1979.
Schw84.
P. Schweitzer. Aggregation Methods for Large Markov Chains. In G. Iazeolla and other, editors, Muthemuticul Computer Performance and Reliability. North-Holland, Amsterdam, 1984.
ScMii90.
M. Sczittnick and B. Miiller-Clostermann. the Markovian Analysis of Communication
Int. Conf. on Data Communications mance, Barcelona, June 1990.
MACOM System.
- A Tool for In Proc. 4th
Systems and their Perfor-
SeMi81.
K. Sevcik and I. Mitrani. The Distribution of Queuing Network States at Input and Output Instants. Journal of the ACM, 28(2):358-371, April 1981.
Sevc77.
K. Sevcik. Priority Scheduling Disciplines in Queueing Network Models of Computer Systems. In Proc. IFIP 7th World Computer Congress, pages 565-570, Toronto, Amsterdam, 1977. NorthHolland.
ShBu77.
A. Shum and J. Buzen. The EPF Technique: A Method for Obtaining Approximate Solutions to Closed Queueing Networks In Proc. 3rd Symp. on lkleusurwith General Service Times. ing, Modelling and Evaluating Computer Systems, pages 201-220, 1977.
ShYa88.
J. Shanthikumar and D. Yao. Throughput Queueing Networks with Queue-Dependent formance Evaluation, 9( 1):69-78, November
SLM86.
E. de Souza de Silva, S. Lavenberg, and R. Muntz. Approximation Technique for Queueing Networks
Bounds for Closed Service Rates. Per1988. A Clustering with a Large
BIBLIOGRAPHY
715
Number of Chains. IEEE Transactions on Computers, 35(5):419430, May 1986. SmBr80.
C. Smith and J. Browne. Aspects of Software Design Analysis: Concurrency and Blocking. A CM Sigmetrics Performance Evaluation Review, 9(2):245-253, 1980.
SMK82.
C. Sauer, E. MacNair, and J. Kurose. The Research Queueing Package: Past, Present and Future. Proc. National Computer Conf., pages 273-280, 1982.
SmTr90.
R. Smith and K. Trivedi. The Analysis of Computer Systems Using Markov Reward Processes. In H. Takagi, editor, Stochastic Analysis of Computer and Communication Systems, pages 589629. Elsevier Science (North Holland), Amsterdam, 1990.
SoGa89.
E. de Souza e Silva and H. Gail. Calculating availability and performability measures of repairable computer systems using randomization. Journal of the ACM, 36(1):171-193, 1989.
SoLa89.
E. de Souza e Silva and S. Lavenberg. Calculating Joint QueueLength Distributions in Product-Form Queuing Networks. Journal of the ACM, 36(1):194-207, January 1989.
SOQW95. W. Sanders, W. Obal, A. Qureshi, and F. Widjanarko. The UltraSAN modeling environment. Performance Evaluation, 24( 1):89115, October 1995. Spir79.
J. Spirn. Queuing Networks with Random Selection for Service. IEEE Transactions on Software Engineering, 5(3):287-289, May 1979.
SRT88.
V. Sarma, A. Reibman, and K. Trivedi. Optimization Methods in Computer System Design. In R. Levary, editor, Engineering Design: Better Results Through Operations Research Methods. Elsevier Science (North Holland), Amsterdam, 1988.
StewSO.
W. Stewart. MARCA: MARCOF CHAIN ANALYZER, A Software Package for Makrov Modelling, Version 2.0. North Carolina State University, Department of Computer Science, Raleigh, N.C., 1990.
Stew94.
Introduction to Numerical Solution of Markov W. Stewart. Chains. Princeton University Press, Princeton, N. J., 1994.
StGo85.
W. Stewart and A. Goyal. Matrix methods in large dependability models. Research report RC 11485, IBM T.J. Watson Research Center, Yorktown Heights, N.Y., November 1985.
716
BIBLIOGRAPHY
STP96.
R. Sahner, K. Trivedi, and A. Puliafito. Performance and Reliability Analysis of Computer Systems - An Example-Based Approach Using the SHARPE Software Package. Kluwer Academic Publishers, Boston, M.A., 1996.
STR88.
R. Smith, K. Trivedi, and A. Ramesh. Performability Analysis: Measures, an Algorithm and a Case Study. IEEE Transactions on Computers, 37(4):406-417, April 1988.
Stre86.
J. Strelen. A Generalization of Mean Value Analysis to Higher Moments: Moment Analysis. ACA4 Sigmetrics Performance Ewaluation Review, 14( 1):129-140, May 1986.
SuDi84.
R. Suri and G. Diehl. A New Building Block for Performance Evaluation of Queueing Networks with Finite Buffers. ACM Sigmetrics Performance Evaluation Review, 12(3):134-142, August 1984.
SuDi86.
R. Suri and G. Diehl. A Variable Buffer-Size Model and its Use in Analyzing Closed Queueing Networks with Blocking. Management Science, 32(2):206-225, February 1986.
SuHi84.
R. Suri and R. Hildebrant. Modeling flexible manufacturing systems using mean value analysis. Journal of Munufucturing Systems, 3( 1):27-38, 1984.
Taka75.
Y. Takahashi. A Lumping Method for Numerical Calculations of Stationary Distributions of Markov Chains. Research report B 18, Tokyo Institute of Technology, Department of Information Sciences, Tokyo, June 1975.
Taka90.
H. Takagi. Stochastic Analysis of Computer And Communication Systems. North-Holland, Amsterdam, 1990.
Taka93.
H. Takagi. Queueing Analysis: A Foundation of Performance Evaluation, volume l-3. North-Holland, Amsterdam, 1991-1993.
Tane95.
A. Tanenbaum. Distributed Operating Systems. Prentice-Hall, Englewood Cliffs, N.J., 1995.
Tay87.
Y. Tay. Locking Performance in Centralized Databases. Academic Press, New York, 1987.
ThBa86.
A. Thomasian and P. Bay. Analytic Queueing Network Models for Parallel Processing of Task Systems. IEEE Transactions on Computers, 35(12):1045-1054, December 1986.
Thom83,
A, Thomasian. Queueing Network Models to Estimate Serialization Delavs in Comnuter Svstems. In A. Arrrawala and S. Trinathi.
BIBLIOGRAPHY
editors, Proc. Performance North-Holland.
‘83, pages 61-81, Amsterdam,
717
1983.
ThRy85.
A. Thomasian and I. Ryu. Analysis of Some Optimistic Concurrency Control Schemes Based on Certification. ACM Sigmetrics Performance Evaluation Review, 13(2):192-203, August 1985.
Tijm86.
H. Tijms. Stochastic Modelling and Analysis: Approach. Wiley, New York, 1986.
TMWH92.
K. Trivedi, J. Muppala, S. Woolet, and B. Haverkort. Composite performance and dependability analysis. Performance Evaluation, 14:197-215, 1992.
Tows80.
D . Towsley. Queuing Ne :work Models with St ate-Dependent Routing. Journal of the ACM, 27(2):323-337, April 1980.
Triv82.
K. Trivedi. Probability and Statistics with Reliability, Queuing, and Computer Science Applications. Prentice-Hall, Englewood Cliffs, N.J., 1982.
TrKi80.
K. Trivedi and R. Kiniki. A Model for Computer Configuration Design. IEEE Computer, 13(4):47-54, April 1980.
TrSi81.
K. Trivedi and T. Sigmon. Optimal Design of Linear Storage Hierarchies. Journal of the ACM, 28(2):270-288, 1981.
TSI+96.
K. S. Trivedi, A. S. Sathaye, 0. C. Ibe, R. C. Howe, and A. Aggarwal. Availability and Performance-Based Sizing of Multiprocessor Systems. Communications in Reliability, Maintainability and Serviceability, 1996.
TSIHSO.
K. Trivedi, A. Sathaye, 0. Ibe, and R. Howe. Should I add a processor? In Proc. 23rd Annual Hawaii Int. Conf. on System Sciences, pages 214-221, Los Alamitos, CA, January 1990. IEEE Computer Society Press.
TuSa85.
S. Tucci and C. Sauer. The Tree MVA Algorithm. Evaluation, 5(3):187-196, August 1985.
VePo85.
M. Veran and D. Potier. QNAP 2: A Portable Environment for Queueing Systems Modelling. In D. Potier, editor, Proc. Int. Conf. on Modelling Techniques and Tools for Performance Analysis, pages 25-63, Paris, Amsterdam, 1985. North-Holland.
Wals85.
R. Walstra. Nonexponential Networks of Queues: A Maximum Entropy Analysis. ACM Sigmetrics Performance Evaluation Review, 13(2):27-37, August 1985.
A Computational
Performance
718
BlBLlOGRAPHY
Welc95.
B. Welch. Practical Programming in Tel and Tk. Prentice-Hall, Upper Saddle River, N.J., 1995.
Whit83a.
W. Whitt. Performance of the Queueing Network Analyzer. Bell System Technical Journal, 62(9):2817-2843, November 1983.
Whit83b.
W. Whitt. The Queueing Network Analyzer. Bell System Technical Journal, 62(9):2779-2815, November 1983.
Wilh77.
N. Wilhelm. A General Model for the Performance of Disk Systems. Journal of the ACM, 24(1):14-31, January 1977.
WLTV96.
C. Wang, D. Logothetis, K. Trivedi, and I. Viniotis. Transient Behavior of ATM Networks under Overloads. In Proc. IEEE INFOCOM ‘96, pages 978-985, San Francisco, CA, Los Alamitos, CA, March 1996. IEEE Computer Society Press.
Wolf82.
R. Wolff. Poisson arrivals see time averages. Operations Research, 30:223-231, 1982.
Wong75.
J. Wong. Queueing Networlc Models for Computer Systems. PhD thesis, Berkley University, School of Engineering and Applied Science, October 1975.
Wu82.
L. Wu. Operational models for the evaluation of degradable computing systems. In Proc. 1982 ACM SIGMETRICS Conf. on Measurements and Modeling of Computer Systems, Performance Evaluation Review, pages 179-185, College Park, Maryland, New York, April 1982. Springer.
YaBu86.
D. Yao and J. Buzacott. The Exponentialization Approach to Flexible Manufacturing System Models with General Processing Times. European Journal of Operational Research, (24):410-416, 1986.
YBL89.
Q. Yang, L. Bhuyan, and B. Liu. Analysis and Comparison of Cache Coherence Protocols for a Packet-Switched Multiprocessor. IEEE Transactions on Computers, 38(8):1143-1153, August 1989.
ZaWo81.
J. Zahorjan and E. Wong. The Solution of Separable Queueing Network Models Using Mean Value Analysis. ACM Sigmetrics Performance Evaluation Review, 10(3):80-85, 1981.
ZES88.
J. Zahorjan, D. Eager, and H. Sweillam. Accuracy, Speed, and Convergence of Approximate Mean Value Analysis. Performance Evaluation, 8(4):255-270, August 1988.
ZSEG82.
J. Zahorjan, K. Sevcik, D. Eager, and B. Galler. Balanced Job Bound Analysis of Queueing Networks. Communications of the ACM, 25(2):134-141, Februarv 1982.
Index
A ABA, 411 Absorbing state, 46 Absorbing marking, 84 Accumulated reward, 66 Aggregation step, 168 Algorithms ERG, 92 AMVA, 535 ASPA, 500 asymmetric SCAT, 536 BFS, 559 BOTT, 405 CCNC, 312 convolution, 313 Courtois’s approximate method, DAC, 312 FES, 368 Gaussian elimination, 144 Grassmann, 119 LBANC, 311 MVALDMX, 352 RECAL, 311 RECAL method, 360 SCAT, 385 core algorithm, 386 extended, 393 multiple server, 389 single server, 385
153
SUM, 397 multiclass network, 403 single class networks, 400 Takahashi, 170 Allen-Cunneen, 443 Allen-Cunneen approximation, 229-230, 234 AMVA, 535 Aperiodic, 47 Approximate lumpability, 169 Approximation algorithms, 573 Approximation networks, 379 APU, 492, 623 Arc multiplicity, 87 marking-dependent, 90 Arcs, 84 Argmin, 405 Arrival overall rate, 266 process, 214 rate, 266 theorem, 326, 343, 473 ASCAT, 536 Associate processing unit, 492 Associated processor model, 624 ASUM, 534 Asymmetric MVA, 535 nodes, 533 queueing systems, 250 SCAT, 536 719
720
INDEX
SUM, 534 system, 248, 250 GI/G/m, 249 M/M/m, 249 ATM networks, 658 ATM, 33
B Bard-Schweitzer approximation, 380 Batch service, 258 Batch size, 258 BCMP theorem, 313 BCMP mixed networks, 352 theorem, 300-301 Version 1, 303 Version 2, 303 BFS method, 559 Binomial coefficient, 297 Birth rate, 105 Birth-death process, 105, 218, 221 BJB, 414 Blocking after service, 548 before service, 548 factor, 521 network, 547, 549 probabilities, 553 rejection, 549 repetitive, 548 types, 548 BOTT, 405, 534 Bottapprox method, 405, 463 multiclass, 408 Bottleneck analysis, 406, 427-428 Bounds analysis, 410 analysis asymptotic, 410-411 balanced job, 410, 414 lower bound, 233, 379 optimistic, 410 pessimistic, 410 upper Boxma Cohen and Brumelle,
bound,
229,
Huffels 233
formula,
233,
379
235
C Cache coherence coherence write-once, miss, 603 strategies,
protocol, protocol 669 666
667-668
Caches, 603 CCNC algorithm, 312 CDF, 7, 9, 14, 26, 36 conditional, 36 Cell loss probability, 658 Central limit theorem, 24 Central-server, 263 model, 263, 331, 511 Chain, 36 concept, 623 Chapman-Kolmogorov equation, 39, 50 Cheapernet, 610 Class switching, 295, 326, 337, 482, 621, 623 mixed priorities, 483 Client server system, 646 Client server systems, 607 Closed network single class, 313 Closed queueing network, 274 Closed tandem network cyclic queue, 222 Closing method, 464 Coefficient of variation, 9, 11, 13-14, 16, 20, 259 Communication systems, 608 Composite node, 452 Concept of chains, 295 Conditional probability, 22 Confidence interval, 29 Conservation law, 243 Contention time, 502 Continuous parameter process, 36 Continuous random variable, 7, 30-31 Convolution, 33 algorithm, 311, 313, 564, 573 method, 317 operator, 313 Correction factor, 229 Correlation coefficient, 24 Cosmetatos approximation, 231, 235, 241 Cost function, 558 linear, 558 non-linear, 559 Courtois’s approximate method, 153 Covariance, 23 Coverage factor, 61 Cox-m, 454 cox-2, 453 Crommelin approximation, 232 CSMA/CD network, 608 CSPL, 579 CTMC, 49, 253, 274, 276, 279, 311, 571 ergodic, 53 state transition diagram, 250 time-homogeneous, Cumulative distribution
49 function,
7
INDEX Cumulative
measure,
178
D DA, 423 DAC algorithm, 312 Death rate, 105 Decision variables, 558 Decomposition method, 439 of Chylla, 442 of Gelenbe, 441 of Kuhn, 442 of Pujolle, 441 of Whitt, 441 Degree of multiprogramming, 263 DES, 1, 59, 573 Diffusion approximation, 423 Disaggregation step, 168 Discipline dynamic priorities, 211 FCFS, 211 IS, 211 LBPS, 307 LCFS, 211 preemption, 211 PS, 211 queueing discipline, 242 RR, 211 SIRO, 211 static priorities, 211 WEIRDP, 307 Discrete random variable, 5, 20, 25 Discrete-event simulation, 571, 573 Discrete-parameter process, 36 Discrete-time Markov chain, 38 Distribution arbitrary normal, 10 branching Erlang, 17 cox, 17, 210 deterministic, 210 Erlang, 210 Erlang-lc, 13 Erlang-r, 33 Erlang-2, 276 exponential, 10, 210, 216, 224 function, 7 gamma, 15 general independent, 2 10 general, 2 10 generalized Erlang, 16 hyperexponential, 12, 210 hypoexponential, 14 k phases, 15 r phases, 34 non-exponential, 423 normal, 9 of sums, 31 waiting time, 217, 227
Weibull, DTMC,
38,
19 296,
571
Embedding techniques, Entropy, 43 1 function, 43 1 Equilibrium, 274 probability vector, Equivalent network, ERG algorithm, 92
ERG,
721
106
53 551
98
Ergodic chains, 326 Ergodic CTMC, 53 Ergodic Markov chain, 47 Ergodic, 47, 54, 274, 285 Erlang-m, 454 Estimate, 28 Estimator, 28 Ethernet, 607 Expectation, 20 Expected instantaneous reward rate, Expected value, 6, 8 Extended BOTT, 463 Extended MVA, 470 Extended reachability graph, 86 Extended SUM, 463
68
F Fast state, 191 FB, 258 FDDI, 610 FES method, 368 multiple nodes, 373 FES node, 368 Fire, 84 Firing rate marking-dependent, 90 Firing time, 86 Fission-fusion, 518 Fixed-point iteration, 403 Flexible production systems, 630 Flow, 441 Flow-equivalent server method, 311, 499 Fokker-Planck equation, 423 Fork-join systems, 517 Full batch policy, 258
G Gauss Seidel, 594 Gauss-Seidel iteration, 140 Gaussian elimination, 144 Generator matrix, 277 GI/G/l system, 229
368,
722
INDEX Kulbatzki, 234 approximation, formula, 234
GI/G/m system, 233 GI/M/l system, 225 GI/M/m system, 228 Global balance, 274 equations, 54, 274, 277, 279, 290 Gordon/Newell theorem, 288, 292 Grassmann algorithm, 119 GREEDY policy, 258 GSPN, 86 Guards, 89
L Lagrange function, 565 multipliers, 560 LAN, 607-608 Laplace transform, 26-27, 301 Laplace-Stieltjes transform, 26, 216, 224-225 Latency time, 502 LBANC, 311 LBPS, 307 Likelihood function, 29 Linear cost constraint, 560 Little’s theorem, 213, 215, 218, 221, 239, 252, 254, 271, 326 Load-dependent, 301 Local balance, 277-278 conditions, 301 equations, 278-279, 283 property, 281, 283 Loss system, 250, 255 LST, 216 Lumpable, 160 approximate, 169 pairwise, 169
H Heavy traffic approximation, 245 Hessenberg matrix, 108 Heterogeneous multiple servers, 248 Hierarchical models, 571, 665
I I/O subsystems, 501 Immediate transition, 86 Impulse reward, 66 Infinitesimal generator matrix, Inhibitor arcs, 87 Initial marking, 84 Initial probability vector, 41 Input arc, 84 Input place, 84 Instantaneous measure, 178 Instantaneous reward rate, 66 Internal concurrency, 506 Irreducible, 46, 54 ISDN channel, 650 ISPN, 585
52
M
J Jackson’s method, 467, 537 Jackson’s theorem, 284 Jacobi method, 137 Jensen’s method, 184 Joint cumulative distribution function, Joint distribution function, 20 Joint probability density function, 36 Joint probability mass function, 20
230
36
K Kendall’s notation, 210-211 Kimura, 443 approximation, 230, 235 Kingman, 233 Kingman-Kijllerstriim approximation, 234 Kolmogorov’s backward equation, 51 Kolmogorov’s forward equation, 51 Kramer/Langenbach-Belz, 234, 443 Kramer/Langenbach-Belz formula, 229 &h-order statistic, 30
M/G/l system, 223 M/G/m system, 230 M/M/o0 system, 217 M/M/l system, 214 M/M/l/K system, 219 M/M/m system, 218 Machine repairman model, 221, 263 Macro state, 167 Marchal, 233, 444 Marginal probabilities, 272, 389 Marie’s method, 452 Marking, 84 dependent arc multiplicity, 90 dependent firing rate, 90 tangible, 86 vanishing, 86 Markov chain, 37 chain ergodic, 47 process, 36 property, 36, 49 reward model, 56, 64 Martin’s approximation, 230 Master slave model, 623
INDEX Matrix equation, 274 Maximization, 558 Maximum entropy method, 430 closed networks, 433 open networks, 431 Maximum-likelihood estimation, 29 MB, 258 Mean number of jobs, 219, 271, 273 of visits, 266, 268 Mean queue length, 219, 225, 271 Mean recurrence time, 47, 54 Mean remaining service time, 239, 243 Mean residual life, 223 Mean response time, 219, 225, 271, 317 Mean time to absorption, 66 to failure, 61 to repair, 62 Mean value analysis, 311, 326, 573 Mean value, 6, 8 Mean waiting time, 219, 223, 271 MEM, 430 Memory constraints, 498, 504 requirements, 382 Memoryless property, 37, 224 Merging, 440 Met hod of moments, 28 of shadow server, 478 of simultaneous displacement, 137 Minimization, 558 Minimum batch policy, 258 MMPP, 653, 658 Moment, 26, 227 nth, 6, 9 central, 7, 9 second central, 7 Monoprocessor model, 621 Monotonicity, 398 MOSEL, 588 MOSES, 489, 588 MRM, 56, 64 instantaneous reward rate, 66 MTTA, 66 MTTF, 61 MTTR, 62 Multiclass closed networks, 320 Multilevel models, 571 Multiple job classes, 267, 271 Multiple servers, 270 Multiprocessor system, 603 loosely coupled, 606 tightly coupled, 603, 666 Multiprogramming system, 263 MVA, 326, 379, 405, 535
MVALDMX
algorithm,
723
352
N Network BCMP, 295 closed, 263, 266, 288, 297, 348 load-dependent service, 347 mixed, 267, 343 BCMP, 352 multiclass, 267, 271 closed, 320 open, 266, 297 product-form, 281, 306 separable, 281 single class, 265, 268 with blocking, 547 Networks approximation, 379 product-form, 379 Newton-Raphson method, 404 Node types -/G/l-FCFS HOL, 495 -/G/l-FCFS PRS, 497 -/G/m-FCFS HOL, 495 -/G/m-FCFS PRS, 497 -/M/l-FCFS HOL, 471, 494 -/M/l-FCFS PRS, 471, 496 -/M/l-LBPS, 307 -/M/l-SIRO, 306 -/M/l-WEIRDP, 307 -/M/m-FCFS HOL, 495 -/M/m-FCFS PRS, 496 product-form, 282 Node, 263 Nominal service time, 352 Non-exponential distribution, 423 Non-lossy system, 251 asymmetric, 252 Non-product-form networks, 421 Normalization condition, 269, 271, 278, 280 constant, 281, 289, 301, 311, 313-314, 320, 360 Normalized blocking factor, 521 Normalized speedup, 521 Normalized standard deviation, 7 Normalized synchronization overhead, 521 Normalizing condition, 255 Norton’s theorem, 368 NPFQN, 3 n-step recurrence probability, 47 transition probabilities, 39
724
INDEX
Number
of jobs,
212-213,
30J
0 ODE, 189 One-limited service, One-step transition Optimization, 557 dynamic, 557 static, 557 Ordinary differential Output arc, 84 Overall throughput,
614 probabilities,
equation,
38
189
270, 273
P Pairwise lumpable, 169 Parallel processing, 507 with asynchronous tasks, 507 PASTA theorem, 343 Pdf, 8-9, 11, 13-14, 16, 19, 33, 36 PEPSY, 572 Percentiles, 73 Performability, 66, 68, 674 Performance evaluation, 620 measures, 212, 268, 277 Periodic, 47 Peripheral devices, 263 Petri net, 84 PFQN, 3, 273 Place, 84 Pmf, 5 PN, 84-85 states, 84 Poisson arrival streams, 301 process, 11, 214 Pollaczek-Khintchine formula, 224, 229 transform equation, 224 Polling, 113 asymmetric limited service, 114 exhaustive-service systems, 114 gated-service systems, 114 single-service systems, 114 symmetric limited service, 114 Population vector, 268, 300 Positive recurrent, 47 Power method, 133 Preemption, 242 Priority, 88, 621 class, 237 fixed, 244 function, 244, 246 networks, 470 queueing, 237 system, 237
static, 248 strategy mixed, 623 system, 244, 248 time dependent, 243 Private cache blocks, 668 Probability n-step recurrence, 47 density function, 8 generating function, 25 marginal, 269, 272, 315 mass function, 5 geometric, 214 of loss, 251, 256 one-step transition, 38 routing, 265-268, 279 steady-state, 214, 218, 221, 227, 268 transition, 49 unconditional state, 52 vector, 212 equilibrium, 53 steady-state, 53 waiting, 219, 231 Product-form, 283 approximation, 429 expression, 289 network, 278, 311, 379 node types, 282 queueing network, 398 multiclass, 399 single, 398 solution, 283, 311 Property local balance, 283 M =+ M, 283 Markov, 36, 49 memoryless, 37, 216 station-balance, 283 Pujolle method, 467
Q Queue length, 213 Queueing network product-form, 571 Queueing discipline, 210-211, 301 network, 263 closed, 279 closed tandem, 263 non-product-form, 571 system asymmetric, 250 symmetric, 250 systems elementary, 210 Queuing network
INDEX
Quota
central-server, nodes,
594 490
Race model, 86 Random selection, 250 Random variable Bernoulli, 5 binomial, 6, 26 Erlang, 20, 27 exponential, 20, 27 gamma, 20, 27 geometric, 6, 26 hyperexponential, 20, 27 hypoexponential, 20, 27 Poisson, 6, 26 statistically independent, 20 sum, 23 Random variables continuous, 22 Randomization, 104 Reachability graph, 98 Reachability set, 84 Reachable, 46, 54 RECAL algorithm, 311 method, 360 Recurrent non-null, 47 Recurrent null, 47 Recurrent state, 47 Recurrent positive, 47 Relative arrival rate, 266 Relative utilization, 266, 411 Relaxation parameter, 140 Remaining service time, 230 Repetitive blocking, 548 Residual state holding time, 55 Response time, 213, 217, 219 RESQ, 517 Reward rate, 66 expected instantaneous, 68 specification, 90 RG, 91 Robustness, 448 Routing matrix, 296, 298 probability, 265 load-dependent, 307, 326
RS, 84 S SCAT algorithm, 536 SCAT, 385 core algorithm, 386 extended algorithm,
393
multiple server node algorithm, 389 single server node algorithm, 385 Seek time, 502 Self-correcting approximation techniqu 385 Semi-Markov process, 91 Serialization, 505 Service blocking, 548 rate, 266 load-dependent, 326, 369 station, 263 load-dependent, 352 single, 263 time distribution, 227 generally distributed, 295 remaining, 223 Shadow equation, 491 model, 479 network, 479 node, 479 service times, 479 technique, 478, 491 extended, 623 Shared cache blocks, 668 SHARPE, 500, 507, 594 Short circuit, 369 Short circuit, 452 Simultaneous displacement, 137 Simultaneous resource possession, 497, Single class closed network, 313 Single server node, 269 Single station queueing system, 209 Single-buffer polling system, 98 SIRO, 306 Slow state, 191 SMP, 91 Sojourn time, 48 Solution technique numerical, 277 SOR, 140, 594 Speedup, 521 Split-merge system, 519 Splitting, 441 SPN, 579 SPNP, 579 iSPN, 585 Squared coefficient of variation, 17-19 Standard deviation, 7 State diagram, 279 State probabilities, 251-252, 255, 277 State space, 36, 296 State transition diagram, 38, 251, 277 Stationary probability vector, 41
725
e,
501
726
INDEX
Stationary, 41 Steady-state probability, 253, 283 probability vector, 53, 274-275 solution, 278 Stochastic process, 36 Stochastic transition matrix, 38 Storage hierarchy, 566 Strategy last batch processor sharing (LBPS), WEIRDP, 307 Substitute network, 452 Success over-relaxation, 140 SUM, 397, 405, 534 Summation method, 397, 463 Surrogate delays, 504 Symmetric system GI/G/m, 249 M/M/m, 249 Synchronization overhead, 521 System availablity, 62 equation, 379 performance, 63 reliability, 63
Takahashi’s Method, 166 Tangible marking, 86 Task completion, 64 precedence graph, 517 Terminal system, 265, 513 Terminals, 265 Theorem arrival, 326, 343 BCMP, 300-301 Gordon/Newell, 292 Jackson’s, 632, 284 Little, 213 Little’s, 271 Norton’s, 368 PASTA, 343
Theorems
307
central limit, 24 Throughput, 213, 266, 270, 272 Time-average accumulated reward, 68 Time-homogeneous, 37 Timed transition, 86 Token rotation time, 614 Token, 84 TPM, 43 Traffic equations, 266, 286, 432, 614 Transfer blocking, 548 time, 502 Transient state, 47 Transient state probabilities, 41 Transient uniformization, 184 Transition, 84 enabled, 84 fire, 84 immediate, 86 probability, 49 rates, 50 timed, 86 Tree-convolution, 326
U Uniformization, Unit vector, UNIX, 620 Upper time Utilization,
104 53 limit, 212,
246 221, 269,
272
VW Vanishing marking, 86 Variabilities in workload, Variance, 7, 9, 11, 13-14, Visit ratio, 266, 289, 297 Wafer production system, Waiting time, 213 WAN,
610
XYZ z-transform,
25,
27
418 16-20, 639
214,
217