Table of Contents
Preface
............................................................................................................
Chapter 1: The Concurrent Computing Landscape 1.1 1.2 1.2.1 I 2.2
1.2.3 1.3 1.4 1.5 1.6 1.7 1.8
1.9 1.9.1 1 .9.2
.
1
The Essence of Concurrent Pr-ogrammiilg ...................... .... 2 Hardware Architectures ..................................................... Processors and Caches ....................................................... 4 Shared-Me1nor.y Multiprocessors .......................................... 6 Distributed-Memory Multicomputers and Networks ............. 8 Applications and Progralnlning Styles ........................... ... 10 Iterative Parallelism: Matrix Multiplication ........................... 13 Recursive Parallelism: Adaptjve Quadrature ......................... 17 Producers and Consumers: Unix Pipes ................................ 19 Clients and Servers: File Systems ......................................... 27 Peers: Distributed Matrix Multiplication .............................. 23 Summary of Programming Notation .............................. ..... 26
Declarations ......................................................................... 26 Sequential Statements ............................................................ 27 1.9.3 Concurrent Statements, Processes, and Procedures ............... 29 1.9.4 Colnments ............................................................................ 31 .................................................. 31 Historical Notes .......................... ..... References ................................................................................................33 ..................................................................... 34 Exercises ...................... .
xiv
Table of Contents
Part 1: Shared-Variable Programming Chapter 2: Processes and Synchronization
...................
................... 41 ............................. 42
States. Actions. Histories. and Properlies Parallelization: Finding Patterns in a File ............................. 44 Synchronization: The Maximum of an Array ........................48 Atomic Actions and Await Statements ................................. 51 Fine-Grained Atomicity ....................................................... 51 2.4.2 Specifying Synchronization: The Await Statement ............... 54 2.5 Produce~/ConsurnerSynchronization .................................... 56 2.6 A Synopsis of Axiomatic Semantics ...................................... 57 2.6. 1 Fol.mai Logical Systems .......................... . ....................58 2.6.2 A Programming Logic ........................................................... 59 2.6.3 Semantics of Concurrent Execution ....................................... 62 ................................... 2.7 Techniques fool- Avoiding Interference 65 2.7.1 Disjoint Variables ........................... . . .............................. 65 2.7.2 Weakened Assertions ...........................................................66 2.7.3 Global Invariants ..................................................................68 2.7.4 S y~ichronizatjon ..................................................................... 69 2.7.5 An Example: The An-ay Copy froblern Revisited ................. 70 2.8 Safety and Liveness Properties ............................................ 72 2.8.1 Proving Safety Properties ............................................... 73 2.8.2 ScheduJiog Policies and Fairness ....................................... 74 Historical Notes .................................. ..... ................................................. 77 References ................................................................................................. 80 Exercises ................................................................................................... 81
2.1 2.2 2.3 2.4 2.4.1
Chapter 3 : Locks and Barriers 3.1 3.2 3.2.1 3.2.2 3.2.3 3.3 3.3.1 3.3.2 3.3.3 3.4 3.4.1 3.4.2
...................................................
93
The Critical Secrion Problem ................................................94 Critical Sections: Spin Locks .............................................. 97 Test and Set ................................... .........................................98 Test and Test and Set .......................................................... 100 Implementing Await Statements ......................................... 101 Critical Sections: Fair Solutions ............................................ 104 Tfie Tie-Breaker Algorithm ................................................ 104 The Ticket Algorithm ........................................................... 108 The Bakery Algorithm ........................................................... 11I. ............................... .................... Barrier Synchronizatjon . . 115 Shqred Counter ...................................................................... 116 Flags and Coordinators .......................................................... 117
Table of Contents
xv
Symmetric Barriers ............................................................. 120 3.5 Data Paallel Algorithms ............................................... 124 3.5.1 Parallel Prefix Computations ................................................ 124 3.5.2 Operations on Linked Lists ................................................. 127 3.5.3 Grid Compntations: Jacobi Iteration ................................. 129 3.5.4 Synchronous Multiprocessors ............................................ 131 Paral.l.el Computing with a Bag of Tasks ................... ...... ...132 3.6 3.6.1 Matrix Multiplication ......................................................... 133 3.6.2 Adaptive Quadrature ............................................................ 134 ......................................................... 135 Historical Notes .......................... . References ................................................................................................. 139 Exercises ................................................................................................... 141 3.4.3
Chapter 4: Semaphores
............................ ......................................... 153 4. I Syntax and Semantics ........................................................ 154 156 4.2 Basic Problems and Techniques ............................................. 4.2.1 Critical Sections: Mutual Exclusion ................................. 156 B ....tiers: Signaling Events ..................................................... 15G 4.2.2 4.2.3 Producers and Consumers: Split Binary Semapl~ores ............ 158 4.2.4 Bounded Buffers: Resource Counting ................................. 160 4.3 The Dining Philosophers ........................................................ 164 4.4 Readers and Writers ............................................................... 166 4.4.1 ReaderslWriters as an Exclusion Problem ............................. 167 4.4.2 Readerstwriters Using Condition Synchronization ............... 169 The Technique of Passing the Baton .................................. 171 4.4.3 4.4.4 Alternative Scheduling Policies ........................................ 175 4.5 Resource Allocation and Scheduling ..................................... 178 4.5.1 Problem Definition and General Solution Pattern ................. 178 4.5.2 Shortest-Job-Next Allocation ............................................. 180 4.6 Case Study: Pthreads .............................................................. 184 4.6.1 Thread CI-eation .................................... .............................185 4.6.2 Semaphores ......................................................................... 186 4.6.3 Example: A Simple Producer and Consumer ........................186 his to^.ical Notes ..................................................................................... 188 190 References ................................................................................................. E~ercises ................................................................................................... 191
Chapter 5: Monitors 5.1 5 . 1. 1
...................... . . ................................................203 Syntax and Semanlics ............................................................ 204 Mutual Exclusion ............................................................ 206
xvi
Table of Contents
Condition Variables ................................................................ 207 Signaling Disciplines ............................................................. 208 5.1.4 Additional Operations on Condition Variables ...................... 212 5.2 Synchronization Techniques .................................................. 213 5.2.1 Bounded Buffers: Basic Condition Synchronization .............213 5.2.2 Readers and Writers: Broadcast Signal .................................. 215 5.2.3 Shortest-Job-Next Allocation: Priority Wait ....................... ...217 5.2.4 Interval Tii11er: Covering Conditiolls ................................... 218 ......................................... 5.2.5 The Sleeping Barber: Rendezvous 221 5.3 Disk Scheduling: Program Structures .................................... 224 5.3.1 Using a Separate Monitor ...................................................... 228 5.3.2 Using an Intermediary ............................................................ 230 5.3.3 Using a Nested Monitor .........................................................235 5.4 Case Study: Java ....................................................................237 5.4.1 The Tl~readsClass .................................................................. 238 5.4.2 Synchonized Methods ........................................................... 239 5.4.3 Parallel ReadersIWriters ........................................................ 241 5.4.4 Exclusive ReadersNriters .....................................................243 5.4.5 True ReadersIWriters ............................................................. 245 5.5 Case Study: ftheads .............................................................. 246 5.5.1 Locks and Condition Variables ..............................................246 5.5.2 Example: Summing the Elements of a Matrix ....................... 248 Historical Notes ........................................................................................ 250 References ................................................................................................. 253 Exercises ................................................................................................... 255
5 . 1.2 5.1.3
Chapter 6: Implementations
...........................................................
265
A Single-Pi-ocessor Kernel ..................................................... 7-66 6.2 A Multiprocessor. Kernel ........................................................ 270 6.3 Implementing Semaphores in a Kernel .................................. 276 6.4 Impleinenting Monitors in a Kernel .......................................279 6.5 Implementing Monitors Using Semaphores ..........................283 HistoricaI Notes ........................................................................................ 284 References ............................................................................................... -.' X F , Exercises .................................................................................................... 137
6. I
Part 2: Distributed Programming Chapter 7 : Message Passing 7.1
..................................
...................... ...................... . 29s ............................................296
Asynchronous Message Passing
Table of Contents
xvii
7.2 Filters: A Sor-ring Network .....................................................298 7.3 Clients and Servers ................................................................302 7.3.1 Active IVlonitors ..................................................................... 302 7.3.2 A Self-scheduling Disk Server ............................................ 308 7.3.3 File Servers: Conversational Continuity ................................ 311 7.4 Interacting Peers: Exchanging Values .................................... 314 7.5 Synchronous Message Passing .............................................. 318 Case Study: CSP .................................................................. 320 7.6 7.6.1 Corn~nunicationStatements ...................................................321 323 7.6.2 Guarded Communication ....................................................... 7.6.3 Example: The Sieve of Eratosthenes ......................... . . ......326 7.6.4 Occam and Modern CSP .................................................... 328 7.7 Case Study: Linda .................................... .. .......................... 334 7.7.1 Tuple Space and Process Interaction .............................. .......334 7.7.2 Example: Prime Numbers with a Bag of Tasks .....................337 7.8 Case Study: MPI .................................................................. 340 7.8.1 Basic Functions ................................................................... 341 7.8.2 Global Communication and Synchronization ........................ 343 7.9 Case Study: Java .................................................................... 344 7.9.1 Networks and Sockets ...........................................................344 7.9.2 Example: A Remote File Reader ......................................... 345 Histol-icai Notes ...................................................................................... 348 References ...................... ................................................................... 351 Exercises ................................................................................................. 353
Chapter 8: RPC and Rendezvous 8.1 8.1. 1 8.1.2 8.1.3 8.1.4 8.1.5 8.2 8.2.1 8.2.2 8.2.3 8.2.4
8.3 8.3.1 8.3.2
...........................................
361
Remote Procedure Call ......................................................... 362 Synchronization in Modules .......................................... 364 A Time Server ..................................................................... 365 Caches in a Distributed File System ......................................367 A Sorting Network of Merge Filters ...................................... 370 Interacting Peers: Exchanging Values .................................... 371 Rendezvous ............................................................................ 373 Input Statements .....................................................................374 ClienZ/Server Examples .........................................................376 A Sorting Network of Merge Filters ......................................379 Interacting Peers: Exchanging Values ....................................381 A Multiple Primitives Notation ........................................... 382 lnvoking and Servicing Operations ...................................... 382 Examples ..................................... ...... .....................................384
Table of Contents Readers~WritersRevisited ................................................ 386 Encapsulated Access ..............................................................387 Replicated Files ...................................................................... 389 Case Study: Java .................................................................. 393 Remote Method Invocation ............................................... 393 Example: A Remote Database ............................................... 39.5 8.6 Case Study: Ada ................................................................... 397 .................................................398 8.6.1 Tasks ............................ . . 8.6.2 Rendezvous ......................................................................... 399 5.6.3 ProteccedTypes ...................................................................... 401 8.6.4 Example: The Dining Philosophers .......................................403 8.7 Case Study: SR ...................................................................... 406 8.7.1 Resources and Globals ........................................................... 406 Comrnunication and Synchronjzation ....................................405 8.7.2 8.7.3 Example: Critical Section Silnulation ....................................409 Historical Notes ....................................................................................... 411 References ............................................................................................... 41.5 Exercises .................................................................................................. 416 5.4 8.4.1 8.4.2 8.5 8.5. 1 8.5.2
Chapter 9: Paradigms for Process Interaction 9.1 9. I . 1 9.1.2 9.2 9.2.1. 9.2.2 9.3 9.3.1 0.3.2 9.4 9.4.1 9.4.2 9.5 9.5.1 9.5.2 9.6 9.6.1 9.6.2 9.6.3 9.7
............ 423
Manager/Worlzers (Distributed Bag of Tasks) .......................424 Spai-se Matrix Multiplication .................................................424 Adaptive Quadrature Revisited ..............................................428 Heartbeat Algorithms ............................................................. 430 Image Processing: Region Labeling ......................................432 Cellular Automata: The Game of Life ...................................435 Pipeline Algorithms ............................................................... 437 A Distribu~edMatrix Multiplication Pipeliile ........................438 Matrix Multiplication by Blocks ............................................441 ProbeEcho Algorithms .......................................................... 444 Broadcast in a Network ..........................................................444 Computing the Topology of a Network ................................. 448 Broadcast Algorithms ............................................................ 451 Logical Clocks and Event Ordering .......................................452 Distributed Semaphores .........................................................454 Token-Passing Algoritl~~ns .....................................................457 Distributed Mutual Excl~~sion ................................................457 Termination Derectio~zin a Ring ............................................460 Termination Detection in a Graph .......................................... 462 Replicated Servers .................................................................465
Table of Contents
xix
9.7.1 Distributed Dining Philosophers ............................................ 466 9.7.2 Decentralized Dining Philosophers ........................................407 Historical Notes ........................................................................................ 471 474 References ................................................................................................. 477 Exercises ...................................................................................................
Chapter 10: Implementations
.................... .....
.................. 487 10.1 Asynchronous Message Passing.................................................488 10.1. I Shared-Memory Keniel ..........................................................488 10.1.2 Distributed Kernel ..................................................................491 10.2 Synchronous Message Passing ..............................................496 10.2.1. Direct Communication Using Asynchronous Messages ........497 10.2.2 Guarded Com~nunicationUsing a Clearinghouse ..................498 RPC and Rendezvous ............................................................. 504 10.3 10.3.1 RPC in a Kernel ................................................................... 504
10.3.2 Rendezvous Using Asynchronous Message Passing .............507 509 10.3.3 Multiple Primitives in a Kernel ........................................ 10.4 Distributed Shared Memory ................................................. 515 10.4.1 Implementation Overview ................................................. 516 10.4.2 Page Consistency Protocols .................................................. 518 Historical Notes ....................................................................................... 520 References ................................................................................................. 521 Exercises ............................... . . .............................................................522
Part 3: Parallel Programming
.............................................527
Chapter 11: Scientific Computing
......................................... 533
11.1 11.1.1 11.1.2 11 . I . .3 I 1.1.4 11.1.5 1 1.1.6 11.2 I 1.2.1 1 1.2.2 11.2.3
Grid Cornputations .................................................................534 Laplace's Equation ............................................................... 534 Sequential Jacobi Iteration .....................................................535 Jacobi Iteration Using Shared Variables ................................ 540 Jacobi Iteration Using Message Passing ................................541 Red/Black Successive Over-Relaxation (SOR) .....................546 Multigrid Methods ................................................................. 549 Particle Computations .......................................................... 553 The Gravitatioilal N-Body Problem .......................................554 Shared-Variable Program .......................................................555 Message-Passing Programs .................................................... 559
xx
Table of Contents
....................................................569 ............................................................ 573 11.3.1 Gaussian Elimination ............................................................ 573 11.3.2 LU Decomposition ............................................................. 575 1 1.3.3 Shared-Variable Program .................................................... 576 1 1.3.4 Message-Passing Program .................................................... 581 Historical Notes ...................... ...... .................................................... 583 References ................................................................................................584 Exercises ............................................................................................... 585
1 1.2.4 Approximate Methods 11.3 Matrix Computations
Chapter 12: Languages. Compilers. . ..................... 591 Libraries. and Tools .................... 12.1 Parallel Programming Libraries .............................................592 12.1.1 Case Study: Pthreads ..............................................................593 12.1.2 Case Study: MPI .................................................................... 593 12.1.3 Case Study: OpenMP ............................................................. 595 12.2 Parallelizing Compilers ...................................................... 603 12.2.1 Dependence Analysis ......................................................... 604 12.2.2 Prograrn Transfo~mations ...................................................... 607 12.3 Languages and Models ..................................... .....................614 12.3. 1 Imperative Languages .................................. .......................... 616 12.3.2 Coordination Languages ........................................................ 619 12.3.3 Data Parallel Languages ........................................................ 620 12.3.4 Functional Languages ............................................................ 623 12.3.5 Abstracr Models ..................................................................... 626 629 12.3.6 Case Study: High-Performance Fortran (HPF) ...................... ...... .................633 12.4 Parallel Prograinming Tools .................... 12.4.1 Pel-formance Measurement and Visualization .......................633 12.4.2 Metacornpilters and Metacornpuling ......................................634 12.4.3 Case Study: The Globus Toolkit ............................................ 636 Historical Notes ........................................................................................ 638 642 References ................................................................................................. 644 Exercises ...................................................................................................
Glossary lndex
............................................................................................................. 647
.......................................................................................................................
657
The Concurrent Computing Landscape
Imagine the following scenario: Several cars want to drive from point A to point B. They can compete for space on the same road and end up either following each other or competing for positions (and having accidents!). Or they could drive in parallel lanes, thus arriving at about the same time wjthout getting in each other's way. Or they could travel different routes, using separate roads. This scenario captures the essence of concurrent computing: There are in~~ltiple tasks that need to be done (cars moving). Each can execute one at a time on a single processor (road), execute in parallel on multiple processors (lanes in a road), or execute on distributed processors (separate roads). However, tasks often need to synchronize to avoid collisions or to stop at traffic lights or. stop signs. This book is an "atlas" of concurrent conlputing. It examines the kinds of cars (processes), the routes they might wish to travel (applications), the patterns of roads (hardware), and the rules of the road (commun.icatioo ao.d s y n c h o ~ ~ i z a rion). So load up the tank and get started. In this chapter we describe the legend of the concul~-entprogramming map. Section 1 .I inlroduces fundamental concepts. Sections 1.2 and 1.3 describe the kinds of hardware and applications that make concurrent programming both interesting and challenging. Sections 1.4 through 1.8 describe and illustrate five recurring programming styles: iterative parallelism, recursive parallelism, producers and consumers, clients and servers, and interacting peers. The last section defines the programming notation that is used in the examples and the remainder of the book. Laler chapters cover applications and programming techniques i n detail. The chapters are organized into three parts-shared-variable programming,
2
Chapter 1
The Concurrent Computing Landscape
distributed (message-based) programming, and parallel programming. The introduction to each part and each chapter gives a road map, summarizing where we have been and where we are going.
1. I The Essence of Concurrent Programming A concurrent program contains two or more processes that work together to perform a task. Each process is a sequential program-namely, a sequence of stareinents that are executed one after another. Wilereas a sequential program has a siizgle thread of control, a concurrent prograin has mulriyie rl7.reads of corztrol. The processes in a concurrent program work together by cornnz~inicatirzg with each other. Communication is programmed using shared variables or message passing. When shared variables are used, one process writes into a variable that is read by another. When message passing is used, one process sends a message that is receked by another. Whatever the form of communication, processes also need to syizchrorzize wit11 each ocher. There are two basic lunds of synchronization: mutual exclusion and condition synchronization. Murual exclusion is the problem of ensuring that critical sectiorzs of statements do not execute at the same time. Condition synchr-orzization is the problem of delaying a process until a given condition is true. As an example, communication between a producer process and a consumer process is often implemented using a shared memory buffer. The producer writes into the buffer; the consumer reads from the buffer. Mutual exclusion is required to ensure that the producer and consumer do not access the buffer at the same time, and hence that a partially written message is not read prematurely. Condition synchronization is used to ensure that a message is not read by the consumes until after it has been written by the The history of concurrent programming has followed the same stages as other experirnenral areas of computer science. The topic arose due to opportunities presented by hardware developments and has evolved in response to technological changes. Over time, the initial, ad hoc approaches have coalesced into a collection of core principles and general programrning techniques. C o n c u ~ ~ e programming ~lt originated in the 1960s within the context of operating systems. The motivation was the invention of hardware units called ch~lnnelsor device controlier,~.These operate independently of a controlling processor and allow an YO operation to be carried out concurrently with continued execution o.f program instructions by the central processor. A channel communicates with the central processor by means of an ilzierr~~pf-a hardware signal that says "stop what you are doing and start executing a different sequence of instructions."
1.1 The Essence of Concurrent Programming
3
The programming challeiige-indeed the intellectual challenge-that resulted from the introduction of channels was that now parts of a pl-ogram could execute in an unpredictable order. Hence, if one part of a program is updating the value of a variable, an intell-upt might occur and Lead to another part of the program trying to change the value of the variable. This specific problem came to be known as the critical section pr.oblem, which we cover in derail in Chapter 3. Shortly after the iotroduction of channels, hardware designers conceived of multiprocessor machines. For a couple decades these weIe too expensive to be widely used, but now all large machines have ~nultipleprocessors. Indeed, the largest machines have hundreds of processors and a e often called ~ n a s s i v e l ~ ~ parallel process0~-s,or MPPs. Even most personal computers will soon have a few processors. Multiprocessor macIGnes allow diflerent application programs to execute at the same time on different processors. They also allow a ,single application program to execute faster if it can be rewritten to use multiple processors. But how does one synchronize the activity of concurrent processes? How can one use multiple processors to make an application run faster? To summarize, channels and ~nultiprocessvrsprovide both opportunities and challenges. When writing a concurrent program one has to make decisions about what kinds of processes to employ, how many to use, and how they should interact. These decisions are affected by the application and by the underlying hardware on which the program will run. Whatever choices are made, the key to developing a correct program is to ensul-e that process interaction is propei-iy synchronized. This book exami,ries all aspects of concurrent. programming. However, we focus on i~lzperativepi-ogr~lmnswith explicit concurrency, communication, and synchronization. I n particular, the programmer has to specify the actions of each process and how they commuiiicate and synchronize. This contrasts with declnrcltive pro,orunzs-e.g., functional or logic programs-in which concurrency is implicit and there is no reading and writing of a program state. In declarative programs, independent parts of the program may execute in parallel; they communicate and synchronize implicitly when one part depends on the results produced by another. Although the declarative approach is interesting and iinporrant-see Chapter 12 for more information-the imperathe approach is much more widely used. In addition, to implenient a declarative program on a traditional machine, one has to write an imperative program. We also focus on concurrent prograins in which process execution is a.synclzr-onous-namely, each process executes at i.ts own rate. Such programs can be executed by interleaving the processes on a single processol- or by executing the processes in parallel on a multiple instruction stream, multiple data stream (MIMD) multiprocessor. This class of machines includes shared-memory
4
Chapter 1
The Concurrent Computing Landscape
multiprocessors, distributed-memory multicomputers, and networks of workstations-as described in the next section. Altl~ougl~ we focus on asynchronous multiprocessing, we describe synchrorzo~ismultiprocessing (SIMD machines) in Chapter 3 and h e associated data-parallel programming style jn Chapters 3 and 12.
1.2 Hardware Architectures This section summarizes key attributes of modern computer architectures. The next section describes concurrent programming applications and how they make use of rhese architectures. We first describe single processors and cache memories. Then we examine shared-memory multiprocessors. Finally, we describe distributed-memory machines, which include multico~nputersand networks of machines.
1.2.1
Processors and Caches A modern single-processor. machine contains several components: a central processing unit (CPU), primary memory, one or more levels of cache memory, secondary (disk) storage, and a variety of peripheral devices-display, keyboard, mouse, modem, CD, printer, etc. The key components with respect to the execution of programs are the CPU, cache, and memory. The relalion between these is depicted in Figure 1.1. The CPU fetches instructions from lnernoiy and decodes and executes them. It contains a control unit, arithmetic/logic unit (ALU), and registers. The control unit issues signals that control the actions of the ALU, menlory system,
Primary memory
c=' w Level 2 cache
Level 1 cache
Figure 1.1
Processor, cache, and memory in a modern machine
1.2 Hardware Architectures
5
and external devices. The ALU implements the arjllnmetic and logical instructions defined by the processor's instruction set. The registers contain instructions, data, and the machine state (including the program counter). A cache is a sn~all,fast memory unit that is used to speed up program execution. It contains the contents of those ineruory locations that have recently been used by the processor. The rationale for cache memory is that most programs exhibit ferr~porai!Localio:, meaning that once they reference a memory location they are likely to do so again in the near fi~ture.For example, instructions inside loops are fetched and executed many times. When a program attempts lo read a memory location, the processor first looks in the cache. If the data is there-a cache hit-then it is read from the cache. If the data is not there-a cache r711,ss-then the data is read from primary memory into both the processor and cache. Similarly, when data is written by a program, it is placed in the local cache and in the primary nlelnory (or possibly just in the pri~narymernory). In a write tlw-o~ighcaclze, data is placed i n memory immediately; in a write back cache, data is placed 11-1 memory later. The key point is that after a write, the contents of the primary memory might temporarily be inconsistent with the contents of the cache. To increase the transfer rate (bandwidth) between caches and primary memory, an entry in a cache typically contains multiple, C O I I ~ ~ ~ U Owords US of memory. These entries are called cache lines or blocks. Whenever there is a cache miss, the entire contents of a cache line are transfel-red between the memory and cache. ~, that This is effective because most pr0gram.s exhibit spatitrl l o c a l i ~ meaning when they reference one mernory word they soon reference nearby memory words. Modern processors typically contain two levels of cache. The level 1 cache is close to the processor; the level 2 cache is between the level 1 cache and the primary memory. The level 1 cache is smaller and faster than the ].eve1 2 cache, and it is often organized differentlp For example, the level 1 cache is often direcr rncrpped and the level 2 cache is often set a-ss0ciative.l Moreover, the level 1 cache often contains separate caches for instructions and data, whereas the level 2 cache is commonly unified, meaning that it contains borh instructions ai~dclata. To illustrate the speed differences between the levels of the memory hierarchy, registers can be accessed in one clock cycle because they are small and internal to the CPU. The contents of the level 1 cache can typically be accessed in one or two clock cycles. 111 contrast, it takes on the order of 20 clock cycles to
'
In a direct-rnapped cache, each niernoi-y address maps into one cnclie entry. In a set-sssocintive cache, each Inemor?, address maps into a set of cache entries; rhe set size is typically two or- ft~ur.Thus, it two rrlsmorp addresses map iniu rhe same location, or~lythe most r c c c ~ i l l yreferenced locntion can be i n n direct-niilppcd cache, whereas both locariu~iscan be in n set-associalive cache. On the other hand, a direcl-mapped cache is faster becausc it i s easier lo decide i f a word is in the cachc.
6
Chapter 1
The Concurrent Computing Landscape
access the level 2 cache and perl~aps50 to 100 clock cycles to access primary memory. There are sirnilar size differences in the memory hierarchy: a CPU has a few dozen registers, a level 1 cache contains a few dozen kilobytes, a Level 2 cache coi~rainson the order of one megabyre, and primary memory contains tens to hundreds of megabytes.
1.2.2 Shared-Memory Multiprocessors In a shared-rnenzory m~tltiprocessor,the processors and memory modules are connected by means of an intercorznecziorz ~zehuork,as shown in Figure 1.2. The processors share the primary memory, but each processor has its own cache memory. On a small ~nultiprocessorwith from two to 30 or so processors, the inrerconnection network is implemented by a memory bus or possibly a crossbar switch. Such a multiprocessor is called a UMA 111achine because there is a uniform memory access time between eveiy processor and every memoly location. UMA machines are also called synzrtzetric multiprocessors (SMPs). Large shared-memory multiprocessors-those having tens or l~undredsof processors-have a hierarchically organized memory. In particular, the interconnection network is a a-ee-structured collection of switches and memoiies. Consequently, each processor has some memory that is close by and other memory that is farther away. This organization avoids the congestion that would occur if there were only a single shared bus or switch. Because it also leads to nonuniform mernory access times, such a ~nultiprocessoris called a NUMA machine. On both UMA and NUMA machines, each processor has its own cache. IF two processors reference different memory locations, the coctents of these locations can safely be in the caches of the two processors. However, problems can arise when two processors reference the same memory location at about the
I
I
Interconnection network
Figure 1.2
Structure of Shared-Memory Multiprocessors.
1.2 Hardware Architectures
7
same time. If both p~.ocessorsjusc read the same location, each can load a copy of the data into its local cache. But if one processor writes into the location, there is a coche corzsistencj~problem: The cache of the other processor no longer contains the correct value. Hence, either the other cache needs to be updated with the new value or the cache entry needs to be invalidated. Every rnultiproce~ssor has to implement a cache consistenc)~protocol in hardware. One inethod is lo have each cache "snoop" on the memory address bus, looking for references to locations in its cache. Writing also Jeads to a ~nenzaryconsisfe~zcyproblem: When is primary memory aclually updated? For example, if one processor writes into a memory location and anott~ei-processor later reads from that memory location, is the second processor guaranteed to see the new value? There are several different memory consistency models. Seg~tenfialconsiste~zcyis the strongest model; it guarantees that memoly updates will appear to occur in some sequential order, and that every processor will see the same order. Pi.ocessor corr.sistel7cy is a weaker model; it ensures that tlne writes by each processor wjll occur in inernory in the order they are issued by the processor, but writes issued by different p1.0cessors might be seein in different orders by other processors. Relecr.se co~zsistency is an even weaker model; it ensures only that the primary memory is updated at programmer-specified synchronizatjon points. The memory consistency problem presents tradeoffs between ease of programming and implen~entationoverhead. The programmer intuitively expects sequentia1 consistency, because a program reads and writes variables without regard to where the values are actually stored inside the machine. In particular, when a process assigns to a variable, the pr0gramme.r expects the result of that assignment to be seen immediately by every other process in the program. On the other hand, sequential consistency is costly to.implement and it makes a machine slower. This is because on each write, the'hardware has to invalidate or update other caches and to update primary memory; moi.eover, these actions have to be atoinic (indivisible). Consequently, multiprocessors typically implement a weaker ~nernoryconsistency model and leave to prograrnrners rhe task of inserting memory synchronization instructions. Compilers and Iib1,aries often take care of this so the application progrmlner does not have to do so. As noted, cache lines often contain multiple words that are transferred to and from lnernory as a unit. Suppose variables .K and y occupy one word each and that they are stored jn adjacent memory words that (nap into the same cache line. Suppose some process is executing on processor I of a nlultiprocessor and reads and writes va~iablex. Finally, suppose another process is executing on processor 2 and reads and writes variable y. Tllen whenever processor I accesses variable s,the caclne line on that processor will also contain a copy of variable y; a similar process will occur with processor 2.
8
Chapter I
The Concurrent Computing Landscape The above scenario causes what is called fcrlse sharing: The processes do not actually share variables x and y , but the cache hardware treats the two variables as a unit. Consequently, when processor 1 updates x . the caclie line containing both x and y in processor 2 has to be invalidated or updated. Similarly, when processor 2 updates y, the cache line containing .x and y in processor I lias to be invalidated or updated. Cache invalidations or updates slow down the memory system, so the program will run much more slowly than would be the case if the two variables were in separate cache lines. The key point is at the programmer would correctly think that the processes do not share variables when in fact the memory system has introduced sharing and overhead to deal with it. To avoid false sharing, the programmer has to ensure that variables that are written by di.Kerent processes are not in adjacent memory locations. One way to ensure this is to use padcling-that is to cleclare dummy variables that take up space and separate actual variabIes from each other. Th1.s js an instance of a time/space tradeof€: waste some space in order to reduce execution tirne. Tu summarize, rnultjprocessors employ caches to improve performance. However, the presence of a memory hierarchy introduces cache and memory consistency problems and the possibility of false sharing. Consequently, to get maximal performance on a given machine, the programmes has to know about the cl~al-acteristicsof the memory system and has to write programs to account for them. These issues will be revisited at relevant points in the remainder of the text.
1.2.3 Distributed-Memory Multicomputers and Networks In a clistribi.itecl-~rte~~zor~ nzultipmcessor, there is again an interconnection network, but each processor has its own private memory. As shown in Figure 1.3, the location of the interconnection network and memory mocJ.u-les are reversed relarive to their location in a shared-memory rnultiprocesso~~. The interconnection
/
Interconnection network
Cache
Figure 1.3
1
Cache
Structure of distributed-memory machines.
1.2 Hardware Architectures
9
network supports nzessuge ynssirzg rather than memory reading and writing. Hence, the processors cornrnunicale with each other by sending and receiving messages. Each processor has its own cache, but because memory is not shared there are no cache or memory consistency problems. A nzznlricor.rtputer-is a distributed-memory ~nultiprocessorin which the processors and network are physically close to each other (in the same cabinet). For this reason, a multicomputer is often called a fightill coupbed mnchirze. A multicomputer is used by one or at most a few applications at a time; each application uses a dedicated set of processors. The interconnection network provides a highspeed and high-bandwidth communication path between the processors. It is typically organized as a mesh or hypercube. (Hypercube machines were among the earliest examples of multicomputers.) A rtenv0r.k .system is a distributed-memory tnultiprocessor in which nodes are connected by a local area communicatio~inetwork such as an Ethernet or by a long-haul network such as the Internet. Consequently, network systems are called loosely cotiplccl multiprocessors. Again processors communicate by sending messages, but these take longer to deliver tkan in a rnulticornputer and there is Inore network contention. On rhe other hand, a network system is built out of co~nmodityworkstatiol~sand networks whereas a mulricoinputer often has custom components, especially for the interconnection network. A network system that consists of a collection of workstations is often called a nehvu~k,of wor+kssarions(NOW) or a clwter of ~vorksrations(COW). These workstarions execute a single application, separate applications, or a mixture of the two. A currently popular way to build an inexpensive distributedmemory multiprocessor is to construct whac is called a Beovvulf rnnchine. A Beowulf machine is built out of basic hardware and free software. such as Pentium processor chips, network interface cards, disks, and the Linux operating system. (The name Beowulf refers to an Old Engljsh epic poem, the first masterpiece of English literature.) Hybrid combinations of distributed- and shared-memory rnultjprocessors also exist. The nodes of the distributed-memory machine might be sbaredmemory multiprocessors rather than single processors. Or the interconnection network inight support both message passing and rnechanis~nsfor direct access to remote memory (this is done in today's most powerful machines). The most general combination is a machine that supports what is called distrib~itedslzured rnernory-namely, a distribured implementation of the shared-memory abstraction. This makes the machines easier to program for many applications, but again raises the issues of cache and memory consisrency. (Chapter 10 describes distiib~~ted shared memory and how to implement it entirely in software.)
10
Chapter 1
The Concurrent Computing Landscape
1.3 Applications and Programming Styles Concurrent program.ming provides a way to organize software that contains relatively independent parts. It also provides a way to make use of lnultiple processors. There are three broad, overlapping classes of applications-multithreaded systems, distributed systems, and parallel computations-and three corresponding kinds of concursent programs. Recall that a process is a sequential program that, when executed, has its own thread of control. Every concurrent program contains multiple processes, so every concul-rent program has multiple theads. However, the term multithreaded usually means that a program contains more processes (threads) than there are processors to execute the threads. Consequently, the processes take turns executing on the processors. A mul~ithreadedsoftware system manages multiple independent activities such as the following: window systems on personal computers or workstations; time-shared and multiprocessor operating systems; and real-time systems that control power plants, spacecraft, and so on. These systems axe written as rnultithreaded programs because it is much easier to organize the code and data structures as a collection of processes than as a single, huge sequential program. In addition, each process can be scheduled and executed independently. For example, when the user clicks the mouse on a persona! computer, a signal is sent to the process that manages the window into which the mouse cursently points. That process (thead) can now execute and respond to rhe mouse click. In addition, applications in other windows can continue to execute in the background. The second broad class of applications is distributed computing, in which components execute on machines connected by a local or global communicaeion network. Hence the processes communicate by exchanging messages. Examples are as follows: file servers in a network; database systems for banlung, airline reservations, and so on; Web servers on the Internet; enterprise systems that integrate components of a business; and fault-tolerant systems that keep executing despite component failures. Distribu.ted systems are written to off-load processing (as in file servers), to provide access to remote data (as in databases and the Web), to integrate and manage
1.3 Applications and Programming Styles
11
data that is inherently distributed (as in enterprise systems), or ro increase reliability (as in fault-tolerant systems). many distributed systems are orgallized as client/servel-systems. For example, a file server provides data files for processes executing on client machines. The components i n a distributed system are often themselves multitheaded programs. Parallel cornpuling is the third broad class of appljcations. The goal is to solve a given problem faster or equivalently to solve a larger problem in the same amount of time. Examples of parallel computations are as follows: scientific computations that model and simulate phenomena such as the g1obaI climate, evolution of the solar system, or effects of a new drug; graphics and image processing, including the creation of special effec~sin movies; and large combinatorial or optimization problems, such as scheduling an airline business or modeling the economy. These k.inds of computations are large and compute-intensive. They are executed on parallel processors to achieve high performance, and there usually are as many processes (threads) as processors. Pal.allel computations are wi-iwen as dcim parallel progr~wzs-in which each process does the same thing on its part of the data-or as tu-sk parallel y7-ogruins in which different processes carry oul: different tasks. We examine these as well as other applications in this book. Most importantly, we show how to program them. The processes (threads) in a ~nultithreaded program jnteract using shared variables. The processes i n a distributed system cornmunicate by exchanging messages or by invoking remote operations. The processes in a parallet computation interact using either shared variables or message passing, depending on the hardware on which the program will execute. Parr 1 in this book shows how to write programs that use shared variables for communication and syncl~ronizatiom.Part 2 covers message passing and remote operations. Part 3 exa~njnesparallel programming in detail, with an emphasis on scientific computing. Although there are Inany concurrent programming applications, they employ only a small number of solution patterns, or paradigms. In particular, there are five basic paradigms: (1) iterative parallelisln, (2) recursive parallelism, (3) producers and consumers (pipelines), (4) clients and servers, and (5) interacting peers. Applications are programmed using one or more of these. Iterarive parallelism is used when a program has several, often identical processes, each of which contains one or more loops. Hence, each process is an iterative program. The processes in the program work together to solve a single problem; they communicate and synchronize using shared variables or message
12
Chapter 1
The Concurrent Computing Landscape
passing. Iterative parallelism occurs ]nost freqi~enclyin scientific computations that execute on multiple processors. Recltrsive par.allelism can be used when a program has one or more recursive procedures and the procedure calls are independent, meaning that each worlcs on diffexent parts of the shared data. Recursion is often used in imperative languages, especially to implement divide-and-conquer or backtracking algorithms. Recursion is also the fundamental programming paradigm in symbolic, logic, and functional languages. Recursive parallelism is used to solve many co~nbinatorialproblems, such as sorting, scheduling (e.g., traveling salesperson), and game playing (e.g., chess). Prod~~cers and corzsl.rmers are communicating processes. They are often organized inro a pipeline through which inforination flows. Each process in a pipeline is a.flcer that consumes the output of its predecessor and produces ourput for its successor. Filters occur at the appljcation (shell) level in operating systems such as Unjx, within operating systelns themselves, and within application programs whenever one process produces output that is consumed (read) by another. Clierzls and senjers are the dominant interaction pattern in distributed systems, from local networks to the World Wide Web. A client process requests a service and waits to receive a reply. A server waits for requests from clients then acts upon them. A server can be implemented by a single process that handles one client request at a time, or it can be multithreaded to service client requests concurrently. Clients and servers are the concurrent programming generalization of procedures and procedure calls: a server is like a procedure, and clients call the server. However, when the client code and server code reside on different maclnines, a conventional procedure call-jump to subroutine and later return from subroutine-cannot be used. Instead one has to use a remote procedure call or a rerzdezvous, as described in Chapter 8. Interacting peers is the final interaction paradigm. It occurs in distributed programs when there are several processes that execute basically the same code and that exchange messages to accomplish a task. Interacting peers are used to implement distributed parallel programs, especially those with iterative parallelism. They are also used to implement decentralized decision making in distributed systems. We describe several applications and communication patterns in Chapter 9. The next five sections give examples that illustrate the use of each of these patterns. The examples also introduce the programming notation that we will be using throughout the text. (The notation is sumn~arizedin Section 1.9.) Many more examples are described in later chapters-either in the text or in the exercises.
1.4 Iterative Parallelism: Matrix Multiplication
13
1.4 Iterative Parallelism: Matrix Multiplication An iterative sequential program is one that uses for 01' w h i l e loops to examine data ancl compute results. An iterative parallel program contains two or more iterative processes. Each process computes results for a subset of tbe data, then the results are combined. As a simple example, consider rhe following problem from scientific cowputing. Given matrices a and b, assume that each matrix has n rows and columns, and that each has been initialized. The goal is to compute the matrix product of a and b, storing the result in the n x n matrix c. This requires computing n 2 jnner products, one for each pair of rows and columns. The rnatl-ices are shared variables, which we will declare as fallows: double a [n,n], b [n.nl , c [n,nl
;
Assuming n has already been declared and initialized, this allocates storage for three arrays of double-precision floating point numbers. By default, the indices for the rows and columns range from o to n- 1. After initializing a and b, we can compute the matrix product by the following sequential program: for [i = 0 to n-11 { f o r [j = 0 to n-11 ( # compute inner product of a [i,* ] and b [ * , j] c [ i , j ] = 0.0;
for [k = 0 to n-11 c [ i , j ] = c[i, j l c a [ i , k l * b l k , j l ;
1
I
Above, the outer loops (with indices i and j) iterate over each row and column. The inner loop (with index k) computes the inner product of row i of matrix a and column j of matrix b, and then stores the resu1.t in c [i,j 3 . The line that begins with a sharp character is a comment. Matrix multiplication is an example of what is called an embnrrnssi17.gly parallel cipplicntiorz, because there are a multitude of operations that can be executed in parallel. Two operations can be executed in parallel if they are independent. Assume that the read set of an operation contains the variables it reads but does not alter and that the write set of an operation contains the variables that it alters (and possibly also reads). Two operations are independent if the write set of each is disjoint from both the read and write sets of the other. Informally, it is always safe for two processes to read variables that do not change. However, it is generally not safe for two processes to write into the same variable, or for one
14
Chapter 1
The Concurrent Computing Landscape
process to read a variable that the other writes. (We will examine this topic in detail in Chapter 2.) For matrix multiplication, the computations of inner products are independent operations. In particular, lines 4 through 6 in the above prograin initialize an element of c then compute the value ol' that element. The innermost loop in the program reads a row of a and a column of b, and reads and writes one element of c. Hence, the read set for an inner product is a row of a and a column of b, and U1e write set is an element of c. Since the write sets for pairs of inner products are disjoint, we could compute all of then1 in parallel. Alternatively, we could con-lpute rows of results i n pacallel, columns of results in parallel, or blocks of rows or colurnns i n parallel. Below we show how to program these parallel computations. First consider cornpudng rows of c in parallel. This can be program.n~edas follows using the co (concurrent) stalement: co [i = 0 to 11-11 { # compute rows in parallel f o r [j = 0 t o n - 1 1 { c [ i , jl = 0 . 0 ; for [k = 0 to n - l l c [ i , j ] = c [ i , j ] + a[i,kl*bEk,j];
3
3
The only syntactic difference between this program and the sequential program is that co is used in place of for in the outermost loop. However, there is an important semantic difference: The co statement specifies that its body should be executed concun+ently-at least conceptualIy if not actually, depending on the number of processors-far each value of index variable i. A different way to parallelize matrix lnultiplication is to compute the columns of c in parallel. This can be programmed as n-11 { # compute c o l m s in parallel t o n-11 { 0.0; 0 to n-11 c [ i , j ] = c[i,jl + a [ i , k l * b [ k , j ] ;
co [ j = 0 to for [i = 0 c [ i ,j ] = for [k =
I Here the outer two loops have been intercl~nnged,meaning that the loop on i in the previous program has been interchanged with the loop on j . It is safe- to interchange two loops as long as the bodies are independent and hence compute the same results, as they do here. (We examine this and related lunds of prograrn transforn~ationsin Chapter 12.)
I .4 Iterative Parallelism: Matrix Multiplication
15
We can also compute all inner products in parallel. This can be programmed in several ways. First, we could use a single co statement with two indices: = 0 to n-1, j = 0 to 11-11 { # all rows and c C i , j l = 0.0; # a l l columns f o r [k = 0 t o n - I ] c [ i , j l = c [ i , j l + a [ i , k ] * b [ k , j];
co [i
1
The body of the above co statement is executed concurrently for each combination of values of i and j ; hence the program specilies n 2 processes. (Again, whether or not the processes execute in parallel depends on the underlying irnplementation.) A second way to compute all inner products in parallel is to use nested co statements: co [i = 0 to n-11 { # rows in parallel then co [ j = 0 to 11-11 { # columns i n parallel c[i, j ] = 0 . 0 ; f o r [k = 0 to n-11 c [ i , j] = c [ i , j ] + a[i,kIfb[k,j ] ; 1 1
This specifies one process for each row (outer co statement) and then one process for each colu~nn(inner co statement-). A third way to write the program would be to interchange the first rwo lines in the last program. The effect of all three programs is the same: Execute the inner loop for all n 2 combinations of i and j. The difference between the three programs is how the processes are specified, and hence when they are created. Notice that in all the concurrent programs above, all we have done is to replace instances of for by co. However, we have done so oilly for index variables i and j . What about the innermost loop on index variable k? Can this for statement also be replaced by a co statement? The answer is "no," because the body of the inner loop both reads and writes variable c [ i ,j 1 . It is possible to compute an inner product-the f o r loop on k-using binary parallelism, but this is impractical on most machines (see the exercises at the end of this chapter). Another way to specify any of the above parallel computations is to use a process declaration instead of a co statement. In essence, a process is a co statement that is executed "in the background." For example, the first concurrent program above-the one that computes rows of results in parallel-could be specified by the following program:
16
Chapter 1
The Concurrent Computing Landscape process r o w [ i = 0 t o n-11 { # rows i n p a r a l l e l f o r [ j = 0 t o n-13 { c [ i , j ] = 0.0; f o r [k = 0 t o n-11 c [ i , j ] = c[ij] + a [ i , k l * b [ k , jl;
1 3
This declares an array o f processes-row [ 1 I , r o w 12 I , etc.-one for each value of index variable i. The n processes are created and start executing when this declaration is encountered. If there are statements -Following the process declaration, they are executed concurrently with the processes, whereas any statements following a co statement are not executed until the co statement terminates. Process declarations callnot be nested within other declarations or statements, whereas co statements may be nested. (Process declarations and co statements are defined in detail in Section 1.9.) The above programs employ a process per element, row, or colu~nnof the result matrix. Suppose that there are fewer than n processors, which will usually be the case, especially if n is large. There is still a natural way to make full use of all the processors: Divide the result rnatrix into strj.ps-of rows or columnsand use one ~ ~ o r k eprocess r* per strip. In particular, each worker computes the results for the elements in its strip. For example, suppose there. are P processors and that n is a multiple of P (namely, P evenly divides n). Then if we use strips of rows, the worker processes can be programmed as follows: process w o r k e r [ w = 1 t o P I int first = (w-1) * n / P ;
{
i n t last = f i r s t + n / P - 1;
# strips in p a r a l l e l # f i r s t r o w of s t r i p # last row o f strip
[ i = f i r s t to l a s t ] { for [j = 0 to n-1] { c[i,j] = 0 . 0 ; f o r [k = 0 t o 11-11 c [ i , j l = c [ i , j ] + a[i,kl*b[k,jJ; 1
for
1
E
The difference between this program and the previous one is that the i rows have been divided into P strips of n/P rows each. Hence, the extra lines in the above program are the bookkeeping needed to determine the first and last rows of each strip and a loop (on i)to coinpute the inner products for those rows. To summarize: the essential requirement for being able to parallelize a program is to have independent computations-namely, computations that have
1.5 Recursive Parallelism: Adaptive Quadrature
17
disjoint write sets. For matrix multiplication, the inner products are independent computations, because each wrices (and reads) a diflerent element c [ i , j 1 of the result matrix. We can thus compute all inner products in parallel, all rows in parallel, all colun3ns in parallel, or strips of rows in parallel. Finally, the parallel programs can be written using co statements or process declarations.
1.5 Recursive Parallelism: Adaptive Quadrature A recursive program is one that contains procedures that call themselves-ejther directly or indirectly. Recursioi~is the dual of iteration, in the sense that iterative programs can be converted to recursive programs and vice versa. However, each programming style has its place, as some problems are naturally iterative and some are naturally recursive. Many recursive procedures call themselves more than once in the body of a procedure. For example, quicksort is a common sorting algorithm. Quicksort partitions an array into two parts and then calls itself twice: first to sort the left partition and then to sort the right partition. Many algorithms on trees or graphs have a similar structure. A recursive program can be implemented using concurrency whenever it has multiple, independent recursive calls. Two calls of a procedure (or function) arc independent if the write sets are disjoint. This will be the case if (1) the procedure does ~zotrefe ence global variables or only reads them, and (2) reference and result arguments, if any, are distinct variables. For example, if a procedure does not reference global variables and has only value parameters, then evely call of the procedure will be independent. (It is fine for a procedure to read and write local variables, be-cause each instance of che procedure will have its own private copy of local variables.) The quickso1-t algorithm can be programmed to meet these requirements. Another interesting example follows. The qttctdrat~ireproblern addresses the issue of approximating the integral of a continuous .Frtnction. Suppose f (x) is such a function. As illustrated in Figure 1.4, the integral of f (x)fronl a to b is the area between E ( x ) and rhe x axis from x equals a to x equals b.
Figure 1.4
The quadrature problem.
18
Chapter 1
The Concurrent Computing Landscape
There are two basic ways to approximate the value of an integral. One is to divide the interval from a to b into a fixed number of subintervals, and then to approximate the area of each subinterval by using something like the trapezoidal rule or Simpson's rule. double fleft = £(a), fright, area = 0.0; double width = (b-a) / INTERVALS; for [X = (a + width) to b by width] { fright = f(x); area = area + (£left + fright) * width / 2; fleft = fright; 1
Each iteration uses the trapezoidal rule to compute a small area and adds it to the total area. Variable width is the width of each trapezoid. The intervals move from left to right, so on each iteration the right-Fand value becomes the new leftInand value. The second way to approximale an integral is to use the divide-and-conquer paradigm and a variable number of subintervals. In pa~ticular,first compute the midpoint rn between a and b. Then approximate the area of three regions under the curve defined by function f ( ) : h e one from a to m, the one from m to b, and the one from a to b. If the sum of the two smaller areas is within some acceptable tolerance EP S I L O N of the larger area, then the approximation is considered good enough. If not, the larger problem-from a to b-is divided into two subproblems-from a to m and from m to b-and the process is repeated. This approach is called adaptive qnaclrnture, because the algorithm adapts to the shape of the curve. It can be programmed as follows: double quad(doub1e left,right,fleft,frigb.tI1rarea) double mid = (left + right} / 2; double fmid = £(mid); double larea = (fleft+fmid) * (mid-left) / 2; double rarea = (fmid+fright) * (right-mid) / 2; if (abs((larea+rarea) - Irarea) > EPSILON) { # recurse to integrate both halves Larea = quad(left, mid, fleft, fmid, larea); rarea = quad(mid, right, fmid, fright, rarea); 1 return (larea + rarea);
3
{
19
1.5 Recursive Parallelism: Adaptive Quadrature .
The integral of f ( x )from a to b is approximated by calling area = w a d ( a , b, f (a), f (b), (f(a)+ f (b)) * (b-a) / 2 )
;
The function again uses the trapezoidal rule. The values of f ( ) at the endpoints of an interval and the approximate area of that interval are passed to each call of quad to avoid con~pi~ting them more than once. The iterative program cannot be parallelized because the loop body both reads and writes the value of a r e a . However, the recursive program has independent calls of quad, assuming that function f ( ) has no side effects. In particular, the arguments to quad are passed by value and the body does not assign to any global variables. Thus, we can use a co statement- as follows ro specify that the recursive calls should be executed in parallel: co l a r e a = q u a d ( l e f t , m i d , £ l e f t , £mid, l a r e a ) ; / / r a r e a = quad(rnid, r i g h t , £mid, f r i g h t , rarea); OC
This is the only change we need to make to the recursive program. Because a co statement does not terminate until both calls have completed, the values of l a r e a and r a r e a will have been computed before quad returns their sum. The co statements in the matrix rnultiplicatjon programs contain lists of statements that are executed For each value of the quantifier variable (i or j ) . The above co statement contains two funcrion calls; these are separated by / / . The first form of co is used ro express iterative parallelism; rhe second form is used to express recursive paralleIisni. To summarize, a program with multiple recursive calls can readily be turned into a parallel recursive program whenever the calls are independent. However. there is a practical problem: there may be too much concurrency. Each co statement above creates two processes, one for each function call. If the depth of recursion is large, this will lead to a large number of processes, perhaps too many to be executed in parallel. A solution to rhis problem is to prune the recursion tree when there are enough pt-ocesses-namely, to switcli from using concurrent recursive calls to sequential recursive calls. This topic is explore,d in the exercises and later in this book.
1.6 Producers and Consumers: Unix Pipes A producer process computes and outputs a stream of results. A consumer process inputs and analyzes a stream of values. Many programs are producers and/or consumers ofone form or another. The cornbination becomes especially
20
Chapter 1
The Concurrent Computing Landscape
interesting when producers and consumers are connected by a pipeline-a sequence of processes in which each consumes the output of its predecessor and produces output for its successor. A classic example is Unix pipelines, which we examine, here. Other examples are given in later chapters. A Unix application process typically reads from what is called its standard input file, s t d i n , and writes LO what is called its standard output file, stdout. Usually, s t d i n is the keyboard of the terminal fi-om which an application is invoked and stdout js the display of that terminal. However, one of the powerful features introduced by Unix was the fact that the standard input and output "devices" can also be bound to different kinds of files. In particular, s t d i n and/or s t a o u t can be bound to a data file or to a special kind of "file" called a pipe. A pipe is a buffer (FIFO queue) between a producer and consumer process. It contains a bounded sequence of characters. New values are appended when a producer process writes to the pipe. Values are removed when a consumer process reads from the pipe. A Unix application program merely reads from s t d i n , withoi~tconcern for where the inpul is actually from. If s t d i n is bound to a keyboard, input is the cl~aracterstyped at the keyboard. If s t d i n is bound to a file, input is the sequence of c-haracters in the file. If s t a i n is bound to a pipe, input is the sequence of cl~aracterswritten to that pipe. Similarly, an application merely writes to s t d o u t , without concern for where tlne output actually goes. Pipelines in Unix are typically specified using one of the Unix command languages, such as csh (C shell). As a specific example, the printed pages for this book were produced by a csh command sjrnilar to the following: sed - f S c r i p t $ *
I tbl I eqn I gsoff Macros -
This pipeline contains four commands: ( I ) sed, a stream editor; (2) tbl,a table processor; (3) eqn, an equation processor; and (4) grof f , a program lhac produces Postscript ourput from t r o f f source files. A vertical bar, the C slnell character for a pipe, separates each pair of commands. Figure 1.5 illustrates the structure of the above pipeline. Each of the cominands is aJlrel- process. The input to the sed filter is a file of editing coinmands (Script) and the command-line arguments ($*), wlzich for this book are
Source files + sed
Figure 1.5
A pipeline of processes.
postscript
1.7 Clients and Servers: File Systems
21
the appropriate source files for the text. The output froin sed is passed 10 tbr, which passes its output to eqn, which passes its output to g r o f f . The groff elter reads a file of Macros for the book and then reads and processes its standard input. The grof f filter sends its output to the printer in rhe author's office. Each pipe in Figure 1.5 is implernented by a bounded buffer: a synchronized, FIFO queue of values. A producer process waits (if necessary) rrntil there is room in the buffer, then appends a new line to the end of the buffer. A consumer process waits (if necessary) until the buffer contains a line, then removes the first one. In Part 1 w e show how to implement bounded buffers using shared variables and various synckronization primitives (flags, semaphores, ancl monitors). 111 Part 2 we introduce cornrnunication channels and the message passing primitives send and receive. We then show how to program filters using them and how to implement channels and message passing using buffers.
1.7 Clients and Servers: File Systems Between a producer and a consumer, there is a one-way flow nT info~mation. This h n d of interprocess relationship often occurs in concurrent progranls, bur it has no analog in sequential programs. This is because there is a single thread of control in a sequential program, whereas producers and consumers are inclependent processes with their own threads of control and their own. rates of progress. The clientlserver relationship is another common pattern in concurrent programs; indeed, it is the most common pattern. A client process requests a service, then waits for the request to be handled. A server process repeatedly waits for a request, handles it, then sends a reply. As illustrated in Figure 1.6, there is a two-way flow of information: from the client to the server and then back. The relationship between a client and a server is the concurrent p~.ogramrninganalog of the relationship between the caller- of a subroutine and the subroutine itself. Moreover, like a subroutine that can be called from many places, a server typically has many clients. Each client request has to be handled as an independent
-
Request
_--Figure 1.6
Clients and servers.
22
Chapter 1
The Concurrent Computing Landscape
unit, bur multiple requests might be handled concurrently, just as multiple calls of the same procedure might be active at tlie same time. Client/server inreractions occur within operating systems, object-oriented systems, networks, databases, and many other programs. A common example in all rhese applications is reading and writing a data file. In particular, assume there is a iile server module that provides two operatjons on a file: read and w r i t e . When a client process wanl-s to access the file, it calls the read or w r i t e operation in the appropriate file server module. On a single-processor or other sha-ed-memory system, a file server would typically be iinplen~entedby a collection of subroiltines (for read, w r i t e , etc.) and data structures that represent files (e.g., file descriptors). Hence, the interaction between a client process and a file would typically be implemented by subroutine calls. However, if a file is shared, it is probably important chat it be written to by at most one client process at a time. On the other hand, a shared file can safely be read concurrently by multiple clients. This kind of problem is an instance of what is called the readers/writers problem, a classic concusenc programming problem that is defined and solved in Chapter 4 and revisited in later chapters. In a distributed system, clients and servers typically reside on different machines. For example, consider a query on the World Wide Web-for example, a query that arises when a user opens a new URL within a Web browser. The Web browser is a client process that executes on the user's machine. The URL indirectly specifies another machine on which the Web page resides. The Web page itself is accessed by a server process that executes on the other machine. This server process may already exist or it inay be created; in either case it reads the Web page specified by the URL and returns it to the client's m.achine. l n fact, as the U R L is being translated, additional server processes could well be visited or created at intermediate machines along the way. Clients and servers are programmed in one OF two ways, depending on whether they execute on the same or on separate machines. In both cases clients are processes. On a shared-memory machine, a server is usually implemented by a collec~ionof subroutines; these subroutines are programmed using ini~tual exclusion and conditiorl synchronization to protect critical sections and to ensure that the subr-outines are executed in appropriate orders. On a distributed-memory or network machine, a server is implemented by one or more processes that usualIy execute on a different machine than the clients. In both cases, a server is often a multilhreaded program, with one thread per client. Parts 1 and 2 present numerous applications of clients and server, including file systems, databases, memory allocators, disk sclleduling, and two more classic pl-oblems-the dining philosophers and the sleeping barber. Part I shows how to
1.8 Peers: Distributed Matrix Multiplication
Data
f,)!?z< f Data
(a) Coordinator/worker interaction
Figure 1.7
23
(b) A circular pipeline
Matrix multiplication using message passing
i~nplernentservers as subroutines, using semaphores or monitors for synchronization. Part 2 shows bow to implement servers as processes that colnmunicate with clients using message passing, rernote procedure call, or rendezvous.
1.8 Peers: Distributed Matrix Multiplicafion We earlier showed how to implement parallel matrix multiplication using processes that share variables. Here we present two ways to solve the prohle~nusing processes thar cornmunicate by means of message passing. (Chapter 9 presents additional, n o r e soph.isticated algorithms.) The first program en~ploysa coordinator process and an array of independent worker processes. In the seconcl program, the workers are peer processes that interact by means of a circular pipeline. Figure 1.7 illustrates the structure of these process interaction patterns. They occur frequently i n distributed parallel computations, as we shall see in Part 2. On a distributed-memory machine, each processor call access only its own I.ocal memory. This means that a program cannot use global variables; instead every variable has to be local to some process or procedure and can be accessed only by that process or procedure. Consequently, processes have to use message passing to communicate with each other. Suppose as before that we want to compute the product of matrices a and b, storing the result in rnatrix c. Assume that a,b, and c are n x n matrices. Al.so assume for simplicity that there are n processors. We can then use an array of n worker processes, place one worker on each processor, and have each worker. compute one row of the resulr matrix c. A program for the workers follows: process worker[i = 0 to n-11 { double a En] ; # row i of matrix a double b [ n , n ] ; # all of matrix b double c [nl ; # row i of matrix c receive initial values for vector a and matrix b;
24
Chapter 1
The Concurrent Computing Landscape for [j = 0 to n-11 { c[j] = 0.0; for [k = 0 to n-11 c [ j l = c [ j l + aEk1 * btk,jl;
I send result vector c to the coordinator process; 1
Worker process i computes row i of result matrix c. To do so, it needs row i of source rnatrjs a and all of source matrix b. Each worker first receives rhese values from a separate coordinator process. The worker then coinputes its row of results and sends them back to the coordinator. (Alrernarively, the source matrices might be produced by a prior computation, and the resulr matrix might be input to a subsequent computation; this would be an example of a distributed pipeline.) The coordinator process initiates the computation and gathers and prints the results. In particular, the coordinator first sends each worker the appropriate row of a and all of b. Then the coordinator waits to receive a row of c fi-om every worker. An outline of the coordinator follows: process coordinator { double a[n,n]; # source matrix double b[n,n]; # source matrix double c [ n , n ] ; # result matrix initialize a and b; for [i = 0 to n-11 I send row i of a to worker [i]; send all of b to worker C i 3 ; 1 for [ i = 0 to n-11 receive row i of c from worker
a b
c
iI print the results, which are now in matrix c;
;
1
The sen& and receive statements used by the coordinator and workers are Jnessage-pa-ssing prirnifives. A send statement packages up a message and transmits i t to anorher process; a receive statement waits for a message froin another process, then stores it in local variables. We describe message passing in detail in Chapter 7 and use it to program numerous applications i n Parts 2 and 3. Suppose as above that each worker has one row of a and is to compute one row of C . I-Iowever, now assume tliat each worker Iias only one column of b at a time instead of the entire matrix. Initially, worker i has colurnn i of matrix b. With just this much of the source data, worker i can compute only the result for
iJ/
1.8 Peers: Distributed Matrix Multiplication c [ i ,i ].
In order for worker
i to compute all of row i of matrix c ? it has to
acquire all columns of matrix b. It can do so if we use a circulai pipeline-as illustrated in Figure 1.7 (b)-and circulate the colurnns among the worker pcocesses. 111 particular; we have each worker execute a series of I'OLL}~C~.S; in each round it sends its column of b to the next worker and receives a different column of b from the previous worker. The program follows: process w o r k e r [ i = 0 t o 11-11 { d o u b l e a [nl; # r o w i of m a t r i x a double b [ n ] ; # o n e column of m a t r i x b double c [nl ; # r o w i o f matrix c d o u b l e sum = 0 . 0 ; # storage for inner products # n e x t column of r e s u l t s i n t n e x t C o l = i; receive row i of~natrixa and column i of matrix b; # compute c [ i , i l = a[i,*l x b [ * , i I f o r [k = 0 t o 11-13 sum = sum + a r k ] * b [ k l ;
c [ n e x t C o l ] = sum; # c i r c u l a t e columns and compute r e s t of c [ i , * l f o r [ j = 1 to 11-11 { s e n d my column of b to the next worker; receive a new column of b from the previous worker; sum = 0.0; f o r [k = 0 t o n-11
sum = sum + a [ k l * b [ k l ; if
(nextCol == 0 ) n e x t C o l = n-1;
e 1 se c
nextCol = nextCol-1; [ n e x t C o l l = sum;
1
s e n d result vector c to coordjrlatoi-process; 1
Above the next worker is the one with the next higher index, and the previous worker is the one wid1 the next lower index. (For workel- n-1, rhe next worker is 0; for worker 0, the pl-evious worker is n-1.) The co1urn.n~of matrix b we passed circularly among the workers so that each worker e.ve~ztuatlysees every column. Variable n e x t c o l keeps track of where to place eacli inner product in vector c . As in the first computation, we assume that a coordinator process sends rows of a and colurnns of b to the workers and then receives rows of c from the workers. The second program elnploys an interprocess relationship that we call in.reracting peers, or silnply peers. Each worker executes the same algol.ithm and
I I
! I
26
Chapter 1
The Concurrent Computing Landscape
co~nmunicateswith other workers in order to compute its part of the desired result. We will see further examples of interacting peers in Parts 2 and 3. In some cases, as here, each worker communicates with just two neighbors; in other cases each worker communicates with all the others. In the first program above, the values in matrix b are replicated, with each worker process having its own copy. In the second program, each process has one row of a and one column of b at any point in time. This reduces the memory requirement per process, but the second program will take longer to execute than the ,first. This is because on each iteration of the second program, every worker process has to send a message to one neighbor and receive a message from another neighbor. The two psograms illustrate the classic rimelspace tradeoff in computing. Secrion 9.3 presents additional algorithms for distributed matrix multiplication that illustrate additional timelspace trade0.R~.
1.9 Summary of Programming Notation The previous five sections presented examples of recurring patterns in concurrent programming: iterative parallelism, recursive parallelism, producers and consumers, clients and servers, and interacting peers. We will see numerous instances of these patterns in the rest of the book. The examples also introduced the programming notation that we will employ. This section summarizes the notation?although it should be pretty much self-evident from the examples. RecalI that a conciinent program contains one or more processes, and that each process is a sequential program. Oul programming language thus contains ~nechanis~ns for sequential as well as concurrent programming. The sequential programming notation is based on the core aspects of C, C++, and Java. The concurrent progra~n~ning notation uses the co statement and process declarations to specify parts of a program that are to execute concurrently. These were jntl-oduced earlier and are defined below. Later chapters will define mechanisms for synchronization and inter-process communication.
1.9.1 Declarations A variable declaration specifies a data type and then lists the names of one or more variables of that type. A variable can be initialized when i t i s declared. Examples of declarations are int i, j = 3; double sum = 0.0;
1.9 Summary of Programming Notation
27
An array is declared by appending the size of each dimensioll of the array to the name of the array. The default subscript range for each dimension is from o to one less than the size of the climension. Alternatively, the lower and irpper bounds of a range can be declared explicitly. Arrays can also be initialized when they are declared. Examples of array declarations are int a [n]; # same as "int a[O:n-11;" int b[l:nl; # array of n integers, b[ 11 . . . b [ n ] int c[l:n] = ([nl 0 ) ; # v e c t o r of zeroes double c [n,nl = ( tnl ( [ n l 1.0) ) ; # matrix of ones
Each declaration is followed by a descriptive comment, which begins wit11 the sharp character (see Section 1.9.4). The last declaration says that c is a matrix of double precision numbers. The subscript ranges are 0 to n-1 for each dimension. The initial value of each element of c is 1.0.
1.9.2 Sequential Statements An assignment statement has a target variable (left-hand side), an equal, sign, and an expression (right-hand side). We will also use short-hand forins for- increment, clecrement, and similar assignments. Examples are a[n] = a [ n ] + 1; x = (y+z) * f(x);
# same as "a[nl # f ( x ) is a function call ++;It
The control statements we will use are i f , w h i l e , and for. A simple if statemenr has the form if (condition) statement;
where condition is a Boolean-valued expression (one that is tnle or .false) and statement is some single statement. If more than one statement is to be executed conditionally, the statements are surrounded by braces, as i n if (condition) { statement 1;
In subseqirent chapters we will often use s to stand for such a list of statements. An iflthedelse statement has the form
28
Chapter 1
The Concurrent Computing Landscape if (condition) statement 1; else
statement2;
Again, the statements could be statement lists surrounded by braces. A while statement has the general form while (condition) statementl;
{
...
staternentN; ?
If condition is true, the enclosed statements are executed, then the w h i l e statement repeats. The while statement terminates when condition is false. If there is only a single statement, we omit the braces. The if and while statements are identical, to those in C, C++, and Java. However, we will write for statements in a more coinpact way. The general form of our for statement is for [quantifierl, statementl;
. . . , quantif ierM1
(
A quc~nl-$er.introduces a new index variable, gives its initial value, and specifies the. range of values of the index variable. Brackets are used around quantifiers to indicate that there is a range of values: as in an array declaration. As a specific example, assume that a [nl is an array of integers. Then the following statement initializes each element a [ i 1 to i: for [i = 0 to n-11 a[i] = i;
Here, i is the new quantifier variable; it does not have to be declared earlier in the program. The scope of i is the body of the for statement. The initial value of i is 0 and it takes on all values, in order, from 0 to n-1. Assume m[n, n] is an integer matrix. An example of a for statement with two quantifiers is f o r [i = 0 to n-1, m [ i , j] = 0 ;
j = 0 to n-11
1.9 Summary of Programming Notation
29
An equivalent program using nested for stale~nentsis for [i = 0 to n-l] for [j = 0 to n-11 m [ i , j l = 0;
Both programs initialize the n2 values of tnatrjx in to quantifiers are [i = 1 to n by 21 [i = 0 to n-1 st i!=xl
0.
Two more examples of
# odd values f r o m 1 to n # every value except i = = x
The operator st in the second quanti-fier.stands for "such that." We write f o r statements using the above syntax for several reasons. First, it emphasizes that our for statements ai-e different than those in C, C4+, and Java. Second, the notation suggests that they are used with arrays-which have brackets, rather than parentheses, around subscripts. Third, our notation simplifies programs because we do not have to declare the index variable. (How many times have you forgotten to do so?) Fourth, it is often convenient to have more than one index variable and hence more than one quantifier. Finally, we use the ' same kinds of quantifiers in co statements and process declar? tlons.
1.9.3 Concurrent Sfatements, Processes, and Procedures By default, statements execute sequentially-one at a time, one after the other. The co (concurrent) statement specifies that two or more statements can execute i n parallel. One for111of co has two or inore arms, as in co statementl; / / ... / / statementN; OC
Each arm contains a statement (or statement list). Tlne arms are separated by the parallelism operator / / . The above statement means the Following: Start executing each of the statements in parallel. then wait for rhem to tei~ninate.The co statement thus terminates after all the statements have bee11 executed. The second form of co statement uses one or more quantifiers as a shorthand way to express that a set of statements is to be executed in parallel for every combination of values of the quantifier variables. For example, the trivial co statement that follows initializes arrays a [n] and b [ n l to zeros:
30
Chapter 1
The Concurrent Computing Landscape
This creates n processes, one for each value of i. The scope o-€the quantifier variable is tlie process declaration, and each process has a diffei-ent value of i. The two forms of co can also be mixed. For example, one arm might have a quantiiie~.(within brxkets) and another arm might not. A process declaration is essentially an abbreviation for a co statement with one arm and/or one quantifier. Ir begins wirh the keyword p r o c e s s . The body of a process is enclosed in braces; it carltains declarations of local variables, if any, and a list o-f statements. As a simple example, the following declares a process £00 that sums the values from 1 to 1-0then stores the result in global variable x: process foo { i n t sum = 0 ; f o r [i = 1 t o 101
sum += i; X
= sum;
I A p r o c e s s declaration occurs at the syntactic level of a procedure declaration; it is not a statement, whereas co is. Moreover. processes execute in the background, whereas the code that contains a co statement waits for the processes created by co to terminate before executing the next statement. As a second simple example, the following process writes the values 1 to n to the standard output file: p r o c e s s barl { . f o r [ i = L to nl write (i); # same a s " p r i n t £ (I1%d\n, i) ; 'I
1 An array of processes is declared by appending a quantifier name of the process, as in p r o c e s s b a r 2 [i = 1 to nl w r i t e ( i );
(ill
brackets) to the
{
Both b a r l and bar2 write the values 1 to n to standard output. However, the order in which bar2 writes the values is nondete~nzitzisric. This is because bar2 is an array of n distinct processes, and processes execute in an arbitrary
1.9 Summary of Programming Notation
31
order. In fact, there are n ! different orders in which the array of processes could write the values (n factorial is the number of permutations of n values). We will declare and call procedures and functions in essentially the same way as in C. Two simple examples are int addOne(int v) { return (v + 1);
# an integer function
1 main() { # a "void" procedure int n, sum; read(n); # read an integer from stdin for [i = 1 to nl sum = sum + addOne(i); write("the final value is", sum); 1
If the input value n is 5, this program writes the line: the final value is 20
1.9.4 Comments We will write comments in one of two ways. One-line comments begin with the sharp character, #, and terminate with the end of the line. Multiple-line comments begin with / * and end with * / . We use # to introduce one-line comments because the C++/Java one-line comment symbol, / / , has long been used in concurrenl programming as the separator between arms of ~~~~~~~~ent statements. An assertion is a predicate that specifies a condition that is assumed to be true at a certain point in a program. (Assertions are described in detail in Chapter 2.) Assertions can be thought of as very precise comments, so we will specify them by lines that begin with two sharp characters, as in
This comment asserts that x is positive
Historical Notes As noted in the text, concurrent programming originated in the 1960s after the introduction of independent device controllers (channels). Operating systems were the first software systems to be organized as multirhreaded concurrent programs. The research and initial prototypes leading to modern operating systems
32
Chapter 1
The Concurrent Computing Landscape
occurred in the late 1960s and early 1970s. The first textbooks on operating systems appeared in the early 1970s. TIie introduction of computer networks in the 1970s led to the development of distsibuted systems. The invention of the Ethernet jn the late 1970s greatly accelerated the pace of activity. Almost i~n~tlediately there was a plethora of new languages, algorithms, and applications. Subsequent hardware developinents spuued further activity. For example, once worltstations and local area networks became relatively cheap, people developed client/sei-ver computing; Inore recently the lnternel has spawned Java, Web browsers, and a plethora of new applications. Multiprocessors were invenred in the 1970s, the most notable machines being the Illiac SIMD multiprocessors developed at the Universily of Illinois. Early machines were costly and specialized, however, so for many years highperfoi-inance scientific computing was done on vector processors. This started to change in rhe mid- 1980s with the invention of hypercube machines at the California Institute of Technology and their cominercial introductioi~by Intel. Then Thinking Machines introduced the massively parallel Connection Machine. Also, Cray Research and other makers of vector processors started to produce rnulriprocessor versions of their machines. For several years, numerous companies (and machines) came, and almost as quickly went. However, the set of players and n~achinesis now fairly stable, and high-pel-Forrnance co~nputingis pretty muc.h synonymous with massively parallel processing. The Hislorjcal Notes in subsequent chapters describe developments that are covered in the chapters theniselves. Described below are several general references on hardware, operating systems, distributed systems, and parallel cornputing. These are the books the author has consulted most often while writing this text. (If there is a topic you wish to learn more about, you can also try a good Internet search engine.) The now classic textbook on computer architecture is Hennessy and Patterson [1996]. IF you want to learn how caches, interconnection networks, and multiproces~ors work, start there Hwang [I9931 describes hlgh-performance 'etViP'V Rf* c.n\nnnttterxcbitqcturca and h 19'24J ,. Wry~ 'lP' LuwL cornouter archllectules. Almaal.;inc cl l%le~ ir ,+J & computing applications, software, and architectures. The most widely used textbooks on operating systems are Tanenbaum [I9921 and Silberschatz, Peterson, and Galvin [1998]. Tanenbaum [I9951 also gives an excellent overview of distributed operating systems Mullender [1993] contains a superb collection of chapters on all aspects of distributed systems, illcluding reliability and fault tolerance, transaction processing, file systems, real-time systems, and security. Two additional textbooks-Bacon [I9981 and Bernstein and Lewis (19931-cover multithreaded concurrellt systems. including distributed database systems.
pa@
LL,,
IbVv
References
33
Dozens of books on parallel computing have been written in the last several years. Two competing books-Kumar et al. 119941 and Quinn [1994]-describe and analyze parallel algorithms for solving numerous llu~nerjcand cornbinatorial problems. Both books contain some coverage of hardware and software, but they emphasize the design and analysis of parallel alporitbms. Another four books emphasize software aspects of parallel computing. Brincl~Kansen [I 9951 exanlines several interesting proble~nsin computational science and shows how to solve thetn using a small number of recurring programming paradigms; the problem descriptions are notably clear. Foster [I9951 examines concepts and tools for parallel programming, with an emphasis on distributed-memory machines; the book includes excellent descriptions of High Performance Fortran (HPF) and the Message Passing Interface (MPI), two of the most important current tools. Wilson [I9951 describes four important programming models-data parallelism, shared variables, message passing, and generative communication-and shows how to use each to solve science and engineering problems; Appendix B of Wilson's book gives an excellent short history of major events in parallel computing. A recent book by Wilkinson and Allen [I9991 describes dozens of applications and shows how to write parallel programs to solve them; the book einphasizes the use of message passing, but includes some coverage of shared-variable computing. One additional book, edited by Foster and Kessellnan [1999], presents a vision for a promising new approach to distributed? high-performance colnputing using what is caIled a computational grid. (We describe the approach in Chapter 12.) The book starts with a comprehensive introduction to computational grids and then covers applications, programming tools, services, and infrastructure. The chapters are written by the expelts who are working on inaking. computational grids a real jty.
References Almasi, G. S., and A. Gottlieb. 1994. High.& Parallel Conzl?~iting,2nd ed. Menlo Park, CA: Benjamin/Cu*lmings. Z ~ Bacon, J. 1998. Cnnc~irr-entSyste17z.~:Operatilzg Syste~ns,Dcitubc~seL ~ I Distributed Systems: An Integrated Apy~roach,2nd ed. Reading; MA: AddisonWesley.
and Beinstein, A. J., and P. M. Lewis. 1993. Corzc~rr.r-encyin Pro~rnr.nmi~zg Dutc~bnseSystems. Boston, M A : Jones and Bartletr. Science. Englewood Cliffs, Brinch Hansen, P. 1995. Studies in Contpit~atio/~nl NJ: Prentice-Hall.
34
Chapter 1
The Concurrent Computing Landscape
Foster. I. 1995. Design.ing and Building Parallel Programs. Reading, M A : Addison-Wesley. for a New ConzFoster, I., and C. Kesselrnan, eds. 1999. The Gricl: BL~repri~zt 1x1-tingInfmstrtict~tre. San FI-ancisco, CA: Morgan Kaufmann.
Hennessy, J. L., and D. A. Patterson. 1996. Conzyuter Architechtre: A Quurztit~lfive Appro~ich,2nd ed. San Francisco, CA: Morgan Kaufmann.
Hwang, K. 1993. Adl~nrzced ComptiCer Arch.itectci.re: Paru.llelismn,, Scala,bility, P~,ogrunzmabili~.New York, N Y : McGraw-Hill. Kumu, V., A. Grarna, A. Gupta, and G . Karypis. 1994. Infl-odwctiotzto Pcrrallel Cor-rzgt~ting:Design and Analysis qf Algoritlzrns. Menlo Park, CA: Benjamin/Cumn~jngs. Mullender-, S., ed. 1993. Distrib~ltedSystenzs, 2nd ed. Reading, MA: ACM Press and Addison-Wesley.
M . J . 1994. Parallel Cotnp~iting: Theoiy and Praclice. New York, NY: McGraw-Hill.
Quinn,
Silbel.sclnatz, A., J. Peterson, and P. Galvin. 1998. Operc~tingSysreun Coacq~ts, 5th ed. Reading, MA: Addison-WesJey. Tanenbaum, A. S. 1992. Mocler~zUpemtlrzg Systerns. Englewood Cliffs, NJ: Prentice-Hall.
A. S. 1995. Disli-ibmted Operatirzg Sysrents. Englewood Cliffs, NJ: Tane~~baum, Prentice-Hall. Wilkinson, B ., and M. Allen. 1999. Parallel Prograrnr~zi~lg:Techniques and Applications Using Networked Workstations and Parallel Comp~iters. Englewood Cliffs, NJ: Prentice-Hall. Wilson, G. V. 1995. Practical Pamllel Progrumnzirtg. Cambridge, M A : MIT Press.
Exercises I . 1 Determine the characteristjcs of the multiple processor machines to which you
have access. Haw many processors does each machine have and what are their cloc-k speeds? How big are the caches and how are they organized? What is the size of the primary memory? What is the. access time? What memory consistency protocol is used? How is the interconnection network organized? What is the remore memory access or message transfer time?
Exercises
35
1.2 Many probIems can be solved more efficiently using a concurrent program rather than a sequential program-assuming appropriate hardware. Consider the programs you have written in previous classes, and pick two that coutd be rewritten as concurrent programs. One progran~should be iterative and the other recursjve. Then (a) write concise descriptions of the two problems, and (b) develop pseudocode for concurrent programs that solve the problems.
1.3 Consider the nlatiix multiplicarion problem described in Section 1.4. (a) Write a sequential program to solve the problein. The size of the matrix n should be a command-line argument. Initialize each element of matrices a and b to 1.0. (Hence, the value of each element of the result matrix c will be n.) (b) Write a parallel program to solve the problem. Compute strips of results in parallel using P worker processes. The matrix size n and the number of workers P should be command-line arguments. Again initialize each element of a and b to 1.0. (c) Compare the performance of your programs. Experiment with different values for n and P. Prepare a graph of the results and explain what you observe.
(d) Modify your programs to multiply rectangular matrices. The size of matrix a should be p x q, and the size of matrix b should be q x I.. Hence, the size of the result will be p x r. Repeat part (c) for the new programs. 1.4 In the matrix multjplication programs, computing an inner product involves mul-
tiplying pairs of elements and adding the results. The multiplications could all be done in parallel. Pairs of products could also be added in parallel.
(a) Construct a binary expression tree to illustrate how this would work. Assume the vectors have length n, and for simplicity assume that n is a power of two. The leaves of the tree should be vector elements (a row of matrix a and a column of matrix b). The other nodes of the tree should be multiplication or addition operators. (b) Many modern processors are what are called super-scalar processors. This means that they can issue (start executing) two or more instructions at the same time. Consider a machine that can issue two instructions at a time. Wi-ite. assembly-level pseudocode ro implement the binary expression tree you constructed in part (a). Assume there are instructions to load, add, and multiply registers, and there are as many registers as you need. Maximize the r ~ ~ ~ m of b e rpairs 01 inst~vctionsthat you can issue at the same time. (c) Assume that a load takes 1 clock cycle, an add takes I clock cycle, and a multiply takes 8 clock cycles. What is the execution time of your program?
36
Chapter 1
The Concurrent Computing Landscape
1.5 In the first parallel matrix multiplication program in Section 1.4, the first line has a co statement for rows, and the seconcl line has a f o r statement for columns. Suppose the co is replaced by f o r and the for is replaced by co, but that otherwise the program is the same. In particular, the program computes rows sequentially but computes columns in parallel for each row. (a) Will the program be correct? Namely, will it compute the same results?
(b) Will the program execute as efficiently? Explain your answer. 1 .G The points on a unit circle centered at the origin are defined by the function f ( x ) = sqrt ( 1 - x 2 ) . Recall [hat the area of a circle is x r 2 , where r is the radius. Use the adaptive quadrature program in Section 1.5 to approximate the value of 7c by cornp~~ting the area of the upper-right quadrant of a unit circle then multiplying the result by (1. (Another' function to try is the integral from 0. o to 1 . 0 of f ( x ) = 4 / (1+x2).)
1.7 Consider the quadrature problem described in Section 1.5.
(a) Write four programs ro solve the problem: a sequential iterative program that uses a fixed number of inrervals, a sequential recursive program that uses adaptive cluaclrature, a parallel iterative prograrn that uses a fixed number of intervals, and a recursive parallel program that uses adaptive quadrature. Integrate a funclion that bas an interesting shape, such as sin(x)*exp(x). Your prograins shoi~ldRave command-line arguments for the range of values for x, the number 0.F intervals (for fixed quadrature), and the value of EPSILON (for adaptive quadrature). (b) Expeti~nentwith your programs for different values of the command-line arguments. What are their execution times? How accurate are the answers? How fast do the programs converge? Explain what you observe.
1.8 The parallel adaptive quadrature prograin in Section 1.5 will create a large number of processes, usually way more than the number of processors. (a) Modify the prograin so that it creates approximately T processes, where T is a co~nmancl-lineargument that specifies a threshold. In particular, use a global variable to keep a count of the number of processes that you have created. (Assume for this problem that you can safely increment a global variable.) Within the body of quad, if over T processes have already been created, then use sequential recursion. Otherwise use parallel recursion and add two to the global counter. (b) Implement and test the parallel prograrn in the text and your answer to (a). Pick an interesting function to integrate, then experiment with different values of T ancl E P S I L O N . Compare the pe~forrnanceof the two programs.
Exercises
37
1.9 Write a sequential recursive program to implement the quicksort algorithm for 1 array of n values. Then modify your program to use recursive paralsorting a lelism. Be carefill to ensure that the parallel calls are independent. Implement both programs and compare their performance. 2.10 Gather infonnation on Unix pipes. How are they jxnplemented on your system? What is the maximum size of a pipe? How are read and w r i t e synchronized? See if you can construct an experiment that causes a w r i t e to a pipe to block because the pipe is full. See if you can construct a concurrent program lhat deadlocks (in other words, all the processes are waiting for each other). 1.11 Most computing facilities now employ server machines for electronic mail, Web pages, files, and so on. What kinds of server rnachines are used at your computing facility? Pick one of the servers (if there is one) and find out how the software is organized. What are the processes (threads) that it runs? How are they scheduled? What are the clients'? Is there a thread per client request, or a fixed number of server threads, or what'? What data do the server threads share? Wlien and why do the rhlaeadssynchronize with each other? 1.12 Consider the two programs for distributed matrix multiplication in Section 1.8.
(a) I-Iow many messages are sent (and received) by each program? What are the sizes of the messagesc? Don't forget that there is also a coordinator process in the second program. (b) Modify the programs to use P worker processes, where P is a factor of n. In particular, each worker should compute n/P rows (or columns) of results rather than a single row of results. (c) How many messages are sent by your program for part (b). What are the sizes of the messages? 1.13 The rrclnspose of matrix M is a matrix T such that T [ i ,j 1
= M [ j ,i I ,
for all i and
j.
(a) Write a parallel program using shared variables that computes the transpose of an n x n matrix M. Use P worker processes. For simplicity, assume that n is a multiple of P. (b) Write a parallel program using message passing to compute the transpose of square rnarrix M. Again use P workers and assume n is a multiple of P. Figure
out how to distribute the dara and gather the results. (c) Modify your programs to handle the case when n is not a multiple of P .
(d) Experiment with your programs .For various values of n and P. What is their performance?
38
Chapter I
The Concurrent Computing Landscape
1.14 The grep family of Unix commands finds patterns in files. Write a simplified version of grep that has two arguments: a string and a filename. The program should print to stdout ail lines in the file that contain the string.
(a) Modify your program so that it searches two files, one after the other. (Add a third argument to the program for the second filename.) (b) Now change your program so that it searches the two files in parallel. Both processes should write lo stdout.
(c) Experiment with your programs for (a) and (b). Their output should differ, at least some of the time. Moreover, the output of the concurrent prograin should not always be the same. See if you can'observe these phenomena.
Part
1
Shared-Variable Programming
Sequential progralns often employ shared variables as a matter of conveniencefor global data structures, for example-but they can be written without thern. In fact, many would argue that they .should be written wilhout them. Concurrent programs, on the other hand, absolutely depend on the use of shared con~ponents.This is because the only way processes can work together to solve a problem is to communicate. And the only way they can communicate is if one process writes into .~o171etlzing that the other process reads. That something can be a shared variable or it can be a shared communication channel. Thus, co~nmunicationis programmed by writing and reading shared variables or by sending and receiving messages. Communication gives rise to the need for synchronization. There are two basic lunds: mutual exclusion and condition synchronization. Mutual exclusion occurs whenever two processes need to take turns accessing shared objects, such as the records in an airline reservation system. Condition synchronization occul-s whenever one process needs to wait for another-for example, when a consumer process needs to wait for data from a producer process. Part 1 shows how to write concurrent programs in which processes cornmunicate and synchronize by means of shared variables. We examine a variety of multithreaded and parallel applications. Shared-variable programs are most commonly executed on shared-memory machines, because every variable can be accessed directly by every processor. I-Towever, the shared-variable programming model can also be used on distributed-memory machines if it is supported by a software irnplernentation of what is called a distributed shared memory; we examine how this is done in Chapter 10. Chapter 2 introduces fundamental concepts of processes and synchl-onization by means of a series of small examples. The first half of the chapter
40
Part 1
Shared-Variable Programming describes ways to parallelize- programs, illustrates the need for synchronization, defines atomic actions, and introduces the await statement for programming synchronjzation. The second half of the chapter examines the seinantics of concurrent programs and introduces several key concepts-such as interference, global invariants, safety proper-ties, and fairness-that we will encounter over and over jn later chapters. Chapter 3 examines two basic kinds of synchronization-locks and barriers-and shows 11ow to implement them ~tsingthe kinds of instructions that are found on evely processor. Locks are used to solved the classic critical section probtem, which occurs in most concurrent programs. Bai~iersare a fundamental synchronization technique in parallel programs. The last two sections of the chapter examine and illustrate applications of two important models for parallel computing: data parallel and bag of tasks. Chapter 4 describes semaphores, which simplify programming mutual exclusion and condition synchronization (signaling). The chapter introduces two more classic concun-ell[ programming problems-the dining philosophess and readers/writers-as well as sever-a1 others. At the end of the chapter we describe and illustrate the use of the POSZX threads (Pthr-eads) library, which supports threads and sernaphores on shared-memory machines. Chapter 5 describes ~nonitors,a higher-level programming il~echanislnthat was first proposed in the 1970s, then .fell somewhat out of favor, but has once again become popular because it is supported by the Java progranlming language. Monitors are also used to organize and synchronize the code in operating systems and other ~nultithreadedsoftware systems. The chaprer illustrates the use of monitors by means of several interesting examples, including communication buffecs, readerslwriters: timers, the sleeping barber (another classic problem), and disk scheduling. Section 5.4 describes Java's threads and synchl-onized methods and shows various ways to protect shared data in Java. Section 5.5 shows how to program monitors using the Pthreads library. Chapter 6 shows how to implement processes, semaphores, and inonitors on single processors ancl on shared-memory multiprocessors. The basis for the iniplernentatjons is a kenzel of data structures and primitive routines; these are the core for any implementation of a concurrent programming language or subl-outine library. At the end of the chapter, we show how to implement monitors using sernaphores.
Processes and
I
Concux~entprograms are inherently more complex than sequential programs. In many respects, they are to sequential programs what chess is to checkers or bridge is to pinochle: Each is interesting, but the formel- is more intellectually intriguing than the latter. This chapter explores the "game" of concurrent programming, looking closely at its rules, playing pieces, and strategies. The rules are formal tools that help one to understand and develop correct programs; the playing pieces are language mechanisms for describing concurrenl computations; the strategies are useful programming techniques. The previous chapter introduced processes and synchronization and gave several examples of their use. This chapter examines processes and synchronization in detail. Section 2.1 rakes a first look at the semantics (meaning) of colzcurrent programs and introduces five fundamental concepts: program state, atomic action, history, safety property, and Iiveness property. Sections 2.2 and 2.3 explore these concepts using two simple examples-finding a pattern in a file and finding the maximum value in an array; these sections also explore ways to parallelize programs and show the need for atomic actions and synchronization. Section 2.4 defines atomic actions and introduces the await statement as a means for expressing atomic actions and synchronization. Section 2.5 shows how to program the synchronization that arises in producer/consu~nerprograms. Section 2.6 presents a synopsis of the axiomatic semantics of sequential and concurrent programs. The fundamental new problem introduced by concurrency is the possibility of interference. Section 2.7 describes four methods for avoiding interference: disjoint variables, weakened assertions, global invariants, and synchronization. Finally, Section 2.8 shows how to prove safety properties and defines scheduling policies and fairness.
42
Chapter 2
Processes and Synchronization
Many of the concepts introduced in this chapter are quite detailed, so they can be hasd to grasp on first reading. But please persevere, study the examples, and refer back to this chapter as needed. The concepts are important because they provide the basis fol- developing and understanding concurrent programs. Using a disciplined approach is important for sequential programs; it is imperative for concurrent programs, because the order in which processes execute is nondeterministic. In any event, on with the game!
2.1 States, Actions, Histories, and Properties The state of a concurrent program consists of the values of the program variables at a point: in time. The variables include those explicitly declared by the programmer as well as implicit variables-such as the program counter for each process-that contain hidden state information. A concurrent program begins execution in some initial state. Each process in the program executes independently, and as it executes it exmines and alters the program state. A process executes a sequence of statements. A statement, in turn, is implemented by a sequence of one or more atolnic actions, which are actions that indivisibly examine or change the program state. Examples of atomic actions are uninterruptible machine instructions that load and store memory words. The 'execution of a concurrent program results in an iizterlenving of the sequences of atomic actions executed by each process. A particular execution of a concurrent program can be viewed as a history: so + s , + ... -+ s,. In a history, so is the initial state, the other s,, are subsequent states, and the transitions are made by atomic actions that alter the state. (A history is also called a trace of the sequence of states.) Even parallel execution can be modeled as a linear history, because the effect of executing a set of atomic actions in parallel is equivalent to executing them in some serial order. In particular, the state change caused by an atomic action is indivisible and hence cannot be affected by atomic actions executed at about the same time. Each execution of a concurrent program produces a history. For all but the most trivial programs, the number of possible histories is enormous (see below for details). This is because the next atomic action in any one of the processes could be the next one in a history. Hence, there are many ways in which the actions can be interleaved, even if a program is always begun in the same initial state. In addition, each process will most likely contain conditional statements and hence will take different actions as the state changes. The role of synchronization is to constrain the possible histories of a concurrent program to those histories that are desirable. Murual exclusion is concerned with combining atomic actions that are implemented directly by hardware
2.1 States, Actions, Histories, and Properties
I
43
into sequences of actions called critical sectioizs that appear to be atomic-i.e., tliar cannot be interleaved with actions in other processes that reference the same variables. Condition synchronization is concerned with delaying an action unril the state satisfies a Booleal1 condition. Both forms of synckronizatioo can cause processes to be delayed, and hence lhey restrict the set of atomic actions that can be executed next. A properly of a program is an attribute that is true of every possible history of that program and hence of all executions of the program. There are two kinds of properties: safety and liveness. A safety property is one in which the program never enters a bad state-i.e., a state in which some variables have undesirable values. A liveness property is one in which the progran? eventi~allyenters a good state-i.e., a state i n which variables have desirable values. Purtial courecrness is an example of a safety property. A program is partially coaect if the final state is correct, assuming that the program terminates. If a program fails to terminate, it may never produce the correct answer, but there is no history in which the program has terminated without producing the correct answer. Ternzination is an example of a liveness property. A program te~ininates if every loop and procedure call terminates-hence, if the length of every history is finite. Total correctrzess is a property that combines partial correctness and termination: A program is totally correct if it always tern~inateswith a correct answer. Mutual exclusion is an example of a safety property j11 a concull-ent program. The bad state in this case would be one in which two processes are executing actions in different critical sections at Lhe same time. Eventual entry to a critical section is an example of a liveness property in a concurrent: program. The good state for each process is one in which it js executing within its critical. section. / Given a program and a desired property, how might one go about demonstrating that the program satisfies the property? A common approach is resting or debugging, which can be characterized as "run the program and see what happens." This corresponds to enumerating some of the possible histories of a program and verifying that they are acceptable. The shortcoming of testing is that each test considers just one execution history, and a limited number of tesLs are unlikely to demonstrate the absence of bad histories. A second approach is to use operational reasoning, which can be characcerized as "exhaustive case analysis." In this approach, all possible execution histories of a program are enumerated by considel-ing all the ways the atomic actions of each process might be interleaved. Unfortunately, the nulnber of histories in a concurrent program is generally enormous (hence the approach is "exhaustive"). For example, suppose a concurrent program contains n processes and that each executes a sequence of m atomic actions. Then the number of different histories
44
Chapter 2
Processes and Synchronization
of the program is ( n , m ) ! / (m!"). In a program containing only three processes, each of which executes only two atomic actions, this is a total of 90 different histories! (The numerator in the formula is the number of permutations of the n.rn actions. But each process executes a sequence of actions, so there is only one legal order of the m actions in each process; the denominator rules out all the jllegal orders. The formula is the same as the number of ways to shuffle n decks of rn cards each. assuining that the cards in each deck remain in the same order relative to each other.) A third approach is to employ assertioncil reasoning, which can be clnaracterized as "abstract analysis." 111this approach, formulas of predicate logic called ns.~er.tion.sare used to characterize sets of states-for example, all states in which x > 0. Atomic actions are then viewed as predicate tvansfori~-zers, because they change the state from satisfying one predicate to satisfying another predicate. The virtue of the assertional approach is that it leads to a coinpact representation of states atid state transformations. More importantly, it leads to a way to develop and analyze programs in which the work involved is directly proportional to the number of atomic actions in the program. We will use the assertional approach as a tool for col~structingand understancling solutions to a variety of nontrivial problems. We will also use operational reasoning to guide the development of algorithms. Finally, many of the programs in the text have been tested since that helps increase confidence i n the con.ectness of a program. One always has to be wary of testing alone, however, because it can reveal only the presence of errors, not their absence. Moreover, concun'ent programs are extremely difficult to test and debug since (1) it is diffiC L I I to ~ stop all processes at: once in order to examine their state, and (2) each execution will in general produce a differey history.
2.2 Parallelization: Finding Patterns in a File Chapter 1 examined several kinds of applications and showed how they could be solved using concurrent programs. Here, we examine a single, simple problem and look in detail at ways to parallelize it. Consider the problem of finding all instances of a pattern in filename. The pattern is a string; filename is the name of a file. This problem is readily solved in Unix by using one of the grep-style commands at the Unix command level, such as g r e p pattern filename
Executing this command creates a single process. The process executes something similar to the following sequential program:
2.2 Parallelization: Finding Patterns in a File
45
string line;
read a line of input from stdin into l i n e ; while (!EOF) { # EOF is end of file look for p a t t e r n in line; i f (pattern isin l i n e ) write l i n e ; read next line of input;
1 We now want to consider two basic questions: Can the above progra~nbe parallelizcd? Tf so, how? The fundamental requirement for being able to parallelize any program is that it contains independent parts, as described in Section 1.4. Two parts are dependent on each other if one produces results that the other needs; this can only happen if they read and write shared variables. Hence, two parts OF a program are independerzt if they do not read and write the same variables. More precisely:
(2.1) Independence of Parallel Processes. Let the read set of a part of a program be the variables it reads but does not alter. Let rhe wrire set of a part be the variables ir writes into (and possibly also reads). Two parts of a program are independent if the write set of each part is disjoint from both the read and write sets ol' the other part.
A variable is any value that is read or written atomically. This includes simple variables-such as integers-that are stored in single words of storage as well as individual elements of arrays or structures (records). From the above definition, t d o parts are independent if both only read shared variables, or if each part reads different variables than the ones written into by the other past. Occasionally, it is possible that two parts can safely execute in parallel even if they write into common variables. However, this is feasible only if the order in which the writes occur does not matter. For example, when two (or more) processes are periodically updating a graphics display, i t may be fine for the updates to occur in any order. Returning to the problem of finding a pattern in a file, what are the independent parts, and hence, what can be paralletized? The program starts by reading [he first input line; h i s has to be done before anything else. Then tlie program enters a Loop that looks for the pattern, outputs the liue if the pattern is found, then reads a new line. We cannot output a line before loohng foi a pattern in that line, so the first two lines in the loop body cannot be executed in parallel wit11 each ocher. However, we can read the next line of input while loolung for a pattern in the previous line and possibly printing that line. Hence, consider the following, concurrent version of the above program:
46
Chapter 2
Processes and Synchron~zation string line;
read a line or input from stain into line; while ( ! E O F ) { co look lor pattern i n line; if (pattern is in line) wrile line; / / read nexi line of input into l i n e ; OC ;
I
Note that the first arm of co is a sequence o f srelements. Are the two processes in this program indep~ndent?T l ~ eanswer is 110, because lhe first process reads line and the second process writes into it. Thus, i f [he second process runs Faster than lhe firsl, i t will overwrite the line before it is examined by tlze first process. As noted, parts o f a program can be executed concurl-entlyorlly jf they read
and write different variables. Suppose Lhe second process reads inlo a djffel-elit varjable than the one that is exa~ninedby the first process. In particular, consider the following program: string linel, line2; read a 1.ine of input from s t d i n illto 1inel ; while ( ! E O F ) ( co look for pattern i n linel; if (pattern i s i n line11 wrire linel; / / read next lineof input into line2: OC ;
1
Now, the two processes are working on different lines, which u e stored in variables line1 and line2. Hence, the processes can execute concurrently. But js the above program correct? Obviously not, because the first process continuously looks at linel, while the second process continuously reads into line2, wbich is never examined. The solution is relatively straightforward: Swap the roles of the lines at the end of each loop iteratjon, so that the first process always exatnines tbe last line read and the second process always reads into a different variable from the one being examined. The following program accomplishes Ulis: string linel, line2;
read a line of input from a t d i n into linel; while ( ! EOF) { co look for pattern in l i n e l ;
2.2 Parallelization: Finding Patterns in a File
47
(pattern is in l i n e l ) wrile l i n e l ; / / read 11cxt line of input into line2; if
OC ;
linel = line2;
1
111particular, at the cnd of each loop itcralion-and aftex each pl-ocess has finished+opy the conlents of line2 into linel. The processes within the c o statement arc now independent, but theil- i~crjonbare coupled because of the last sratement in the loop, which copies line2 into linel. The above concul.renl program is correct, but it is quite inefficient. First, the last line in the loop copies the contents of line2 inlo l i n e l . This is a sequential aclion not present in the first program, and ir: in general recluires copying cl.o~ensof characters; thus, i t i.s pure overheacl. Secoiid, the loop body contains a co statement, which means that on each iteration o.Fthe while loop, two p~ncesseswill be created, executed, then destroyed. It is possil>le Lo do [he "copying" much more efficiently by using an array ol'two lines, having each process index inlo a tliffeerenl line in the tu-my. a~ldthen merely swapping array indices in the last line. However, process crealion overhead would scill dominate, because it takes much longer lo creace and debt~.oyprocesses than to call proccdures and mucb, much longer than to execute straight-line code (see Chapter 6 for details). SO, we come to our final question for this secrion. Is there anotbcr way to parallelize the prog.ram that avoids having a c o statement inside the loop? As you no doubt guessed, the answel. is yes. In par~icular,instead of having a co statement inside tl~ewhile loop, we can put w h i l e loops inside each arm of the co scalemelit. Figure 2. I co~ltainsan oulline ol' thih approach. The program is in fact an insta~~cc of the proclucer/consi~~nerpattern inlroduced i n Secciun 1.6. Here, the f rst process is the producer alrd Ihe se.cond process is the consu~ne~-. They cornmunicale by means of the sha~.cdb u f f e r . Note that the declaratio~is of line1 and line2 are now local to the processes because the lines arc no longer shared. We call the slyle of the program in Figure 2.1 "while inside co" as opposed to the "co inside while" st.yle of the earlier programs in this section. The advancage of the "while inside co" style is that processes are created on.ly once, rather than on each loop iteracion. The down side is Lhac we have Lo use two buffers, and we have to program the recli~iredsynchtor~ization.The statements that precede and follow access Lo tlzc shared b u f f e r indicate the kind of synchronization cl~atis required. We will show I~owto program this synchronization in Section 2.5, but first we need to exalnitle the topics of synchro~iizarionill general and ato~nicactions in p;uticular.
48
Chapter 2
Processes and Synchronization string buffer; # contains one line of input boo1 done = false; # used to signal termination co
# process 1: string linel; while (true) (
find patterns
wail Tor buffer t o be f i l l 1 or done lo be me; if (done) break; l i n e l = buffer; signal that buffel- is cmpty ;
look for pattern in l i n e l ; if (pattern isin linel) write 1 inel ; 1 //
# process 2: string line2; while (true) {
read new l i n e s
read next line of input into 1 i n e 2 ; if (EOF) {done = true; break; 1 wait for buffer to be empty; buffer = line2;
signal that buffer is full ; 1 OC ; i 6
Figure 2.1
Finding patterns in a file.
2.3 Synchronization: The Maximum of an Array Now consider a different problem, one that requires synchronization between processes. The specific problem is to Ant1 the maximu~nelement of ai-ray a 1111. We will assume that n is positive and that all elements of a are positive integers. Finding the inaxi~nalelement in array a is an example of an accumu)atjon (or reduction) problem. In this case, we are accumulating the maximum seen so far, or equivalently reducing all values to their maximum. Lei m be the variable that js to be assigned (he maximal value. The goal of the program can then be expressed in predicate logic as
The first line says thal when the program tertninates, the value of m is to be at
2.3 Synchronization: The Maximum of an Array
49
least as large as every value in array a. The second Line says that m js to be ecluivale111to some value in array a. To solve h e problem, we can use the i'ollowjng sequential program: int m = 0; f o r [i = 0 to n-11 C i f (a[il > m )
m = a[il; 1
This program iteratively looks at all values irl alray a; if one is found that is s~nallerthan the current maximum, il is assigned to m. Since we assume that all values in a are pc~sitive,it is safe to initial.izern to 0. Now consider ways to parallelize the above program, Suppose we fully parallelize the loop by examining every array element in parallel: i n t m = 0; co [i = 0 to n-l] if ( a [ i l > m) m = a[i];
This prograrn is irlcorrect because the processes are not independent: cach one both reads and writes variable m. In particular, suppose that each process l colnpares its a [i] to m aL Lhe same executes at the same rate, and hence t l ~ aeach time. ALJ processeh will see that the comparison i s true (because all elements of a Lire posirive and I.he initial value OF m is 0). Hence, all processes will try to i~pdatem. The memory hardware will cause Llie updates to be done in somc: serial order-because writing inlo a meruory word is alotnic-and the fi~lalvalue o l ' m will be the value o f a (i] assigned by the last process that gets co update m. 1n the above program, reading and wriling m arc separate actions. One way t.o deal with having too much paral.l.elisrn is Lo use synchronization to combine separate actions itlto a single atomic action. The following program does so: int m = 0; co [i = 0 to n-1] (if (a[iJ > rn) m = a[i];)
The angle brnckels i n the above code specify d m each i f sl-ate~nenlis to be executed as a11 atomic action-name1 y, that each i f slakment both exi~lni nes rhe current value of m and coudifonally lipdates it as cr single, itldivisiblo action. (We describe c-hc angle-bracket nocation in detail j n the next hectiori.) Ux)fortunalely, the last program is almost the same as the sequential program. In Lhe sequential program, the elements of a are examined in a fixed
50
Chapter 2
Processes and Synchronization
o~-der-f~.orna [ O I to a [n-11. In the I;tsr program the elelnents of a are examined in ,an arbitrary ordel--because processes execute in an arbitrary order-but they are still examined one at a time due to the synchl-onizarion. The lccy issues in this applicarjon are to eil.sure [hat upclatcs of m are atomic and that m is indeed the maximum. Suppose that we execute the cornparisolls io paral.lel but (he updates one at a h e , as in the following prograln:
Is t.his version colrect? No, because this prograrn is actually the same as the first concurrent program; Every process could compare its value of a to m a n d then update h e value of m. Al thougll this program specifies that updating m is atomic, (he memory hardware of a machine will i n facl ensure that. So, what is the best way to solve this problem? The answer is to combine rhe last two programs. It is safe to do cornparj.sons in parallel, because [hey are read-only accions. But it is necessary to ellsure that when the program ter~ninates. m is itdeed the maxirnum. The followii~gprogram accomplishes this: int m = 0; co [i = 0 to n-11 if (a[il > rn) (if ( a [ i l > m ) rn = a [ i ] ; )
# check t h e value of m # recheck t h e value of m
The idea js first to do Lhe comparison, and then i f it is true, to double rheck belore dniiig llie update. This may seem like a lot of wasted effort. but in fact often it i s not. Ouce some process has updaled m, it is probable hat half thc otlwr processes will find their value OF a to be less than the new value of m; hence, rbose processes will not execute the body of the i f statement. After a further update, even fewer proce.sses wou1.d find the first cbeck to be true. Thus, il' the checks themselves occur somewl~ata t random rather Lhan concurrently, it i s increasingly likely that processes will not have to do the second check. This specific problem is not onc that is going to benefi~from being solved by a concurrent program. unless Lhe program is executed on a SlMD machine, which is buil~Lo execute line-grained progralns efficienlly. Howevei-, Lhjs section has made Ihl-ee key points. First. synchronization is recluired to get cor~zct answers w hene\ler processes both rcacl and write shared variables. Second, angle brackets we used to specify aclions that are to be atomic; w e explore this topic in detail in the next section and later sl~owhow to j~nplelnenlatomic aclions, wbich a r c in fact instances of critical sections. Thi.rd, the lechniclue o-F double checlting
2.4 Atomic Actions and Await Statements
51
before updaling a shared variable is qi~iteuset-.i'lrI-as we shall see it1 later examples--especially jf i t is possible that the first check is i'alse and hence that the second check is not ileeded.
2.4 Atomic Actions and Awaif Statements As mentioned eiirlier. we can view execution of a concurrent program as an interleaving of the atomic actions exccuted by i udivitlual processes. When processes interact, riot all i~iterleavingsare li.kely to be acceptable. The role of synchronization is to prevent unclesirable inlel-leavings. This is done by combining finegrained atomic actions into coarse-grained (composite) actions or by delaying process execution ~rnlilthe program state salisfies some preclicate. The first for111 of synchronizalion is called I ~ L I I L I C exclu.sion; L~ the second, cnnditio~zsynchronization. This section examines aspects of a ~ o ~ nactions ic and prcscnls a notation for specifying spncluo~iizaiion.
2.4.1 Fine-Grained Atomicity Kecall that an ato~rlicactjon makes a n indivisible state transl:olmatjon. This means that any inler~nediatestate that might exist in rl~ei~nple~nentation ol' tlie action must no[ be visible to other processes. A$rf.e-gmincd atomic action is one that is impleniented direclly by tlie hardware on which a concurrent program executes. In a sequential program, assignme~itstatcrnenls appear to be atotx~icsince no intel-~ned.iatcstate is visible to the program (except possibly if there is a machine-detected TauIc). However, this is no1 genc~.allythe case i n concurrent prograins sincc an assign~nentstatement might be imple~nentedby a scqnence of fine-grained machine instructions. Fur example, consider lhe following program, and assume that the fine-grained ato~nicactions are rcading ;ind wriling the variables: int y = 0 , z = 0; co x = y + z ; / / y = 1; z = 2 ; oc;
If x = y + z is implemented by loading a register wilh y and then adding z to it, tire tinal value of x coulcl be o, I. 2 , or 3 . This i s because we could see I.he illitin1 values for y and z,their final values, or some co~nbjnation.depending on how far the second process has executed. A ful-tliel-peculiarily of the above program is that the final value of x could be 2 , even though one could never stop the program and see a stale in which y + z is 2.
52
Chapter 2
Processes and Synchronization We assume that ~nachineshave tbe t'ollowing, real istic character.istics: Values of the basic types (e.g., i n t ) are stored in memory ele~nellts(c.3.. words) that are read and writteri as atomic acdons. Values are manipulated by loading thetn inlo registers, operating oa them Lhere, then Wring the uesi~lt~ back into memory. Each process has i t s own set of registcrs. This is realized erther by hilving distinct sels o f registers or by saving ancl restoring register values whenever a different process is executed. (This i s called a context switch since the registers constitute the execution context of a process.) Any inter~nediateresr~ltsthat occur wherl a complex expression is evaluated are stored in registers or in memory private to the executing process+.g.,
on a privale stack. With this macliine model, if an. expression e in one process does not reference a variable altered by another process, expression evaluation ulill appear to be several fine-grained atomic actions, This i s atomic, even if i t requires execuli~~g because (1) none of the values on which e depends could possibly change while e is being evaluared, and (2) no other process can see any le~nporaryvalues that might be creared whi.le. the expression i s being evaluated. Similarly, if an assign~nenlx = e in one process does not reference ally variable altered by anocher process-for example, i t references only local variables-then executio~lof Lhe assignment will appear to be atomic. Unfortunately, most staremeilts in concurrent programs that reference slzaced variables do not meet the above disjointness requirement. However, a weaker requirement is often met. (2.2) At-Most-Once Property. A criricul reference i.n an expression is a reference to a variable (hat is changed by anothcr process. Assume chat any critical reference is to a simple variable that is stored in a memory element Lhat is read and wsjtten atomically. An assignment statement x = e satisfies h e at-most-once property if either ( 1 ) e contains a1 most one critical reference and x is not read by another process, or- (2) e contains no critical references: in ~/Iiichcase x Inay be read by otlier processes.
This is called the At-Most-Once Property because there can be at lnosl one shared varii~ble,and it can be referenced at most one time. A similar delinition iippIies lo ex(~ressionst l ~ a tarc no[ in assignment statemenls. Such an expression satisfies the At-Mosl-Oncc Propelly il' il contains no Inorc than one critical re.fere11ce.
2.4 Atomic Actions and Await Statements
53
If an assignment statement meets the requirements of the At-Most-Once Property, then execution of the assignmel~rstatement will appear to be atomic. This is because the one shared variable in the statement will be read or wrirren just once. For example, if e concajns no critical references and x is a si~nplevariable that is read by other processes, they wiIl not be able to tell whether the expression is evaluated atomically. Similarly, if e contains just one critical reference, the process executing tlie assignment will not be able to tell how that variable is updated; it will just see some legitimate value. A few exampIes will help clarify the definition. Both assignments in the following program satisfy the property
UFPE '~CCEN int x = 0, y = 0 ; co x = x + l ; / / y = y+l;
M KI OC;
BIBLTUI'
There are no critical references in either process, so the final values of x and y are both 1. Both assign~nentsiu the following program also satisfy the property: int x = 0 , y = 0; co x = y+l; / / y = y + l ; oc;
The first process references y (one critical reference), but x is not read by the second process, and the second process has no critical references. The final value 01' x is either 1 or 2, and the final value of y is I. Tlie first process will see y either before or after it is incremented, but in a concull-ent program il can never know which value it will see, because execution order is nondeterministic. As a final exatnple, nejther assignment below satisfies the At-Mosr-Once Property: int IC = 0, y = 0 ; co x = y+l; / / y = x+l; oc;
The expressjon in each process corlrains a critical reference, and each process assigrls 10 a variable read by the other. Indeed, the final values of x and y could be 1 and 2, 2 and 1,or even 1 and 1 (if the processes read x and y before either assigns to them). However, since each assignment refers only once Lo only one variable alterecl by mothel- process, the final values will be those that actually existed in some state. This contrasts with the earliel- example in which y + z referred to two variables altered by another process.
54
Chapter 2
Processes and Synchronization
2.4.2 Specifying Synchronization: The A wait Statement It an exp~.cssion01- assignment st;lIernent does tlot satisfy the At-Most-Once Properly, we often need 10 have it execuled atomically. More generally, we often need t o execute sequences of statements 21s a single atomic action. I11 both cases, wc need LO use a synchronizalion mechanism to construct a coarse-grainer1 atomic action, uihich is a sequence of hnc-grained atomic actions chat appears to be i~~divisibl.e. As a concrete exiimplc. suppose a database contains two values x and y , and tlial a1 all rimes x and y are to be the satne in the sense that no process examining the database i.s eve1 LO see a state in which x and y differ. Then, if a process alters x , it must also allcr y as part or Lhe same atomic action. As a second example, suppose one process inserts elements on a queue I-epresented as a linked list. Another process renloves elements from the list, assuming tlicre are elements on the list. One variable points to the head of the list and ;inother points to the tail of the list. lnserling and re~novingelements requires manipularing two values; e.g., lo i.rlsert a n elemenr, we have to change the link of the previous clement so ic points to the new element, and we have to change the tail va~.jableso i r points to the new element. I f the ]is1 coillains just one element, simultaneous insertion and removal can conflict, leaving [he list in an u~lsrable stale. Thus, insertion arid removal nlust be atom-ic actions. Furthennore, if-' the list i s empty, we need to delay execution of a rernove operarion urllil an element has beell jnsertecl. We will speciFy atomic actiotls by means of angle brackets ( and ). For example, ( e) indicates thal expression e is to be evalualed atornically. \Ve will bpecify synchronization by means of the await statement: (2.3) (await (B)
s;)
Boolean expression B speciiies a delay condition; s is a sequence of sequential slatelnencs that i s guaranteed to terminate (e.g., a sequence of assignment stateincnts). An await statement is enclosed in angle brackets to indicate that it is executed as itn ato~nicaction. 111 particular, B is guaranteed to be true when execulion of s begins, and no inrerrlal state in s is visible to other processes. For ex~~mple, (await (s > 0 ) s = s - 1 ; )
delays until s is positive, rl~endecrements s. The value of s is gua-anteed lo be posi live before s is dacremerlted. The await sLaLemenl is a very powerful slalemeot since it can be used to specil'y arbitrary, coarse-grained atomic actions. This makes it convenienl for
2.4 Atomic Actions and Await Statements
55
expressing synchronization-and we will therefore use await to develop illilia1 solutions to synchronization problems. This expressive power also makes await very expensive to implement in its most general form. However, as we shall see in h i s and the next several chapters, there are many special cases of await that can be implemented efficiently. For example, the last await statement above is an example of the P operation on semaphore 8, a topic in Chapter 4. The general form of the await statement specifies both mutual exclusion and condition synchronization. To specify only mutual exclusion, we will abbreviate an await statement as follows:
For example, the following increments x and y atomically:
The internal state-in
which x has been incremented bur y has nol-is, by definition, not visible to other processes that reference x or y . Tf s is a single assignmen( statement and meets the recluiremenls of t l ~ eAt-Most-Once Property (2.2)-or if s is implemented by a single machine instruction-then s will be executed atomically; rbus, (s;) has the same effect as s. To spccil'y only condition synchtonization, we will abbreviate await 21s (await (B);)
For example, the following delays the executing process until count > 0: (await (count > 0);)
If B meets tile requirements of the At-Most-Once Property, as in this example, then ( await (B) ;) can be itnple~nentedas while ( n o t B);
This i s an inslance of what is called a .$pin fool). In particular, (lie w h i l e statement has an empty body, so i r just spins until B becomes false. An urlcnn.ditional! atomic action is one that does not conlain a delay condition B. Such an action can execute immediately, subject of course to the r-equirement that il execute alomically. Hardware-i mplemented (fine-grained) actions, expressions ill angle brackets, and await svatenlenls in which the guard is the constant true or is omitted are all ~incooditionalatomic actions. A condirionul atomic action is an await statement with a guard 8. Such an action cannot execule until B is true. lf B is false, i t can only become true as
56
Chapter 2
Processes and Synchronization
the result 06 actio~lstaken by orl~erprocesses. 'I'hus a process wailing to execute a concli~ionalatomic action could wait for an arbitrarily long time.
Producer/Consumer Synchronization The last solution in Section 2.2 to rhc problem of finding patler~is in a file employs a producer process and a consumer proccss, l o puticular, the producer repeatedly reads inpur Ijnes, determines those chat contained the desired paclern, and passes Lhem on lo the consumer process. Thc consurner then outputs the lines i t receives frorn the producer. Commun cado on between the producer and consumer is conducled by means of a shared buffer. We left unspecified how to synchronize access ro h e buffer. We are now in a position to explain how to do so. I-Iel-e we solve a somewlia~silnpler producedconsu~nerproblem: copying all elements of an array forn the prodi~certo the consurner. We leave lo the reader the task o.F adapling 111issolution to the specific problem at the end of Section 2.2 (see the exercises a1 rl~eend of this chapter). Two processes are given: Producer and consumer. The Producer I~asa local array a [n] of integers; lhe Consumer has a local array b [n] of iiztege~s. We assume chat array a has been initialized. The goal i s to copy the coatems of a inlo b. Because the arrays are not shared, the processeh I~avelo use sharecl vaciables lo co~nmunicatewith each other. Let buf be a single shared integer lliar will sel-ve as a co~nmunjcationbuffer. The Producer ant! consumer have to alt.ernaLe access to buf. To begin. thc Producer deposits rlle first elernenr of a in buf, then Consumer (etches it, then Producer deposils Che second eje~nent0.1' a, and so on. Let shal.ed variables p and c count the number of items that li;nle been deposited and fctcheci, respectively, T n i tial l y, these values are both zero. The synchronization rcqui rements between the producer and consumer can then be e.xpressed by the followi I I predicate: ~
PC:
c <= p c= c+l
In pnrticuli~r,the values of c and p can differ by at most one, meaning the producer has deposiled a! most one more element than h e consumer has fetched. The actual cock for he rwo processes is shown in Figure 2.2. The Producer and Consumer use p and c as show11 in Figure 2.2 to synchronize access to buf. In particular, they use await statements to wait uticil the buffer is empty o r Cull. When p == c the bul'kr is e.lnpry (the pr~cviously deposited element has been fetched). When p > c the buffer is f u l l .
2.6 A Synopsis of Axiomatic Semantics
57
i n t b u f , p = 0 , c = 0; process Producer ( i n t a [nl ; while (p < n) { (await ( p == c ) ; ) buf = a l p ) ; p = p+l;
1 1 process Consumer i int blnl;
while (c < n) i (await ( p > c);) b [ c l = buf; C = c+1; 1 1
Figure 2.2
Copying an array from a producer to a consumer.
I When synclironjzarion is implemented in this way, a process i s said to be bu.sy waiting 01- spillnitzg. This i s because the process is busy checking the condition in its await state~ment,but all il does is spin jn a loop until that condilion is true. This kind of synchronization is common-indeed, necessary-at the lowesi levels of software systems, such as operating systems and network protocols. Ch.apler 3 examines busy waiting in detail.
1
2.6 A Synopsis of Axiomatic Semantics At the end of Section 2.1, we described how assei-iional reasoning can help us imderstand [he properties of a concurrent program. More importantly: it can help us develop correct programs. Consequently: we will be using assel-tionnlreasoning frequently in the remainder of the text. In this and the nexl two seclions, we iilrroduce the formal basis for it. L a m chapters will apply the concepts informally. The basis for assertional reason is what is called a ptvgrczrnming logic-a forrnal logical system that facilitates making precise statements about program execution. This section summari~esthe topic and inuoduces key conceprs. The Hisrorical Notes at the erld of this chapter describe sources of more detailed infortnation, including many more examples,
I
1
I I
I
58
Chapter 2
Processes and Synchronization
2.6.1 Formal Logical Systems ,411~ lormal
logical systcm consists of rules dcfincd in terms o l
a set of,for~riwlasconstructed from these symbols, n sel of distinguished Formulas calletl axioms, and
a set of infirei~cerules.
Formulas arc well-formecl sequences of symbols. The axioms are special tormulas Lhar are a priori assumed to be true. Inference rules specify how lo derive addi~ionaltrue fonnulas from axio~nsand o h e r Lrue formulas. Inference rules have the form
, I
I
I I
i
Each H, is a h?;potll,esis;C is a conclusion. The meaning of an inference rule is as I'ollows: If all the hypotheses are true, then we call infer thal the conclusion is also tnic. Both rlie hypothcscs and conclusion are Cor~lnulasor schematic represe11taLionsOF formulas. A pr-oc?fin a ~OI-ma1 logical sysle~nis a sequence of lilies, each OF which is an ~ I X ~ O I I01-I can be del-ived From previous lines by applicaliotl of an it~ference rule. A 11zec11-en1is any line in a proof. Thus, theorems are either axioms or are obtained by applying an inference rule to other theorerns. By itself, a C01-1iia1logical syslc~nis a malhematical abstraction-a collecLion 01' sy~nbolsant1 relalions between 11icm. A logical system becomes interest111gwhen Llie loi-mulas represenl sLaLenients about some domain of discourse and Lhe for~nulasthat are theorems are true stalemenrs. This I-equiresthat we provide an inl-erpreta~ionof [he I'o~mulas. An iulerp~.efo/ioi? of a logic maps each tormula to true or false. A logic is smmd with respect to ao interpretation if all its axioms ancl inference rules are sound. AII miom i s sollncl if it maps to Lrue. An infel-ence rule is soutld il' its conclusion maps to Lrue, assuming all the hypotheses map to true. Thus, if a logic i s sound, all theorems are Irue s1aLement.s about the domain of discourse. In this cx,e, the i~i~erpretatio~i is called a nlocl~lfor Lhe logic. Comple~enessis (he dual of soundness. h logic is cnrnplere with respect to an interprelalioii it' 1--ormulathat is inapped to true is a Lheorem-that is, every formula is provahle in the logic. Thus, if FACTS is the set of true statetnents that are expressible as forinulas i n a logic and THEOREMS is the set of theorems of the logic, soundness means that THEOREMS c FACTS and complefeness means
2.6 A Synopsis of Axiomatic Semantics
59
d complete allows all hut FXCTS THEOREMS. A logic [hat is both s o ~ ~ nand true statcnlcnts expressible in t h e logic to be proved. Any logic that includes arirlinielic cannot be co~nplere,as shown by Gerrnan n~athematicianKurl Giidel in his I'amous i~~co~nplereness theorem. However, a logic hat extends another one can be r-elatively conzplere, meaning that i t does not introduce any incompleteness beyond tllar inherent in the logic i t extends. Foriutiatcly, ~-eI;ttiveco~~ipleteness is good enough for the progratnming logic we presenr below since the arjthmolic properties that we will ernploy are certaiiily tl'llc!.
2.6.2 A Programming Logic A ptogtwnuning logic is a 6oralal logical system that allows one co state and prove properties of programs. This section surnnlarize-s a specific one rhal we
call PL (Program~ning1,ogic). As wirh any fol'mal logical system, PL conlains symbols, formulas, axioms. and inference rules. The sp~nbolsor PL ace predicates, braces, ;und pl-ogra~nmjng language statements. 'The formulas of PI, are called /riplc.s. They have the following f01-1n:l
Predicates P and Q specify relations between the values of pi-ograni variables: s is a statement or stalemenr list. The pulpose of a programming logic is to facilitare proving p~r,peiTiesof program execution. Hence, che inrerpretation of a triple characterizes the relation ber~veenpredicates P and Q ant1 statemenr list s. (2.4) Interpretation of a Triple. The triple {P) s ( Q ) is true if, whenever execution of s is begun in a state satisfying P and execution UF s tcrminates, the resulting stale sat.isfies Q.
This inlerpretation is called partial con-ectness, which is a safety property as defined in Section 2.1.. It says that, if rlle initial program state satisfies P, then the final state will satisfy Q, assuming s terminates. Tlle related liveness propelty is total corr-ectrzess, which is partial correctness plus termination-that is, all histories are finite.
I
I
i:',1 I
i
I Predicates i n ~riplesu.e surrounded bv bl.aces. hccuusc thnt 1s the way [hey l~avt:rn~d[tionally heen used in p~,ograinminglogics Noweker, braces are n l h o used in our proogralurning norattot\ t o enclose scqllcnceb of iratcmclrrc. Tu avoid pnssil~leconfusio~l,r w will rlcc # # to specify a prcdicule io n program. Kccoll tllat rhe ~ h a i - ptJ~aracter# is r~scdlo ~ntroducea one-lint. colulnelit. Think of ,I ored~careas a \cry precise-hence vety I
shwp~o~nment.
60
Chapter 2
Processes and Synchronization
In a triple, predicates P and Q are often called r1rre1-rionssince they assert that the program state lnusl satisfy the predicate in order lor h e interpretatiori o f the triple lo be Lrue. Thus, an assertion characterizes an acceplable program state. PI-edicaleP is called the precondition of s; jl characterizes (lie condition that the st.ate must satisfy before executioll of s begins. Predicate Q is called Lhe postcortdition of s; it characterizes Lhe state chat results from executing s. assuming s terminates. Two special asserlions are true, whjch characterizes all program stares, and false, which characterizes no program state. In order for interpreration (2.4) t.o be a model for our programming logic, Lhe axioms and infel-ence rules of PL must be sound with respect to (2.4). This will ensure thal all theoretns provable in PL are sound. For example, the following triple should be a theorem:
However. the following sl~ouldnot be a theorem, because assigning a value ro cannot miraculously set y to 1 :
x
In addition lo being sound, {.he logic should be (relatively) complete so thal all triples lhal are true are i n fact provable as theorems. The most imporlant axiom of a programming logic si~clias PL is the one relating to assignment:
Assignment Axiom: { P,
,,)
x
=
e
{PI
,
specilies lextual substitution: it rnenns "replace all free The notation P, occurrences of variable x in predicate P by expression e." (A variable is free i n a predicate if it is llol captured by a bound varjable of the same name in all exjstcnlial or universal quantifier.) 'rhe Assignment Axiom thus says that if one wants at1 assignment to result in a state satisfying p~-edicateP, then the prior st.atc must satisly P wjth variable x textually replaced by expressio~~ e. As a u example. the following triple is an instance of the axiom:
The ~>reco~idjcion sitnplifies lo he predicale true, wluch characrerizes all stales. Thus, this triple says that 110 maaer what the starting stale, wlwn we assign 1 LO x we get a state that satisfies x == 1. The more coln~nonway lo view assjg~urlentis by ''going forward." In parLicular, s l m with t~ predicate that characterizes whal is true of the current state,
Composition Rule:
(PI s, {Q), [Q)
Sa
IR)
{ P I S1; S2 {R)
If Stat.ement Rule:
(P
A
B) S {Q), (P {PI
While Stalemer~t1Pule:
{I
if (B) A
{I) while(B)
Rule of Consequence:
Figure 2.3
PI
+
P, {P) s
iB)
A
S;
3
Q
CQ)
S (1) S;
{I
A
,
Q
{Q)
TB)
q1
Inference rules in programming logic PL,.
and then produce Lhe predicate that is rrue 01: the state aftel- I-he assignment takes place. For example, if we start i n a state in which x == 0 and add 1 lo x, then x will be 1 in the resulting state. This is caplured by the triple:
~ I
{x
L
== 0) x
= 1; { x
1-
1)
The Assignment Axioln describes how Lhe state changes. The inference n ~ l e sit1 a programming logic such as PL aliow thcore~nsresillling from instances of' the Assignment Axiom to be combined. I n pacdcular, in(-'el.eacrI-ulesare i~sed to characrerize the effects of statemeni co~nposition(sratelnent lists j and of control. statemenls such as i f and while. They are also used to modify the predicates in triples. Figure 2.3 gives four of the most jmportanl inference rules. The Composition Rule allvws one to glue together die o-iples i20r two sraletnents wlien the siatelnents arc execclled one after thc ~Llier. The first hypothesis in the Lf Slatewent Rule chal-aclerizes the effect of execi~tings when B is true; the second I~ypolliesischaracterizes whai i s [rile when B is false; the conclusion combines the two cases. As a simple example of the use of these two rules, the following progntn~sets rn to the maximil~nof x aizd y:
I I
62
Chapter 2
Processes and Synchronization
tVhace\ter the i.nitial state, the lisst assignmer~tproduces a shle in which m == x. After the if slalement has executed. m is equal ro x and at least as large as y , or it is ecliial to y and greater than x. The While Stateme111Rule requires a b o p invur-iunt I. This is a plcdicate that i . ~true before ancl afi-er eacll ileradon of the loop. II' r and loop condicioli B are u-ue before [he loop body s is executed, lben execution of s rrii~stagain make I t ~ u e .Thus, when the loop terruinales, I will slill be true. but now B will be false. As an exa~nplc,the following program se;ucbes array a for- the first occurrence of the value of x. Assuming that x occurs i n a. the loop rerminaccs with variable i seL to the index of the first occurreczce. i
= 1;
{i == 1 A ( v j: 1 <= j < i: a [ j ] ! = x ) ) while ( a [ i ] ! = x ) i = i-1-1; ((b' j: 1 c = j < i: a [ j ] != X ) A a [ i l == x )
Tlie loop invarianr here is the qnantificd predicale. I t is true befo.cc Ihe loop because the range of the quantifier is empty. 11 is also true before and after each execution of the loop body. When the loop terminates, a [i] is equal Lo x and x docs not occur earlier in a. The Rule o l Consequence allows precondilions to be st(-etigthcnedancl/or poslcondi~ionslo be wealtcned. As an example, the follocving triple is true:
Thus, from ilie Rule of Consequence, the tbllow ing triple is also true:
Tbe posicondition in the second triple is ~ u e a k ~than r the one in the first criple because it cha~-acterizesmore sates; even though x might indeed be exactly 1, it is also the case that it is nonnegali\fe.
2.6.3 Semantics of Concurrent Execution The c o ~ ~ c t ~ ~statement > ~ e n t co-or equivalenlly a process declaration-is a control slatement. Hence, i t s el'fect is described by an inference side that captures the effect of parallel execulion. Processes are comprised of sequential srare~nents nlld syncliro~iizatio~i statements such as a w a i t . With respect to parlial correctness, the effect of an await statement (await (B) S;)
2.6 A Synopsis of Axiomatic Semantics
63
is much like an i f statemetit for which the guard B is rrue when execution of s begins. Hence, the inference rule Cot await is similar to the inference rule for if:
Await Statement Rule :
tP
A
8) S { Q )
(P) (await (B) S;) CQ)
The hypothesis says "if execution of S begins in a stele in wlijch both P and B are true. and s terminales, h e n Q will be true." The conclusion allows one to infer chat it is [hen Lhe case that the await statement yields state Q if begun in state P, assu~ning(bar rhe await statement terminates. (Inference rules say not1iillg about possible delays, as delays affect liveness properties, not safety properties.) Now consider the effects of concurrent execution, such as that specified by the following statement: co s,;
/ / S,;
//
...
/ / S,;
oc;
Suppose Lliat the t'ol lowing i s tnie for every statement:
According Lo the Interpretation of Triples (2.4), this means that, if S, j s begun in a state satisfying P, and s, terminates, then the state will satisfy Q ~ .For this interpretation to l~oldwhen the processes are executed concun-cnlly. tlie processes Jnusl be started in a state satisfying tbe conjuncrion oS the pi. Jf all the processes Lermi~zale,the final state will satjbfy the cor)juncLion oE the Q ~ . Thus, we get the following inference rule:
C o Statement Rule:
(P,) s, {I?,
A
co S,; {QI
A
{Qi)
... //
-.-
A
are interference free pnl
. .. A
/ / S,;
oc
Qn)
However, note the phrase in the hypothesis. For the conclusion to be true, Llle processes and their proofs tnwt not irllerfere with each other. One process inteqeres wilh anodzer if il executes an assignment that invalidates an assertion in the other process. Assertions characterize what a process assumes to be t111le before and after each statement. Thus, if one process assigns to a shared variable and thereby invalidates an assuinption of another process, the proof of the other process is not valid.
64
Chapter 2
Processes and Synchronization As 2111 example, consider the followi~~g si~npleprogram:
If [he program is started in a stare in which x is 0, then when the program terminales, x will bc 3. But what is rrue about each process? Neither can assume that x is still 0 when i t starts, because the order of execution is nondeteminis~ic. In palticular, if a process assumes that x is 0 when i t begins, rhal assertio~iwill be interfececl will? if the other process executes first. However, what is rrue is captured by lhe following:
The assertions in each process account f01- the two possible execution orders. N o ~ calso thar the conjunction of the preconditions is indeed x == 0 arid that [he co~~junction of thc postconditions is indeed x == 3. The above display js an example of what is culled a PI-oofo~rline. There is RII assertion before and after each slalement, and all of the resulting triples are rrue. (There are three triples: one for each process and one for the co stalement.) Hence, the proof outline captures all thc key pants that would exist in a Fortnal proof of Lhe correctness of che above program. The formal definition of noninterference follows. A n ussign,wr.en/ uction is all assign~nerltstatement or an await stateme~~r that contains one or morc assignments. A criticcll assertion is a precondition or postcondition that is not wi thjn an await statement.
(2.5) Noninterference. Let a be an assignment action in one process and let pre ( a ) be its precondition. Let c be a critical assertion in another process. Jf necessary, rename local variables in c so their names are different from the names ol' local variables in a and pre (a). Then a does not interIere with c if the following is u theorem in progranuni~iglogic PL:
2.7 Techniques for Avoiding Interference
65
I n short. crjtictd assertion c i s i nva.~.ianlwith respect to eexecul.ion of assignincnt action a. The precondition oC a is included in (2.5) because a can be execu~ed only if the process is in a spate satisfying pre ( a ) . As an example of the use of ( 2 . 5 ) , consider thc last program above. The precondition of [lie first process i s a critical asserrion. Ic is not interfered with by the assignment statement in 111~second process because Lhe following lriple is Lrue:
The Lirst predicate simplifies to x == 0,so afrer adding 2 to x, [he value of x is either 0 or 2. Whal tl~istriple expresses is the fact thac if the second process execu~esbefore the first one: d ~ e nthe value of x will be 2 when the first process begins execution. There are lhree rnore critical assertions in (he above program: the poslcolidition in the firs1 process and the pre- and postconditions ill lhe second process. The noninterference prods are all similar to the one above,
2.7 Techniques for Avoiding Interference The processes in a concurrent progr(m work together to c o ~ i i p ~ resulcs. ~te The key I-ecluirelnent for having a colrecr program is that the processes not interfere with each olher. A collection of processes is interference-free if no assignmen( acfjo~iin one process interferes with any critical assertion in another. Thi.s section describes Couc basic reclitliques 1-'or avoiding ii~terfcrencc,and hence four techniques t h a ~car1 be used t o develop col-rcct concurrent programs: ( l ) dis,ioint variables, (2) weakcned asse~tions,(3) global invariants, and (4) synclwonizatjon. These ~echniquesare eo~ployedextensively lhroughoul h e remainder of the book. All involve putting asserlions and assigntnenl actions in u fowl thal ensures that noninterference formulas (2.5) a]-eCI'LI~.
2.7.1 Disjoint Variables Recall that the wrife s e ~of a process is rhe sel of variables that it assig~isto (and possibly also reads), and the relld set al'a process is thc set of vuiables I.hal il reads but does 11ol alter. The ~.efer.enceset of a process is the set of variables thal appear in the nsserlions in a proof of lhai procehh. The referel~ceset of a process w i l l oRcn be the same as h c uniorl oC the read and wrile sels! but it [night not be. With respect to int.erl%rence,Lhc critical variables are those i n assertions.
66
Chapter 2
Processes and Synchronization
If the write set of one process is disjoint from the reference set of a second, and vice versa, then the two processes callnot interfere. Formally, this is because the Assigntnent Axiom employs textlid substitution, which has no effect on a predicate that does not contain a reference to the target of the assignment. (Local variables in differenc processes are different variables, even if they happen to have the same name; rhus they can be renamed before applying the Assignment Axiom.) As an example, consider the following program:
If x and y are initially o, then from the Assignment Axiom, both of the following are theorems:
Each ptocess contains one assignment statement and two assertions; hence thes-e are Tou~no~iinterferencctheorems to prove. Each one is trivially true because the two processeh reference di.fferent variables, and hence the substitutions that ~'esull Iron1 the Assigr~mentAxiom are vacuous. Disjoint writelreference sets provide the basis for ruosc parallel itetative algorithms, such as the matrix rnultiplicatio~ialgorithm described in Section 1.4. As nnoclicr example, different branches of Lhe lree of possjble moves in a game-playing program can be searched in parallel. Or, multiple transactions can examine a database in parallel or they can ilpdate dil'f'erenl relations.
2.7.2 Weakened Assertions Even when rhe write and reference sets of pfoccsses overlap, we can sometilnes avoid interkrence by weakening assertions to take inlo account the effects o-l' concurrenl execution. A weakened assertion is one that admils more program states rhan anocher assertion that might be true of a process io isolation. We saw an example in Section 2.6 using the following program:
2.7 Techniques for Avoiding Interference
67
Here, the preconditions and postconditions in each proccss are weaker Lhen they could be in isolation. In particular, each process could assert in isolation hat if x is initjally 0, then upori tecmination clle value of x is 1 (first proces~)or 2 (second process). However, these stronger assertions would be itlcerfered wjrh. Wcaltcncd assertions have more rcalislic applications than the simplis~ic problem above. For example, assume a process schedules operations on n moving-head disk. Oll~erprocesses insert opel-ations into a queue; when (he disk is idle, the ,scheduler cxamines the clueue, selects the best opct.atio11according t o some criteria, and stark that operation. Although the sclieduler may have selected the best operation at the time it examined the queue, it is not the case that at all times the disk is performing the best operatio~i-or eveti that at the time an operation is started it is still the best one. This is because a pcoccss ~niglil have inserled anothel; becler operation into the queue jusl alter the selection wiis made-and even before the disk started lo execute the selected operalion. Thus. ''best" in this case is a time-dependen1 property; however, i t is sufficient for scheduling problems such as this. solutiotls As anolher example, Inany parallel nlgorichms for aj~proxirnati~lg to pat-tial differetltial equations have the following fonn. (See Chapter 11 for a speciGc example.) The problem space is ap(~roximzltedby a finite grid of points. say g r i d [ n , n] . A process is assigned 10 each grid poi~)t-or more comnlonly. a block of grid poinls-as in the Ibllowing pl-ogratn outline: double grid [n,nl ; process PDE[i = 0 to n-1, j = 0 to n-11 while (not converged { grid [i,j 1 = f (neighboring points) ; 1 1
{
The Function f computed on each iteration might, for example, be [heaveraige of
the. ~ O L I Ineighboring . points in the sar.nc cow and column. For many prublems. the value assigned to grid[i, j] on one iteration depends on the values of the neighbors from the previous iteration. Thi~s,the loop invariant would characterize this relatiori between old and new values Tor g.1-it1points. In order to ensure that the loop invarjanr in each procexs is nor i~iterfeled with, the processes must use two matrices and musl syncli~onjzeafter each iteration. In particular, on each iternfon each PDE process reads values from onc matrix, co~npi~les f . and then assigns the result to the second matrix. Each
68
Chapter 2
Processes and Synchronization
process then waits until all processes have computed the new value for their grid points. (The next chapter shows how to ilnplzrnent this kind of synchronization, which is called a barrier.) The roles of the matrices are then switched, and the processes execute another iteration. A second way to synchrollize the processes js to execute them in lockstep, with each process executing lhe same actions at the same time. Synchronous multiprocessors support this style of execution. This approach avoids interference since every process reads old values from g r i d before any process assigns a new value.
2.7.3 Global Invariants Another, very powerful technique for avoiding interference is to employ a global invariant ro capture the relations between shared variables. Jn fact, as we shall see starting in Chapter 3, one can use a global invariant to guidc the developrnenr of a solution to cmy synchl-onization problem. Suppose I i s a predicate that rel-krencesglobal variables. Then I is a global irtvz~ri~nr with respecl to a set of processes if: (1) I is tlne when Lhe processes begin execution, and (2) I is preserved by every ahsignment action. Condition 1 is satisfied if I is true in the initial stare of every process. Condition 2 is satisfied if. I'or every ashignrnent action a, I js true afler executing a assuming lhat I is true before executing a. I n short, the two conditiotls are all instance of the use of ~nathemaricalinduction. Suppose predicate r is a global invariatlt. Further suppose [hat every critical assertio~ic in the proof of every process P, has the fonn I A L, where L is a predicate abouc local variables. In particular, each variable referenced in L is eilller local to process P, or it is a global variable that only P, assigns to. IP all critical assertions can be put in this form I A L, then the proofs of the processes will be interference-free. This is because (1) r is il~variantwith respect to every action in one process can interfae assignment actjon a: and (2) no assig~l~nent with a local predicate L in another process since the target (left-hand side) of a is different from the variables in L. Wheu all asserlious use a global i.nvarian1 and local predicate as above, Noninterference recluirement (2.5) is met for every paj r of assignment acrions and crilical assertions. Moreover, we have to check only the triples in each process to verify that each critical assertion has the above form and that I is a global invarianl; we do not even have to consider assertions or statemenk in other processes. In fact, for an array of identical processes, we ollly have to check one of them. I n any case, we only have to check a linear number of slaleinents and assertions. Contrast this lo having to check (or test) an e,vporlentinl number OF progl-am histories, as described in Section 2.1.
2.7 Techniques for Avoiding Interference
69
We will employ the technirluc of global invariants extensively in Llie rem;li~lder of the text. We illustrate the usef~ulnessand powel- ul'the techniqut: at the end o f t h s section afkr first iatroducing the fourth technique for avoiding jnterference: synchl-onization.
UFFE 'CCEN
2.7.4 Synchronization Sequences of assignment sratetrjents thac are within await statements appear to other processes to be an indivisible unir. 1-Ience we can ignore the effects or' the ilidividual statenieliis when considering wherher one process interferes wjrh anohel: It is sufficient to consider only whether the entire sequence of statelnents miglzl cause interference. For exampk, consider the follou~ingatomic action: I
Neither assignment by itself can cause interference, because no other process can see a state in which x has been incremented bul y has not yet been incremented. Only the pair of assigt~me~~rs might cause interference. Internal states of program segments ufithin angle brackets are also invisible. Hence, no asseri-ion about an internal state can be interfered wiLh by another process. For example, the assertion in the middle of the atoinic action below is not a critical asserlion:
I
These two allributes of a~omicactioru lead 10 two ways lo use synchronization to avoid interference: mutual exclusion and condition synchronization. Cousider the following:
Ijere, a is an assignment slatement in process PI, and sl and s2 are statements in process ~ 2 Criticaj . assertion c is the precondition of 5 2 . Suppose that a interferes with C. One way to avoid ~nlerferenceis to use ~nulualexc(usion to "hicle" assertion c from a. This js dons by co~nbiningstatemelils SI and s2 in (he second process inlo a single aromic actlon,:
70
Chapter 2
Processes and Synchronization
This executes sl and s2 acomically and hence makes state c invisible to other processes. Anotl~erway lo avoid interference i s to use condition synchl-onizalion to s11-erlgthenLhe precondition of a. In particular, we can replace a by the following conditional atomic action: (await (!C or B) a;)
Here, B is a predicate characterizing a set of staces such that executing a will make c true. Hence the above siaternent avoids interference by either waiting u n t i l c is Falhe-and hence statement s2 could nol possibly be about to execuleor by ensuring lhat executing a will make c true-slid hence that ir would then be fine lo execute s2.
2.7.5 An Example: The Array Copy Problem Revisited Most couculrent programs employ a combination of the above leclrrliques. Here we illusirult ull ol' chern in a single, simple program: the an-ay copy problem show~lin Figure 2.2. Recall that the program uses a shared buffer buf 10 copy the contents of array a in the producer process to array b in the consumer process. Thc Producer and Consumer processes in Figure 2.2 alternate access to buf. First Producer deposils [he first elernellt of a in buf, then consumer fetches it, Lhen Producer deposits the second element o t a, and so on. Variables p and c count the nuinber of items that have been deposited and fetched, respectively. The a w a i t stalemeuts are used to synchronize access Lo buf. Wllen p == c the buffer i s empty (die previously deposirecl element has been feichecl); when p > c the buffer is full. Slippose the initial conlents 01' a [ n ] is some collection of values A l n ] . (The A [i] are called logical variables; they are sililply placeholders for whatevw the values actually we.) The goal is to prove Lhat, upon termination of [lie above program, the conterits of b [n] are the sarxle as A [n], the values i n arrray a. This goal can be proved by using the following global invariant:
Since tl-te processes altertlale access to b u f , at all times p is equal to c, or p is one more than c. Array a is not altered, so a l i ] is always equal fo A [ i ] . Finally, when the bufrer is full (i.e., when p = = c+1), it coilrains ALP-11. Predicate PC is true initially, because both p and c are initially zero. lr is nlaiilrajned by evely assignment statement. as illustrar.cd by the proof outline in
2.7 Techniques for Avoiding Interference
71
Figure 2.4. In the ligurc. IP i s arl invariant ~ O Illie - loop in the Producer procesh, and IC is an inva~ianclor the loop i n 1he consumer psoccss. Predicates l P and IC are relatcd to predicate PC as indicated. Figure 2.4 is anotlie~example of a proof oulline, because there i s an assertion before aud afrer every slalernenl. and each criplc in (lie proof outli~lci s true. The h-iples i ~ )ellch process follow directly TI-om h e assignment slalernents i n each process. Assuming each process conlinually gels a chance to execute, the
int b u f , p = 0 , c = 0; {PC: c < = g <= ccl A a[O:n-11 == A[O:n-11 (p = = c+l) (buf = = ~ [ p - 1 1) 1
A
process Producer ( int a[n] ; # assume a [ i l is initialized to A [ i ] ( I P : PC A p i= n) while ( p c n) { ( P C A p < nl (await ( p == c ) ; ) # delay until b u f f e r empty { P C A p < n A p == c ) buf = a[pl ; ( P C A p < n A p = = c A buf = = ~ [ p ] ) p = p+1; { IP} }
{PC
p == n)
A
1 process Consumer { int b [n]; { I C : PC A c <= n A b[O:c-11 == A [ O : c - 1 1 ) while ( c < n) I { I C A c < n) (await (p > c);) # delay until buffer full { I C A C < n ~ > pc ) b[cl = buf; {iC A c < n A p > c A bCcl == A [ c l l c = ci-1;
(IC)
Z (IC
A
c ==
n)
1 Figure 2.4
Proof outline for the array copy program.
72
Chapter 2
Processes and Synchronization
await statements terminate since firs1 one guard is hue, t11e11the other, and so on. Hence, each process te~.minatesafter n ileracions. When the program terminateh, tlie postconclitions of both processes are true. Hence the final program state satisfies the predicate:
P C A ~ = = ~ A I C =A= Cn
Consequently. a.may b contains a copy OF array a. The assertions i n the two processes do not interfere with each orher. Most of them are a colnbinarjon o-F tlie global in\~~rianr PC and a local predicate. Hence lhey rlieet rhe requirements for noninterference described at the start of the section 01)GlobaJ Invariants. The four exceplions are the asserlions that specify relalions helween lhe values of shared variables p and c . These are not interfered with because oI' dlc await statements in the program. The role OF the await hlalernenls 111 Lhc array copy program is to ellsure LhaL the proclucer and cotlsumer processes allelnale access lo Lho buffer. This play5 two roles wjch respect to avoiding intert'erence. First, i t ensures chat the procehses canno1 access buf a t the salrle tjms; this js an instatlce of mutual exclusi011. Second. i t ensures that the producer does not. overwrite items (overflow) and (hat (he consumer does not read a11 itel11 twice (underflow): this is an instance (sf cundilion synchl-oni~ation. To summarize. this example-even Cl~ougl~ il is simplc-illustrates all ~ O ~ I I lech~~jqi~es for avoiding intel-1:erence. Firsl, many of the statements and ~nany parcs o f the assertions in each process are disjoinl. Second, we use weakened ;isserlions about the values of the shared variables; for example, we say that buf == A [p- 1 I , but o ~ ~ if l yp == c + l . Third, we use Che global i~ivariantPC to express the relationship belween h e vrdues of the shared variables; even though each variablc cha~lgcsas the program executes, this relationship does not change.! Finally, we use synchroniza~on-expressed usjng await statemeiltsto ensure the mutual exclusion and conditioa syc~clu-onizacionrequired for rhis program.
2.8 Safety and Liveness Properties Recall from Section 2.1 that a property of a program is an a~tribulethat is true of every pobsi ble hi story oF [hat program. Every interesring property can be formulated as safety or liveness. A safety properly asserts that norlling bad happens during execution; a liveness property asserts 111:it sonietliing good eventually happens. [n scqucntial programs, the key safety property is that Lhe final state is correcl, and (he key liveness property is termination. These properties ;we eqi~allyimportant for concuirent programs. In addition, r1ze1.eare other i.nterestirip safety and l iveness properlies lhat apply to concurrent programs.
2,8 Safety and Liveness Properties
73
n v o important safety properties in concurrent programs are mutual exclusion and absence of tleatllock. For ~ n u h i a lexclusion, the bad thing is having more 1lia.n one process exec~~ting crilical sectio~isol' slatelnents al Lhe same time. For dead.lock. the bad thing is having all processes waiting for conditions thal will never occur. Exa~nplesof Ji\leness propertj.es of concurreof programs are that a process will eventually get to enter a critical section, that a request for service will evenLually be honored, or (hat a message will everltually reach ils destination. Liveness properties are affected by schedul iog policies, which determine which eligible atomic actions are next to execute. I n this section we describe two methods for proving safety properties. Then we describe different kinds of processor scheduling poljcies and how they affe.ct liveness properties.
2.8.1 Proving Safety Properties Every actiori a program takes is based on its state. If a program fails to satisfy a safety property, there must be some "bad" state that fails to satisfy the property. For example, if the mutual exclusion property fivils to I~old,there must be some state in which two (or more) processes are simultaneously i n their critical sections. Or if processes deadlock, there must be some state in which deadlock occurs. These observations lead to a silnple method for proving that a program satisfies a safety property. Let BAD be a predicate that characterizes a bad program state. Then a program satisfies the associated safety property if BAD is false in every state in every possible history of the program. Given program s, to show that BAD is not true in any stare I-equiresshowing that it is not true in the initial state, h e second state, and so on, where the state is changed as a result of executing atomic actions. Alternatively, and more powerfully, if a program is never to be in a BAD state, then it must always be in a GOOD state, where GOOD is equivalent to BAD. Hence, an effective way to ensure a safety property is to specify BAD, then negate BA.D to yield GOOD, then ensure that GOOD is a global invariant-a predicate that i s true in every program state. Synchronizatjon can be used-as we have seen and will see many more titnes in later chapters-ro ensure that a predicate is a global invariant. The above is a general method for proving safety properties. There is a related, but somewhat more specialized method that is also very useful. Coiisider the following prograln fragment:
74
Chapter 2
Processes and Synchronization co # process 1 . .- ; ( p r e ( S 1 ) ) S l ;
- ..
/ / # process 2
...;
(pre(S2)) S 2 ;
...
OC
Tlierc are two statements, one per process, and two associated precot~ditions (predicates that are true hcfore each statement is executcd). Assume that rlle predicates are not interfered with. Now. suppose that the conjunctions of the precooditjons is false: pre(S1)
A
pre(S2) == f a l s e
TJljs means that the two processes cannot be at these statements at the same time! This i s because a predicate that is M s e characterizes no program state (the empty sel of slates. if you will). l'his mcthod i s called c?xclusio~zoj conjgurutio~zs, heci~usei l excludes tlie program configuration in which the first process is in slale g s e ( SI) at the same time [hat the secontl process is in stace p r e (s2 ) . As an example. consider the proof outline of the m a y copy progrnln in Figure 2.4. The await statement in each process can cause delay. The processes would deadlock if they were both delayed and neither could proceed. The Producer process is delayed if it is at its await statetnent and the delay condiLiar) is I'alse; i n I-llar state, the following predicate is true:
I'Cnp
< n n p !=c
Hencc, p > c w11er) tile Producer i s delayed. Similar1y, the Consumer process is delayed if it is at its await statement and the delay cotldition is false; that state sarisf es
Uecausc p > c ancl p c = c cannot be true at the same time, the processes cannot siini~l~aneously he i n these stares. Hence. deadlock cannot occur.
2.8.2 Scheduling Policies and Fairness Most l iveness properties depend on ,fairness, wli ich is concerned with guaranteeing that processes get the chance to proceed, regardless of what other processes do. Each process executes a sequence of aton~icactions. Au atomic action in a process is eligible if it is the next acolnic action in the process that could be executed. When there a - e several processes. there are several eligible atomic
2.8 Safely and Liveness Properties
75
actions. A schedulirig policy determines which one will be exccuted next. Tllib section defines three degrees of hirness thal a scliedul ing poljcy might provide. Recall that nn unconditional atornjc aclion is one thal does not have a delay condition. Considel- the following simple program. in which the processes execute uncondirional atomic actions: boo1 continue = true; co while (continue); / / continue = falae; OC
Suppose a scheduling policy assigns a processor to a pyocess until that process eit.her rer~ninalesor delays. 1f lhel-e is only one processor, the above program will nor lerminate jf the firs1 process is executed lirsl. Howcver: the program will teiminate i f the second process eventualiy gels a chance to execute. This is captured by the following definition.
(2.6) Unconditional Fairness. A scheduling policy is unconditionally fail- if every unconditional atomic action that is eligible is executed eventually. For the above program, round-robin would be an unco~~ditionally fair scheduling policy on a single processor, ancl parallel execution would be an unconditionally fair policy on a multiprocessor. When a program contains conditional atomic actions-awai t statements with Boolean conditiolls B-we need to make stronger assumptions to guarantee that processes will make progress. This is because a conditional atomic action cannor be executed until a is true.
(2.7) Weak Fairness. A scheduling policy is weakly fair if (1) it is unconditionally fair, and (2) every conditional atomic action lhal is eligible is executed evenlually, assuming that its condition becomes m e and then remains true until it is seen by the process executing the co~~ditional atomic action. In short, if ( a w a i t (B) S; ) is eligible and B becomes true, then B remains true at least until after h e conditional atomic action bas been executed. Round-robin and time slicing are weakly [air scheduling policies if every process gets a chance to execute. This is because any delayed process will eve~ltuallysee that its delay condition is true. Weak fairness is not, however, sdlicient to ensure that any eligible await statement evenrually executes. This is because the condition might change value-from false to true and back lo false-while a process is delayed. In thi.s case. we need a stronger scheduling policy.
76
Chapter 2
Processes and Synchronization
(2.8) Strong Fairness. A scheduling policy is strongly fair if ( 1 ) i f is unconditionally fair, and (2) every conditional atomic action that is eligible is executed eventually, assum.ing that its condition is infinitely often true.
A condition is infinitely often true if it is rrue an infinile number of times in every execution history of ;I (nontenninating) program. To be strongly fair. a scheduling policy cannot happen only to selecl an aclion when the corldition is I'alse; it must soinelilne select the aclion when the condiliotl is true. To see the difference between wed* and slrong fai~'~leg;s, consider boo1 continue = true, try = f a l s e ; co while (continue) { t r y = t r u e ; try = false;} / / (await (try) continue = false;) OC
With a stro~~gly fair policy, his prograln will eventually terminate, because try
is infinitely often rrue. However, with a weakly fail- policy, the program might no1 terminate, because try is also infinitely often false. U~~:Fortunacely, il is impossible lo devise a processor scheduling policy lhat i s both practical and strongly fair. Consider the above program again. On a single processor, u scheduler chal alteinates (he actions of the two processes would be strongly fair sincc Ihe second process would see a slate in which t r y js true; however, s~icha scheduler is i~npracticalto implemenl. Round-robin and time slicir~gare pl-aclical, but they are not strongly fair i n general because proccsscs execute in unpredictable orders. A mulliprocessor scheduler lhal executes t.he processes i n parallel is also practical, bul it too is not strongly fair. This is because h e second process might always esamine try when i t is f;~lse. This is uolikely, of course, but jt is theoretically possible. To furlher clarify the different kilicls of scheduling policies. co~isidcragain [lie :u7-ay copy prograrrl in Figures 2.2 and 2.4. As noted earlier, that progi.am is deadlock-frce. Thus, (lie program will terminate as long as e;lcli process continues to get 3 chalice lo make progtcss. Each process will make progress as long a$ Lhe scheduling policy is weak.ly fair. This is because, when one process makes Llle delay conclilion of the other true, chat condition remains true unril the other process contitlues and changes shared variables. Both await slalernents i n the w a y copy program have the form (await (B);): and B refers lo only orle variable altered by the other process. Consequenrly, boll1 await statetrlents can be implemented by busy-waiting loops. For example, ( await ( p = = c ) ; ) in the Producer CUI be irnplc~nencedby while ( p ! =
C ) ;
Historical Notes
77
Tbe program will tem.inare if the scheduling pol icy is unconditional1y fair, bccause now there arc no conclilional atomic actions and tbe processes alternalc access to the shared buffer. It is not generally the case, however. thar an unconditionally fair scheduljng policy wjll ensurc tcrrnirlation of a busy-waiting loop. Tbis is because ao uncor~ditionallyfair policy might always schedule tlie atomic action that examines the loop condition when the condition is true, as show11 in the example program above. IF all busy wailing loops in a program spit1 forever, a program is said Lo sufCer h-om livelock-the program is alive, but the processes are not going anywhere. Livelock is the busy-waiting analog of deadlock, and absei~ceof livelock, like absence of deadlock, is a safety property; the bad state i s one in which every process is spinning and none of the delay conditions is true. On the other hatld, progress for any one of tbe processes js a liveiless properly; the good thing being that the spin loop of an individual process eventually L.em~in;ttes.
Historical Notes One of the first, and mosl influential, papers on concurrent programming was by Edsger Dijks tra [ 19651. That paper i ntroduced the critical section problem and the p a r b e g i n statetnetlt, which later became called the cobegin statement. Our co statement is a gerleralizacion of the cobegin slatement. Another Dijkstra paper 119681 introducecl the producer/consumer and dining j~hilosopbersproblems: as well as others thal we examine in Lhe next few chapters. Arl Bernstein [I9661 was lhe iirst to specify tche conditions that ai-e sul'licienl to ensure that two processes are independent and lzence call be executed in pnrallel. Eel-nstein's conditions, as they are still called, are expressed in tenns of the inpul and outpl~tsets of each process; the input set contains variables rend by a process, and the output set contains variables written by a process. Bel-nstein's three conditions for independence of two processes are thal the intersections of the (inpul, oulput): (output, inpul), and (output, output) sets arc all disjoint. Bernscein's conditions also provide che basis for the data clependency analysis done by pal-allelizjng coml~ilers,a topic we describe in Chapter 12. Our delinilion of independence (2.1) employs read and write sets for each process, and a variable is i n exactly one set for each process. This leads lo a simpler, but equivalent, condition. Most operating systems texts show how irnplernenti~lgassignmen1 statements using registers and fine-grained atornic actions leads to the critical sectjon problerrr i n concurrent programs. Modem hardware ensures that there is some base level of atomicity-usually a word-for reading and writing memory. Larnport (1977 131 contains an inleresting discussion of how to itnplelnel~tato~rljcreads and writes of "words" i l only single bytes can be read and written atomically; his
78
Chapter 2
Processes and Synchronization
solution does not require using a locking mechanism for mutual exclusion. (See Peterso1111 983 ] for more reccnt results.) The angle brackel not.ation for specifying coarse-grai lied i~lomic actions was also invented by Leslie Lamport. ]-lowever. it was popularized by Dijkslra [1977]. Owicki and Gries [1976a] used await B then s end to specify atomic actions. The specific iotal lion used i t 1 this book cornbi~iesangle brackets and corzdi~ionulafonzic actions and a varjant of await. The rerms ~azcond~tionul are due to fied Sch~ieider[Schneicler and Andrews 19861, Thew is a wcallh 01' material on logical systems for proving pi-operiies of sequential and concu 1-1-en1programs. Schnejder [ 19971 provides excellent cove,rage of all (he topics summ;~rizedi n Sections 2.6 rhl-ough 2.8-as well as many more-and gives extensive hislo~.icalnotes and references. The actual content of these sections was condensed and revised horn Chapters I and 2 of my prior book [Anclrews 199 11. That cuaterinl, in turn, was based os earlier work [Schneider and Andrews 19863. A brier history of formal logical systems and its application to program veri f cation is given below; for more information and references, consult Schneider [ 19971 or Andl-ews [I99 11. Formal logic is concerned with the formaljzation a~ldanalysis oT systematic i ~ a s o n i ~methods. ~g Its origjns go back to the ancie111Greeks, and for the nexl two millennia logic was of intesest mostly to philosophers. However, the discovery of non-Euclidean geometries in the nineteenth century spurred renewed, widespread interest among mathematicians. This led to a systematic study of mathematics itself as a formal logical system and hence gave rise to the field of marllernatical logic-which is also called rnetamalhematics. Indeed, between 1910 and 1913, All!red North Whitehead and Bertrand Russell published the voluminous Pri~lcipiaMuthemar.ica, which presented what was claimed to be a systenl for deriving all of mathelnatics from logic. Shorlly thereafter. David Hilbesc sel out to prove rigorously that the system presented in Prirzcipia Mathelnatica was both sound and complete. However, i n 1931 Kurt Godel demonstrated h a t here were valicl statements that did not have a proof i n that system or i n czrzy similar axiomalic system. Gfidel's incompleteness resull placeti limikatious on formal logical syslems, bul it certainly did not stop intcrcsl in 01-work on logic. Quire the colilrary. Plnving properlies of programs is just one or many applications. Forlnal logic is covered in all slandatd textbooks on rnadlemalical logic or 1netnnlathema6cs. A lighthearted, entertaining introduction to the topic can be found in Douglas liofstadler's Pulitzer Prize-wi nning book Gddel, Eschet; Racl~: 1411. Eterncrl Golden Br-aid [Hofstadter 19791, which explains Godel's result and also describes how logic i s related to computability and artificial intelligence. to propose a Robert Floycl [I9671 is generally credited with being [lie [?I-s~ technique for proving that proglans are correct. H i s method involves associati~lg
Historical Notes
79
a predicate wit11 each arc in a tlowcl~artin such a way that, if the arc is t~.avcrsed, the pred.ica1.eis true. Inspired by Fl.oycl's work, Hoare (19691 developed the first formal logic for proving parrial colwclness propcctics o C seyue~itialprograms. 1-loare introduced the concept of a Lriple, the interpretation for triples, and axiorns and i n Cerence rules for sequenlial state~nents(ill his case, a subsel of Algol). Any logical system for sequential progrnlnming lliat is based on this style has since come to be called a "l-loare Logic." P~.ograrnmiiiglogic PL is an exalnple of a Hoare Logic. The first tvork on proving properties ol' concurl.ent programs was also baked oti a flowchal-I representation [Aslicroft and Manna 19711. At about the s a ~ n e time, Hoare 119721 extended his parr.ial correctness Jogic for becluential programs to include inference rilles for concurrency and synchronization, with synchronization specified by conditional critical regions (which are sj~nilarLo a w a i t statements). However, thal logic was inco~nplelesince asserrio~isi l l one process could not refererlcc \!ariables in anothcr. and global i1,variaiirs coil Id not ~cfcrence local variables. Susan Owicki. in n dissertation supervised by [>avid Gries. was the lirsl lo develop a complete logic I'or proving partial correctriess properties of concurrenr programs [Owicki 1975; Owicki and Gries 1976a, 1976bI. The Logic covered cobegin stalelnents arid synchronization by means of sbarcd variables, semaphores, or co~lditionalcritical regions. Tho1 work described the at-rnoslonce propelry (2.2): introduced aid for.rnalized the concept of inlerl'erence freedom, slid illusti.aced several of the techniqi~es[or avoiding inter-l'erence described in Secdon 2.7. On a personal note, the author had the good fol-tune co bc al Carnell while this research was being done. The Owicki-Gries work nddrcsses only three safcty pi-opel-ties: pat-lial correctness. rnutual exclusion, and absence of deadlock. Leslie Lamport llc377aI independent I y developed an idea sil-rlila1 to interference fr.eedom-monotone asserlions-as part of a general method for proving both safcty and liveness properties. I-Iis paper also i ~ltroduccclthe lenns .sqfeIy and I~VC'II.C.P.S.The rnelhod in Sectioi~2.8 fol- proving a safety prope~lyusing a global invariant is based on Lamport's mell~od.The method of exclusion of configurations is d u e lo Schneider [Schneider and Andrews 19861: i t is in turn a generalization of h e OwickiCrics merhod. Francez [ 19861 contains a thorough discussion of Fairness and its relation Lo termination: synchronization, and guard evaluation. The terminology for unconditionally fail; weakly fair, and suongly fajr scheduliilg policies comes from that book, which co~ltairlsan exleusi\/e bibliography. Tliese scheduling policies were first defined and formalized in Leh~nanet al. [I98 I], although with somewhat different terminology.
80
Chapter 2
Processes and Synchronization
Programming logic (PL) is a logic for proving safety properties. One way I'ormal proofs of liveness properties is to extend PL with two temporal operators: henceforrh and ever~tually. These enable one to make assertions of states and hence about the Future. Such a telnporul logic was a b o ~ sequences ~l first introduced in Pnueli [1977]. Owicki and Lamport [ I9821 sl~owhow to use temporal logic and invariants to prove live~lessproperties of concurrent programs, Again, see Schneider 11 9971 for an overview of temporal logic ancl how to use ir to prove l iveness properlies. 10construct
References A ndrews, G. R. 1991 . C n n c ~ ~ r r e nProgramtnir~g: t PI-incipfes and Pructice. Menlo Park, CA: Benjamin/Cummiugs.
Ashcroft: E., and 2. Manna. 1971. Formalization of properties of parallel programs. N ~ c h i n eintelligence 6: 17-41. Bcmstein, A. J. 2966. Analysis of programs for parallel processing. IEEE Trans, 0 1 1 Com,pur~rs EC-15, 5 (October): 757-62. Dijkstlx, E. W. 1965. Solutjon of a problem in concurrent progl-amming control. Comnz. ACM 8, 9 (September): 569. Dijkstra, E. W. 1968. Cooperating sequential processes. In E Genuys, ed. Prograrnmi~zgLunguages. New York: Academic Press, pp. 43-1 12. Dijkstra, E. W, 1977. On two beautiful solutions designed by Martin Rem. EWD 629. Reprjnted in E. W. Dijkstra. Selected Writirzgs olz Cbmpu/ing: A Personal Perspecrive. New York: Springer-Verlag, 1982, pp. 3 13- 18. Floyd, R. W. 1967. Assign.ing meanings to programs. Proc ante^ Math. Socicc Syn~p.in Applied ~l.lathemutics19: 19-3 1. Francez, N. 1986. Fnirness. New Yo&: Spri nger-Verlag Hoare, C. A. R. 1969. An axiomatic basis for computer programming. Cornm. ACM 12, 10 (October): 576-80,583. Hoare, C. A. R. 1972. Towal.ds a rhcory of parallel programming. In C. A . R . Hoare and R . H.Perroll, eds. Oyeralivlg .S.ystrrrzs Trchn,iquc.s. New YOI-k: Academic Press. Hofstadter. D. J. 1979. Godel, Eschet; Bach: An Eternal Golderz Braid. New York: Vintage Books. Larr1po.1-t,L. 1977a. Proving [he correctness of multiprocess programs. IEEE 'li.nns. 017 Sofh;ure Engr SE-3, 2 (March): 125-43.
Exercises
81
Lamport. .1, 1977b. Cor~currentreading and writing. Comm. ACM 20, I 1 (November): 806-1 1. Lehnian, D.. A. Pnueli, a ~ J. ~ d Stavij. 1981. Impartiality, justice, and Fairness: The ethics of cor~curt-en1tectnination. Proc. Eighth Colloy. on Auto~lzaro, Lungr., and Prog., Lecture Notes in Conlputer Science Vol. 1 15. New York: Springer-Verlag,264-77. Owick.i, S. S . 1975. Axiomatic proof Lechniques for parallel progrwns. TR 75-25 1. Doctoral dissertation, Tthaca, NY: Cornell University. Owicki, S. S., and D.Gries. 1976a. An axiomatic proof rechnique for parallel pl-og~u-ns,Ac.ti~1nforn.mtic~r6: 3 1940. Owicki, S. S.. and D. Gries. 1976b. Verifying properties of parallel programs: An axiomatic approach. Cornni. ACM 19, 5 (May): 279-85. Owicki, S., and L. Lamport. 1982. Proving ljveness properties of concun-ent programs. ACM Trrlns. on Prog. Languclges and Systems 4, 3 (July): 455-95. Pelerson, G. L. 1983. Concurrenl reading while writing. ACM. Trolls. on Pmg. Lnnguuges and S~sfenx.s5 . 1 (January): 46-55. Pnueli, A. 1977. The temporal logic of programs. PI-oc. 18tla S.yn1.11. olr the Foundations of Comnprcrer.Science, November, 46-57. Sch~leider,E R . 1997. On Concrkn-ent Proglnntming. New York: Springer. Schneidec, F. R., and G. R. Andrews. 1986. Concepts for concurrent programming. 111 Currenl T7en.h in Conc~.irrency,Leclure Notes in Co~npurer Science Vol. 224. New York: Springer-Verlag,669-7 16.
Exercises Consider the outli tle of Lbe program in Figure 2.1 that prints all (lie lines in a file that contain pattern.
(a) Develop the missing code fix- synclu-onizing access to buffer. Use the await staternenl to program he synchronization code. (b) Extend your program so that jt reads two files and prints a11 the lities that contairl pattern. Idenlify the independen1 activities and use a separate process for each. Show all synchronization code that is required.
Consider the solution to h e a-ray copy problem in Figure 2.2. Modify the code so tbat p is local to the producer process and c is local to the consumer, Hence, those variables cmnol be used co synclil.onize access to buf. Instead, use two
82
Chapter 2
Processes and Synchronization
new Boolean-valued shared variables, empty and full, to synchronize the two processes. Initially, empty js t r u e and f u l l is f a l s e . Give the new code for the proclucer and consulncr processes. Use await: staternerlts to program the synchronization. 2.3 Tlie Unix tee co~r)~rlancl is invoked by executing: tee filename
The co~n~natld reads the standard inpul and wriles il to bolh the standard oulput and LO file filename. In shol-t, i t produces two copies of (lie input. (;I) Write a secluential progra~nro implement this cornniand
(b) Parallelize your sequentid progratn to use three processes: one to read from standard input, one to w i l e to standard output, and one to write to file filename. Use the "co inside while" style of program.
(c) Change your answer to (b) so that il uses the " w h i l e inside con style. Jr-1 parlicular, create [he processes once. Use double buffering so that you can read and write in parallel. Use the a w a i t statement to syncluonize access to the bufl'ers. 2.4 Consider a simplified version of the U~lixdif f comtnand for conlparjng two text files. Tlie new command i s i~zvokedas differ filename1 filename2
It examines the two files and prints out all lines in the two files that are djtterent. In particular. lor each pair of lines that differ, the command writes two ljnes to sla~idardourput: 1ineNumber: 1ineNumber:
line from file 1 line from f i l e 2
If one file is longer than the other, the command also writcs one line of output for each extra line in the longer file.
(a) Write a sequential prograrn to j mplerne~ltd i f f e r . (b) Parallelize your sequential prograrn to use three processes: two to read the files. and one to wrile lo slandarcl ourput. Use the "co inside while'' style of program.
(c) Change your answer to (b) so that it uses the " w h i l e jnsjde cox style. I n ~~articular, create Lhe processes once. Use double bul'fering for each file, so thal you can read and write in parallel. Use [he a w a i t statement Lo synchronize access to the bufters.
Exercises
83
2.5 Given integer arrays a [ ~ : m ]and b [ l : n l , assume that cach array is sorted in ascending order, and that the values in each array are distinct. (a) Uevclop a sequential program to compute the number oT different values ~ h a l appcar in both a atid b. (b) [denci'fy the independent operations rn you^ sequc~~tial program and then inudify ttie program lo execute the independent operat ions i 11 pal-allel. Store the answer in a shared variable. Use tlzc co statement to spccify concurrency and use the a w a i t starement to specify any syncIi.ronizi~tio~i that ~nighlbe required.
2.6 Assume you have a tree that is reprzsellted using a linked structure. In pal.ticular, each node of the lree is a structure (record) lhat has three fields: a value ancl pointers to tlze leit and right subtrces. Assume a tiull pointer i s represeoted by the corlstant nu11. Write a recursive parallel program to cornpule the sun] of the values of all nodes in the tree. The tocal execution tiwe of lhe computarion sllould be o n the order of the height of Ihe tre.e.
2.7 Assume that tlie integer array a L 1 :n] has been initialized. (a) Write an iterative parallel program to compute the sum o f the elements of a using PR processes. Each process should work 011 a sNip of the array. Asstrme that PR is a filctor of n.
(b) Write a recursive parallel program Lo compute tlie sum of Lhe elene~ltso l the m a y . Use a divide-ancl-conquer scrategy t o cut the size of the problem in half for each recursive step. Stop recursing when the proble~zzsize is less than or' equal to some Lhreshold T. Use the scquenlial iterative algorithm Tor the base case.
2.8 A queue js often represented using a linked list. Assurnc rl~attwo variables, head and tail, point to the first and last elements of the list. Each ejernenl contains a data f eld and a link to the 11exlelement. Assume (hat ;I null link is represenled by die constant n u l l . (a) Write routines lo (1) search the list h ~ the r first elernenl (if any) that contains data value a, (2) insert a new element at the end of the lisl, and (3) dclele Lhe element from tlie frol~tof the list. The s e i ~ c hand delete routines should relurn n u l l if U~eycannot succeed. (b) Now assu~nethat several processes access the linked list. Identify the read
and write sets of each mutine, as defincd in (2.1). Which combina1io1-r~ of roilli ties car1 be executed in para1 lel? M'hicll combinations of rourines must execute one at a lime ( i t . , atom:ically)?
(c) Add synchron,izadon code to the thrce routines lo enforce the synchronizatjon you iderlcified in yoiu- answer to (b). h4ake your atomic aclions as slnall as
84
Chapter 2
Processes and Synchronization
possible, and do not delay a muline unnecessalily. Use [he await statement to prograrrj tbe synchronization code.
2.9 Consider the code Fragmenl in Seclion 2.6 h a t sels m lo rhe maximuin of x and y . The kiple for the i f statemenl in [hat code frngmenl uses (lie I F Stateme111Rule f If from Figure 2.3. What are the predicates P, Q, and B in chis application c ~ the St alemen t Ru l c'? 2.10 Cocisider the followitlg program: int x = 0 , y = 0; c o x = x + l ;x = x + 2 ;
/ / x = x + 2 ; y = y - x ; OC
(a) Suppose each assign~nenlstalemen1 is ilnplemel~tedby a single machine instruction and hence is atomic. How Inany possihle histories ;Ire there? What are the possible final values of x atld y?
(I)) Suppose each assignment slatelnenr is impleme~ltedby three atomic actions that load a register, add or sublract a value from that regislei-, then store d ~ result. e 1-low Inany possible histories are the)-e now'? What are Lhe possible final values (SF x and y'? 3,. 1 1
Consider (he l'ollowing program: int u = 0, v = 1, w = 2 , x; c o x = u + v + w ; / / u = 3; / / v = 4; / / w = 5; 0C
Assume tbat h e atomic actions are reading and writing ilidividual variables. What are the possible final \/slues orx, assuming hat expression u + is evaluated left to right?
(;L)
v
+ w
(b) What are the possible final, values of x jf expression u + v + w can be eval-
uated in ally order? 2.12 Consider (he following program: int x = 2, y = 3 ; co ( x = x + y ; ) / / ( y = x (a) What arc the possible filial values
*
y;)
of x and Y?
Exercises
85
(b) Suppose the angle brackets are removed and each assignment shlelnenr is now irnplement.cd by three atomic actions: read a variable, add or multiply, and wrile to a variable. Now what are the possible final values of x and y?
2.1 3 Consider the following lhree statements: S,:
S,: S,:
x = x + y; y = x - y; x = x - Y;
Assume tltal x is initi.ally 2 and that y is initially 5 . For each o-l' the following, whar are Lhe possible final values of x and y? Explain you^ answers. (a) s,;
s,;
S3;
2.14 Consider the following program: int x = I, y = 1; co ( x = x + Y ; ) / / y = 0; / / x = x - y ; OC
(a) Does the progl.;un meel the requiren1enl.s of the Ac-Most-Once Property (2.2)? Explain. (b) What ;uc the final values o i x ancl y? Explait1 your answer. 2.15 Consider the following program: i n t x = 0, y = 10; co while ( x ! = y ) x = x + l ; / / while (x ! = y )
y = y - I ;
OC
(a) Does the pl-og.racn meet the requii-einents of the A t-Most-Once Property (2.2)? Explain . (b) Will Lhe program terminate? Always'! Sometimes'! Never? Explain your answer.
86
Chapter 2
Processes and Synchronization
2.16 Consider the following program: int x = 0 ;
co (await (x ! = 0 ) / / (await (x != 0 ) / / (await (x == 0 )
x = x - 2;) x = x - 3;) x = x + 5;)
OC
Devclop a proo.i' outline rliat demonstrates that the final value of x is 0. Use the lechnjque of weakened asse~lions. ldenrify wh.ich assertions are critical assert i o ~ ~as s , defined in (2.5). and show that (hey ai-e not interfered with. 2.17 Consider the following program: c o (await ( x > = 3 ) / / (await ( x > = 2 ) / / (await (x == 1)
x = x x = x x = x
-
+
3;
)
2; 5;
) )
OC
For what initial values of x does Ihe program termiuale, assuming scheduling is weakly f ~ i r ?Whar are the correspondir~gfinal values'? Explain your answer.
2.18 Considel- Lhe fol(owing program: co (await ( x > 0 ) x = x - 1;) / / (await ( x < 0 ) x = x + 2;) / / (await ( x == 0 ) x = x - I ; ) OC
For what initial values ol' x does the program ~ e ~ m i n a tasalrr~it~g c, scheduling is
weakly (air? What are the corresponding final values? Explain your answer. 2.1 9 Consitler the following progrun: int x = 10, y = 0; while (x != y ) x = x - 1; y = y + 1; (await ( x == y ) ; ) x = 8; y = 2;
co // OC
i t ntkes [or the program to (el-~ninate.When the program does lerminate, what are the f i n d vali~esfor x and y'?
Explaj~)what
2.20 Let a [ 1 :m] tund b [ 1 :n] be inceger arrays, m lo cxpress the follo\ving properties.
> 0
and n > 0. W ~ i t epredicates
(a) All elemeiiLs of a are less than a1.l elements of b.
(b) Either a 01. b contains a single zero, bur not both.
Exercises
87
i
(c) Ir is not the case chat bolh a and b contain zeros.
(d) The values in b are t l ~ esame as the values i n order. (Assu~nefor this part that m == n.)
a, except they are
in the reverse
(e) Every element of a is an element of b. (f) Some element of a is larger lhan some elemeilt of b, and vice versa
2.21 The if-then-else statement I I ~ S the form: if (B) S 1; else 52 ;
I f B js true. sl is executed; othe~wise,s2 is executed. Give an inference rule [or this statement. Look at the If Stale~nentRule for ideas. 2.22 C~.nsiderChe fo1lowing f o r statement: f o r [i = 1 t o n ]
s; Rewritc the stalement using a w h i l e loop plus explicit assigninen1 state~nelirsto the quantifier variable i. Then use the Assignment Axiom and While Statement Rule (Figure 2.3) to develop an inference rule 'for such a for statement.
2.23 The repeat statement repeat
s; until
(8);
repeatedly executes stat.ement s until Boolean expression B is true at the end of some iteration. (a) Develop an ilifercrlce rule for repeat.
(b) Using your answer to (a). develop a proo€ outline that demot~stratesthat repeat is equjvalenl Lo S;
while (!B) S;
2.24 Consider the following precondition and assignment staternenc.
I > I
88
Chapter 2
Processes and Synchronization
For each of the following triples, show whether [lie above statement interferes with the triple. (a) ( x > = 0) ( x = x + 5 ; ) (x >= 5)
(b) (x >= 0) ( x = x + 5;)
(X > =
0)
(c) { x >= 10) ( x =
X
+ 5;) {x >= 11)
(d)
x
+
{X >= 10) ( X =
5;)
{X
>= 12)
(g) {Y isodd) ( y = Y + 1;) (Y iseven)
(h)
{X
i s a multiple of 3 ) y = x; {y is a multiple of 3 )
2.25 Consider the following progratn: int x = V1, y = v2; x = x + y ; y = x - y ; x = x - y ;
Add assertions belore and alter each stalemen1 to characterize [lie etfects 01' this prugranl. In particular, what are the final values o f x and y'?
2.26 Consider the folJowing program: int x, y ; c o x = x - l ; x = x + l ; / / y = y + l ; y = y - 1 ; OC
Sllow hat (X == y) s Cx == y ) is n theorem. where Show a1l nonin~ecferenceproofs in detaj I..
S is
Ihe co statement.
2.27 Consider the following pi-ograrn: i n t x = 0; ( x = x+2;) / / ( x = x+3;) / / ( x = x+4;) oc
co
Prove that ( x == 0 ) s (x == 9 ) is a tlieol-em, where s is the co statemen[. Use the technique of weakened asscrlions. 2.28 (a) Write a parallcl program (hat sets boo lea^^ variable allzero to true if integer array a [I:n] contains all zeros; otherwise, the program should set a l l z e r o to false. Use a co statement to exanline all m a y elements in parallel.
Exercises
89
(b) Develop a proof outline that sliuws that your solution is correct. Shou~thar the processes are interference-Free.
in integer anay a [ 1 :n] by 2.29 (a) Develop a prograin to find the maximum ~~tllue searching even and odd subscripts of a in parallel. (b) Develop a proof outl.ine tllal shows thai your solutjoll is correct. Show that lhe processes are inlerference-free.
2.30 You are given three inleger-valued functions: f ( i), g ( jI . and h (k). The doroain of each function is the nonnegalive integers. The siunge of each function ) all i. Tliere is a1 least one value i s increasing: for example, f (i) < f ( i + lfor common to the range o.F the three f~lnctjoos. (This has been callecl the earliest comnlon meeting time problem, with thc ranges of the functions being the titneb at which lhree people call meet.)
(a) Write a colicurrellt program lo set i, j. and k to the sm;~llestjoteget-s such Lha~f (i) == g (j) == h(k) . Use co to do comparisons in pa~:allel. (This program uses fine-grained parallelism.) (b) Develop a proof oulline that shows that your solulion is con-ect. Sbow thar the processes are intel.feretlce-free. 2.31 Assume that the triples t },?I S, {Q1) and {P,) S 2 {Q,) nl-e both rrue and chat they are interrerence-free. Assume that s, contains an await stateillelit (await ( B ) T). Let s, be S, wilh the a w a i t hlatelnenl replaced by while T;
(
!B);
Answer r l ~ efollowi~lgas independent questions. (a) Will {P,} S, (Q,) still be true? Carefully explain your answer. 1s it rrue always? So~nelimes'! Never? (b) Will (P,} s,' C Q , } and {P,) s, {Q,) scill b e i n r c r l e r e ~ c -Again. c:u-efi~lly explain your answer. Will they be inlerfcsencc-free alwny s'! Soniet i mes? Never? 2.32 Consider he following concurrent program: int s = 1; process f o o l i = 1 to 21 ( while ( t r u e ) { (await (s > 0 ) s = s-1;) Sf; ( s = s+l;)
1 1
90
Processes and Synchronization
Chapter 2
Assurne si is a statemen1 1is1 diat cloes not modify shared variable s. (a) Develop proof outlines for Lhe two processes. Detnonstrate thal the proofs of (he processes are interference-ll-ee. Then use t l ~ cproof outlj.ties and the method ol' exclusio~iof configurations it) Seclion 2.8 to show rhat s, and s, cannot execule a1 the same tj111eand tliat the program is deadlock-free. (Mint: You will need to introduce ausiliary variables to the program and proof outlines. These v;iriables should keep [rack of the location of each process.) (13)
What scheduling policy is required to ellsure that a process delayed at its first
await sralcment w j 11 evel~tuallybe able to pprocecd?
Explain.
2.33 Consider the following program: int x = 10, c
=
true;
co (await x == 0); c = false; / / while (c) ( x = x - 1); OC
(a) Wi l l the progl'aln terminate i.f scheduling is weakJg fair? Explain. (b) Will lhe program ter~niaateif scheduling is strongly fair? Explain. (c) Add h e follocving as ;I third arm while ( c ) [if
i
( X
< 0)
(X
oT the co statement: = 10);)
Repeat parts (a) and (b) for rhis three-process prograln. 2.34 The 8-queens problem is concerned with placing 8 queens on a chess board in such a way that no one queer) car1 attack another. One can atlack another jf both are in the same row or colurnn or are on the same diagonal. to LIK 8-queens Write a parallel recursive program to generate all 92 solurio~~s proble~n.(Hint: Use a recursive procedure to try queen placements and a second procedure to check whelher a giver) placerrlent is acceptable.)
2.35 The stable marriage problem is the following. Let Man [ 1 :n] and WomanCl :n] be al-rays ol processes. Each man ranks the woinen h o ~ n1 to n, and each woman ~-ailks(he Inen frorn 1 to n. (A tanking is a per~nuralionof tlie integers from 1 to n.) A pc~irirzg is a one-to-one correspondence of men and women. A pairing i s ,stczhle if, for Lwo rnen Man [ i l and Man [ j 1 and their paired women woman [pl and woman I s ] , bofli of the following condjtions are satisfied: I.
Man [ i ] ranks woman [ p J higher than Woman [ql , or Woman [ql ranks Man [ j I higher than Man [i]; and
2.
.
Man [ j 1 ranks woman [ql 11igIie1- than woman [p] or Woman [ p ] ranks Man [i) higher than Man [ j I .
Pur differently, a pairing is ~rnscablei f a ma) and wornan would boll1 prefer. each other lo I-heil-currenl pair. A solution to the stable marriage pro\>lemi s a set ot'n pairings, all of which are stable. (a) Give a predicate ihar speci hes rhe goal for the stable marriage problem. I
(b) Write a parallel program to solvc the slnble marriage problem. Be sure ro explain your bolution strategy. Also give pre- ant1 pos~conditionsFor each process and invariants for each loop.
Locks and Barriers
Recall chat concunrnt programs e~nploytwo basic kinds of sy~~chronizarion: murual exclusjorl and condition synchronization. This chapter examines two importan1 problems-cri tical sections and barriers-chat illustrate how to program these kinds of syuchronization. The critical section problem i s concerned with implementing atomic actions in software: the problem arises in mosl concurrenr programs. A barrier is a synchronization point that all processes must reach before any process js allowed ro proceed; barciers a e needed in Inany parallel programs. Mutual exclusion is typically implernenled by means of lock.^ lhal protect critical sectiolis of code. Section 3.1 cleGnes rlie critical section problem and presents a coarse-grained soli~tionthat uses the a w a i t scatemel~tto implemenl a lock. Seclion 3.2 develops fine-grained st~lutionsusing whac arc called spin locks. Section 3.3 presents three fail- solutions: the tie-breaker algorjrhm, the ticket algorithm, and Lhe balteiy olgoritbm. The val.ious solurions illustra~edifferent ways to approach the problem and have different performance and fairness attributes. The solutions to the critical section proble~nare also impo~t;int, because they call be used to implement await staleinents and hence arbitrary ato~rlicactions. We show how to do this at the end oPSectio11 3.2. The last half of (he chapter introduces three rechniqucs for parallel computing: barrier synchraniza~ion,data parallel algorilhms, and what is called a bag or tasks. As noletl earlicr. many problems can be solved by parallel iteralive algorithms i n whjch several identical processek repealed ly man jpulate n shared array. This kill(! of algorithm i s called a clcitu purdlel nigorithm since Lhe shared clata is manipulated in parallel. In such an algorithm, each iteration typically depends on the resihrs of the previous iteration. Hence. at the end of an iteration, faster processes need Lo wait for the slower ones befa-e beginning the nexl iteration. This
94
Chapter 3
Locks and Barriers
point is called a Dul-~-iet,.Section 3.3 describes various kind of synch~.onizalio~) ways lo irnplemenl bar~iersynchronizatioi~and djscusses the performance kadeoffs bctwccn Llieln. Seclion 3.5 gives several exarnplcs of data parallel algoritlilns h a t use barriers ancl also briefly describes thc dehigll of synclironous mulrip~-ocessors(SIMD machines), tvhicli ate especially si~itedto implementing data parallel algorillims. This is because SIMD ~nacllinesexecute instructions ill lock step on evcxy processol-: thence, they provide barriers aulo~naticallyafter every tilachine inslructiou. Section 3.6 presents anothex useful technique for parallel computing callecl a htrg c?j'tuslts (or a work farm). Tliia approach can be used to jmplemenl rccursive p;~rallelismand to implement iterative parallelistn whe~ithere is a fixed numbcr of' independent tasks. An iinportant itttlib~tcOF t h e bag-of-tasks paradig~nis that i t Facilit;ltes load balancing-namely, cnsuring Lhal each processor does about the same amoiuit of work. The bag-of-lash paradigin employs locks to itnpleinenc the bag and bacrial--like synchroilization to detect when a co~nputatiou is done. Tlie programs in d ~ i schaprer employ bus! wuiting, which is w) implelnenlation of hynclu-onization in which a process I-epeatcdly checks a condition until it becolnes truc. The virtue of busy wailing synchronization is that we can implement j r using only (he machine instr-uctions available on modern processors. Altlioi~ghbusy waiting is inefficient when processes share a processor (and hence cheir execution is interleaved), it is efficje~~I when each process exec11tes on i1,s own pi-occsso~: Large rnullipl-ocessors have beer) used for many years to support high-perfo~.mancescicnlilic compu1;ltions. Small-scale (two to four CPU) multiin worlcslations and even personal computers. processors are beco~ningcornmo~~ The operating system Icernels for- muldprocessors employ busy waiting synclironizi~lio~i, as we shall see later in Section 6.2. Moreover, hardware itself c~nploys btrsy waiting synclu-onizarion-for example. to synchronize clnh transl'ers on inelnory busses and local net.works. Soffwwe libraries for multiprocessor machines include routines for locks and somctirnes for barriecs. Those library routines are implemented using the techniques described in this chapter. Two examples of such libraries arc Ptlu-eads.
3.1 The Critical Section Problem The crilicul section ptvblern, is one of the classic colicilrrent programming problenis. Tf was the iirst problem to be studied extensively and remains of' ic~tet-est since mosl coocurrent progralns have critical seclions of code. Moreo\;er, a
3.1 The Critical Section Problem
95
solution to the problem can be used to irnple~nerltarbiuary a w a i t slatemenls. This sec~iondefines h e problem and develops a coarse-grained solution. The nexl two sections develop a series o f fine-grained solutions that illustrate various ways to solve the problem and that use different kinds of machine instrucllions. It) the critical section problem, n processes repeatedly execute a critical lhen a noncritical section of code. The critical section is preceded by an entl-y protocol and followed by an exit protocol. 'fhus, we assume here that tl~eprocesses have the Collowing form: (3.1) process cS[i = 1 to n]
(
w h i l e (true) (
entry protocol; critical section; exit protocol; no~~critical section; 1
1
Each critical section is a seqilcllce of sl-atements that access sotne shiv-ed object. Each noncritical section i s another sequence of statements. We assume h a t a proccss that enters i.ts critical section will eventually exit; rlius, a process may tcmminale only outside its critical section. Our task is lo design entl-y and exit protocols Ular satisfy the folJowing four properties. (3.2) Mutual Exclusion. At most one process at a time is executing irs critical section. (3.3) Absence of Deadlock (Livelock). If two or more processes are trying to enter their critical sections, at least one will succeed. (3.4) Abse~~ce of Unnecessary Delay. If a process is trying to enter its critical section and the other processes are execi~tingtheir nol~criticalsecrions or have terminated, tlie first process is not prevented from entering its critical section.
(3.5) Eventt~alEntry. A process rhat is attempting to enter i t s critical section will eventually succe.ed. The firs1 tl~reeare safely properties, rl~efourth is a liveness property. For mutual exclusion, the bad state is one in which two processes are in Lliejr critical section. For absence of deadlock, the bad state is one in which all the processes are wailing to enter, but none is able to do so. (Thjs is called absence of liv~lockin a busy-waiting solution, because the processes are alive but looping forever.) For absence of utloecessary delay, the bad state is one in which the one process ll~at
96
Chapter 3
Locks and Barriers
wants to enter callnot do so. even ~Iiouglino other process is in the critical seclion. Eventual enlry is a liveness properly since i t depe~lclson the scheduling
policy. as we shall see. A trivial way to solve the critical section problem is lo enclose each critical is, to use uncondi~ional await stacenients. section in angle brackets-that Mut~~al exclusion follows i~ntnedjalelyfrom rhe semantics of angle brackets. Thc orher three properties would be satisfied if scheduli~igis uncondilionally fair since d~iitscheduling policy ensures that :I process atte~nptingto execute the atomic action corresponding Lo its critical section would cvenrually gel to do so, no matter what the other procehses did. Howc\~er,[his "snlutiorl" bogs the issue of how to i~nplementangle brackets. While all four propa-ries above are import~~nt, lnuti~alexclusion is the most S i t first and then consider how also to achieve the essential. f l u s , we ~ O C L I 011 other properties. T ' o specify the in~rtualexclusion property, \ye need a way to indicate whelher a process is i n its critical section. To simplify noration, we develop a solution for two processes, csl and cs2: [lie solution generalizes readily to oue for n processes. Let i n 1 and i n 2 be Boolean variables thar are initially false. Wlien process csl (csz) is in its critical section, we will set i n 1 ( i d ) true. The bad slate we wan1 lo avoid is onc in whicl~both in1 and in2 are true. Thus, we want every state to satisfy the negation of [he bad stale: MUTEX:
-,(in1
A
in2)
Preiljczlie MUTEX needs to be a global invariaat, as defined in Section 2.7. For MUTEX to be a global in\rai-iant, it has to be true i n the inil.ial state and after each assignnient to in]. or in2. I11 particular, befol-e proccss CSI enters its critical section-and hence sets in1 true-it needs to ~nakesure that i n 2 i s hlse. This can be i~nplementedby using a conditional atomic action: (await (!in2) i n 1 = t r u e ; )
Tlze processes are symmetric, so we use the same kind of conditional atomic action for the entry protocol in process C S ~ .What then about the exit protocols? 1 is never necessary to delay when leaving a critical seclion, so we do not need to guard the assig~i~nents that set in1 and in2 ,false. The solution is shown in Figure 3.1. By consrruction. the program satisfies [lie mutual exclusion property. Deadlock is avoided because if each process wcre blocked in its entfy protocol, then both in1 and in2 would have to be true, but this conrradicts the fact thal both xe false at this point in the code. Unnecessary delay is avoided because one process blocks only if the other one is not i n ils
3.2 Critical Sections: Spin Locks
97
boo1 in1 = false, in2 = false; ## IMUTEX: -(in1 A in2) -- global invariant process CS1 ( w h i l e (true) ( ( a w a i t ( ! i n 2 ) in1
= true;)
/ * entry * /
crjtical section; in1 = falee;
/ * exit
*/
/ * entry
*/
/* exit
*/
11oncd.tica1seclion; 1
1 process CS2 ( while ( t r u e ) {
(await (!inl) in2 = t r u e ; )
critical section; i n 2 = false;
noncritical section; 1 1
Figure 3.1
Critical section problem: Coarse-grained solution
cri~icalsection. (All tlzlre of these propertjes can be proven formally i~sjngthe metl~odof exclusion of configurations i~ltroducedin Section 2.8.) 1-:inally,consider the liveness property that a process t~yjoglo enrer its critjcal section evcnrually is able to do so. If csl is tryiiig to enter but cannot, then in2 i s true, so C S is ~ i n its crilicdl section. By the assl~mplionLhar a process in ils critical section eventually exirs, i n 2 will evcnl-ually become false and hence CSI'S entry guard will become [rue. IT C S is ~ still not allowed entry. i l i s eilher cs2 again gains entry to i t s critical because the scheduler is unfair 01.beca~~se section. In the latler situation, the above scenario repeats. so eve~ltuallyi n 2 becomes t-21lse. Thus, i n 2 becomes false infinitely oilen-01 cs2 halts, in which policy is case in2 becomes and remains false. A stro~lglyfair scl~edt~ling required to ensure that csl eventually gains entry in eilher case. (The argu~nen~ for C S is~ symmerric.) I-Iowever, recall Lhai a strongly fair scheduler is impractical. We will address this issue in Section 3.3.
3.2 Critical Sections: Spin Locks The coarsc-grained solutjon it1 Figure 3.1 employs T W O variables. To generalize ihe solulion to n processes, we would have to use n variables. I-Towever, there
98
Chapter 3
Locks and Barriers boo1 lock = false; process CS1 C while (true) (
(await (!lock) lock = t r u e ; )
/ * entry * /
critical section; lock = f a l s e ;
/ * exit * /
noncridcal section; 1 1 process CS2 (
while (true) ( (await ( ! l o c k ) l o c k = t r u e ; )
/ * entry * /
critical section; lock = false;
/ * exit * /
noncritical section; 1 1
Figure 3.2 Critical sections using locks.
are only two interesting stales: some process is in its critical section or no procehh is. One variable is suficienl to clistinguish between these two slates, indepenclent of the t~urnberof processes. a process is i j , a crilical Let lock be a Boolean \/ariable that indicates seclion. That is, lock is true when eilher in1 OT in2 is [rue, and i t i s False otherwise. Thus we have the following requirement: l o c k == (in1 v in2)
By using lock in place of in1 and in2, the entry and exit protocols in Figure 3.1 can bc implemented as shown in Figure 3.2. A virlue of the entry and exit protocols in Figul-e 3.2 relalive to those in Figure 3.1 i:, that they can be used Lo solve rhe critical sectio~iproblem for any nurn be]. of processes, not j u s ~two. In particular. any number OF processes could share lock and execute the hame protocols.
3.2.1 Tesf and Set The significance of rhe above change of variables is that almost all machines, especially ~nultiprocesso~-s, have some special instniction Ihat can bc used Lo jmplemcnt the conditional atomic actions in Figure 3.2. Here we use one called
3.2 Critical Sections: Spin Locks
99
Test atld Set. LVe use another one, Felch and Add, in the tlexl section. Addjrional special instructions are described in the exercises. The Test-and-Set (TS) insimchon takes a shared l o c k variable as an argumen1 and retunis a Boolean resull. As an atomic action, TS reads and saves the value of l o c k , sets lock to true, then returns the saved initial value of l o c k . The effect oT Ihe instruction is captured by the following function:
(3.6)
boo1 ~ ~ ( b o lock) o l { ( b o o l initial = l o c k ; lock = true;
return i n i t i a l ; )
/ * save i n i t i a l value * / / * set lock * / / * return initial v a l u e * /
1
Using
TS,
we can implement the coarse-grained solution in Figure 3.2 by
the algorithm i n Figure 3.3. In pal-ticular, Lhe conditiorlal atomic actions in Figure 3.2 are replaced by loops that do not terminate until lock is false. and hence
TS retiuns false. Since all processes e x e c ~ ~the t e same protocols. the solution as shown works for any number of processes. When a lock variable is used as in Figure 3.3, it is typically called a spiu lock. This is because he processes keep looping (spinning) while waiting [or the lock to be cleared. The program in Figure 3.3 has the following properties. M i ~ t u Jexclusion (3.2) is ensured because, if two or more processes are trying to enter their critical scction, only one can succeed i n being the first to change the value of l o c k From l y wjU [erminate its eritry protocol. Absei~ceof deadfalse to true; hence, o ~ ~ one
lock (3.3) results from the fact that., if both processes are in their enuy prorocols, l o c k is false, artd hence one of the processes will succeed in entering its critical section. U~lnecessarydelay (3.4) is avoided becausc, if both processes are outside their critical section, l o c k is false, and hence one can successfully enter if the other i s execuljng its noncrilical seelion or has termina~ed.
boo1 l o c k = f a l a e ;
/ * shared lock * /
p r o c e s s C S C i = 1 to nl { while ( t r u e ) { w h i l e (TS(1ock)) skip;
/ * entry p r o t o c o l * /
critical section; lock
= false;
/ * e x i t protocol
noncritical section; 1 1 Figure 3.3
Critical sections using Test and Set.
*/
100
Chapter 3
Locks and Barriers
On rhe olher hand, eventual entry (3.5) is not necessarily guai-anteed. If scheduling is strongly fair, n process trying to enter its ccitical section will eventi~allysucceed, because lock will become false infinitely often. If scl~eduliiigis only weakly fair-which is most comlnonly the case-[hen ;I process could spin I'orever in its entry protocol. However, this can happen only if there are always other pcocessss trying and s~~cceedj~lg to enter their crilical sections, which sho~iltlnot be rhe case in practice. Hence, the solurion in Figure 3.3 is likely LO be fair. A solution co the critical section problem siniilal. to the one in Figure 3.3 can be employed on any lnachine [hat has some instruclion lhac tests and alters a shat-ed variable as a s.ingle atomic action. For example, sotne machines have an incremenl it~structionthar inclainents an integer value and also sets n condition code indicating whether the result is positive or nonnegative. Using this instructio11, tlze enrry prolocol can be based on the transition from zero to one. 'The exercises coiisider several representative instructions. (This kind of quesrio~lis a favorite of exam writers!) The spirl-Jock solutions we have dcvcloped-and the oncs you ]nay llave to conslrucl-all have [he l'ollowing attribute. which is worth remembering:
(3.7) Exit Protocols in Spin-Lock Solutions. Io a spin-lock solution ro the crilical seclion problem, the exi 1 protocol should himply reset the shared variables to their inilial values. In Figurc 3.1, this is a state in which in1 and kn2 are both false. In Figures 3.2 and 3.3, this i s a slate in which lock is false.
3.2.2 Test and Test and Set Altl~ougl~ the solution in Figure 3.3 is correct, experi~nenlson aultjprocessors have sliown that it can lead lo poor yerfor~nailceif several processes are cornpetjng for access to a critical section. This i s because l o c k i s il s bared variable and every delayed process continuously references it. This "hat spot" causes molzory contenlion, which degrades h e perfosmal)ce of memory units and processormemoly interconnectio~inetworks. I n addition, the TS instruction writes into lock every time ii is executed, even when the value or lock does nol change. Since shared-memory multiprocessors ernploy caches to reduce traffic to primary memory, this makes TS signif cantly more expensive than an instruction that merely reads a shared variable. (Whcn a variable is written by one processor, t.he caches on other processors need to be invalidated or altered if cbe)! contain a copy of the variable.)
3.2 Critical Sections: Spin Locks
101
/ * shared lock * /
boo1 lock = false;
process CS[i = 1 to nl { while (true) C while (lock) skip; while (TS(1ock)) ( while (lock) skip; 1
/ * entry protocol * /
critical section; lock
=
/*
false;
exit protocol
*/
rioncrilical section; 1 1
Figure 3.4
Critical sections using Test and Test and Set.
Memory co~~ter~uon and cache invalidation overhead can both be reduced by modifying rhe entry protocol. Inhtead of siruply spinning until a TS instruction rerurns true, we can illcrease the ljkelihoocl that it returns true by i~siogthe Tollowing ent~yprorocnl : while (lock) skip; while (TS(1ock)) C while (lock) skip;
/ * spin while lock set * / / * try to grab the lock * / / * spin again if fail */
}
This is called a E>st-and-Test-and-Serprotocol because a process merely lesls lock until rhere is the possibility t l ~ a lTS can succeed. In the two additional
loops. lock is only examined, so its value call be read from a local cache withou~ arfecljng other processors. Memory contenlion is Lhus reducecis but i t does not disappear. I n particular, when lock is cleared at leasf one and possibly all delayed processes will execute TS,even though only one can proceed. Below we describe ways to reduce memory cont.encion further. Figure 3.4 presents the f ~ i l lsolulion to the critical scction problem using the Test-and-Test-and-Set e n t r y protocol. The exit protocol merely clears lock,just as before.
3.2.3 Implementing Await Statements Any solucion lo the critical section problem can be used to i~nplcmentan Llnconditional atomic action ( s ; ) by hiding iriter~~nlcontrol points from ot.her
102
Chapter 3
Locks and Barriers
processes. Let CSe~irerbe a critical section entry prolocol, and let CSexit be the correspoilding exil protocol. Then ( S; ) call be implemented by CSentel-;
s; CSexit;
This assumes that all code sections in all processes that reference or dler- variables allerecl hy S-or that alter variables rel'erenced by s-are also prolected by similar entry and exit protocols. In essence, ( is replaced by CSenler, and ) is replaced by CSex it. The above code slteleton can also be used as a building block to irnplernenl ( await (B) S ; ). Recall that a conditional atol-rlic action delay7 the executing process until B i s kue, then executes S. Also, B must be true when execution of s beg~ns. To ensure that the entire action is atomic, we can use a critical section protocol to hide intermediate states it\ s. We can hen use a loop to repeatedly test B until i t i s true: CSenter: while ( ! B )
{ ???
)
s; CSex it;
1-Icre we assume that critjcal scctions in all processes that alter variables referenced in B or s 01. that reference variables altered in s are protected by similar enu-y ancl exil protocols. The remaining concern is llow to implerllent the above loop body. If the body is executed, B was false. Hence, the only way B wjll become true is if some other process alters a variable referenced in B. Siilce we assume that ally statement in anolher process that alters a variable referenced in B must be in a critical seclion, we have to exit the critical section while waiting for B lo become true. But t o ensure atomicity of the evaluation of B arld execution of s, we rnust reenter the critical section before I-eevaluatingB. Hence a candidate rejitiernent of the :(hove protocol is (3.8) CSenc-er; while
( !B) {
CSexit; CSenler; 1
S;
CSexit: This impleme~tltation preserves the semantics of conditional atomic actions. ass~~ming the critical section prolocols guarantee lnutual exclusion. If scheduling is weakly fair, the process execuLing (3.8) will eventually terminate the loop,
3.2 Critical Sections: Spin Locks
103
assiuning B eveii~uallylxcomes true and remains h-ile. 11: scheduling is strongly fair. lhe loop will terminate if B becomes true infinitely often. (3.8) is correct, il i s inefficicnt. This is because a process cxeculAIllio~~gh irlg (3.8) is spinning i n a "hard" loop-continuously exiting, (hen reentering its critical section-even though it cannot possibly proceed unlil at least sornc other procehs alters a variable referenced i o B. This leitds to nlcrnory contention since every delayed process conl~nuousl y accesses LIle variables used in tlie crirical section protocols and rhe variables in B. To reduce meinory contention, it ib preferable I'or a process lo delay for sornc period of time before reentering the critical sectioo. Let Delay be some code (liar slows a process down. Then we can replace (3.8) by [he i:ollowjog proLoco1 l'or i~nplementinga conditio~lalatomic action:
(3.9) CSenrer; while s:
( !B)
t CSexit: Delay; CScnler;
}
CSexit;
The Delay code might, for example, be an Empty loop that ite.rales a random nu~nberof times. (To avoid memory co~itentionjn this loop, the Delay code s1iouI.d acccss only local variables.) This kind of "back-ofl" protocol is also useful within rhe CSenter protocols lhemselves; e.g., it can be used in place o-I' s k i p in thc delay loop in the sirnple Lest-and-set ent1.y prolocol given in Figure 3.3. If s is si~nplytlie skip sralement, protocol (3.9) can ot'coul,se be simplified by omitting s. If a also satisl'ies Lhe requi~riuentsol the At-Most-Once Property (2.2),rlwn (await ( B ) ; ) can be impletnented as while (!B) skip;
As tnentjoned at the start of the chapter, busy-waiting synchroniz;ltio~iis oftell used wilhin hardware. Jn fact. a prorocol sinlllar to (3.9) is used to synchronize access Lo an Ethernet-a common, local-area colnlnLlnication network. To tra~ls~nila message, an Etherner contro1le.c first sends i t on the Elhernet, then listens to see if the message collided with another message sent at about the same time by anocher controller. J r no collision is tlelecled, the asnrmission is assumecl to have been successful. If a collision is detected, the controller delays a lad, then attempts lo resend (Re message. To avoid a race condilion in which two controllers repeatedly collide because lhey always delay about the silrne amount of tirne, h e delay js randomly chosen from an inlerval thac is douy bled filch time a collision occurs. I-Ience, this is called the h i ~ ~ a l :expor~e,ltiul bock-of'protocol. Experi~nentshave shown that this kind ol' back-off pcolocol is also i~sefulin (3.9) and in critical section entry protocols.
LO4
Chapter 3
Locks and Barriers
3.3 Critical Sections: Fair Solutions The spin-lock solutions to thc critical section problem ensure nlutual exclusion, are deadlock (livelock) free, and avoid unnecessary delay. However. they require ;I stroj~glyfilir scheduler Lo ensul-e eventu;~lentry (3.5). As observed in Section 2.5. practical schetluling policies are only weakly fair. Allhough il is i~nlikely chat a process 1.rying LO enter it^ critical section will nevcr succeed, it could happen if lwo or more processes itre always contending for entry. In particular, the spin-lock snlutions do not control the orclel- in which delayed processes enter tlleir crilical sections when two 01. Inore arc [lying lo do so. This section pccsenls three fair solutions to the critical section plnhlem: che tie-breaker algorithm, the rickel algorithm, and the bakcry algorillim. They depend only on a weakly lair scheduler such as rouncl-robin, which ~ncrely ensures that each process lceeps getting a chance to execute and rhal delay conditions, once true, reinain true. The tie-breaker algorirhni is fair1y si ~ a p l efor two processes and depends on nu special machive inscruaions, but i t is complex for n processes. The ticket algorjth~ni s simple for any number of processes, but i t reiluires a special machine instruction called Pelcli-and-Add. The bakery algorithm is a variation on the ticlcet algdriclim tliilt requires 110 special ~nachine il is rtiure complex (although still simpler than che ~nstruclions,b u ~conseq~~ently n-process tie-brealiei- algorithm).
3.3.1 The Tie-Breaker Algorithm Consider t l ~ ccritical scclion solutiou for two processes sho~vnin Tiigi~re3.1. The shortco~ningof Lhis solulion is that i f each process is trying to enter its critical one proccss section, [here is no control over wh,ich will succeed. In pa~~icular, could succeed, execute its critical section, race back around to the cntl-y protocol, and rhen bucceecl again. To make the solution [air. the processes should take turns when both are trying to enter. The /.ic-br-cnlieralgorifh~r~-also called Peterson's algorithm-is a variatiorl on the critical scclion protocol iu Figure 3. 1 Lhar: "breaks the tie" when both processes arc crying 10 enler. II does so by using an additional variable to indicate which process was l a s ~to enter its critical sectiol~. To ~i~olivute the tie- hreilker algorithm; consider again the coarse-grained pl.oSlxrn ill Figure 3.1. The goal sow is lo imple~nentthe conditional atomic aclions io the entry protocols using only simple variables and sequential stateeach await statement by first ments. AS a starting point, consider implernenti~~g looping 11nti.lthe guard is false, the11 executing the assignmcnl. Then the enhy protocol for process csl would be
3.3 Critical Sections: Fair Solut~ons
105
while ( i n 2 ) skip; in1 = true;
Similarly, the entry prorocol for Cs2 would be while (inl) skip; in2 = true;
The colresponding exit protc~colfor c s l would set in1 lo false, and Lhat i'or cs2 would set in2 to false. The problern with this "solution" i s that the two actions in Lhe entry pr-oto~al is not ensurecl. cols are 1-101execuled atomically. Consequently, r n i ~ t ~exclus~on
For example. the desired postcondition Ikr the delay loop jil csl is dial i n 2 is e d by he assignment in2 = true;, false. Unf(>rtunatelyUlis is i n l e ~ f ~ ~with because it is possible for both processes to evaluate tlicir delay condirions at about the same time and to find that they are true. Since each process wants to be sure that the other is 1101 in its critical section when the w h i l e loop terminates, consider switching the order of Lhe statements in the entry protocols. Namely, the enlry protocol in csl becomes
in1 = true; while (in2) s k i p ;
Si~nila~ly, Lhe entry protocol in cs 2 becomes in2 = true;
while [inl) skip;
This helps but still does not solve tlie problem. Mutual exclusion is ens~u-ed,but deadlock can now 1-esult: If i n 1 and i n 2 are both true, neither delay loop will terminate. However, there is a simple way lo avoid deadlock: Use an additional vilrjnble to break the tie i f both processes are delayed. Let last be an integer variable that indica~eswhich oi-'csl or cs2 was last ~ C S are ~ (I-yingto enter to start executing its entry prolocol. Then if boll1 C S and Lhe~rcritical sections-namely, when in1 and i n 2 are true-thc last process lo star1 i1.s entry protocol delays. This yields the comse-grained solution shown in Figure 3.5. The algorillim in Figure 3.5 is very close to a fine-grained solutio~)that does 1101 requirc await: stacernentb. I n parliculal; if each await state~nentsatislied the requiretnenrs of the At-Most-Once Property (2.2). lken we could implement them by busy-waiting loops. Unfoxtunalely, the await statements in Figure 3.5 refereoce two variables all.ered by Uie other process. However, in this case it is no1 necessary that the delay conditions be evaluated atomically. The reasoning ~ollows.
106
Chapter 3
Locks and Barriers boo1 i n 1 = f a l s e , i n 2 = false; int last = 1; process CS1 { while (true) { in1 = true; last = 1; / * entry protocol (await ( ! i n 2 or last == 2 ) ; )
critical section; i n 1 = false; noncritical scctiojl;
1
*/
/ * exit protocol * /
1
process CS2 { while (true) { i n 2 = true; l a s t = 2 ; / * entry p r o t o c o l (await (!in1 o r last == 1 ) ; )
*/
critical seclion; in2 = false;
/*
e x i t protocol
*/
noncritical sec1ir)n;
Figure 3.5
Two-process tie-breaker algorithm: Coarse-grained solution.
Suppose process
C S evaluates ~
its delay conditiotl and finds that
it i s
true.
IF c s l founcl i n 2 I'alsc, thcn i n 2 mjghl llow be true. I-Ioweuer, in that case process C S has ~ jusl se( last to 2; hence Lhe delay condition is slill true even though i n 2 changed value. If CSI found last == 2 , then that condition will ~ its clirical yenlain true, becaose l a s t will not change until after C S executes section. Thus, in either case. if c s l thinks the dclny conditiorl is tlue, i t is in Fdcl Lrue. (The argument for cs2 is symmetric). Sincc che delay condi tiorls need not be evaluatecl alomically. each await call be replaced by a while loop that irerates as long as the negation of the delay condition is false. This yields the fine-graitled tie-breaker algorithm shown in Figure 3.6. The progl-aln i n Figure 3.6 solves the critical seccion problem for two processes. We can use the same basic idea to solve the problem for any number of processes. In particular, if [here are n processes, the entry protocol in each process consisls of a loop that ilerates through n - 1 stageb. Jn each stage, we use instances of the two-process tie-breaker algorithm to determine which processes get to advance to the next stage. tf we ensure that at most one process at a time is
3.3 Critical Sections: Fair Solutions
I07
boo1 in1 = f a l s e , in2 = false; int last = 1;
process C S ~{ while (true) ( in1 = true; last = 1; /* entry protocol * / while (in2 and last == 1) skip;
critical section: in1 = false; nonc~ilicalscction ;
1
/*
exit protocol
*/
1
process CS2 { while (true) { in2 = t r u e ; l a s t = 2 ; /* entryprotocol*/ while (in1 and l a s t = = 2 ) s k i p ;
critical section; in2 = false; ~ ~ o n ctjcal l i section;
1
/ * exit protocol * /
I
Figure 3.6 Two-process tie-breaker algorithm: Fine-grained solution
allowed to get dlrough all n - 1 stages, then a1 most one at a time can be in its crilical section. Let i n C1:nl and last t l :nl be integer a.lrays. The value oT i n [i] indicates which stage cS [il is execuling; Llle value ol'last [ j I jrldiciltes which process was rhe last ro begin stage j. These vaxiables are used as shown in Figure 3.7. The outer f o r loop executcs n-I times. The inncf f o r loop in process CS [i] checks evel:y orher process. In pa~~icular. cs [il \V:I~LS if there is some other process in a higher or equal numbered stage and cs [i] was the last process to enter stage j. Ollce another process e~uel-sstage j or all processes "ahead" of cs ti] havc exited their critical section, cs [il can proceed to Lhe next stage. Thus, at most n - 1 processes can be pas1 the first stage, n - 2 past the second stage. and so on. This ens~11-es that at most one process at a time can c o ~ ~ ~ p lall e t en stages and hence be executing its critical section. The n-process solution is livelock-free, avoids unnecessary del;ly, ancl ensures eventual entry. These properties follow from the [act dlnl ;t process clelays o~dyif some other process is ahead of i t i n h e elltry protocol, and from the assumption that every process eventual1y exits its critical sect.ion.
108
Chapter 3
Locks and Barriers int in[l:nl = ([n] O ) ,
last[ltnl = ([nl 0);
process C S [ i = I t o nl ( while (true) { f o r [j = 1 to nl { / * entry protocol * / / * remember process i is in stage j and i s last * / i n [ i ] = j; l a s t l j ] = i ; f o r [k = 1 t o n s t i ! = kl { / * wait if process k is in higher numbered stage and process i w a s the last to enter stage j */ while (in [k] >= in [i] and l a s t [j] == i) s k i p ;
I
1
critical section: i n [ i ] = 0;
/*
exit protocol. * /
~loncriticalsection:
1 Figure 3.7
The a-process tie-breaker algorithm.
3.3.2 The Ticket Algorithm Thc n-proccss tie-breakel- algori~hmis cjuite cocnplex and consequently is hard to undersrand. This i s in part because i t war, not obvious how tu generali7,e the twoprocess algorithrn Lo n processes. Here we develop an n-process solution to the critical scc~io~l problern Lhat is much easier to u~~dersland.The solution also ilJustrates how inlcger counters can be used to order processes. The algorithrn is called a ticket cllgorirhm since it is based orr drawing tickets (numbers) ancl then waiting ~ i ~ r n s . Sume stores-such as ice crcaln slol-cs itncl bake~.ies--employ [he follo~vi~lg method io ensure that cuslorners are serviced in orcler of arrival. Upori entering the store. a custorner draw7sa number that is one larger than the nuinber held by any other custorner. The custouler then waits until all custo~nersholding smaller numbers have beer) serviced. This algorilhtn is implemc~.rtedby a number dispenser and hy a display indicating which customer is being served. If the store has one employee behind the service counter, customers are served one at a time in heir order of arrival. We can use this idea to implement a fair critical section protocol. Let number and next be integers h a t are inirjally 1. and let t u r n [l:n] be an array of integers, each OF which is jni~ially0. To enler its critical sec~ion,process cs [i] first sets t u r n l i ] to rhe current value of number and then
3.3 Critical Sections: Fair Solutions
JOY
int number = 1, next = 1, turn[l:n] = ([nl 0 ) ; # # predicate TICKJT is a global invariant (see t e x t ) process CS [i = 1 to n] { while (true) { (turn[il = number; number (await ( t u r n E i 1 == next) ;)
= number +
1;)
CI-ilicalsection; (next = next
+ 1;)
noncritical section;
1
3
Figure 3.8 The ticket algorithm: Coarse-grained solution.
increments number. These are a single atomic action ro ensure that customers unicli~enu~nberrs.Process cs [il then waits until the value of next i s equal to the number i t drew. Upon co~npletingits critical section, cs [i] inc~-e~nents next, again as an atomic action. This protocol results in the algorjtl~mshown in Figure 3.8. Since number is read and incremented as an atomic action and next is incremented as M R L O I ~ I C action, Lhe I-i)llowingpredicate is a global invariant: draw
TICKET:
next > 0 A ('d i: 1 c= i c = n: (CS [i] in its crilical section) 3 (turn [il == next) A ( t u r n [ i ] > 0 ) + ( V j: 1 c = j c = n , j ! = i : t u r n [ i ] != turn[j]) )
I I
I
i
!
The last line says that nonzero values of t u r n are unique. Hence, at most one turn [il is equal to next. Ilence, at m o s ~one process can be in i l s critical section. Absence of deadlock and absence of un~iecessarydelay also I'ollow from the fact that nonzero values in t u r n are unique. Finally, i C schecluling i s weakly fair, the algorithm ensures eventual entry, because once a delay condition becomes true, i t I-emains tme. Unlike the tie-breaker algorithm, the ticket algorithm has a po~entialshorlcorning that is colnmon in algorithms that employ incrementing counLe1.s: the valLies of number and next are unbounded. IF the ticket algorithm runs for a very long time. increnienting a counter will eventually cause arithmetic overflow. This is extremely unlikely to be a problem in practice. however. The algorithm jn Figure 3.8 contains tkrec coarse-grained alomic ttclions. It is easy to implement the await statctnent using a busy-waiting loop siuce the Boolean expression references only one sharcd variable. The lasl alomic action,
Q; !! ,
j
,
f',),! : ,,,
.
?
i
110
Chapter 3
Locks and Barriers
which increments next, can be i~nplelnentedusing regular load and store instructions, because a1 most one process at a t.ime can execute the exit protocol. UnFortunarely, it is hard in general to implemenl Lhe first aro~nicaction, which read? number and theo increments it. Some machines bave instructions that return thc old value of a variable and increment or decrement it as a single indivisible operation. This kind of insuuc[ion does exactly what i s required for the ticket algorithm. As a specific example, Fetch-sod-Add is an instruction with the following el-'l'ect: FA(var, i n c r ) :
(int t m p = var; var
=
var + incr; return(trnp);)
Figure 3.9 gives the ticket dgol-ithm i~nplerncntedusing FA. On ~nachinesthat d o not have Fetch-and-Add or a cornparable instruction, we have to use another approach. 'The key requirement in the tickel algorith~rlis that every process draw a unique number. If R machine has an atomic incle~na~lt the first step in the eouy protocol instruction. we might considel- i~nple~nenting by turn[i] = number; (number = number + 1;)
This cnsures that number is increrllented correctly, but i l does not ensure tliar processes draw uriique numbers. Tn pardcular, every process could execute the first assignment ahove at about the same ~ . i m eand draw 1 . h ~same number! Thus ir is essential that both assignments he executed as a single atomic action. We h;l\le already seen two other ways to solve rhe criticd section probjejn: spin locks arld Ihe tie-breaker algorithm. Eilher of heae could be used wilhin rlie
int number = 1, next = 1, turn (1:nl
-
=
(
In1 0
;
process C S l i 1 to n] { while (true) ( t u r n [ i ] = FA(nurnber,l); / * entry p r o t o c o l * / while ( t u r n [ i ] ! = next) s k i p ;
critical section; next = n e x t + 1;
/ * exit protocol
noncritical section; 1
1 Figure 3.9
The ticket algorithm: Fine-grained solution.
*/
3.3 Critical Sections: Fair Solutions
Ill
ticket algorithttl to make r~umberdrawing atomic. I n par~icular,suppose CSenter i s a critical section entry prolocol and CSexit is the coi-responding exit protocol. Then we could replace the Fetcli-and-Add statement in Figure 3.9 by
(3.10) CScntes; turn[i]
= number; number = numbercl;
CSexit;
Although this might seem like a curious approach, in practice it would actually work quile well, especially if an instruction like Test-and-Set is available Lo j~nplemenrCSenter and CSexit. With Test-and-Set, processes m i g h ~not draw numbers in exactly the order they attempt to-and theorclically a process could spin forever-bul with very high p~obabilityevery process would draw a ~iumber. and most would be drawn in order. This i s because die crirical section within (3.10) is very short, and hence a process is not likely to delay in CSenter. The major source of delay in the ticket algorithm is waiting for t u r n [i] to be equal to next.
3.3.3 The Bakery A Igorithm The ticket algoritlim cam be ilnplemented directly on machines that have an instruction like Felcli-and-Add. If such an instruction is not available, we can e algorithm using (3.10). But (liar sitnulace the number drawing part of d ~ licket might not be fair. requires ~~silig anoLher critical section protocol, and the sol~~tion H e ~ ewe present a ticket-like algorithm--callecl the bukely algoi.ithrn-thal is fair and that does not require any special machine instruclions. The algorjllim is consequently more cort~plexuhan the ticket algorithm in Figure 3.9. In the ticket algorit)\m, each customer clraws a unique nu~nbel.and then waits for its number to be eqi~alto next. TJzcbakery algorithm takes a different approach. In particular, when customers enter the store, they first look around at all o~hercustomers acid draw a number one larger than any otliel: All customers tnust then wait for their number to be called. As in the ticket algorithm, the customer with he s~nallestnutnber is the one that gets serviced nexl. The difference is that custolners check with each other rather than wirh a central next counte~.lo decide on the order of service. As in [lie tickel algorithm, let t u r n [ l : n ] be an axray of integers, each OF which is initially rero. To enter its critical section, process cs [i] first sets t u r n [i] to one more than rhe maximum of all the curxenl values in t u r n . Then CS [i] waits until turn[i] is the smallest o f Ihe notizero values of t u r n . Thus the bakery algorill~mkeeps the following predicate invariant:
112
Chapter 3
Locks and Barriers
BAKERY: ( C S [i]
( V i : 1 <= i < = n: in i l s critical seclion) * (turn [i]
(turn[i] > 0 )
> 0) A ( V j: 1 < = j < = n, j != i: turn[jl == 0 v turnri] < turntjl) )
+
Upon cornpleling its critical section, c s [i] resets turnti] to zwo. Figure 3.1 0 conrains a coarse-grained bakery algorit11111meeting Lliese specifications. The first alomic action guarantees thal nonzero values oT turn axe unique. The f o r statement ensures thal Lhe consequenl in predicate BAKERY is t~wewhen P [ i ] is executing its cri~icalsection. The algorithm satisfies Lhe inuti~al exclusior~ property because not all 0.1' t u r n [i] ! = 0, turn [ j 1 I = 0, and BAKERY call be true at once. Deadlock cannot result since tlonzero vaJues o f t u r n are ilnique, and as usual we assume that every process eventually exits ils critical geclion. Processes are not delayed unnecessarily sirice turn[i] is zero when cs [i1 is outside its critical section. Finally, the bakery algorithm ensul-es eventual entry if scheduling is weakly fair since once a clclay conditic)~~ becolnes true, it remains true. (The values of turn in the bakery algorithm can gel arbibarily large. I-Iowevel; the turn [i] continue to get larger only i f rhcrc i s nL\vuy.s at least one process trying 10 get into its crilical section. This is no1 likely to be a pi-aclical problem.) The bakery algo~ithmin Figure 3.10 cannot he irnplc~nented&I-eclly on contelnporary machines. The assign~nenrlo turn[i] requires computing Lhc lnaximum. of n values. and [he await statenienl references a shared variable (turn[j 1 ) twice. Thcse ;~ctjonscould be inlpleinenred ato~nicallyby using anotI1e1-critical section p~.orocolsuch as Lhe tie-breaker algorithm, but thal would be qi~jleinefficient. FOI-tunately, there is a sirnplcr approach.
int turnl1:nl = ([nl 0 ) ; # # predicate BAKERY is a global invariant
--
see text
process CS [i = 1 to n] ( while (true) ( (turn[i] = max(turn[l:nJ) + 1;) for [j = 1 to n st j ! = i l (await ( t u r n l j ] == 0 or turn[il < turn[j]);)
cri~icalsectio~l; turnril = 0;
>
noncritical section;
1 Figure 3.10
The bakery algorithm: Coarse-grained solution.
3.3 Critical Sections: Fair Solutions
113
When n processes need to synchronize, it is oAen usef~~l first to develop a rwo-process solution and then lo generalize that solution. This was the case earlier for [he tie-breaker algorithm and is again usel:i~l here since i~ helps illustrate the problelns that Ilave to be solved. Thus, consider tbe followjng enlry protocol lor process cS 1: t u r n l = turn2
+
w h i l e (turn2 I =
1; 0 and t u r n l > turn21 skip;
The corresponding entry prolocol for process cs2 is t u r n 2 = t u r n l + lj w h i l e ( t u r n l ! = 0 and t u r n 2 > t u r n l ) s k i p ;
Each process ahove sets its value of t u r n by an opti~nizedversion of (3.1 O), and [he await state~nenlsare terltativel y implelnented by a busy-waiting loop. The problem with the above "solulioi~"is that tlejthel- the assignlncnt statelnenls nor the w h i l e loop guards will be evaluated atomically. Consequently, the processes could start thejl- entty prolocols at about the same time, and both could sel t u r n l and t u r n 2 to 1. If this happens, both processes could be in their CI-itical section at the same time. Tlie two-process tie-breaker algorithm iri Figure 3.6 suggests a parrial solution to Lhe above problem: II both t u r n l and turn2 arc 1, let one of the processes proceed and have the other delay. For example, lel the lower-nurrtbered process proceed by strengthening the second conjunct in the delay loop in cs2 to turn2 >= t u r n l .
Unfortunately, it is still possible for both processes to enter their- critical section. For example. suppose csl reads turn2 81ld ge1.s back 0. Then suppose cs2 stark its ently protocol, sees that t u r n l is still 0, sets turn2 lo 1, and then enters ils critical section. At lhis point, c s l can corltinue its elltry protocol, set t u r n l to 1, and then proceed into its critical section since both t u r n 1 and t u r n 2 are 1 and C S lakes ~ precedence in [his case. This kind of sjrr~atiorli s called a race corzdition since cs2 "raced by" csl and hence csl missed seeing that C S was ~ changing t u r n 2 . To avoid this race cond,i.tion,we can have each process set its value of t u r n Lo 1 (or ally nonzero value) a t the start ol' the entry protocol. Then it examines the other's value of t u r n ant1 resets its own. In particular, the entry protocol for process CSZ is t u r n l = 1; turnl = turn2 + 1; while (turn2 ! = 0 and t u r n l > t u r n 2 ) s k i p ;
Sim ilarl y, the entry prolocol for process cs2 is
114
Chapter 3
Locks and Barriers turn2 = 1; turn2 = turnl + 1; while (turnl != 0 and turn2 >= turnl) s k i p ;
One process cannot now exit jts while loop uoril the orber llas finished setling i d value of turn iP it is in the m.idst of doi~lgso. The sol.litio~gives csl precedence over cs2 in case both processes have thc same (nonzero) value foi- turn. The encry protocols above arc not quite symmclric, because Lhe delay conditions in the. second loop are slightly different. However, we can rewrite the~vjo a s)ltnmetsic form as follows. Let (a,b) and ( c ,d) be pairs of integers, and define the glraler than relation between such pairs as follows: (a,b) > ( e , d ) == true == f a l s e
if a
z c
or if a == c alld b
>
d
otherwise
Then we can I-ewrite turnl > t u r n 2 in C S as ~ (turnl,1 ) > (turn2,2) and turn2 >= turn1 in C S as ~ (turn2,a) > (turnl,1). The virtue of a s)~mmetricspecification i s that it i s rlow easy to generalize the two-process balcery algorithm to ao n-pcoccss algorithl~i,as shown in Figure 3.11. Each process first indicates chat it wanis Lo enter by setling its value of t u r n Lo 1. Then it computes the maxi~numof all the turn[i] and adds onc co the result. Finally, (he process employs a f o r loop as in the coarsc-grained solution so that it delays uotil it has precedence over all other proccsscs. Note that the rnaxirnu~~~ is approximated by reading each eletnenl of turn and selecting the lurgesl. It i s no! colnputed atolnically and hence is not necessa~ilyaccurale. However, if two or more processes compute the samc value, tliey arc 01-deretl as descr-ibed a bovc.
int turn[l:n] = ([n] 0 ) ; process CS[i = 1 to nl { while (true) { turn[i] = 1; turnti] = max(turn[l:n] ) + 1; for [ j = 1 to n st j ! = i] while (turn[j] != 0 and (turn[i],i) > (turn[ j] j) ) skip;
.
crilical section; turn[il = 0;
noncritical seclion;
I 1 Figure 3.1 1
Bakery algorithm: Fine-grained solution.
3.4 Barrier Synchronization
115
3.4 Barrier Synchronization Many problems can be solved using i~erativealgorithms that successively compute better approxiniatiolis lo an answer, cerlninating whet] either- the firlal atiswer has been compi~tedor-in the case of many numerical algorithms-wheo (lie linal answer has converged. Such an algoriihm Lyyically nmiipu1ate.s an array of values, and each iteration perlom~s(.he satne computalion on all array elemenls. Hence, we can often use mulliplc processes to compute disjoint paw of the sulution in parallel. Wc have already seen a feu/ examples aod will see many rnore in che next section and ill later cl~apters. A key atlribute of most parallel it.eralive algorilhrns is that each ilel-ation typically depends on tbe results of the previous ileration. One way lo structure such an algorithm is to i~nplementthc hody o f each iteration using o n e or more co statements. Ignoring tenninatiou, and assuming thcre are n parallel casks on each ilel-ation, this approach l ~ a the s general fornl while (true) { co [i = 1 to n]
code to implemetlt task i; OC
}
Unfol-mnately. the above approach is quire inefficient since co spawns n processes on each iteration. 11 is rn~lclimore cosrly lo create and destroy processes than to i~nplenentprocess syncluonization. Thus an alternative st~-ucru~.c wilI result in a more efficient algorithm. I n particular, create the processes once al the begini~ingof the computation, Lhen have tliero syncl~ronizeat the erld of eac 1 iteration: process ~ o r k e r r i= 1 to nl { while (true) { code to implement task i; wait for all n hsks EO complete;
1
1
This is called barrier syr~ch.t~)t~izutio~~, because the delay poini ac the et>d of each itcration represenls a barrier that all processes 11:ive Lo arrive at bel'ore any are allowed to pass. Barriers can be needed both a1 the ends of loops, as above, atid 3~intermediate stages, as we shall see. Below we develop several implementations of barricr synchronization. Each ernldoys a dieerenl pmcess ince~actiontechnique. We a l s o describe when each Irind of' barrier- is appropriate lo usc.
116
Chapter 3
Locks and Barriers
3.4.1 Shared Counter The simplest way lo speciCy the ~.equiremenlsfor a barrier js to employ a shal-ed inlegel; count,which is inilially zero. Assu~nediere are n worker processes that ~)eedco meet a1 a barrier. When a process arrives a1 the barrier, it increments count; when count: i s n, all processes can proceed. This specification leads Lo thc following code outline: (3.1 1 )
i n t count
= 0;
process Worker [i = 1 to nl { while (true) C code to jmplemenl task i; (count = count + 1;) (await (count = = n);)
1
1
We CiIll implelrient ehe await sraternent by a busy-waiting loop. I F we also have all indivisible incre~nerit inslruction. such as the Ferch-and-Add instruclion dclinecl in Section 0.3, \ve can implement lhe above barrier by FA(count. 1 ) ;
while ( c o u n t ! = n ) skip;
The above code is not fully adequate, lirrwever. The difficulty is lhar count must be 0 at the start o f e x h iteratioti, which means that count needs to be reset to o each Lime all processes have passed rhe barrier. Moreover, count has to he reset before any process again tries LO incrernc~~t count. It is possible to solve this reset proble~ilby employing two counters. one Iha~ counts up LO n and another that counts down to 0, with the roles of thc cou11Lei-s be; ng switched arler each stage. However. there are addi lional, pragmatic problems with using s l i a ~ ~counters. d First, Lhey have to be incrementecl andlor decremented as atomic actions. Second, when a process is delayed in (3.1 I ) , it is continuously examining count. 111 the, worst case, n-1 processes might be d Lo delayed waiting for the lasr process to arrive aL the barrier. 'L'hjs c o ~ ~ llead severe tnelnory contenliotl, except on mul~iprocessorswilh colierent caches. Rut even then, the value of count i s continuously cllanging, so evely cache needs to be updated. Thus it is appropriate to ilnplerner~ta barrier using counters only if Ihe target machine has atornic increment instructjons, coherent caches, and e f f cienr cache update. Moreover, n should be relatively small.
3.4 Barrier Synchronization
117
3.4-2 Flags and Coordinators One way co avoid the memory conlention prob1e.m i s lo distribute the implelaentation of count by using n variables that sum lo h e same value. I n particular, let a r r i v e [ l : n ] be an array o[: integers inilializccl to zeros. Then replace the incremenl of count in (3.11) by arrive 1 i] = 1. Wit11 this change. the €01lowing predicate is a global invariant: UFPE CcFW
count == ( a r r i v e [ l l +
... +
M IEI
arrive In] )
BIBLIOTMLA
Memory contention would be avoided as long as elements of arrive are stored i1-1 different cache lines (to avoid contenrion whdn they are written by the processes). With the above change, the remaining problems are irnplemenling rhe await statement in (3.1 1) and rcsetlit~gthe elernenls of arrive at t,he end of be each iteration. Using the above relalion, Lhe a w a i t stalemen1 could obvio~~sly i~nplementcdas (await ( (arrive [l] +
. ..
+ arrive l n l ) = = n);)
However. this reinlrtduces rrlernory contention, and, moreovel; it i s inefficient s.ince the sum of the arrive [i] is conlinually being computed by every wailing Worker. We can solve both l l ~ ememory contenrion and rebel PI-ohlemsby using :un additional set of sl~ared\~aluesarid by employing an additional process. Coordinator. Instead of having each Worker sum a11d lest the values of arrive, let each worker wait for a single value lo become true. In parlicular, let continue [I:n ] be another array of integers, initialized to zeros. After setting arrive [ i] Lo 1,Worker [ i l delays by wailiiig tbr continue [i] to be set 10 1.
(3.12) a r r i v e [ i l = 1; (await (continue[ i l == 1);)
The coordinator pmcess waits for all elenlenrs of
a r r i v e Lo become 1,then
sets all elen~entsof continue to 1: (3.13) for [i = 1 ta n] (await (arrive[il = = for [i = 1 to n] continueti] = 1;
1);)
The await statements in (3.12) and (3.13) can be implemented by w h i l e loops since each references a single shared variable. Ttle coordinator can uxe a for statement 10wait Ifor each elernen1 of arrive to be set; Jnoreover. because all arrive 1-lags n i ~ ~ be s t set bel'ote any Worker is allowed 1.0 coutinue, he Coordinator can test the arriveri] in ally 01-cler. Finally: inemory
118
Chapter 3
Locks and Barriers
con~entioriis not a problem since [Ire processes wai.t For different variables to be set and Il~csevariables could be slored in different cache lirles. Variables arrive and c o n t i n u e in (3.12) and (3.13) are examples of what is callecl a ,@OR variable-a variable that is raised by one process to signal that a synchronization condirio~~ is true. The rerxraining problem i s augmenting (3.1 2) and (3.13) with code to clear the Rags by resetting hem to o in preparalion for the next i~eralion.Here ~ w general o principles apply. (3.J4) Flag Syncl~ror)izationPrinciples. (a) The process that waits for a synchronization Rag io be set js the one that sl~ouldclear that flag. (b) A flag should not be set unlil it is known that i t is clear.
The firs1 princjple ensules rhat a flag is not cleared before it has been seen to be set. Thus in (3.12) Worker [ i l ~Iiouldclear c o n t i n u e [ i ]and in (3.13) c o o r d i n a t o r should clear all elernerlts of a r r i v e . The second principle ensures that another process does not again set the sarnc flag before it is cleared, which could lead to deadlock if rhc first process later wails for the flag ro be set again. In (3.13) this means C o o r d i n a t o r should clear arrive [i) hefore setting cont i n u e [i]. The C o o r d i n a t o r can do [his by executing another for statement aftel the iirsl one irr (3.13). Alternatively, C o o r d i n a t o r can clear arrive [i] immediately after it has waitec! for it to be set. Adding flag-clexrjng code, we get Lhe coordinalor barrier shown in Figure 3.12. Although Fjgul-e 3.12 implements barrier sytichro~iization in a way lhat avoids Iiielnory contentjon, the solution has two undesi~.ableattributes. First, it requiscs an extra process. Busy-waiting synchronization is ineflicient ~ ~ n l eeach ss process exea~teson its own processor, so the c o o r d i n a t o r shoilld execute on its own processor. However, it would probably be bertcr to use that processor for another worker process. The seconcl shortco~ning01using a coordinator is that the execution time of eacl~iteration of c o o r a i n a t or-and hence each instance of barrier synchronization--is proportional to the number of Worker processes. In iterative algorithms. each Worker often executes identical code. Henee, each is likely to arrive at the barrier ar about the same titne, assuming every Worker is executed on its own processor. Thus, all a r r i v e flags will get set at about rhe same lime. However, Coordinator cycles through tlxe Aags, wailing for each one lo be set ill turn. \Ve can overcome both problerns by combin,ing the actions of the coordinator ancl workers so that each worker is also a coordinator. In particular, we can organize the worker's into a tree, as shown in Figure 3.13. Workers can then send arrival signals up (he tree and continue signals back down the bee. In pattic~~ku, a worker node firs1 waits for its children to arrive, then tells its parent node thal il too has arrived. When the root node learns rhat its children have arrived,
3.4 Barrier Synchronization
int aurive[l:n] = ([n] 0).
119
continue[l:n] = ([nl 0);
process Workerti = 1 to nl ( while ( t r u e ) { code to impletnenl task i i arrive [il = 1; (await (continue [i] == 1);) continue[ij = 0 ; 1
1 process Coordinator ( w h i l e (true) { for [i = 1 to n] f (await (arrive[i] -= 1);) arrive [il = 0; 1 for [i = 1 to n] continue[il = 1;
Figure 3.12
Barrier synchronization
using a coordinator process.
it knows that all other wo~kershave also arrived. Hence the cool can tell its children to continue; they in turll can tell their childl-en to continue, and so on. 'The specific actions of each kind of worker process are listed i n Fjgure 3.14. The await statements can, of course, be implemented by spin loops. The implementatio~iin Figure 3.14 is called a combining tree h u r r i ~ r .This is because each process combines the results of its children, the11passes thern on to its parent. This barrier uses tlie same number of variables as the cenl~alized
Worker
Worker
f
/"- -. 2
Worker
Figure 3.13
fZ
2
Worker
Tree-structured barrier.
f
' Worker
'i
Worker
120
Chapter 3
Locks and Barriers
leatnode L;
interiornode
arrive[^] = 1; (await (continue[L] continue[ll = 0 ;
I:
==
1);)
(await (arrivefleft] = = I);) arrive[leftl = 0;
(await (arrive[right] == 1 );) arrive[right] = 0; arriveC11 = 1; (await (continue (I] == 1) ;} continue [II = 0 ; continue[left] = 1; continue[right] = 1;
toot node
Figure 3.14
R:
(await (arrive[left] == 1) ;) arrivefleft] = 0; (await (arrive [right] = = 1) ; ) arrive[rightJ = 0; continueCleft] = 1; continueCright1 = 1;
Barrier synchronization using a combining tree.
coordinator. but i t is much inore efljcient for large n, because tJle height of the tree is proportional to log2n. We can make the co~nbiningtree barrier even Inore eflicient by having the root node broadcast a single message that tells all other nodes to continue. In pvlicular, the root sets one continue flag, and all other nodes wait l o r i t to be set. This continue flag can larcr be cleared it1 either of Lwo ways. One is to use double buffering-that is, use Lwo continue flags and allenlare betwee11them. The other way is to alternate the sense of the continue Hag-that is, on oddnumbered rounds Lo wait for. it to be set to I. and on even-numberecl rounds to wait for it to be set to 0.
3.4.3 Symmetric Barriers In the coinbjning-tree barrier: processes play cliifercnt roles. I n p ~ ~ l i c u l a rhose , in the lree execule more actions than rbose at the leaves or roo(. Moreover, the root node needs lo wait for arrival signals to propagate up the tree. If every process is executing the same algorithm and every process is executing on a diffe~entprocessor, (hell all processes should anive at cbe barrier at about the satne time. Thus if all processes take the exact saine sequence of actions when at interior nodes
3.4 Barrier Synchronization
121
they reach a barrier, then all might be able to proceed th~.oughthe barrier at the same rale. This section presenrs two sjurnn.etric b~lrl.ier.s.These are especially suitable I'or shared-mernol-)I mul~igrocessors with nonuniform melnol-y access time.
A symmetric n-process barrier is co~lstrucredfrom pairs o l siulple, twoprocess barriers. To construct a two-process barrie,~.,we c o ~ ~ use l d tlie coordin;~torlworker technique. Hou,e\ler, the actio~isof the two processes would tlien be different. Instead, we can consoucc a Fully syin.~netricbarrier as follows. Let each process have a flag that it sets when i r arrives at the barrier. It then waits for the. other process to set its ,flag and finally clears the othes's flag. IF w [ i ] is one process and w [ j 1 is the other, the sy rn~nerrictwo-process barrier is then i cnplenlet~tedas follows: (3.15) / * b a r r i e r code f o r worker process W [ i ] * / (await (arriveti] == 0);) / * key line - - see t e x t * / arriveci] = 1 ; (await (arrive[j] == 1); ) arrivelj] = 0 ;
/ * b a r r i e r code for worker process W l j J * / (await (arrive[j] == 0);) / * key line - - see text arrive[j] = 1; (await (arriveli] = = 1);) arrive [ i l = 0;
*/
The last three lines in each process serve rhe roles described above, The cxis;elice of the lirst lirle in each process may at first seem odd. as it ju.st waits for a process's own flag to be cleared. However, it is needed to guard against h e possible situation in which a process races back to thc barrier and sets ils own flag before the other pt.ocessj-om the previous rcse of the bat.rier cleared the flag. In short, a11 four lines are needed in order ro follow tlie Flag Synchronization Principles (3.14). The question, now is how Lo combine instances o f two-process barriers to construct an n-process barrier. In parliculal; we need Lo devise an intcrcoonection scheme so that each process eventually learns chat all othcrs have an-ived. The best we can, do is to use some sort of binary inlerconnectio~l,which will have a size proportional to log2 n. Let Worker [ l : n l be Lhe array of processes. If n is a power of 3. we cotlld combine them as shown in Figur-e 3.15. This kind o f barrier is called a bzrtterJy Bnl.~.ierdue Lo the shape oC the iuterconneclion pattern, which is similar As shown, a to Lhe buttelfly i ntercon neclion pa1cel.n for tlie Fourier ~ral~sflom~. bulterfly barrier has log2 n stages. Each worker synchronizes with a difl'erent
122
Chapter 3
Locks and Barriers 1
Stage 1
u
Stage 2
'
Stage 3
Figure 3.15
----
Workers
2
3
4
u
5
6
7
u
8
u
M
,
Butterfly barrier for eight processes.
other worker at each stiige. In particul.ar, i n stage s a worker synchronizes with a worker at distance 2"' away. When every worker has passed through all the stages, a1.l worker3 must have a~rivedat the barrier and hence all can l~soceed. 'This is because eve~yworker has hrectly or indirectly synchronized with every orher one. (When n is not a power of 2, a buttelfly barrier call be constructed by using the next power of 2 greater than n and having existing worker processes subslitute for the missing orles at each slage. This is [lot vely efficient, however.) A different inlerconnec.tion pattern is shown in Figure 3.16. This pattern is becrer because it can be used for any value of n, not just those that are powers 01' 2. Again there are several stages, and in stage s a worker s y n c h r o n i ~ ~with s one ac disrance away. Howevel; in each two-process barrier a process sets the arrival Rag of a workel- to its right (modulo n) and waits for, then clears. its own arrival Bag. This slruclure is callecl a di.~semi~zntio~~ burrier, because i t is based on a tecliniclue for disscmi~~ating inforination to n processes in log, n rounds. I-lere, each workei dissernitlates notice of its arrival at the barrier. A critical aspect of correctly imple~neilting an n process burrierintlepelldent of which interconnectiou pattern is used-is to avoid race conditions cha~can result horn using multiple i~stancesof (he basic two-process barrjer.
Workers
1
2
3
4
Stage 1
Stage 2 Stage 3
Figure 3.16
Dissemination barrier for six processes.
5
6
3.4 Barrier Synchronization
123
Corlsider the butterfly ba~rierpattern in Figure 3.15. Assume tl~ereis o ~ l yone array o.f flag variables and that rhcy art: usecl as shown i n !3.15). Suppose Lhat process 1 ardves a1 its first stage and sets its flag arrive [I]. Further suppose process 2 is slow, so has nut yet arrived. Now suppose processes 3 and 4 arrive a1 the f rst stage or tlie barrier, set and wait for each other's flags, clew them, and proceed to the second stage. In the second stage. process 3 wants to synchronize with proccss 1. so i t waits fol- arrive [ 1) to be 1. it crlready is, so process 3 clears a r r i v e [ I ] alld merrily proceeds lo stage 3, even though process 1 had set arrive r 11 for process 2. The net effects are hat some processes will exit the barrier before they should and thal some processes will w a i l forever to gel ro tlie next stage. The s;une proble~ncan occur wilh the dissemination barrier in Figure 3.16. The above synchi-onizationanomaly results h ~ using n only one set of tlags per process. One way to solve [lie problem is lo use a di.f€erent set of flags for each stage of an n-process ban-ier. A better way is to have the flags take On rnore values . I F the flags are integers-as above-tbcy can be used as incre~nencing co~1ntel.sthat record the number of barrier stages each process has execukd. The inilial value oteach Hag is 0. Any time worker i arrives at a new barrier stagewhether i l is the firsr stage of a new bnlrier or an jrlterrnediate stage of a harrierit incremenls arrive [ i I . Then worker i determines lhe appropriak pwtner j for the current snge and waits for the value of arrive [jl LO be nl Lccist arrive [ i]. The actual code follows: # b a r r i e r code f o r worker process i for [s = 1 to num-stages) { arrive[i] = arrive[i] + 1; # determine neighbor j for stage s while (arrivetj] < arriveti]) skip; 1
In this way, worker i i s assured thar worker j has also proceeded at leasc as far. This approach of using incrementing coullters and stages removes die above race condi~ion,removes the need For a process to wait Ior its own flag to be reser-the first line in (3.15)-and removes the neod for a process lo reset another's l-lag-the last line in (3.15). Tlius every stage of every barrier is just three lines of code. 'I'he only downside is that the counters always increase, so they could tl~eoretically overflow. However. this i.s extlunzely unlikely to occur. To sumn~arize[he topic of barriers, there a-e many possible choices. A counter balder is h e simplest and is reasonable for a small number o f processes when (here is an alomic Fetch-and-Add instruction. A symmetric barrier provides tlie mosr general and efficient choice fol- a shared-memory ~uachine,
124
Chapter 3
Locks and Barriers
becai~seeach process will execute the sane code, ideally a t about the same rate. (On a distributed-mernory machine, a tree-structured barrier is often more efficien t hecause it results in fewer interprocess communications.) W hatcvcr the str-ucture of the barrier, however, the crjtical codilig problem i s to avojd race conditjons. This is acllieved either by using n~ulljplcflags-one per stage for each pair of processes-or by using increine~ltingcounters as above.
3.5 Data Parallel Algorithms In dnta ~>a.rcrllel algori~hmuseveral processes execute the same code and work on different parts of shared data. Balrjers are used to sj~ncluonizethe execution phases of the pi-ocesses. This kind o£ algorithm is most closely associated with synchronoiis mulriprocessors-i.e., single instruction stream, multiple data suearn (SIMD) machines Cllat supporc fine-grained conlputations and ba~rierspncl~ronization in hardware. However, dala parallel algorithms are also exlsemelg usefi~lon asynchronous multiprocessors as long as the granularity of the processes is large enough ro more than cornpcnsate for barrier synchronization overhead. This sect ion develops data parallel solutions for three problems: partial sums of an array, finding the end of a linltecl list, and Jacobi ilerarion for approximating the solution to a partial difl'erenlial equation. These illuslrare the basic techlliques that arise in data parallel algoril.hnls and illushate uses of barrier syncluonization. At the end of the seclion, we describe SlMD mul~iprocessorsand show how thcy remove Inany sources o f inlerference and hence remove the need tor programming barriers. An entire chaptel- later in Lhe text (Chapter 1 I ), shows how to implement data parallel algorjtl~rnsefficiently on shared-memory multiprocessors and d istri buled machines.
3.5.1 Parallel Prefix Computations 1c is freq~~ently useful Lo apply an operation to all elenlent-s of an array. For example. to compute Lhe average of an array of values a [nl, we first need to suln all the elements, then divide by n. Or we might wan1 to know the averages for all prefixes a [ o : i ] of the array, which requires computing rhe sums of all prefixes. Because of the importance of this kind of compulation, Lhe APL language provides spccial operators called reduce and scatz. Massive1y para1 lel S IMD machines, such as the Co~lnectionMachine, provjde reduction operators in hardw v e for combining values in messages. In 11iis section, we sliow how to compute in parallel the sums of all pref xes or' an array. This is thus called a pamilel p r e b c o n ~ p ~ i f n ~ i oThe n . basjc
3.5 Data Parallel Algorithms
125
algorithm can be used for any associative binary operator, such as addition, multiplication, logic operators. or maxirnum. Consequently, pal'allel prefix colni n many applications, jncluding image PI-ocessing,matrix putations are use.l'~~I computations, and parsing a regular language. (See tlze exercises at the end of [his chapter.) Suppose we are given away a [n] and are lo compute sumrn], where sum[i] is LO be the sum] of the first i eleinenls of a. The obvious way to solve this problem sequentially is to itera1.e ac.ross Lhe rwo arrays: sum[O] = a [ O l ; f o r [i = 1 to n-11 sum[i] = sum[i-11 + a [ i l ; I n particular, each iteralion adds a t i ] LO Lhe already co~nputedsuln of Lhe previous i-1 elernenls. Now consider how we might parallelize this approach. If our task were ld as l'oJlows. First, add merely to find the sum of all elemcnch, we c o ~ ~ PI-oceed pairs oE elements i n parallel-for example. add a [O ] and a [I] i n parallel with adding other pairs. Second, combine the re.sults of the first step. again in pamllel-for example, add the sum of a [ 0 I and a [ I I to [he sum of a 1 2 I slid a [ 3 1 in parallel with computi-ng ol.her parlial sums. 'If we co~~tinue this process, in each step we w o ~ ~ ldouble d the number of elements thai have been summed. Tllus in log, n sreps we would have co~npuledthe SLIITI o f all elements. This i s the best wc can do if we have ro combine all elements two at a time. 'To conipuce the sii~iisof all prefixes in parallel, we can adapt this techniclue ol' doubling the nun~beror elements that have been added. First, ser all the surn[i] to a [i]. Then, in parallel add sumti-11 to sumti]. for all i > = l . I n particular, add elements that ar-e di.stance I away. Now double the distance and add surnri-23 to aum[i]; in this case ( ~ o J - ~ IiI > = 2. It' YOU continue to double the distance, then afler log, n 1rounds you will have cornyuted all partial sums. As a specific example. the following table illusrrales the steps of die algorithm for x six-elerncnt array:
r
1
r
initial values oC a
1
2
3
4
5
sum after disPaocc 1
1
3
5
7
9 1 1
sum after distance 2
1
3
6
I0
14
18
sum after distance 4
1
3
6
10
15
21
6
Figure 3.17 gives at1 implementation of chis algorithm. Each process first initializes one element of sum. Then jt repeatecllg co~rlputesparlial sums. I n the
126
Chapter 3
Locks and Barriers i n t a [ n l , sum[nJ, o l d [ n ] ;
process Sum[i = 0 to n-l] { int d = 1; sumri] = a [ i ] ; / * initialize elements of sum * / barrierli); ## S U M : surnli.3 = (a [i-d+l] + . . . t. a[i3 ) while (d < n ) Z oldlil = sum[i] ; /* save o l d value * / barrier (i) ; if ((i-d) >= 0) sum[i] = old[i-dl + sum[il; barrier (i); d = d+d; / * double the distance * / 1 }
Figure 3.?7 Computing all partial sums of an array,
algorithm, barrier ( i ) i s a call lo a procedur-e thal implements a barriersynchronization point; argument i is the identity of the calli~lgprocess. 'The procedur.e call retums after all n pl-ocesses have called barrier. The body of the procedure would use one of the 4gori~limsi n the PI-evioussection. (For [his probleln, the barr~erscan be optimized hince only pairs of processes need lo synclironize in each stcp.) The barrier points in Figlure 3.17 are needed to avoid interference. For examplc. all elclnents of sum nced to be initjaljzed bel'ore any process examines theln. Also, each procoss llecds t o ~nakea copy of the old value i n sum[i he.fore i t updates that value. Loop invariant SUM specifies how rnucll of the preiix ol'a each process has sulnrned on each iteralion. As mentioned, we call modify this algorithm to use any associative binary operator. A.11 that we need to change is [.heoperator in the slateme~~l that modifies sum. Since we have written the expression in the combining step as o l d [i-dl + sum[il, the binary operator need not be cormnutat.ive. We can also adapt he algorithm in Figure 3.17 to use fewer than n processes. In this case, each process would be responsible for computing the parlial sums of a slice of the an-ny.
3.5 Data Parallel Algorithms
127
3.5-2Operations on Linked Lists When working wi~hlinked data structures such as trees, programmers often use balanced shucrures such as binary trees in order to be able to search for wlcl insert items in logarilhmic time. When data parallel algorithms a e used. however; even with linear lists marly operations can be i~nplementedin logarithmic time. Hcre we show how to find the end of a serially linked Jist. The same kind of algorithm can be used for other operations on serially linked lists-for example, computing dl parrial sums of data values, inserting an element in a priori(.y lisl, or matching up elements of two lists. Suppose we have a linked lisl of up to n elements. The links are storcd in array linkCn1,and the data values are stored in array data [n]. The head ol' the list is pointed to by another variable, head. If element i is part of thc list, then either head == i or l i n k [ j 1 == i for some j between 0 and n- 1. The l i n k field of the last element on the list is a null poinler, which we will represen[ by n u l l . We also assume that (he l i n k fields of eletnents no1 on the Jist are null pointel-s and that the list is already inirialized. Tbe following is a sample list:
FbRwibwR head
The problem is to find the end of the list. The standard sequential algorithm list element head and follows links until finding a lull link; the last eletnetlt visited is rhe end of the list. The execution ~ i m eof the sequential algorilhin is thus proyorlional to the length of the list. I-lowever, we can find the end of the list in t i ~ proportional ~e co rhe logarithm 01the length of tlie list by using a data para.llel algorjtlun and the technique of doubling thal was introduced i n rhe previous sectioo. We assign a process Find 10 each I.ist elemenl. Let end [n] be a slza-ed iuray of integers. [f elerneat i is a part of the lisr, lhe goal of F i n d ti] is to set e n d [ i ] to h e index of the end of the Last element on the lisl; otherwise F i n d [i 3 sh0~11dSet ena [i ] Lo n u l l . 7'0 avoid special cases, we will assume that the list contajas at leas1 lwo elements. lnilially each process sets ena[il to l i n k [ i ] - h a t is, to the index o F the next element on the list (if any). Thus end initially reproduces the paltern of links i n the lisr. Then (lie processes execute a series of I-ounds. 111 each round, a process looks a1 end [end[i]1 . If both i l and end [i] are not null pointers, thcn [he process sets end [ i 1 to end [end [ i ] I . Thus aRer the first roilnd end [i] will poiot to a list eletnent two links away (if there is one). After two
starts at
128
Chapter 3
Locks and Barriers int link [n], end ln] ;
process FindCi = 0 to n-11 ( int new, d = 1; end[i] = link[i] ; / * initialize elements of end * / barrier ( i) ; # # FIND: end[i] == i n d e x of end of the list ## at most ad-' links away from node i while (d < n) { new = null; / * see if end[il should be updated * / if (end[i] ! = null and end[end[i]] ! = null) new = end[end[il]; barrier(i) ; i f (new ! = null) / * update end[il * / end [i] = new; barrier(i); d = d + d ; / * double the distance * /
1 1 Finding the end of a serially linked list.
Figure 3.18
rounds, end[i] will poin~lo a list element four links away (again, if tliel-e is one). After logz n 1rounds, every process will I~avefound the end of the list. Figure 3.18 givcs an irnplementarion of this dgorithm. Since Lhe programming tecliniclue is [lie came as i n the parallel prefix computation, the stnlcture of the algorithlu is also the same. Again. barrier (i) indicates a call to a proccdure Ihar implements barrier syncluonizalion for process i. Loop invariant FhVD specifies what end[i] points to before and nfLer e;icll ireraticln. If the end of [he list i s fewer Lhan 2d-1links away from eletnent i, then end[il will not change on Fu~tlierirerations. To illustrate the execution of this algorithm, consider a six-element list linked together as follows:
r
A t the start of the while loop in Find, the end pointers contain these links. Aflcr [he first ireration of t(lc loop,end will contain the following Irnks:
3.5 Data Parallel Algorithms
129
Notice tlial the end links for the last two elements have no1 cha~~ged since they are already correct. Alrer the second round, the end links will be as follows:
After the third and final I-ouod,the end links will have their final values:
As with the parallel prefix compulat~on,we can adapl this algorithm t o use fewer lhan n processes. 111 pilrticular, each would rhe~ibe responsible for computing the values for a subset ofthe end links.
3.5.3 Grid Computations: Jacobi Iteration Many problems in image processing or scient16c modeling can be solved using what are called grid or mesh cont/~ut~~rio~is. The basic idea is to employ a matrix of points tha~superimposes a grid or tuesh on a spatial ~.cgion. I n an image processillg probletn, rhe rnau-ix is initialized to pixel valuex, and the goal is l o do solnethjng like find sets of neighboring pixels having the same intensity. Scientific ~uodelilzgd e n involves appl-oxilnatjng solulions to partjal z1il'~erential ecluations, in which case the edges of Lhe ~llatrixare initialized LO boundary conditions. and the goal is to comnpute an approxi~nationfor the value ol'eilch interior point. (This col-responds to linding a steady state solution to an equation.) I n ejtber case, the basic outline of a grid co~npuracionis initialize Lhe ~n:~lrix; (not yet terminated) ( compute a new value for each poiat; check for terrni nation: 3
while
On each ireration, the new v a l ~ ~ of e s points can typically be computed in par-allel. As a specific example, here we present a si~nplesolution LO Laplace's equation in two dimensions: = 0. (This is a parrial differential equarion; see Section 1 1.1 for details.) Let g r i d [ncl.n+ l] be a matrix of points. The edges
130
Chapter 3
Locks and Barriers
of grid-the left and right columns. and the lop an.d bottotn rows-xepresent the boundary OF a two-dirnensionaI region. The n x n interior elements of g r i d corlrspontl t o a mesh tliat is superi~nposedon the region. The goal is to compute the steady state values of interioi points. For Laplacc's equation, we can use a finite difference ~netl~od such as Jacobi i.terntion. In parl.icular, on each ileration we compute a new value for each interior point by taking the average of the previous values of jts four closest neighbors. Figure 3.19 presents a grid computation that salves Laplace's ecluatio~l Once again we use barriers Lo synchronize steps of the using Jacobi itc~a~io~l. comp~~ration. Tn this case, thei-e are two main steps per ilera(ion: update newgrid and check for convergence, then move the contents of newgrid into grid. Two matrices are used so i l ~ new t values for grid points depend only on old values. We can tenninale the cornpi~tationeilher afier a f xed nuln ber of' iterations or wl-ren values in newgrid al-e all within solne tolerance E P S I L O N of those in grid. If differences ilre used. they cao be checked i n parallel, but the 1-esults need to be combined. This can be done using a parallel prefix compulatiorl; we leave the details 1.0 the reader (see the exercises al the end of this chapter). The ;~lgorirhmin Figure 3.19 is correcl, but it is simplislic in several respects. First, ic copies newgrid back into grid on encli iteration. It would be far more efficient lo "unroll" the Joop once ancl have each iteration first update poinfs going from grid to newgrid and then back from newgrid to g r i d . Seconcl, i t wollld be better sti.11 lo use an algo~ilhrniliac converges Faster, such as red-black successive over relaxation (SOR). Third. the algorithm in Figure 3.1,9 i s roo line grain for asyncli~onousmultiprocessors. For those, it would he far
real grid[n+l,n+ll, newgrid[n+l,n+ll; boo1 converged = false; process Grid[i = 1 to n, j = 1 to nl { w h i l e (not converged) { newgridli, j] = (grid[i-1,jl grid[i, j-11
+ grid[i+l,j l +
+ grid[i,j+l])
check for convergence as described in the text; barrier(i); gridli, j] = newgridfi,j l ; barrier ( i ); 1
1
Figure 3.19
Grid computation for solving Laplace's equation.
/ 4;
3.5 Data Parallel Algorithms
131
better to partition the grid into blocks and to assign o~icprocess (and pl.occssor) to each block. All these points are considered in dehil in Chapter I I. where we show how to solve grid cornputalions efficiently on both shared-memory m i ~ l ~ i plucessors and disrri buted machines.
3.5.4 Synchronous Multiprocessors On an asynchronous multiprocessor, each processor cxeculcs n separate process, and the processes execute at potenrially differelit rates. Such multiprocessors are called MIMD machii~es,because they have multiple inslruclion streams and ~nultiple data strearns-that is, they have multiple independell( processors. Tliis is Llie execution model we have assumed. Aithough MIMD macllines are the most widely used and flexible ~nultiprocessors, synch-o~iousmult.iprocessors (SlMD ~nachines)have recencl y heen avail able, e.g., the Connecfon Machine in the early 1990s and Maspar machines in the mid to late 1990s. An SIMD machine has multiple dala streams but only a single inslluction stearn. Tn particular, every processor executes exaclly lhe same sequence of instructiot~s,a i d they do so in lock .rtep. TI~Ismake3 S l M D niachines especially suited to executing data pa-allel algorithms. For example. on an SlMD machine the algorithm in Figure 3.17 ror computing all parlial sunis of an array simplifies to i n t a [n], sum[n]; process Surn[i = 0 to n-l] { sumCil = a [ i ] ; / * initialize elements
while ( d
n) { if ((i-d) > = 0) / * update sum * / sum[i] = sumti-dl + sum[il; d = d + d ; / * double t h e distance
of s u m * /
c
*/
We do not need to program b a ~ ~ i e rsince s every process cxccutcs the same at the same time; hence every instruction and indeed every memory instructio~~s reference is lollowed by an itnplicit barrier. In addition, we do not need 10 use extra val-iables lo hold old values. I n the assignment lo sum[i], evely process(or) fetches the old values from sum before any assigns new valua. Hence, par4l)el assignment statements appeal- to bc atomic on an SlMD machine, which reduces some sclurces ul'inlerfcrence. It is technologically muck easi.er to construct an SlMD machine with a massive number of processors than it is to constnlct a massively pzu-allel MIMP
132
Chapter 3
Locks and Barriers
n~acliine.This makes SIMD machines atlracrive [or large problertis that can be solved using data parallel algorithms. On the other hand. SmlD m?ch'lnes are special purpose in the sense that the entire machine executes one program a1 a time. (This is the main reason they have fa1len out of favor.) Moreover, i l is a cha.llenge for the programmer to keep the processors busy doing useful work. Jn the above algorithm, for example, fewer and fewer processors upclate s u m [ i] on each iteration, yct each has to evaluate the condition in the i f statement and then, if the condition is false, has to delay until all other processes have updated sum. In general, the executiot~time of' an i f staten~etltis the cola1 of the execution time lor e v e n branch, even if Ihe branch is not iaken. For example, Ulc lime to exccule an i f / t h e n / e l s e slatcment on each processor is the suin of the limes to evaluale the condition. executc the then part, and execute the else part.
3.6 Parallel Computing with a Bag of Tasks Many iterative problcrns can be solved using a data parallel y~-ogram~i-~ing scyle,
as shown in Lhe previous scclion. Recursive problems Lhal arise from (he divide and conquer paradigm can be parallelized by executing ~.ecul-sive calls in parallel rnther than sequentially, as was shown in Section 1.5. This section presenls a dil'l'erenl way to implemenl parallel computations using whai is called a hag nftcrslts. A task is an independent unit of work. Tasks are placed in a bag rhal is shared by two or more worker processes. Each worker execules the following basic code: while (true)
get a task from the bag; i f (no more tasks) break;
# exit the while loop
execute the task. possibly generating new orles;
I This approach can be used to implement recursive parallelism, in which case the tasks are the recursive calls. It call also be used Lo solve iterative problems havi11g a fixed number of independent taslts. The bag-of-tasks paradigm has severd usefill attributes. First, it is quite easy to use. One has only Lo define the representation for a task, impletr~entthe bag, program the code Lo execute a task. and figure out how to detect tzl-mination. Second. programs thal use a bag of tasks are scnfable, meaning that one can use any number of processors merely by varying the number of workers. (However, the perl'ormance 0.1' the program might no1 scale.) Finally, the paradigm makes i t
3.6 Parallel Computing with a Bag of Tasks
133
easy Lo i~nplementload ba1nncin.g. II: tasks lake different amounts oC lime to execute, s o w workern will probably execute more than others. However, as long as tsl~ereare enough more tasks chan workers (e.g., two or Ltlree times as many), the total arnounl of cornpi~tationexecuted by each worker should be approxirnately the same. Below we show how to use 21 bag o l tasks to implement matrix multiplicalion and adaptive quadratwe. I n matrix mulliplication there is a fixed number of
tasks. With adaptive quadrature, tasks are created dynamically. Botb exa~nples use critical sections to prolect access to the bag and use balrier-like synchronization to detect krm.ination.
3.6.1 Matrix Mu1fiplicafion Consider again the problem of multiplying two n x n matrices a and b. Th.is requires computing n2inner produc~s,one for each combination of a row of a and a column of b. Each inner product is an independent computation, so each could be execrlled in parallel. Ho\ve\ler. suppose we are going Lo execute the program on a machine wilh PR processors. Then we woi~ldlike to employ PR worlter processes, one per processor. To balance the colnputational load. each should compute about the same number of inner products. In Section 1.4 we statically assigned parts of the compulation t o each worker. Here, we will use a bag o t tasks. and have each worker grab a task whenever it is needed. 1I' PR is much smallel- than n, a good size for a task would be one or a few rows of the result matrix c. (This leads co reasonable data locality for matrices a and c. assuming matrices are sto~.edin row-~najurordel-.) For simplicity, we wjll use single ~'ows. Initially, the bag contains n iasks, one per row. Since these can be ordered in any way we wjsh, we can represent the bag by simply counti~ig rows : int nextRow = 0 ;
A worker gets a task out of die bag by executing h e atomic action: ( row = nextRow; nextRow++; )
tvl~erer o w is a local variable. The bag is empty ic r o w i s at least n. Tlie i~tomic action above is another instance of drawing a ticket. I1 can be implemented usj~ig a Fetch-and-Add instruction., it one js available, 01.by usirbg locks to protecr rl~e critical seclion. Figure 3.20 contains an outline of the full progran. LVe assume that the malrices have been initialized. Workers compute inner products in the usual way. The program terminates when each worker has broken out of its while loop. [f
134
Chapter 3
Locks and Barriers
int n e x t R o w = 0; # the bag oE tasks double a [ n , n l , b [ n , n l , c [ n , n l ; process Worker[w = 1 to PI ( int row; double sum; # for inner products
while (true) I # get a task ( r o w = n e x t R o w ; nextRow++; ) if (TOW >= n) break; coinpure jnner products lor c [row,* I ; 1 1 Figure 3.20
Matrix multiplication using a bag of tasks.
we wanr to deiect this, cve could employ a shared counler, done, ll'lal is initially zero and ihal is incremeoted atonlically before a worker executes break. Then if we want Lo have the last worker print the results, we could add the following code lo h e end of each worker: if (done == n) print rnalrix c ;
This i~scsdone like a counter barrier.
3.6.2 Adapfive Quadrature Recall that the quadrature problem is to approximate the integral ol' a function f ( x ) fi'on~a t o b. With adaptive quadrature, o w first computes the midpoint m between a and b. Then one approximaces three areas: from a co m, From m to b, and froin a to b. If tile sum oP be two smnl.ler areas is within some acceptable lolcrance of the larger area, tllen the approximation is considered good enough. if not, the larger problem is divided into ~ w osubproblems and the process is
repeated. We can use the bag-of-tasks paradigm as followb to implemenl adaptive quadrature. Here, a task is an interval 10 examine; i t js defined by the end points of' (lie interval, lhe value of the function at those endpoints, and the approxitnatiun of the area for that interval. lnidally there i s j u s t one task, for the entire interval a to b.
Historical Notes
135
A worker repeatedly ta.kes a task from the bag and executes it. In contrast to the matrix m~~ltjylication program, howevt?r, the bag may (temporarily) be empty and a worker might thus have to delay bel'o1.e getting task. Moreover, executing one task will usually lead to producing two smaller casks, which a worker needs to put back into the bag. Finally, it is tnol-e difficult Lo determine when all the work has bcen done because it js not sufficient simply to wail until the bag is empty. (Indeed, the bag will be empty rl~e~ninutethe first task is taken.) Rather, a11 rhc work i s done only when borh (1) the bag is elnpry itnd (2) each wol-ker is waiting to get another task. Figure 3.21 contains a program for adaptive quadrature using a bag of tasks. The hag is <epre.sentedby a queue and a counter. Arlolher counter keeps track oT the number of idle workers. All h e work has been finished when size is 0 i ~ n d i d l e is n. Note thac the progl-am contains several atomic aclions. These are needed to protect the critical sections that access the shared v;uiables. All but one of the arornic actions are unconditional. so they can be protectecl using locks. has to be implementcct using the rnore elaboHowever, the one await state~ne~it rate pro~ocolin Section 3.2 01-usjng a morc powcrl'ul sync11rouit;ition mechanism such as semaphores 01-mot~ilors. The prograrr) in Figure 3.21 (makes excessive use o f the bag of hsks. In particular, when a worker decides to generate two tasks, it pilts both of them i n the bag, then loops around and takes a new task out of thc bag (perhaps even one that it just pur in). Tostcacl, wc could have a worker put one Lilsk jli the bag ood keep the other. Once the bag contains enough work to lead lo a balanced computational load, a f~lrlheroptimjzation would be to havc a workel- execute a rssk fully, using sequerltial recursion, rather than putling Inore Lasks into the bag.
Historical Notes 'The critical section problern was first clescribed by Edsger Dijkstra [J965). Because t.he problem is lundarue~ilal,it h:is bcen studied by scores ol' people who have published 1itel;~llg~ hundrecls of papers on the topic. This chapter h:~s presented four of the lnost i~nl>ortantsolulions. (Raynal [I9861 has written an enlire book on lnuti~alexclusiol~algorithms.) AJtIioi~ghdevisjog busy-wai ting solutions was early on mostly an academic exercise-sioce busy waiting is illefficient on a single processor-the advent o f multiprocessors has spurre.d renewed inlel.cst in such solutjons. Indeed, all m~~ltiprocessors now provide instructions that support at least one busy-waiting solution. The exercises at the end of chis chapter describe most of these inslructions. Dijkstra's paper [I9651 presenl-edthe li rst n-process software solution. It is an exlension of the lirst two-process soluriotl designed by Dutch mathematician T. Dekker (see the exercises). However, Dijkstra's original formulation of the
136
Chapter 3
Locks and Barriers type task = (double left, right, fleft, fright, lrarea); queue bag (task); # the bag of tasks int size; # number of tasks in bag int idle = 0; # number of idle workers double total = 0 . 0 ; # the total. area
compute ;ipproximal-e area from a to b; inserl task ( a , b, f (a), f ( b ) , area) in the bag; count = 1; process Workertw = 1 to PR] { double l e f t , right, f lef t , f r i g h t , l r a r e a ; double m i d , fmid, larea, rarea; while (true) ( # check for termination
( idle++; i f ( i d l e == n && size == 0) break; ) # get a task from the bag ( await (size > 0)
remove n task from the bag; size--; idle--; ) mid = (left+right) / 2 ; fmid = f (mid); larea = (fleft+fmid) * (mid-left) / 2; rarea = (fmid+fright) * (right-mid) / 2; if (abs((larea+rarea) - lrarea) > EPSILON) ( ( put (left, mid, fleft, fmid, larea) inthebag; put (mid, right, fmid, fright, rarea) in the bag; size = size c 2 ; ) 1 else ( t o t a l = total + lrarea; )
1 if
( w == 1) # worker 1 prints the result grintf("the total is % f \ n M ,total);
1 Figure 3.21
Adaptive quadrature using
a bag of tasks.
problem (lid nor require (he eventual entry properly (3.5). Donald Kllulh [I9661 \\/cis the f r s ~ LO publisll a soluliun that also ensures event.ua1 entl-y. The tie-breaker algorithm was discovered by Gary Pelerson [I9811; it is often callcd Peterson's algorilhm as a result. This algorjthrn is particularly simple fo~.two processes, unlike the earlier solutio~lsof Dekker, Dijkstra, and others.
Historical Notes
137
l'eterson's algal-illim also generalizes readily lo an n-process solution, as shown in Figure. 3.7, That solution requires that a process go through all n-1 stages, even if no other process is tryi.ng co enter its critical section. Block and Woo (1990J present a variation hat requires only m stages if just m processes are contending for entry (see the exercises). The bakery algorithm was devised by Leslie Lamport 119741. (Figure 3.1 1 contains an improved version that appeared in Lamport 1.19791.) I n addition lo being more intuitive than earlier critical sectiorl solulions, the bakery algorilh~n allows processes to enter in essentially FJFO order. It also has thc interesting property that it is tolerant of some he~dwarc:failures. First, if one process reads turnti] while another process is setting it in the entry protocol, the read can relurn any value between 1 and the val~icbeing wr-ilten. Sccond, a process cs [i] may fail at any Lime, ass~~niing it im.mediately sets t u r n [i] to 0. However, the value of t u r n ti] cat] become unbounded in Lamport's solu~ionil' there is always a( I.east one process it) its critical seclion. The tickel algorith!n i s the sirnplest of tke n-process solulions to the crilici~l ,secrin~-lprohlen~;it also allows processes to enter in lhe order in which they draw number^. The cicket algorirlim can it1 fact be viewed as an optimization of the bake131 algorithm. However. it requires a Fetch-and-Add j n s ~ u c l i o n ,so~aehing that few ~nachincsprovide (see Alnlasi ancl Goldieb [I 9941 for machines that (lo). Hany Jordan [I9781 was one of the firs[ lo recognjzc the importance of barrier syl~ct~ronization in parallel iterati vc algorithms; he i? creditetl wit11 coining the term bar~ier. Altliough barriers axe uol neatly as commonplace its crjtical sections. they loo have been studied by dore~lsof people who have cleveloped several dil'l'erent i mple~ncniations. The cumhining tree barrier iu Figures 3.13 a11d 3.14 is sirnilax to one devised hy Yew, Tzeng, ant1 Lawric 11 9871. The butterBy barrier was devised by Brooks [1986]. Hensgen, Finkel, arrd Manber [I9881 developed the dissemination barrier; their payer :ilso describes a tourn;iment barrier, which js similar in structure to a combining tree. Gupta [I9891 describes a "fuzzy barrier," ujhich includes a rcgion of statements a 131-ocesscan execute while i t waits for a barrier; his paper also describes a hardware implementation or
fuzzy barriers. As described in seve.ral places i n the chapter, busy-wailing jmplemenrations of locks and baxxiers can lead to m.etnory and interconnection-networlc conrenlion. Thus it is iinport,mt for delayed processes lo spill on the contents of local memory, e.g., cached copies of variables. Chapter 8 of Wennessy and Pdtterson's a~cbitecturebook [I9961 discusses synchronization issues in multiprocessors, includes an intexesling historical perspeclive, and provides an exlensive set of I-el'erences. An excellent paper by Mellor-Cmrrrtny 2nd Scot1 [I9911 arlalyzes [lie performance of several lock and barrier protocols, including most of those described
138
Chapter 3
Locks and Barriers
in Lhe iexc. That paper also presents a new, list-based lock protocol and a scalable, tree-based ban-ier with only local spinning. A recent paper IMcKenney 1996Il describes clickrent kinds of locking patterns in parallel programs and gives guidelines for selecting which locking primitives to use. Another recent paper [Savage et al. 19971 describes a tool called Eraser that can detect data races due l o synchronization mistakes in Jock-based rnultitlueaded programs. On ~nultiprogramsned(time-sliced) systems; performance can su t'l'er greatly i f n process is preempted while holding a lock. One way to overcolne this problem is to use noizblockbzg-or lock fi-ce-algorithms inslead of using mutual exclusion locks. An irnpleinentation o i a data structure is nonblocking if some process will cornplele an operation in a finite number of steps, regardless of the execution speed of other processes. I-lerljhy and Wing [I9901 int~oducesthis concept and presents a nonblocking implementatjon of concurrent queues. 111 I-lerlihy and Wing's implementation, Lhe insert operation is also wait flee--i.e., every inserr operation will complere in a finite number of steps, again regardless of execution speeds. Herlihy [I9901 presents a general methodology for i1np1e.inenting highly concurrent data stnlctures, and Herlilzy 119911 contains a comptehensive discussion of wait-free synchronizatjon. A recent paper by Michael and Scott [I9981 evaluates the performance of nortblocking implementations of several data slruclures; they also considel- the alternative of lock-based impletnenta(ions lliat u e modified to be preemption safe. They cot~cluderhar nonblocking implemen~ations are si~perjorfor those data structures for which clwy exist (e.g., rli~eues),and lhat preemptjon-safe locks outpel-form normal locks on inultiprogrammed syscems. Data parallel algorithms are no st closely associated wirh ~nassivelyparallel machines since those m;lcliines pcmit thousands of dala elelmerits ro be operaled on in p:rallel. Many of the earliest examples of data parallel algori(1ims were designed for Lhe NYU Ul tricornputer [Schwartz 19801; al I have the characterjstic attribute that execution tjrt~eis logarithmic jn the size of the data. The Ultracomputel. is a MIMD machine: and i t s designers realized the importance of cfficienl critical section and barrier synchronization protocols. Consequently. they implein an early hardware prototype [Gottlieb et cnented a Replace-and-Add operi~tio~l al. I9K31; 111is jnstluction adds a value to a memory word and hen relurns the I-esull. I-lowever. more recent versions of the Uluacomputer provide a Fetch-andAdd operation instead [Al~nasi& Gotllieb 19941, which returns che value of a memory word bel'ore addjog ro it. This change was made because Fetch-anclAdcl, n~lilceReplace-aad-Add, generalizes 1.0 any binary combining opcratorI'or example, Fetch-and-Max or Fetch-and-And. This general kind of instruction i s called a FeLch-and-0 operation. The introduction of the Connection Machine in the micl-1980s spun-etl renewed interest it1 data pal-allel slgoritl~rns. That tnachine war designed by
References
139
Daniel Hillis [I9851 as part of his doctoral djsse~faiionat MIT. (Precioi~sfew dissertations liave been as influential!) The Conneclion Machine was ;I SINID machine, so barrier synchronization was au tolnalically provided after every machine instr.uct.ion. The Con~lectionMachine also had ~liousandsof processing elements; tli.is enables a data parallel algorillim to liave a very fine granulai.ityfor example, a processor can be assignecl t o every element of an array. I-lillis and Steele [I9861 give an overview d the oligiiial Col)r~ecfonMacliine and describe severi~linteresting data parallel algorithms. was introcluced by CarThe bag-of-tasks paradigm for parallel cornpuli~~g r i e r ~ .Gebtlter, and Leichter r19861. Thar papei- shows hoihl to ilnplernent thc task bag in the Linda prograrnmin,g notation, wliich we describe in Chapter 7. They called [lie associated proprnmtning n~oclclr-qlicrrtcd woi-keys. because any uuiiiber of workers can share the bag. Soine now call this the work j i ~ r n ln.zodel. because the tasks in the bag are farmed out to [he wol-ltel-x. We prefer the phrasc "bag of tasks'' because il characterizes the essence OF the approach: a sharecl bag or tasks.
References Alruasj. G. S., and A. Gobtlieb. 1994. l-lighly P.rrallel Conrprrling. 2nd ed. Menlo Park, CA: Benjarnilz/Conmni~igs.
of' Peterson's Block, K., and T.-K. Mroo. 1990. A mol'e cfticicnl ge~~eralization Proc~ssirlg L c ! t ~ ~35s (August): ~nutiialexclusion algorithm. Jn.ji>nn~rtio~~
219-22. Brooks, E. D.. 1 11. 1986. The bi~tterflybartier. 1111. Jn~~~r~lill c?f Purnllel Prog. 15, 4 (Aug~ltit):295-307. Can-iel-o,N., D. Gelei-nler, and 1. Leichter. 1986. Distributed data slnlcrures in Linda. Th.i~.teenthACIV Synzp. on Principles of l'rog. Lnuzg.~.. Jani~aql,pp. 23642. Dijkstra, E. W. 1965. Solution of a problem in conculrent progra~~~rning control. Comm. Act11 8 , 9 (September): 569. Gottlieb, A,, B. D. Lubachevsky, and L. Rudolph. 1983. Basic techniques for the efficient coordination of very large ilumbers of cooperating sequential pro. and Sjtslrms 5 , 2 (April): 164-89. cessors. ACM Pans,on P ~ o g Lnn.guage.r Gupta. R. 1 989. The fuzzy barrier: a lnechanism for high speed sy lichronizalion of processors. Tizird Inr. Conj: oop~ Arc:hitecturul Shipport f i r PIVQ. LangLlages and Operatilzg Systems, April, pp. 54-63,
140
Cha~ter3
Locks and Barriers
I-leanessy, J . L.. and D. A. Patterson. 1996. Cor~~pw/c.t.ilrc~hifectura: A Qua~iriralive Approt~ch,2nd ed. San Francisco: Morgan Kau fmann.
D., R. Finkel. and U. Manber. 1988. Two algorithms for barrier synclironi z a t i o ~ ,In/.Jo~lrnc~l of Pc~ru.lle1frog. 17, 1 (January): 1-1 7.
Hensgen.
I-Ie~.lihy,M. P. 1990. A methodology for irnpleme~iri~~g highly concuo-ent data slruclures. Puoc. Second ACM Symp. o,z Principles & Practice ~f Parallel Pmg., March, pp. 197-206. Hcrlihy, M. P. 1991. Wait-free synchronization. ACM fic~rzs. 011 Prog. Lunpuage.es ar1.d Sy.c/ems I I , 1 (January): 124-49.
M.P., and 1. M. Wing. 1990. Linearizability: a correctness condition and Systenzs 12, 3 for cotlcilrrent objects. ACM X-tr~ir.on P m g . Lnng~~ages (July): 463-92.
Herlihy,
l-iillis, W. D. 1 985. Tlze Conneclion Muchine. Cambridge, MA: MIT Press. I-Iillis,W. D., and G . L. Sleele., Jr. 1986. Data parallel algorithms. Conzm. ACM 29, 12 (December): Il70-83. Jordan, H. F. 1978. A special purpose architecture for finite element analysis. Proc. I978 I ~ z f .Conj: on Parallel Processir~g, pp. 263-6.
problem in concurrent prograinming control. Cnnsm.ACM 9, 5 (May): 321-2.
Knutll, D. E. 1966. Additional comlnents on a
L?unporl, L. 1974. A new solution of Dijkstra's concurrent progl-ammjng problen~.Con7nl. ACM 1 7, R (August): 453-5.
Lalnporl, L. 1979. A new ;~pproachto proving the correctness of multiproccss programs. ACM Trans. ow Prog. La)iguages und Sysrons 1 , I (July): 84-97. Lamport, L. 1987. A fast mutual exclusion algoritlinl. ACM T m s . on Conzp~.rterSystems 5, 1 (February): 1 -I I. McKenney, P. E. 1996. Selecting locl
uter'.S,ys~ern.c 9. 1 (February): 21-65. JL.
Mich;lel, M. M., and M . L. Scotl. 1998. Nonblocking algorilhlns and preernption-sd'c locking 011 ~nultiprogrammed sharccl rnemory mu1tiprocessors. Journal of PCII-trllel ancl Di.strilm.tcd Computer 5 l,1-26. Pctcrson, G. L. 1981. Myths aboul the r n u a ~ a lexclusion probletn. Info1nzation Processing Lef1er.s 12. 3 (June): 1 15-61.
Exercises
Raynal, bI. 1986. Algoraithrns for
~M~rllial Exclusion.
141
Cambridge, MA: MIT
Press.
M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson. 1997. Eraser: a dynamic data race detector for multitlueaded progratns. ACiW. Trczns. on Computer Syslems 15,4 (November): 391-41 1.
Savage, S.,
Schwartz, J . T. 1980. UItracomputers. ACM Traizs. on Prog. Lutzguages and Systems 2 , 4 (October): 484-52 1. Yew, P.-C.. N.-F. Tzeng, u l d D. H. Lawlje. 1987. Dishibuting hot-spot addressing in large-scale multiprocessors. IEEE Pans. on Computers C-36, 4
(April): 388-95.
Exercises 3.1 Following is Dekker's algol-llhm, the first solution to the critical section problem for two processes: boo1 enterl = false, enter2 = f a l s e ; int turn = 1; process P1 I while ( t r u e )
{
enterl = true; w h i l e (enter2) i f ( t u r n == 2 ) { enterl = false; while (turn == 2) skip; enterl = true;
1 crjtical section; enterl = false; t u r n = 2; noricrilical section; 1
1 p r o c e s s P2 { while (true) { e n t e r 2 = true; w h i l e (enterl) if (turn == 1) { enter2 = f a l s e ; w h i l e (turn == 1) skip; enter2 = true;
1
142
Chapter 3
Locks and Barriers
ci-itical section : enter2 = false; turn
=
1;
noncrirical section;
1
I
Explain clearly how the program ensures mutual exclusion. avoids deadlock. ovoids unneccssary delay, ant1 el~sureseventual enlsy. For the eventual enu-y properly, how many times can one process that wanls to enter its critical secljon be bypassed by the oLher before the first gets in? Explair~. 3.2 Suppose a computer ha$ atomic dec~crnentDEC and increment I N C instruclions (hat alho return the value of the sign bil of thc rcsulc. In particular. Lhe decrement ~nsiruclionhas the following effect: DEC ( v a r , sign) : ( var = var - 1; if (var >= 01 sign = 0; else sign = 1;
)
INC i s si~uilar.the only difference being that it adds 1 to v a r .
Using DEc andloi. INc, develop a solution to rhe critical scctiorl pcoble~nTo]- n proccsscs. Do not wony abouc the e\~entualentry property. l>esc~.ibcclearly how your solulion works ancl why i t is correct. 3.3 Suppose a computes has an ;~to~nic swap instruction, detinecl as l'ollows; Swap(var1, var2) : ( tmp = v a r l ; v a r l = var2; var2 = tmp; )
Irr the above, tmp is a n internal register. (a) Using Swap. develop a solution to the critical section problen~lor n processes. Do not worry about the evenlual e n 0 property. Describe clearly how
your solutio~iworks 2und why
is corrcct.
(b) l\/lodify your answer to (a) so that it will perl'osm well on a mul~iprocessor system with caches. Explain what changes you make (jf any) and why. ( c ) Using Swap, develop a fail. solulion lo the critical sec~ionpi-oblem-namely, one that also ensures eventual enlry lo each waicing process. The key is lo order the processes, where first-come, first-served is the most obvious ordering. Note that you cannot j u s t use your answer to (a) lo imple~nelltthe alomic action in the ticket algorithm, because you cannot get a fai.r solution using unfair componenls. You may assume chat each process has a unique identily, say, the integers from 1 to n. Explain your solution, and give convincing arguments why it is correct and fair.
Exercises
143
3.4 Suppose a computer has an aroo~icCompare-nad-Swap instruction, which has the following ei'fect: CSW(a, b, c ) : ( if (a == C ) ( c = b ; return ( 0 ) ; 1
else ( a = c;
return (1); 1 )
Para~netersa,b, and c are sjrl.iplevariables, such as integers. Using csw, develop a solutioc~to the critical secrion problem for n processes. Do not won-y about the eventual entry property. Describe clearly Ilow your sojution works and why it is colrec t. 3.5 Some RISC (reduced instruction set computer) machines provide the following two jnstructions: LL(register, variable) # load locked ( register = variable; location = &variable; ) SC (variable, value) # store conditional ( if ( l o c a t i o n = = &variable) ( variable = value; return (1); ) else return (0); )
The (LL) (load locked) instructiotl atomically loads variable into register and saves the address of the variable in a special register. Thc location register is shared by all processors, and il is changed onlj. as a result 0.E executing LL instructions. The sc (store condilional) instruction atomically checks LO see i f the address of variable is the same as the address cu~senclystored in location. If i t is, sc scores value in variable and returns a l: orherwise sc returns a O.
(a) Using these instructions, develop a solution to the critical section problem. Do not worry about the eventual entry property. (b) Are these instructions powerful enough to develop a fair solution to Lhe critical section problem? If so, give one. If not, explajn why not.
3.6 Consider the load Locked (LL) and store condilional (SC)instructions defined in the previous exercise. These instructions can be used to implemenl an a~omic Fetch-and-Add (FA) instruction. Show how to do so. (Hint:l'ou will need to use a spin loop.)
144
Chapter 3
Locks and Barriers
3.7 Considel the following critical section protocol [Lampor1 1987): i n t lock = 0; process CS[i
= 1 to n] { while ( t r u e ) ( (await (lock = = 0 )); lock = i; Delay; while ( l o c k ! = i ) ( (await (lock == 0)); lock = i; Delay;
1
crilical section; lock
= 0;
noncl-ilical section; 1 1
(a) Suppose the Delay code is deleted, Does the prolocnl ensu~.emutual cxclusion? Does it avoid deadlock? Does the protocol avoid unnecessary clelay? Does it ensure eventual enuy'? Cal~eFullyexplain each of your answers. (b) Suppose elie pl-ocesseh execute with true concurrency on a multiprocessor. Suppose the Delay code spins for long enough to e~is~lre that every process i tha~ cvaits for lock to be o has r i ~ n eto execule lhe assignment stalemenl Lhac sets lock to i. Does the p~.otocolnow ensul-e ~ n u r ~ ~ exclusion, al avoicl deadlock, avoid unnecessary delay, and ensure eevetual entry? Again, carefully explain each of your answers.
3.8 Suppose your machine has the followi~lgatomic instl-uclion: flip ( l o c k ) ( lock = (lock + 1 ) % 2; return (lock); )
# flip the lock # return the new value
Someone suggests the followjng solulion lo Ihe critical section problem for t ~ ) o processes: i n t lock = 0;
# shared variable
process CS[i = 1 to 21 { while ( t r u e ) { while (flig(1ock) != 1) while (lock 1 = 0) skip;
critical section; lock
= 0;
noncritical section; 1 1
Exercises
145
(a) Explain why this solulion will not work-in other words, give an execution order that results in both processes being in their critical sections at the same time. (b) Suppose Ulnt the first line in the body o.1 f l i p is changed to do addition n~odulo3 rather than modulo 2. Will h e solution now work for two processes? Explain your answer.
3.9 Consider the following variation on the n-process tie-breaker algori~hm1Block and Woo 19901: int in = 0, last[l:n];
# shared variables
process CS [I 1 to nl ( int stage; while (true) { (in = in + 1;): stage = 1; last[stagel = i; ( a w a i t (last[stagel ! = i or in <= stage);) while (last [stage] ! = il { # go to next stage stage = stage + 1; last (stage] = i; (await (last [stage] I = i or in c = stage) ;)
1
critical section; (in = in - 1;) noncritical section;
1
1
(a) Explain clearly how this prograrn ensures mutual exclus.ion, avoids deadlock, and ensures evenrual entry. (b) Compare the perfmnance of this algorithm to that of the tie-breaker algori th~nin Figure 3.7. In particula~;which is faster if only one process is hying to enter the critical section? How tnuch faster? Which is faster if all n processes arc wyjng to enter the crilical section'? How much faster?
(c) Coiiverl the coarse-grained solution above to a fine-grained solulion in which tbe only atomic actions are reading and writing variables. Do not assume increment and decrement are atomic. (Hint:Change i n to an array,)
3.10 (a) Modify the coarse-graincd ticket algori.thm in Figure 3.8 so next and number do not overflow. (Hint:Pick a constant MAX greater than n, and use modular arithmetic.) (b) Using your answer to (a), m o d f y the fine-grained licket algorithm in Figure 3.9 so next and number do not ovelerflow. Assutne the Fetctl-and-Add instruc[ion is available.
146
Chapter 3
Locks and Barriers
3.1 I 111the bakery algorjthrn (Figure 3.1 I), the values of t u r n are unbounded if there i s always at least one process in its critical section. Assitlne reads and writes are alomic. 1s i t possible to rnodify the algorithm so that values ot t u r n are always bounrled'' IF so, give a modi6ed algorithm. If riot, explain why not.
3.12 In the critical section protocols in che text, every process executes the same algorillim. It is also possible to solve the problem using a coordillator process. In particular, when a regular process CS [i] want5 Lo enter its critical section, il tells the coordinator, then waits for Lhe coordinator Lo grant pel-missiotl. (;I) Develop protocols for the regular processes and the coordinator. Do not. worry about the eventual entry property. (Hint: See the cool-dinator ban-ier in Figure 3.12 I'or ideas). (b) Modify your 'answer ro (a) so that it also ensures eventual entry.
3.13 Displi~y(3.1 I ) shows how Lo use a shared counter for bmier synchroniza~ion:but chat solution does not solve the problem of reselting the counter to zero. Develop
a cornplete solution by using two counters. First develop a coal-se-grained solu[ion l~siilgawait statements. Then develop a fine-graitled solution; assu~nethe Felch-and-Add instruction is a\/ailable. Be carcIuJ about a process corning back around to Ihe barrier before all others have left it. ( H i r r ~ :The "increment" in Fetch-a~itlAdd can be negative.) 3.14 A to~u.nnntcn/harrier has he same kind of tree srructurc as i n Figure 3.13, bur the worker processes interact di fl'erenlly Llian in llie combining-tree barrier of Figure 3.14. In particular, each. worker is a leaf node. Pairs of adjacent workers wait lor each other to arrive. One "wins" and proceeds to the next level u p the lree; (lie olher waits. The winncr of the '-toucnacnent" at the top of the tree announces that all workers have reached Ule barrier-i.e., it tells all of rl~emchat
they can continue. (a) M'riie programs for the workers, showing all derails of how they synchronize. Asatme that the number of workers, n, is a power of 2. Either use two sets o.l' variables or reset their values so that the worker processes can use the tourniunenl barrier again on their next loop iteration. (b) Co~npareyour answel- Lo (a) with [lie combining-tree barrier in Figure 3.14. How many variables are required h11-each kind or barrier? I f each assignment and await statement takes one unit of time, whal is the rolal time required for barrier synchronization in each algorithm? What is the lotal lime Tor the cornhining-tree barrier if it is modified so that the root broadcasts a single continue nlessage, as described it1 the text?
Exercises
147
3.15 (a) Give complete details for a bur~erfiybarsier Tor eight processes. Show all variables that are oeeded, and give the code thal each process would execute. The barrier slzould be reusable. (b) Rcpeat (a) for a dissemination ba~rierfor eight processes.
I
(c) Compare your atlswers Lo (a) and (h). How many variables are required for each kjnd of balriel-? If each assignment and await statement lakes one nit of t h e , what is the total lime required for barrier synchronization in each algorithm?
l
I
I
(d) Repcat (a), (b), and (c) lor a six-process barrier. (e) Rcpeil~(a), (b), and (c) for a 14-process barrier, 3.16 Cons.ider the following implementat.ion of a single n-pcocess harrier: int arrive [ l: n ] =
(
[n) 0 );
# shared array
code executed by Worker [ 11 : arrive [ l l = 1; (await (arrive [nl == 1) ;) codeexecuted by WorkerCi = 2 to nl (await (arrive[i-11 == 1);) arrive[i] = 1;
:
(await (arrive[n] == 1);)
(a) Explain how this barrier works. (b) What is the lime complexity of the barrier?
(c) Ext.end the above code so that the barrier i s irusable. 3.17 Assume there are n worker processes, nul-nbered from 1 to n. Also assurne thal your machine has an atomic incrernenl instruction. Consider the followin2 code for an n-process ban-iel-tllal i s supposed lo be reusable: int count = 0 ; go = O j
# shared variables
code execulcd by Worker [I]: (await (count == n-1) ;) count = 0 ; go = 1;
code executed by Worker [ 2 :n] : (count++;) (await (go == 1);)
(a) Explain what is wrong with the above code?
148
Chapter 3
Locks and Barriers
(b) Fix thc code so Lhat i t wo~ks.Do not use any morc shared variables. but you may introduce local v;uiables.
(c) Suppose the above code were correct. Assume all processes arrive a1 [lie barrier at the s a ~ n etimc. I-low long does it take before evely process can leave the balricr? Count each assignment statement as 1 time unil, and counl each await slatelne~itas 1 ~ i m eunil once the condition becorncs true.
3.18 In Lhe pu-allel prefix algorithm in F i g ~ ~3.17, t e there are three barrier synclironizalion points. Some of these can I>e optimized since il is not always necessary for. ill1 procesxes to arrive at ;I barrier before any can proceed. Identify the b a ~ ~ i e r s that can be opti~nized.and give the details of the op~imizalion. Use the smallesr possible nilnlber OF t wo-process barriers. 3.19 Modify cach of Lhe following algal-ilhms to use otlly k processes instead o-l'n processes. Assume lhnt n i s :I multiple of k. I
(a) TIIC pi~rallelprefix comput~~tiou i n Figlife 3. 1 7 (b) The linked-list compuiation in Figure 3.18. (c) The grid computation it1 Figure 3.19.
3.20 Onc way to to sort n integers is to use an oddleven exchange sort (also called ;u> oddleven transposilion sort). Assume here are n processes P [I n] :and that n i s evcn. In [his kind of sorting method, each process cxecutes a series of rou~ids. On odd-uu~nheredrounds, odd-numbered processes P lodd] exchange valucs with P [odd+l] if the values are out of order. On even-nurnbered rounds, evennumbered processes P [even] excbange values with P [ even+1] . again if the values are out of order. (P [ 11 and P [n] do nothing on even numbered rounds.) (a) Uelei.~nioehow many rounds have Lo be executed in the worst case ro sort n nu~nbcts. 'The11 write a data parallel algorithm to sort inleger array a [l:n] into ascendirlg order. (b) Modify your answer ro (a) lo ~eenliinateas soon as the array has been sorled (e.g.. it might i~litiallybe in ascending order).
(c) Modify yyour answer
LO (a)
co use k processes; asslime n is a muttiple of k.
3.21 Assume there are n processes P ( l : n ] and chat P [I] has some local value v that it wants Lo broadcast to all the others. In particular. the goal is to store v in every entry of array a [I:nl . The obvious sequential algorithm requires linear time. Write a data parallel algorjthm to slorc v i n a in 1ogaritlim.i~ Lime. 3.22 Assume P [l:n] is an array of processes a ~ l db [l:n] is a shared Boolean array.
(a) Write a data parallel algorithm ro count the number of b [ i ]that arc true.
Exercises
149
(b) Suppose Lhe answer to (a) is count. whicll will be belween 0 and n. Write a d a b parallcl algorithm that assigns a u n i q ~ ~integer e index between 1 and count lo each P [ i] for which b [i I is true.
3.23 Suppose you are given two serially linked lists. WI-ite a data parallel algorithm LIMI matches co~.respondingelements. In particular, wheo the algorithm terniid 10 each other. (If one list is nates, lie ith elements on each list s h o ~ ~ lpoilit longer that Llie other, extra elements on the longer list should have null pointers.) Define the data structures you need. Do nol modify he original lists; instead store the answel-s in additional arrays. 3.24 Suppose thar you are give11a serially linked lisl and rhar: the elements RrC linked Logether in ascending o~dei-of their data fields. The standard sequential algorithm for inserlirlg a new ele~nelitin the proper place takes linear time (on averige, lialf tbe list has to be searched). Write a data parallel algorith~nto insert a new element into Lhe list in logarithmic time.
3.25 Consider a si~n.plelanguage .tor expression e v a l i ~ a ~ owith n the following 3yntax: e x / ~ ~ - e . r ~::i=o n operand I expression operator-oper~md ol~ernnd::= iden~ijkrI nrnn/?er o/~ermoi'::= + I *
An identilier is as usual a sequence ot' letters or digi~s,beginning wilh a letler. A nunibel- is a sequence of digils. Tlie opc~.alorsare + and *. An array of characters chll:n] has been given. Each character is a lelter, a digit, a blank, +. or *. The sequence of chatncrers from ch [I] lo ch[nl represents a senlencc in the above expression language.
Write a data parallel algorittlm that deter~ninesfor each character i n ch [I:n] the loken (nontexminal) lo which il belongs. Assume you have n processes, one pcr character. The resull fur each character should be one of I D , NUMBER, PLUS, TIMES, or BLANK. (Hints: A regular language can be parsed bp iI finite-slate automaton, which can be represented by a rans sit ion matrix. The r o w (cf Lhe maLrix ace indexed by stales. the columns by chwaclers; the value o F an entry is tbe new slate the auto~nalon~voilldenrer. given thc currenl starc and next characcer. The conlposition of state transition functions is associative, which makes it menab able to a par;lllcl prefix co~npulation.)
3.24 The following region-labeling problem arises in image processing. Given is inkgcr array image [ 1 :n, 1 :n] . The value of each entry is the intensily of a pixel. The neighbors of a pixel ai-e the lour pixels to 11ie left, right, above, and below il. Two pixels belong to the satne region if they are neighbors and they have the same value. Thus, a region js a maximal set of pixcls that are col?nectcd and that all have the same value.
150
Chapter 3
Locks and Barriers
The problem is to find all regions and assign every pixel in each region a unique label. In particular. lcl label C 1 :n, 1 :n] bc a second matrix, and assume that Lhe initial value or l a b e l [i ,j ] is n* i + j . The final value of label [i, j ] 1s lo be the largest oT Ihe initial Iabcls in thc region to which pjxcl [ i ,j ] belongs. Write a data puallel grid computation to co~nputethe final values of label. The complrCalio1) should rcrmirlnte when no label changes value.
3.27 Using a grid computation to solve the region-labeling problem of the previous exercise reclujces worst-case execution time of O(n2). This car) happen if there i s a regioti that "snakes" around the image. Even for sitnple itnages, the grid compuLation requires O(n) execution time.
,
The region-labeling problem can be solved as follows in lime O(log n). Firsl, for each pixel, delerinine wlierher i t is on the boundary or a region and, if so, which of its neighbors are ;~lsoon the boundary. Second, have each boundary pixel create poin~ersto its neighbors that are on tlle boutldx~y;this produces doubly lirlked liscs corinecting all pixels that are on the bourldary of a region. Third, using the lisls, propagale the largesr label of any of the boundaiy pixels lo he olhers (hat are on Ihe boundary. (The pixel with the largest label for any region will be on its boundary.) Finally, use a p;lrallel prefix compuialion to propagare the label foe each region to pixels in the inierior of the region. WI-iiea dala parallel program that iinplenlerlts this algorithm. Analyze iLs execution li~lze,which should he O(log n).
3.28 Consider rhe problem of generating dl prime numbers up to some limit L using the bag-of-(asks paradigm. One way to solve this problem is to inimic the sieve OF Eratoslhenes in which you have an array of L integers and repeatedly cross out multiples of primes: 2. 3, 5, and so on. I n [his case the bag would contain ilie next prime to use. This approach is easy to program, but i t irses l o ~ sof storage because you need to have an an-ay oi-'L inlegers. A sccontl way to solve the problern is LO check all odd numbers, one after the otliec Ln this case, the bag woultl contain all otld numbers. (They do not have to he stored, because you can generate Lhe next odd number from (lie current one.) For each candidate, see whether il is pr-ime; if il is, add it to a growing list of known p~-imes.The list of PI-it-nesis ilsed to check Future candidates. This second approach requires far less space than the first approach, but it has a tricky slnchronizatjon proble~n.
Wrile parallel programs Lhai irnplen~eilleach of lhese approaches. Use w worker pl-ocesses. At the end o l each program. print [he last 10 primes thar you have found anrl the execution time for (he compulational par1 of the pl-ogram.
Exercises
151
Con1pa1.e the time and space requ irernents 01' chese two programs for various values of L and w.
3.29 Consider the proble~nof determining the number. of words in a dictionary that contain uniclue letters-namely, (he number of words in which no letter appears more than once. Treat upper and lowel-case versio~lsd a letter as the same letter. (Mosr Unix systems conlain one or tnorc online dictionaries. for example in / u s r / d i c t /words.)
Write a palallel program to solve this probiern. Use (he bag-of-tasks paradigm and w worker processes. A1 the end of the program, print the number of words that concain unique lelters, and also print all those rhal are the longcst. You may read the dictionary file into shared variables befol-e slarting the parallel cotnpu tation.
3.30 A concul.rent queue is a queue in which insert and delele operations can execute in parallel. Assume (he queue is stored in array queue [ l : n l . Two variables f r o n t and rear polnt to the first full element and the next empty slot, respectively. The delele operation delays rlntil lhere is an elc~nentin queue [ f r o n t ] , then removes i t and jncrernencs f r o n t (modulo n). The jnserl. operalion delays u n t i l there is an empty slot, then puts a new element in queue [ r e a r ] and increments r e a r (modulo n). Design algorilhms for queue insertion and deletion that ~naxixnizeparallelism. 111 particular, excepl a1 critical points where they access shared variables, inserts a l d deletes ought to IF, able to proceed in pa-allel with each other and with themselves. You will need additional variables. You may also assume the Fetch-andAdd instruction is available.
As we saw in (lie last chapter, most busy-waiting prol.ocol5 arc quite complex. Also, there is t ~ oclear dislinclion belween variables that are used lor sytichronization and those that are used for comp~llingresults. Consecluently, one bas to he very careful when designing or using busy-waiting protocols. A furtl~erdeficiency of busy waiting i s that it is inel'licient ii.1 tnost multithreaded programs. Except in pardlel programs i n which the number of processes malches 111e number of processors, there are usually more yroccsses lliiun processors. Hence, a processor executing a spinning process can usually be Inore produchvely elnployed execuGng another proccss. Becaiise synchronizalion is fundnrne~iral10 concurlcnt programs. i t is dcsirable to have special tools that aicl in the design of correct sy nclironization prolocols and that can be used to block processes tliat cnust be delayed. Soncip11orr.s were the first and remain one of rlze most important synchronizatjon tools. They make ir easy to protect critical sections and can be used in a disciplinetl way to implc~nenlsignaling ant1 scheduling. Consecjuenlly, they are i~lclutleclin all ~hreadsand parallel programming libraries of which the author i s aware. Moreover, semaphores can be implemented jn Inore than one way-using busywaiting techniques fr-o~nthe previous chapter or using n kernel, as described in Chapter 6 . The concept of a seluaphore-and indeed the very term-is motivated by one ol' lie ways in which railroad traffic is syncl~i-onizedto iivoid train collisions. A railroad se~naphoreis a sigr~alflag that irldicales whether. the t.rack ahead i s clcal-or is occupied by anorher- train. As a train proceeds, sernapl~oresare set and cleared; lhey remain set long enough lo allow anothel- train t i m e to stop i F necess a y . Thus rail~.oadsemaphores can be viewed as rnech;~nisrusthat signal condiLions in order to ensure mutually exclusive occupancy of critical sections of track.
154
Chapter 4
Semaphores
S e n ~ t ~ p t ~ )inr econcul-rcnt s programs are similar: They providc a basic signditlg mechanism and are used to implement mutual exclusion and condition synchronization. This chap~erdefines the syntax and semantics of semaphores and then illustrates how to use then1 to solvc synchronization problems. For coillparison purposes, we I-eexamine some of the problems considered in previous chapters, including critical sections, producers and consumers, and barriers. In addition, we intl.oduce several interesting new problems: bounded burel-s, dining ph~losophers, readers and writers, and shorcesl-job-next resource allocation. Along Lhe way, we introduce ~111-ee usefd programming tecllniques: (I) changing variables, (2) split binary semaphol.es, and (3) passing the baton, a general teclitlique cl~at can be used to concrol the order io which processes execute.
I
4.1 Syntax and Semantics A semaphore is a special kind of shared variable that is manipulated only by two n/onzic: operations, P and v. Think of a semaphore as an instance of a semaphore class and of P and v as the two methods of he class, with the additional attribute
that the methods are atou~ic. The value of a semaphore is a nonnegative integer. The v operation is used to signal the occurrence o l an event, so it increments the value of a semapliore. The P operation is used to delay a process until an event has occurred, so it waits until the value of a semaphore is positive then decrements the value.' The power of semaphores ~-esullsl'rom the fact Lhac P operations might have to delay. A semaphore is declared as follows: sern s;
The default initial value is zero. Alternatively, a semaphore can be initialized to any nonnegative value, as in sern l o c k
=
1;
A.rrays of semaplioces cao also be declared-ant1 usual fashion, as in
optionally initialized-in
the
sern forksf51 = ([53 1); The lettcro P a11dv are mnemonic tor Dutch words, as described in thc Ius~oncalnotes ;I( tllc cnd ot the chapter Think o f P as standing for pa^^.". i ~ l d think of thc upward 'rliilpc of v as signifying ~ncrcment.Somc ailrhorc usc w a i t and signal inslcd ol' P and V, but wc wlll 1,eserve ~ h cuse o f thc tcrms w a i t WLI signal for [he next chapler on monrtors.
4.1
Syntax and Semantics
155
11' there were no inilialization clause in lllc above 0 ) s = s V(s): ( s = s + 1;)
-
1;)
The v operation al-omic;~llyincremenrh the value of s . The P operation decre111entsthe value of s, but to ensure th;~ts is never negative, the P operation waits until s is positive. The delay and decrement in the P operalion are a sit~gleatomic action. Suppose s is a semaphore with current value 1. If two processes try at [tie same time to execute P ( s ) operations, only one process wjl l succeed. However, ifone process tries to execule P ( s ) at the same time that another procesb tries lo execute v(s),tlie Lwo operations will both succeed-in an ~ n p r e d i c ~ border-and le the final value of s will again be I. A ge17erul semaphore i s one that can take on any nonnegative value. A bitlury semaphor.e is one whose value is always either o or I . In particular. a v operalion on a binary sernaphorc js executed only when the sernapho~-ehas value 0. (One could clefiue the v operation on a binary scmapllore lo be one thal waits uulil the value or Lhe seniaphore i s less than 1:lhen cloes the incl-emenl.) Since Lhe semaphore operalio~lsare defined in terms of await statements, their fornlal semaatjcs follow directly from applications of the Await Statenlent Rule given in Section 2.6. In partjcular, infere~icerilles for P and v result directly from specializing rhc Await Statement Rule by the actual a w a i t sratelnents used above. Fairness attributes of seinaplzore operations also follow from Lhe lact 1hat [hey are defined in terms 01a w a i t sratcmenls. Using Ihe terminology o l Sectioli 2.8, i f a > 0 becomes and [hen remains true, execution of P ( a ) will terminate if the underlying scheduli~lgpolicy is weakly fair. If s > o is infinitely often true, execution of P (s) will terminate if the undeclyjng scheduling policy is strongly aromic I'air. Since the v operation on a general sernapl~orcis an u~lconditio~~al ilction. v will terminate i C tlie underlying scheduling policy is unconditionally fair. As will be shown in Chapter 6, implcmenrntions of semaphores usually ensure t1i;lt when processes are delayed while executing P operations, they are awakened in the order Lhey were delayed. Consequently. as long as other processes execute an adeqi~atenu~iiberof v operations, a process wailing at a P operaliotl will eventi~allybe able to proceed.
156
Chapter 4
Semaphores
4.2 Basic Problems and Techniques Semaphores directly supporl the implementation of mutual exclusion, as in the critjcal section problem. They also directly support simple ~OI-ms of condition sy~lchronizalioni n which semaphores a-e used to signal the occurrence of events. These two uses of semaphores can also be combjned to solve Inore complex synchmnizalion problems. This section illusrrares the use of semaphores for mutual exclusion and conditiorl synchronization by presenting solutions to four problems: (1) critical sections, (2) barriers, (3) producerslconsu~ners, and (4) bounded buffers. The solutions to the last two problems also illlistrate the important program~ning tedlnique of split binary sernapl~ores. Later sections show how to use the techniques introduced here to construct solutiotls to more complex synchronizalion problems.
4.2.1 Critical Sections: Mutual Exclusion Recall that in the crilical section problem, each of n processes repearedly executes a critical section of code, in which it requires exclusive access 10 some shared resource, and then executes a noncritical section, in which it computes using only local objects. In particular, in its critical section each process requires mnutua.lly exclusive access to the shxed resource. Semaphores were conceived in part to rn'ake the critical section problern easy to solve. Figure 3.2 presented a solution i~singlock variables in which variable lock is true when no process is in its critical section and lock is false otherwise. Let kue be represented by 1 and let false be represented by 0. Then, a process enters ils critical sectiori by first waiting for lock to be 1 and then selting lock to 0. A process leaves its critical sectjon by reselling lock to l. These are exactly the operations supported by semaphores, Hence, let mutex be a semaphore that has initial value 1. Execution of e (mutex) is the same as waiting for lock lo be 1 then setting lock to 0. Sjmilarly, execution of V(mutex) is the same as setting lock to 1 (assuming lock is set lo 1 only when il i s known to be 0 ) . These observations lead to the solurion to the critical section problem shown in Figure 4.1.
4.2.2 Barriers: Signaling Events We introduced barrier synchronization in Section 3.4 as a means to synchronize stages of parallel iterative algorithms, such as the data para\le\ algorith.ms i n Sec(lo113.5. The busy-waiting ~mplementationsof barriers used flag variables that
4.2 Basic Problems and Techniques sem mutex
=
,
157
1;
process CSCi = 1 to nl while ( t r u e ) { P (mutex);
1
critical secdon; V(mutex) ;
~loncrilicalsection: 1 1 Figure 4.1
Semaphore solution to the critical section problem.
processes set atid cleared as they arrivccl ar and left a barrier.. As was bhe case with clle crilical sectio~~ problem, semaphores make it relatively easy to i rnplemen[ ban-ier synchronization. The basic idea is to usc one semaphore for each synchronization flag. A process sets n flag by execuling a v operation; a process wajrs for a flag to be set and then clears il by executing a P operation. (XI each process in a parallel prograrn execules on ils own processor. then delays at baniers sl).ould be irnple~ncnledusing spin loops, no1 by hlocking processes. Thus, we would want to use a busy-wairing implementation of setnaphores.) Co~lsiderGrsr Lhe problem OF implementing a two-process barrier. Recall that two properties are required. First, neither orocess can gel past rhe barrier until bolh have arrived. Second, the ba~xiermust be reusable since in general the satne processes will need to sync.hron,izeafter each scage of the co~npu~ation. For the critical section problem, we need Lo use only one semaphore as a lock, a process is inside or outside its critical secbecause the only corlcern is wl~ell~er tion. However, barrier synclu.onizatinn requires using two se~naphoresas signals, because wc need to know each time a process arrives at or departs from I-he barrier. A sigrtalirzg semaphore s is one that is (usually) inilialized to 0 . A process signals an event by executing v ( s ) ; other processes wait for lllat event by executing P ( a ) . For a two-process barrier, rl~etwo significant events are the processes arriving at the barrier. Hence, cve can solve the proble~nusing two sigrlalil~g semaphores, arrive1 and arrive2. Each process signals its arrival by execuliag a v operation os its own semaphore, then waits for the other process to arrive by executing a P operation on the other PI-ocess'ssemaphore. The sol~~cion is shown in Figure 4.2. Because bairier synchronization i s symmetric, each process taltes the same accions-each j u s l signals and waits on different scmnphores. When semaphores are used in this way, they are much like Rag variables. and rlieir use follows [lie Flag Synchronization Principles (3.14).
I
I
'
,d
j.
'
, .
,,
;).
I
:: ,: /
.
i:l
\a :
:&
;I
,
., L. J
I
I
I 11
i
158
Chapter 4
Semaphores sem arrive1
=
0, arrive2 = 0 ;
process Worker1 {
.- .
V(arrive1) ; P(arrive2);
/ * signal arrival / * wait for other process
*/ */
process Worker2 I -.V (arrive2); / * signal arrival */ P(arrive1); / * wait for o t h e r process * /
Figure 4.2 Barrier synchronization using semaphores.
I
I
1
We can use the two-process ba~ricras shown jn Figure 4.2 to i~uplementan we call n-process butterfly barrier having the s k u c t u ~ - esllown in Figure 3.1.5. 01use the same idea to implernenl at1 n-pxoces~disserxliriation hari-i.er having [he structure shown in Figui-e 3.16. In boLb cases. we would employ an array or arrive semaphoces. At each stage, a process i first signals its arrival by execuling arrive [i] ) , then wails for another process by cxccuting P 011 that pmcess's instance of arrive. Unlike the situarion with flag variables, only one atmy or arrive se~napl~ores is required. This i s because v operatious are "remelnbered", whereas the value of a flag variable might be overwritten. Alternatively, we can use semaphores as signal flags to ilnplcment npl-ocess barrier synchrc>nizarjoousing a central coordinator process (Figure 3.12) or a cor~lbiningtree (Figill-e 3.14). Again. bccause v operations are i.emcmbered, we woi~ldneed to use fewer semaphores lhan flag variables; for example, only one semaphore would be needed ,Poor the Coordinator in Figure 3.12.
4.2.3 Producers and Consumers: Split Binary Semaphores This sec~iollreexamines the producers/consurne~~s problem introduced in Seclion 1.6 and revisjted in Scclion 2.5. There we assi~~ned that the]-e was one producer and one consumer; here we consider the general situation in which there are tnulciple producers and consumers. The solulion illustrates anorlier use of scmapl1ore.s as sigualing flags. It also inrroduces the irnpo~fanlconcept of a split binary setnaphol-e, which provides another way to protect critical sections OF code.
4.2 Basic Problems and Techniques
I
i
159
In the producersJcons~~rners problen~.producers send messages that are received by consumers. The processes communicate using a single shared buffer, which is manipulated by two operations: deposit and fetch. Producers insel-r messages inla the buffer by calling deposit; consumer5 receive messages by calling fetch. TO ensurc chat messages arc: nor overwritlen and that they are ~ x e i v e donly once, execution of deposit and fetch must alternate, with degosit executed first. The way to program the required alternation property is again to use signaling semaphores. Such sc~naphorescan he used either 10 inclicare when processes ~ ~shared s varireach crilical execillion points or lo indicate changes to (he s l a ~ of ables. Here, the critical execution points are starting and ending deposit and fetch operations. The correspondiiig changes to h e sharecl buffer are its becoining empty or full. Became there might be multiple producers or consurners, il is simpler to associale a semaphore with each of the two stales o f the buffer rathcr than the execution points of h e processes. Let e m p t y and f u l l be two semaphores ind.icating whether the buffer is cnlply or full. Initially, the buffer is empty so the initial value of empty is 1 (i.e., lhe "make the buffer en~pcy"evenl Ilas already occurred). The initial value of f u l l is 0. When a producer wants to execute deposit, it I U U S ~first wail for che buffer to be empty; oiler a producer deposits an item, he buHcr becomes full. Similal-ly, when a consumer wants to execute f e t c h , it must first \riait fo1- the buffer to be full. and ihen it inakes the buffer empty. A process waits for an event by executing P on the appropriate senlapho~eand signals an event by executing v, so we get [he solution shown in Figure 4.3. In Figure 4.3, empty and f u l l are both binary semaphores. Togcthcr they form what is called a split hinan, sen7a.pliore, because at most one of empty or f u l l is 1 at a time. The term split binary semaphore comes from the fact that empty and f u l l can be viewed as a single binary semaphore that has been split into two binary semaphores. In general, a split binary semaphore can be formed from any number of binary semaphores. Split binary semapho~esare important because they can be used as follows to isnplement inutual exclusion. Suppose that one of the binary semaphores has an initial value of 1 (hence the others are initially 0). Furlher suppose thac, in the processes that use the semaphores, every execution parh starts with a P operation on one of the semaphores, and ends with a v operation on one of the semaphores. Then all statements between ally P and the next v execute with mutual exclusion. In parlicular, whenever any p~ocessis betweerl a P and a v, the semaphores are all 0, and hence no other process can completc a P unril the first process executes a V. The solution to che producers1consumers probleln io Figure 4.3 illustrates this use of split binary semaphores. Each Producer alternately executes
i
1 A
,.f/
.!) ..
.
;
.I,, ,,, , , 8
,
.
., ,
., $1
!( ,
I., ,- ,
1 1
160
Chapter 4
Semaphores
/ * a buffer of some type T aem e m p t y = 1, f u l l = 0;
t y g e T buf;
p r o c e s s Producer [i = 1 to MI
while ( t r u e )
*/
{
{
-. / * produce data, then deposit it in the buffer * / P (empty); buf = data; V(flll1);
1 process Consumer [ j = 1 to N] { while (true) ( / * fetch result, then consume it * / P(ful1) ; result = buf; vternpty) ; .- . }
1 Figure 4.3
Producers and consumers using semaphores.
then v ( f u l l ) , and each Consumer alxernately cxecules e ( full ) then v (empty). I n Section 4,4, we will use chis property of split binary seinaphores to constriict a general method for implementing await stalements. P ( empty)
4.2.4 Bounded Buffers: Resource Counting The lasl exa~npleshowed how to synchronize access to a sjngle communication bufl'eer. 1f dala are p~oducedat ;ipproximately the same rate ax which they are consumed, a process would not generally bave to wait very long lo access the hingle buffer. Commouly however; producer and consumer execulion is bursty. For example, a producer might produce several items in quick succession, then do more computalion before ptoduciizg anotller scl o l items. In such cases, a buffer capacicy larger' than one can significantly increase performance by reducing the number of rimes processes block. (This is 2111 example of Llle classic ti me/space tratleocf in computing.)
4.2 Bas~cProblems and Techniques
161
Were we develop a solurion lo what is called the bourlci~dbuffer probb?~. In particulu-, a bounded b11:Keris a niultisloi communication but'Fer. Thc solution builds upon the solution in lhe previous section. It also illustrates the ~ s 0 1e general semaphores as resource counters. Assume for now that tbere is just one producer md one consumer. The producer deposits messages in a shared bu lfer; the consumer fetches them. The buffer contains a queue of messages that have beell deposited bu~noc yet fetched. This queue call be represented by a liliked list or by an array. LVe will clse the array representation here, because i t is si ~nplerto prograo,. Jn particular, let (he buffer be represented by buf [n].where n is sreater than 1. Let f r o n t be the index of the message at the front of the queue, and let rear be rhe index ol the first empty slot past the lnessage at the real of tbe queue. Initially, f r o n t and r e a r are set to tlie same value. say 0. With d~isrepresentation for the buffer, the producer deposits a message with value data by executing buf [rear]
=
data; rear = (rear+l) % n;
Similarly, the consumer fetches a message into its loci~l vcl~iableresult by executing result = buf [front]; front = (front+l) % n;
The modulo operator ("a) is used t o ensure Lhac the values of front and rear are always between 0 atld n-1. The queue of buffered messages is thus stored in slors from buf [ f r o n t ] up to but no1 includir~gbuf [rear]. with buf treated as a circular m a y in which buf [ O ] follows buf 111-11.As an example, one possible configuration of bufis shown below: buf
4
rear
. .. I I.../ i 1 + front
The shaded slots are full; tlie blank ones are empty. When there is a s i ~ ~ g buffer-as le i n thc pl-oducecs/consurne~~ problc~nexecution of deposit and fetch nlust alter~late. Wllen there are rnultiple buffers, deposit can execule whenever tbere is an enlply slo~,and fetch can execute whenever there is a stored message. In fad. deposit and fetch can execute concu~~enlly i f there is both an empty slot and a btorcd message, bccause deposit and fetch will thcn access differen1 slots, and hence they will not interfere wit11 each other. tlowever, the synchronizi-ltion requirements are iderr~iculfor both a single-slot and a bounded buffer. It1 particular, the
162
Chapter 4
Semaphores t y p e T buf [nl; / * an array of some type T * / int f r o n t = 0, rear = 0 ; sem empty = n, full = 0 ; / * n-2 c = empty+full <= n
process Producer
*/
(
w h i l e (true) (
...
produce message data and deposil i t in the buffer; P(ern~ty);
but [rear] = data; rear = (rear+l) % n; V(ful1);
1 1
process Consumer ( while ( t r u e ) ( fe'etch message result and colzsume it; P(ful1); result = buf [front] ; front = ( f ront+l) % n; V(empty) i
...
1 b'
Figure 4.4
Bounded buffer using semaphores.
use of (he P and v operalions is lie same. The ox~lydi CCeience is that sernapllore e m p t y is illitialized lo n rather. than 1, as there are i~~ilially n empty slols. The solution is shown in Figure 4.4. In Fi.gurc 4.4. the sen~aplioresserve as resonrce cou~zrers:each coimts the number of units of a resource. I n this case, e m p t y counts the nurnber o l empty buffer slots, and full counts the number OF full slots. When neilher plucess is executing deposit or fetch, Lhe sun1 of the values of the rwo setnaphores is n. the total nurnber of buffer slots. Resource-counling semaphores are uscful whenever processes compete for access to ruulriple-unil resources such as buffer slots or memory blocks. We have assumed that there is only one producer and only ocle consu~nerin Figure 4.4: this ensures that deposit and fetch execute as atomic actions. Suppose, however. that there are two (or more) producers. Then each could be cxecu611g d e p o s i t at the same lime, assuming there are at least two empty slots. In lhal case, both producers co~lldtry to deposit their message into the same. slot! (This would happen if both assign to buf [rear] before either
4.2 Basic Problems and Techniques
163
incremenrs rear.) Similarly, if there are two (or more) consumers, boll1 could execute fetch at the sarxle time and retrieve the same message. In short, deposit and fetch become criljcal sections. Both ope]-ationsmust be execu red with lnulual exclusion-but they can execute concurrently will) each other because empty and full are used in such a way that produce]-sand cotlsulners the required exclusion using the access differem buffer slots. We can i~npleme~lr solution to the critical section problem slzo\vn iu Figure 4.1, with separate semaphores being uscd to PI-olecteach critical section. The co~npletesolution is shown in Figure 4.5. We solved the two prohlems above separate]y-first the synchronization between one producer and one consumer, then d ~ esynchconiza~ionbetween
[nl ; / * an array of s o m e type T * / i n t front = 0, rear = 0; sem empty = n, f u l l = 0; /* n-2 <= empty+full <= n * / s e m mutexD = 1, rnutexF = 1; /* for mutual exclusion */
t y p e T buf
process Producerri = 1 to MI while ( t r u e ) {
{
. ..
prodc~cemessage data and deposil i l in the buffer; P(empty1; P (mutexD); buf [rear] = data; rear = (rear+l) % n; V(mutexD); V(ful1) ;
1
I process Consumer[j = 1 to Nl [ while (true) ( fetch message result and consume i t ; P(fu1l); P(mutexF); result = buf [front]; Front = ( f r o n t + - 1 ) % n; V(mutexF) ; V(ernpty) ;
Figure 4.5
Multiple producers and consumers using semaphores.
164
Chapter 4
Semaphores
multiple PI-oducersand tnultiple consumers. This made il easy lo co~nbjnccl~e solutions t o the two subproblems to get n solution to h e fi111problem. 'U'e will use Lhe same idea in solving the readedwriters proble~n in Section 4.3. I n geneml, whetlever tllel-e a]-e multiple kinds of synchronization, it is uselill to imple.meot them separately and then to combine [lie solutions.
4.3 The Dining Philosophers The last section showed how ro use semaphores to solve Ihe crilical section problem. This section and the ncxl build upon that soluf.on to implenjent selective forms 01 ~ n u t ~ exclusion ~al in rwo classic synchronization problems: tlic dining philosophers and readers and writers. The solution to the dining philosophers illustrates how ro implement mutual exclusion belween processes Lhat compete for access ro overlilpping sets of shared \lariables. The solucio~ito the rcaders/wriLers problem illustrates how to implement a combination of concurrent and exclusive access to shared variables. The exercises contain additional examples or selective niutual exclusion problenis. Although the dining philosophers problcrn is more whimsical chmi practical, i t is similar to realistic proble~nsin whjctl a process requires sirnultnneoi~saccess to mo~-ethan one resoul-ce. Cousequently, the PI-oblemis often used lo illustrthte and compare difkre~ltsynchronization mechanisms. (4.1) Dining Philosophers Problem. Five philosophers sit around a circular table. Each philosopher spends his life alternately tllillking and eating. In the center of the table is a large platter of spaghetti. Because the spaghelri is long and tangled-and the philosophers are noc ~nechanicallyadept-a philosopher must use two h r k s to eat a helping. Un.l'o~tunately.the philosophers can afford only five forks. One fork is placed between each pair of philosophers, and they agree that each will irse only the forks to the imrnediace left and rigbt. The problem is to write a program to simulate the behavior of (lie philosophers. The program must avoid the 11nCorrunate (and ttventually fatal) sieuation in which all. philosophers are hungry buL none is able to acquire both forks-for example, each holds one fork ancl refuses to give it up.
The setting for Lhis problem is depicted in Figure 4.6. Clearly. two neighboring philosophers cannot eat at the same time. Also, with only five forks, at most two philosophers a1 a time can be eating. We will silnulate the actions oC the philosophers as showtl below. We assume thal the Lengths of the thinking and eating periods vary; this could, for example, be simulated in a program by using random delays.
4.3 The Dining Philosophers
1 65
Philosopher2
0 '? a Philosopher,
0
Philosop hero
Figure 4.6
The dining philosophers.
process PhilosopherLi = 0 to 41 while ( t r u e ) i
{
think: accli~ireforks;
eat; release forks: 1 ).
SuJving [he problem r e q i res programming the actions "acquire I'orks" and "release forks." Because the forks are the shared resource, we will focus upon acquiring and releasing them. (Allenialively, we coultl solve the problem by considerjng whethcr or not philosophers are eating; see the exercises a1 the entl O F rhis chapter.) Each fork i s like a critical section lock: i t can be held by a1 [nost one philosopher a1 a time. Hence, we can represent [he forks by an array o f semaphores i~~ilializecl lo ones. l'ickjng up a fork is chcn si~nulatedby executing a P operation 011 the appropriate semaphore, and putting down a fork is simulated by executjng a v operation on the senlaphore. Tlie processes are essentially identical, so jl is natusal to think of having them execute identical actions. For example, we could have each process first pjclc up its LcL f fork and the11 its right fork. However, this could Lead 10 deadlock. In parlicular, suppose all philosophers have picked u p their left fork; then all would wait forever try; ng to pick u p their right fork. A necessary condition for deadlock is that there is circular waitic~g-i.e.. one process is waiting for a resource held by a second, which is wailing for a resource held by a ~hjrd,and so on u p to some process that is waiting For a resource held by the first process. Thus, to avoid deatllock it i s su-Fficienl t o
166
Chapter 4
Semaphores sem fork151
=
(1, 1, 1, 1, 1);
process Philosopher[i = 0 to 31 { while ( t r u e ) ( P(Eork[i] ) ; P(fork[i+ZI ) ; # get l e f t fork then right eal; V(fork[i] ) ; V(fork[i+l]);
think; }
1 process Philosopher [ 4 ]
{
while (true) { P(forktO1); P(fork[4l); eat; V(fork[O]); V(forkl41);
# get right fork then l e f t
think; 1
1 Figure 4.7 Dining philosophers solution using semaphores.
cnslirc that c i r c i i l ~wailing cannot occur. For this problem, we can have one of h e processes-say h e last one, Philosopher 141-pick up i t s right fork first. Figure 4.7 gjves tl,~issolution. Alternatively, we could have odd-nurnbered philosophers pick up forks i n one order and even-numbered philosopl~erspick them up in tbe other order.
4.4 Readers and Writers The readers and writers is another c1assi.c synchronization problem. Like the dirling philosophers, it is often used Lo compare and contrast sy~~chronization ~nechnnisms. It is also an eminently practical problem. (4.2)
ReadersIWritei-s Problem. TWO kinds of processes-readers and writers-share a database. Readers execute Lra11sactio)ls that examine database records; writers execute transactions that b o t l ~examine and update the database. The database is assumed initially to be in a consisrent state (i.e.. one in which relations between data are mcani.ngfu1). Each transaction, if execuled in isolation, transforms tlie database Rom one consislent stale lo another. To preclude iriterfere~lcebetween transactions, a
4.4 Readers and Writers
167
writer process rnusr have exclusive access lo the database. Ass~~ming no writer is accessiug the database, any number of readers may concurrenrly execi~tetransactions. The above definition implies a shared database, but this could also he a shared file, linked list, table, and so on. The readcrs/wrilers problem is another example of selective mutual exclusion. I n the dining philosophers problem, pairs or processes compete for access to forks. Here, classes of processes compete for accehs to the dal-abase. 1.n particulal-, reader processes compele with writers, and individual wrjter processes compete with each other. Readers and writers is also an example o l a general cotidition synchrot~izalionproblem: Reader processes musl wait i i n t i l no wrilers are accessing the dalabase; writer processes must wait until 110 readers or other writers are accessi~iglhe database. This section develops two distinct solutions lo the readersfwriters problc~n. The first approaches i t as a mutual exclusion problem. This solutjo~lis short and quite stra~ghtForwardlo develop. 1-Iowever, it gives preference co readers over writers-as explai~ledFully below-and i t cannot easily be ~nodifiecl-for example, to be fair. The second solution approaches readcl-slwriters as a condilion sytlchronization problem. This solution is longer and appears LO be Inore complcx, but it too is quite easy lo clevelop. Morcovei-. (he second solution car1 readily be modified to implement different scheduling policies between readers and writers. Mosl importantly, the second solulion introdiices a powerful programming technique we call pursirlp the btrtot~,which can be used Lo solve uny coriclilion syt~cluonizationproblem.
4.4.1 ReadersNriters as an Exclusion Problem Writer processes need mutually exclusive access to the database. Keader processes-as a group-also need exclusive access with respect to any writer exclu~ionprobIe~nis process. A useful way ro approach any selective ~ni~tual lirst to overconstrain the sol ulion-j tnplementing rnore excli~sion than is required-and then to relax the consuaincs. In particular, start by treating the probtern as an instauce of Ihe critical seclion problern. 1-lere, the obvious overcotlst~airltis ro ensure that each I-eade~ and write~-process have exclusive access to the database. Let m be a rnu~ualexclusion semaphore-and hence let i t be initial.izcd to 1. The result is the overcoostrained solution shown i n Figure 4.8. Now consider how to relax the solulion i n Figure 4.8 so rhaL I-eacler processes can execute concul-renlly. Readers-as a group-need to lock oul writers. bu~ only the jnsi reader needs to grab [lie mutual exclusjon lock by execuling
168
Chapter 4
Semaphores
process Reader [i = 1 to MI { while (true) [
.- . P(rw);
# grab exclusive access lock
read rhe database; v(rw); # release t h e lock 3
1 process Writer [ j = 1 to Nl while (true)
E
...
P (rw);
# grab exclusive access lock
write the database; V(rw) ;
# release the lock
1 1
Figure 4.8
An overconstrained readerstwriters solution.
P ( rw). Subsequent readers can directly access the database. Similarly, when a rcader finishes, it should release the lock only i f it is the last active readet. This
leads Lo the solurion outline shown in Figure 4.9. In Figure 4.9 variable n r counts the number of active readers. The entry protocol for reader processes firs1 increlnents n r and then 1est.s whether Ole value o.f n r is I. The incrernenl and test have ro be executed as a critical section to avoid inlei-hence with other reader processes; hence, we have used angle brackels lo specify the atomic action in the entry protocol for readers. Similarly, the decrelnenc and test of n r in the reader's exit protocol have to be executed acornically, so the exil protocol is also enclosed by angle brackets, To refine Lhe solution outline in Figure 4.9 into a complete solution using semaphores, we merely have to implement the atomic actions using semaphores. Each is just a critical section, and we saw how to iniplemenr critical sections j n Figure 4.1. Here, let mutexR be a se~naphoreused to provide mutual exclusion between reader processes. It is used as shown in Figure 4.10, which provides a co~nplelesolution to (he reaclerslwriters problem. Note that mut exR is initialized to 1 , thac rhe star1 ol' each atomic actiou is ilnplernented by P (mutex~) , and that t11c end of each alomic action i s implemented by v(mutexR). The algorith~nin Figure 4.10 implements what is called a rraders' preference solution to ~ h creadersJwriters problern. i n particular, if soine reader is
4.4 Readers and Writers
int nr = 0; s e m rw = 1;
# number of active readers # l o c k for reader/writer exclusion
process Reader[i = 1 to MI ( while (true) (
... ( nr = nr+l; if (nr == 1 ) P(rw);
# if f i r s t , get lock
read the database; { nr = nr-I; if (nr == 0) V ( r w ) ; # if last, release lock
1 1 process Writer[ j = 1 to N1 while (true) (
(
.. .
P ( w ) ;
write tJle database; V(rw) ;
1 1
Figure 4.9 Outline of readers and writers solution.
accessing the databasc and both another reader and a writcr arrive 21 their enlry pcotocols, the11 rile tlew reader gets preference over the writes. Hence. this solution is not fair, because a continual stream of readers can permanently prevent writers From accessing tlie database. Tt i s quite difficult to modify the solution in Figure 4.10 Lo make i t tiair (see the l-iistorical Notes), but below we develop a difl'erent solution [hat can readily be changed inlo vtle that is fair. J
4.4.2 ReadersNriters Using Condition Synchronization We approached the above xeaders/wrjters problem as an exercise in mutual exclusion. The focus was on ensuring h a t writers exclucled each orl~er,and that readers as a class excluded writers. The resulting solutio~i(Figure 4.10) thus consisted of overlapping solutions to two critical sectjon problems: one belween readers and writers-for access to the d a t a b a s e a n d one between readers-for access to variable nr.
i
',
,)
t l l
170
Chapter 4
Semaphores int nr = 0; s e m rw = 1; sem mutexR = 1;
# number of a c t i v e readers # lock for access to the database # lock for reader access to n r
process R e a d e r [i = 1 to ml while (true) {
I
... P (rnutexR); nr = nr+l; if (nr == 1) P(rw); V(mutexR);
reacl the database; P (mutexR); nr = nr-1; if (nr == 0) V(rw); V (mutexR)
# if f i r s t , get lock
# if last, release l o c k
;
1
I process Writerrj = 1 to n] ( while (true) (
.- .
P(rw);
write the database; V(rw) ; }
1 Figure 4.1 0
Readers and writers exclusion using semaphores.
We now develop a different solution to Lhe problem by starting froin a different-and simpler-specification of the required synchronization. The new solution introduces a general programming technique called pa.rsing zhe baton, which employs split binary semaphores both to provide exclusion and to signal delayed processes. The technique of passing the baton can be used to implement arbitrary await statements and thus to implement arbitruy coadition synchronization. The technique can aiso be used to control precisely the order in wh,ich delayed processes are awakened. As defined in (4.2), readers examine a shared database, and writers both exaini~ieand alter i r . To preselve database consistency, a writer requires txclusive access, but any number of readers [nay execute concurrently. A simple way Lo specify this synchronization is Lo count the number of each kind of process trying to access the database, then to constrain the values of the counters. In
4.4 Readers and Writers
171
particular, let nw and n w be nonnegative inlegers tbal respectively record h e number ol-' readers and wrjrers accessjry he dn~abase. The bad stales to bc avoided are ones in which both n r and nw ilre positive or in which n w is greater than one: (nr > 0
A
nw > 0 ) v nw > 1
The comple~nenraryset of good states is thus characterized by the negation of the above preclicale. which simplifies to (nr == 0 v nw == 0 )
RW:
A
nw < = 1
'The first cerrn says readers and writers cemnot access the database at {lie salne Lime; the second says Lhere is at (nost one active writel-. With this specjficnlion of Ihe problem, an oulJi11eo.F Lhe Ley part of reatb- processes is
(nr = nr+l;) read (he database; (nr
=
nr-I;)
The cosrcspondi~~g outli~lefor writel- PI-ucesscsis (nw = nw+l;)
write the dalabase; ( n w = nw-1;)
To refine this outline into a coal-se-grained solution, we need lo guard [he assignments Lo the shared vari.ablesto etisure rhat p~.cdicateRW i s a global invariant, In reader processes, this means guarding [be increment of n r by ( n w == O), because if n r is going to be incrernented, then nw had berler be zero. Tn writer processes, the required guard j s ( n r = = 0 and n w == O), because if n w is going Lo be incremented, both n r arid nw had better be zcro. We do not, however, need to guard either decrement, because it is never nccessnry 1.0 delay a process that is giving up use of a resource. Inserling the required guards yields the coarse-grained solutio~lin Figure 4.1 1 .
4.4.3 The Technique of Passing the Bafon . Sometimes, await stalernents can be implemented directly using semaphores or other priini tive operations. However, in general they cannot. Consider the two guards in the await statements in Figure 4.1 1 . They overlap in that the guard in the entry protocol for writer processes requires rhat both nw and n r be zero, whereas the guard in the entry protocol in reader processes requires only that nw be zero. No one semaphore could discriminate between these conditions, so we
I
172
I
Chapter 4
Semaphores i n k nr = 0, n w = 0; ## RW: (nr == 0 v nw == 0 )
A
nw <= 1
process Reader [i = 1 to m] while (true) {
-.(await (nw == 0 ) nr = nr+l;) read the database; (nr = nr-1;)
1 1 process Writer[j = 1 to nl {
while (true) .
a
{
.
(await (nr == 0 and nw == 0 ) n w = nw+l;) write h e database; (nw = nw-1;)
1 1 1
Figure 4.1 1
A coarse-grained readerslwriters solution.
recluire a gcneral technique for implen~entingawait state~nentssuch as these. The one introduccd here i s called passOzg thp baton, for reasons explained below. A s we shall see, this rechnique is powerful enough Lo implement any await staletneot. There are h u atomic ~ slaternents jn Figure 4.1 1. The first ones in each of the reader and wriler processes have the form (await (B) s ; )
where, as usual, B siands for a Boolean expression and s stands for a statement list. Tlle last atomic statements in the processes have the form
I
As observed in Section 2.4, all condition synchronization can be represented by the first fonn, and the second is merely an abbreviation for a special case of the first in which ;he guard B is the constant true. Hence, if we know how to jmplemen t await slatements using semaphores, we can solve any condition synchronizat.ion problem. Split binary semaphoi-es can be used as follows to impleinenr the await sralelnents i n Figure 4.1 1 . First, let e be a binary semaphore whose initial value
4.4 Readers and Writers
173
is one. 11 will be used to control entry illto each of the atomic statements, hence the e for ''entr-y." Second, associate one sernaplzore and one counter wjlh each differenr guard B; each of these semaphores and coilnters is jnitialjzed to ze~'o.The semaphore will be used to delay processes that have to wait for the g u a ~ dto become true; the counter will record rile number of delayed processes. Above, we have two different guards-jn the entry protocols for each OF the readers and writers-so we need two semaphores and Lwo counters. Let r be the semaphore associated with the guard in reader processes, and let d r be (he associated number of delayed readers. Similarly, let w be Lhe semaphore associated with the guard in wrirer processes. and l e l dw be the number of delayed writers. InitiaJly, no readers or writers are waiting for entry, so all of r. dr, w, and dw are zero. The three semaphores-e, r, and w-and the two delay counters-dr and dw-are then used as shown in Figure 4.12. The comments in the code indicate how the coarse-grained atomic stateme~irshorn Figure 4.1 1 get implemented. The code labeled SIGNAL i s used lo exit each of the atomic statements. I t is an abbreviation f o the ~ following: if (nw == 0 and dr > 0 ) { dr = dr-I; V ( r ) ; # awaken a r e a d e r , or
I elseif (nr = = 0 and n w == 0 and dw > 0) ( dw = d w - 1 ; V(w); # awaken a writer, or 3
else V(e);
# release t h e entry lock
The role of the SIGNAL code is to signal exactly one of the three semaphores. In particular, if here are no active writers and there is a delayed reader, then a reader can be awakened by execuling V ( r ) . If there are no active readers or writers, and there is a delayed writer, then a writer cat1 be awakened by executing v (w). Otherwise-i.e., if there is no delayed process char can safely proceedsignal the entry semaphore by executing v ( e). The three semaphores in Figure 4.12 form a split binary semaphore, because at most one of the semaphores at a titne is 1, and every execution path starts with a P and ends with a v. Hence, the statements between every P and v execute with mutual exclusion. The synchronization invariant RW is true i tiitially and just be-fore each v operation. so it is true whenever one of the semaphores is 1. Moreover, each guard B is guaranteed to be true whenever the statement it guards is executed. This is because either the process checked B and Found it to be true, or the process delayed on a sernapl~orethat is signaled only when B is true. Finally, this code transformation does not i~~troduce deadlock since a delay
,.
,'J
.
174
Chapter 4
Semaphores int nr = 0 , nw = 0 ; sem e = 1, r = 0, w = 0;
int dr = 0, dw
=
0;
## R W :
(nr = = 0 or nw == 0 ) and nw <= 1
# controls entry to critical sections # used to delay readers # used to delay writers # at all times 0 <= (e+r+w) <= 1 # number of delayed readers # number of delayed writers
process Reader [i = 1 to MI t while (true) { # (await (nw == 0 ) nr = nr+l;) P(e); if (nw > 0 ) ( dr = dr+l; V(e); P ( r ) ; ) nr = nr+l; SIGNAL; # see text for details
read Llle database; # (nr = nr-1;) P(e); nr = nr-1;
SIGIVAL; 1 1 process Writer[ j = 1 to N] ( while (true) { # (await (nr = = 0 and nw == 0 ) nw = nw+l;) P(e); if (nr > 0 or nw > 0 ) [ dw = dw+l; V(e); P ( w ) i ) nw = nw+l;
SIGNAL; write the database; # (nw = nw-1;) P(e);
nw = nw-1;
SIGNAL; 1 1
Figure 4.12
Outline of readers and writers with passing the baton.
4.4 Readers and Writers
175
semaphore is signaled only if sorne process is wailing on, o r is about to be waitirlg on, the scmapl~ore. (A process c o ~ ~ have l d incremented a delay councer and execuccd V ( e ) bur might not yet have executed the P opexation on the delay semaphore.) This progra~nming technjque is called p~~.ssin.g t11.ehalon becat~seof the way 111 which semaphores are signaled. W l ~ e ~ a iprocess is execi~tjngwithin a crirical section, thi~rkof i r as holdjng a baton chat signifies permission to execute. When chat process reaches a SIGNAL code i'r'ragnlent, it p a s m the baton lo one other process. If some process is waiting for a condilion that is now true, the balon is passed ro one such process, which in LLIr1i execi~tegits critical section and passes the baton to another process. When no process i s wai ti tlg for a conclition that is true, 111e baton is passed to the next process that tries to enler thc crilical section t that execules P (e). for the first time-i.e., the ~ e x process In Figure 4.12-and in general-many of the instances of the SlGNA L code can be simplified or eliminated. For example, in reader processes, n r is positive and nw is zero before execu tion of the firs1 instance of S i G N A L i . e . , the inslance at the end of the enwy protocol in readers. Hence, that signal fragment can be simplitied LO if (dr > 0) t dr = dr-1; V ( r ) ;
else
1
V(e);
Before the second inslance of SIGNAL in readers, both nw and d r are zero. In writer processes, n r is zero and n w is positive before the SIGNAL code a[ rhe end of the writer entry protocol. Finally, both nr and nw are zero before the final instance of SIGAIAL in writer processes. Using these facts to silnpl ify the signal prolocols, we get the final passing-the-baton solution shown in Figure 4.13. In that figure, when a writer finishes, if there is more than one delayed reader and one is awakened, the others are awakened in cascading fashion. In particular, the first reader increments nr, then awakens the second delayed reader, which increments n r and awakens the rhird, and so on. The baton keeps getting passed froin one delayed reader to another until all are awakenednamely, until d r is zero. Also i n Figure 4.13, Ihe last i f slatenlent it1 writers first checks for delayed readers, then jt checks for delayed writers. The order of these checks could safely be switched, because if both kinds of processes are delayed, either could be signaled when a writer finishes its exic protocol.
Alternative Scheduling Policies TI-~ereaclersiwriters solution in Figure 4,13 is certainly longer than the one in Figure 4.9. However, it is based on repeated application of a simple principle-
176
Chapter 4
Semaphores int nr = 0 , nw = 0 ; sem e = 1, r = 0, w = 0;
int dr = 0, dw = 0 ;
# # K W : (nr==O or nw==O) and nw<=l # # # # # #
controls entry to critical sections used to delay readers u s e d to delay writers at all times 0 <= (e+r+w) <= 1 number of delayed r e a d e r s number of delayed w r i t e r s
process R e a d e r [ i = 1 to MI { while ( t r u e ) ( # (await ( n w == 0 ) nr = nrcl;) P(e);
if (nw > 0) ( dr = dr+l; V ( e ) ; P(r); ) nr = x+l; if (dr s 0 ) { dr = dr-1; V ( r ) ; ) else V ( e ) ; read the database; # (nr = nr-1;) P(e); nr = n r - 1 ; i f (nr = = 0 and dw > 0 ) ( dw = dw-1; V(w); 1
else V ( e ) ;
process Writerlj = 1 to N] { while (true) { # (await (nr == 0 a n d n w == 0 ) nw = nw+l;) P(e); if (nr > 0 or nw > 0) ( dw = dw+l; V ( e ) ; P(w); nw = nwcl;
v(e); wrire the database; # (nw = nw-1;) p(e); nw = nw-1; i f ( d r z 0 ) ( dr = dr-1; V ( r ) : ) e l s e i f ( d w > 0) ( dw = d w - 1 ; V(w); ) else V ( e ) ; 1
1 Figure 4.13
A readerslwrilers solution using passing the baton.
)
4.4 Readers and Writers
177
always pass the mutual exclusiotl baion to one process at a time. Lilce the solution in Figure 4.9, the one in Figure 4.13 also gives readers preference over writ-
ers. But because we can control how the baton is passed, we can readily modify the solution in Figure 4.13 to schedule processes in other ways, For example, to give writers preference, it is necessary Lo ensure that new readers are delayed if a writer is waiting, and a delayed reader js awakened only if no writer is cumently wnifir~g. We can meet ihe first requirement by strengthening (he delay condition in the first i f srateinenl in readers:
We call meet the second requirernelit by switching Lhe order of the f rst two anns of the i f statement in writers: if (dw > 0) { dw = dw-1; V(w); ) elseif (dr > 0 ) { dr = dr-1; V ( r ) ; else v ( e ) ;
)
This awakens a reader only when no writers are known to be waiting; that reader can in turn awaken anolher reader and so on. (A new writer could ankle, but unlil it gets past the entLy ssanaphore, no other process can know that i r has m ived .) Neither of the above changes alters the structure of the solntion. This is a viitue of che passing-the-baton technique: Guards can be manipu.lated to aller the order in which processes are awakened without affecting the basic correctness of the solulion. We can also alter the solurion in Figure 4.13 to ensure fair access lo the database, assuming the semaphore operations u e themselves fail-. For cxa~nple, we could force readers and wrilers to alternate t u n ~ swhen both are waiting. To i rnplemeul this solulion, we need to delay a new reader when a writes is waiting; delay a new writer when a reader is \vaitiag; awaken one waiting writer (if any) wlieo 3 reader finishes; awaken all waiting readers (if any) when a writer finishes; otherwise awaken one waiting writer (if ally).
We can delay new readers as shown above; we can delay new writers in a similar way. The program in Figure 4.13 already meets the last two reclujrements. Again the structure of rhe solution is unchanged.
178
Chapter 4
Semaphores
This technique of passing rhe baton can also be used to provide finergrained control over the order in which processes use resources. The nest section illustrates this. The only thing we cannot conLro1 js the order in which processes delayed on the entry se~naphoreare awakened. This depends on the l~nderlying implementation of semaphores.
4.5 Resource Allocation and Scheduling Resource allocation is the problem of deciding when a process can be given access to a I'esource. Jn concurrent programs. a resource is anything that a process might be delayed waitillg to acquire. This inclucles enlry to a critical section, access to a database, a slot in a bounded buffer, a region of memory, use of a prj n ter: and so on. We have alreatly examined several specific resource a1location problems. In most, h e simplest possible allocation policy was employed: If some process is wailing and the resource is available, allocate il. For example, tile solurion to the critical section problem in Section 4.2 ensured thal some waiting process was given pe~~nission to enter; it did rlot attempt to control which process was given per~nissionifthere was a choice. Simjlarly, the soludo~ito the bounded bi~ffesproblem in. Section 4.2 made no attemp1 to control which producer or which consurner next got access to (he buffer. The only more co~nplex allocatio~ipolicy considered so far was in the readerslwriters problem. I-lowever, our concern rllere was giving preference to classes of processes, not to individual processes. general resource allocatioxl policics This section shows how to in~plemet~t and in particular shows how to control explicitly which process gets a resource when mole than one is waiting. First we describe the general solutjon pattern. Tlien we i~rpleinent.one specific allocation policy-s11orl.est job next. The soluliori ernploys the technique of passing rhe baton. I t also introduces the concept of private semaphores, which provide the basis for solving other resource allocation problems.
4.5.1 Problem Definifion and General Solution Pattern In any resource allocation yroblan, processes compete
€01. use of units of a shared resource. A process requests one or rnore units by executing the request operation, which is often implemented by a pl-ocedure. Parameters to request indicate how many uruts are required, identify ally specia1 characteristics such as the size of a ruemory block, and give rhe identity of the requesting process. Each unit of the shared resource is either free or in use. A requesl can be
4.5 Resource Allocation and Scheduling
179
salisfied when all the required units are free. Hence request delays until enough units are free, then returns h e requested number of imits. After using allocated resources, a process returns rliern to the free pool by executing the release operation. Parameters to release indicate the identities of Ihe units being returned. A process can re turn resources in different amounts ,and diffelenl orders thitn it acquires them. Ignoring Ule representatjon of resource units, the request and release operations have the following general o~~lline: request (parameters) ; (await (request can be satisfied) take units; ) re lease ( parameters ) :
(return units; ) The operations need to be atomic since both need to access the representation of resource units. As long as this repcesentarioll ~ ~ s variables cs dil'krenl from others in tl~eprog~arn,tlx operations will appeal- ro be atomic with respect Lo otller actions and hence can execute concurrently with other actions. This general solr~tjonpattern can be imple~nented using the passing-thebaion technique introduced in Section 4.4. In particular: request has the form of a general await statemenl, so it is implemented by the program fraginenl request (parameiers) : P(e); i f (request cannot be
satisfied) DELL\ Y;
take unils; SIC/VAL ;
Similarly, release has h e form of a simple aromic action: so i t can be i~nplernented by tlie program fragmenc release (parameters) : P(e);
relurn units; SIGNA L ; As before, e is a semapliore rhal conb-ols entry Lo the critical sections, and SIG-
NAL is a code fi-agmcnt thal either awaltens a delayed process-if some pending request can be satisfied41- unlocks the entry semaphol-e by executing v (e). Tl~cDELAY code in request is a program fragment like that at the star1 of the entry pi-otocols for readers ancl writers (see Figures 4.12 and 4.13). In ~~articular, it records Lhat there is a new recluest that is about to be delayed, unlocks h e entry semaphore by executing v ( e ) , then blocks the I-equestin~process on a delay semaphore.
,t,
,
/
,
8
I
180
Chapter 4
Semapliores
The exact tlelails of how SIGNAL is i~nplernentedfor a specific resourceallocation problem depend on what the diflerent delay conditions are and how they u e represented. In any evenl, the DELAY code needs to save tbe parameters describing a delayed request so that dley cao be examined in the SIGIVAL code. Also, here needs to be one condition semaphore for each different delay condition. The next section develops a solution lo one specific resource allocation probletn then describes how to solve any such problem. Several additional allocation problems are given in the exercises.
4.5.2 Shortest-Job-Next Allocation Shortest job next (SJN) is an allocation policy that occurs in Inally guises and is used ~ O I many di ffereiit kinds of resources. For now, assume the shared resource has a single unit (lhe general case of mulliple unirs is considered at the end of [his section). Then the SSN policy is defined as follows.
(4.3) Shorlesl-Job-Next (S3N) Allocation. Several processes compete for use of a single shared resource. A process requests use of the resource by executing request(time,id), where time js an integer that specifies how long the proccss will use the resource and i a is an integer that identilies the requesting process. When a process executes request. il the resource is free i t is i~nmediatelyallocated to the process: if nor, the process delays. After using the resource, a process makes i t free by executing release. Whc~itlie resource is freed, i t is allocatecl to the delayed process (if any) that has the minimum value f o r t i m e . I f two or more processes have the same value for time, (he resource is allocaled 10 the one char has waircd the longest. For example. the SJN policy can be used [or processol- allocarion (using t i m e as execution time), for spooling files to a printer (using t i m e as printing time), or for remote file rransfer (ftp) service (using time as the estimated file-transfer time).
The SJN policy is attractive because jt minimizes average job completion time. However, it i s inherently unfair: a process can be delayed forever if there is a continual stream of requests specifying sliorter usage times. (Such unfaix~less is ex~reinely unliltely in praccicc unless a resource is totally overloaded.) If unfairness is of concern, rhe SJN policy can be modified slightly so that a process that has been delayed a long time is given preference. This technique is called aging.
4.5 Resource Allocation and Scheduling
181
If a process makes a requesi and the resource is Free, the reques( can be satisfied immediately since there are no other pending requests. Thus. b e SJN aspect of the allocalion policy comes into play only if there is Inore than one pending request. Since there is a single resource, it is sufficient to use a single variable Lo recold whether the resource is available. Let free be a Boole;u~variable that is true when the resource is available and Fdlse when it is in use. To implement the SJN policy, pending requests need lo be remembered and ordered. Let pairs be a set of records (time, i d ) , ordered by the vali~esof time fields. If two records have the same value for time, let them occur i n p a i r s in the order in which they were inserred. With this specification, the following predicate is to be a global invariant: SJN: p a i r s is an ordered set
A
free
+
(pairs == 0 )
In words, p a i r s is ordered. and if the resource is free, p a i r s is the empty ser. Initially, f r e e is true and airs is empty, so predicate SJN is true. Ignoring the SJN policy for the moment, a requesL can be satisfied exactly when the resource is available. This results in the coarsc-grained solution: boo1 free = t r u e ;
# shared variable (await ( f r e e ) free = false; )
request (time,id) : release():
(free
=
true;)
With chc SJN policy, however. a process executing request needs lo delay until the resource is free urrtl the process's 1-equesl is Ihe next olle LO be honored according to the SJN policy. From the second conjunct of SJN, if free is true at [lie time a process executes request, then set p a i r s is ernpry. Hence the above delay condition is sufficienl L o determine \vhetlier a request can be sarished immediately. The time parameter comes into play only if a recluest must be delayed-i.e.. i f f r e e is false. Based on I.hese observations, we can i~nplement request as request (time,id) : P(e); if ( ! free) DELAY; free = false;
STGIVAL; And we can implement release as release ( ) : P(e); free = true;
Sf Gh'AL;
I
I
182
Chapter 4
Semaphores
In request, we assume that P operatiolls on the encry semaphore e complete in the order in which they are attempled-it., P ( e )is FCFS. If rhis is not Lhe case,
reqi~eslswill not necessarily be serviced in SIN order, The remaining concern is to ilnplerner~tthe SJN aspect of the allocalion policy. This involves using sel pairs and semaphores to jmpletnent DEWY atld SIGIYAL. When a request cannol be saristied, ir needs to be saved so it can be examined later when the resource is released. Thus, in D E W Y a process needs to insert its parameters in pairs. release conlrol or the critical sectioli by executing v ( e ) ,then tlday on a semaphore until the request can be sarisfied. W l ~ e ~the i resource is freed, if set p a i r s is 1101 empty, the resource needs to be allocated to exactly one process in accordance with the SJN policy. In short, if (here is n cleli~yedprocess that can now proceed, it needs to be signaled by exec,uling a v operation on a dolay sernaphore. I n earlier examples, there were just a few different delay conditions, so just a few condition se~nnphoreswaaeneeded; for example, there were only two delay conditions i n the readerslwrirers solution at the end of the previous sectjon. Here, how eve^., each process has a differen1 delay condition. deper)djng on its position in set pairs: the first process in p a i r s needs to be awakened bcfore the second, and so on. Thus each process needs to wait on a different delay se~napliore. Assume there are n processes that use the resource. Let b [n]be an away 01' semaphores, each element of wh.ich is initially o. Also assume the values of process i d s are unique and are in the range from 0 to n-1. Then process i d delays on semaphore blid]. Augmenting the request and release operations above with uses ofpairs and b a s specified, we get the solutio11to the SIN allocation problem shown in Figure 4.14. In Figure 4.14, the insetl operalion in request is assu~nedto place rl~epailin Lhe proper place in p a i r s in order to maintain the first conjuncl in ,$,IN. Hence, SJN is indeed invariant outside request and release; i.e., SJN is tlue just after each P ( e ) and just before each v ( e ) . The i f statenlcnt in the signal code in release awakens exactly oile process iC tlicl-e is a pending requesl! and hence pairs is not empty. The "bacol~" is passed to [hat process, which sets free to false. This ensures that the second cotljuncl in SJN is true when pairs i s nor empty. Since there is only a single resource. no lurtl~errequests could be satisfied, so d ~ dgnal e code in request is simply v ( e). 'The clernerlts of the array of semaphores b in Figure 4.14 are examples of what are callecl private sernapl~ores.
(4.4) Private Semaphore. Semaphore s is called a private semaphore if exactly one process executes P operations on s.
184
Chapter 4
Semaphores Replace Boolean variable free by an inieger avai 1 tlial records the tluniber of available uoits. In request, L'esi whether amount c = avail. II so, allocate amount i~nits.If not, record how many u11iLs are xequj~-edhefore delaying.
In release, increase avail by amount. then determine whether the oldest delayed process that has the minimum value for t i m e can have its request satisfied. I F so, awaken it. 1f not, execute v ( e ) . The other modification is that i t might now be possible to sadsfy more than one pending request when units a~-e released. For example, there could be two delayed processes thal together I-equireno more units Ihan were released. In [his case, the firs1 one that is awakened needs to signal the second aAer taking the units it requires. In short, the signaling pi-otocol at the end of request needs to be (lie same as the one at the end of release.
4.6 Case Study: Pthreads Recall thal a thread is a lightweight process-namely, a process that Iias its own program counter and execution stack but none of the "heavy weight" context (such as page tables) associated with an application process. Some operating systems have provided mech~unisms chat allow programmers to write mu1tithreaded applications. l-Iowever, the mechanis~nsdiffered, so ap pl icalions were 1)or portable to different operating systems, or even lo variarlts of the same opcr-ding system. To rectify this situation, a large group of people jn the l i d - 1990s defined a standard sel of C library routines for multjthreaded programming. The group was working u~lderthe auspices OF an orgznjzation called POSIXportable operating systems interface-so the library is called Ptlueads, for POSIX tlueads. The library is now widely available on varjous flavors of the UNIX operating system, as well as some others. The Pthreads library contajns dozens of functions for thread managemen[ and syncluonization. Here we describe a basic set lhal is sufficient to fork and laler join with new threads and to synchronize their execution using semaphores. (Section 5.5 describes functions for locks and cotlditio~ivariables.) We also present a simple, yet cu~npleteexample of a producer/consumer application. It can serve as a basic template for other applications that use the Pthreads library.
4.6 Case Study: Pthreads
185
4.6.1 Thread Creation Using Pt11re;uls wj th a C program involves four steps. First. include rhe s~.anda~-d header for the Prhreads I i braql: # include cpthread.h>
Second, declare variables for one tlireacl attributes descriptor and one or more thrcad descriptors, as in pthread-attr-t tattr; gthread-t tid;
/ * thread attributes * / / * thread descriptor * /
Third, initialize h e attributes by executing
Finally, create the threads, as described below. The initial attributes of a thread are set beforc the thread is created; many can later be altered by means of thread-management functions. Thread attributes incl.ude the size of the Ihread's stack, its scheduling priol-jly. and ils scheduling scope (local o r global). The defaulc attribute values are often suSficient. with (lie exception of schecluliiig scope. A programmer usually wants a thrcad to be scheduled globally rather than locally, meaning that it co~npeteswith all threads for processor time rather rhnn just with other threads GI-eatedby the same parent thread (and hence same parellr process}. The call of pthread-attrset scope illustratecl above accompljshes th A new thread is created by calling the pthread-create function, a.s i n pthread-create(&tid,
I
&tattr, start-func, a r g ) ;
The firs1 a r g u m e n t is lhe address of a thread descdp~ortl~atis filled in ii'crealion is successful. The second is the address of a previously inirialized thread attributes desc~.iplor. The new thread begins execution by calling start-func with a single argument, arg. I f thread c r e a t i o n is s u c c e s s f i r l ? pthreadcreate reblrns a zero; a nonzero ~'cturnvalue indicales an error. A thread terminates its owns execution by calling
The Prhrcads progi-an1.sin rhc tcxl Iinvt: been restcd usng lht: Solaric ~rn~~lenienti~tiot~. Yo11mily need to use dtfferenr serri~lgstor some of (hc a[rrihulr.; on olhcr systerlls. For exan>ple. 011an l H l X ~~~~pleinealrltion rhc. scheduling scope s h o ~ ~ he l d PTHREAD_SCOPP-PROCESS. wI11chi\ 1118dchult.
. .1
.
286
Chapter 4
Semaphores
The value is a single return value (or NULL). The exit routine is called imp1 icilly if the thread returns from the func~ionthat jt started executing. A parent thread can wait for a cl~ildto rernlinate by executing
where t i d is a child's descriplor and v a l u e s t r is the address of a location for the return \lalue. The return value is filled in when the child calls exit.
4.6.2 Semaphores Threads comtnunjcate with each other using varjables declared global to the I'u~lc~ions executed by the threads. Threads can synchronize with each otl~er using busy waitirlg, locks, semaphores, or condjtio~l variables. We describe se~naphorcshere; locl(s and monitors are described in Section 5.5. The l~eadersfile semaghore.h contains definitions and operalion prototypes for semaphores. Selnaphore descriptors are dedaucd gl.obal to t11re;lds that will use them, as in sem-t m u t e x ;
A descriptor is initialized by calling the sem-init function. For example, ttle following inilialires mutex to 1:
If SNAREDis nonzero, the semaphore can be shared between processes; otherwise it can only be shared between threads in the same process. The Pthreads equivalent of the P ope ratio^^ is sem-wait and the eqoivale~ltof the v operation is s e m s o s t . Thus, one way to protect a critical section of code is to use
semaphore mutex as l'ollows: #em-wait(&rnutex);
/*
P(mutex);
/
critical section; s e m g o s t (&mutex) ;
/ * V(mutex); * /
The Pthrentls library also coritains fu~~crionsto colldjtionally semaphore, re(rie\le ils current value, and deskoy it.
wait
on a
4.6.3 Example: A Simple Producer and Consumer Figurc 4.15 conlains a complete example of a simple producer/consi~merprogram. which is similar to the program show11 earlier in Figure 4.3. 'Che
4.6 Case Study: Pthreads #include cpthread.h> #include <semaphore.h> #define SHARED 1 #include <stdio.h>
/ * standard lines * /
void *Producer(void * ) ; void *Consumer(void * ) ;
/ * the two threads * /
s e m t empty, full: int data; int numIters;
/ * global semaphores * / / * shared buffer */
/ * main0 - - read command line and create threads * / int main (int argc. char *argv [ I ) { pthread-t pid, cid; / * thread and attributes * / pthread-attr-t attr; / * descriptors */ pthread-attr-init(&attr); pthread-attr-setscope(&attr,
sem-init(&empty, SHARED, 1) ; sem-init (&full, SHARED, 0) ;
PTHREAD-SCOPE-SYSTEM);
/ * sem empty = 1 * / / * sem full = 0 * /
numIters = atoi (argv [l]) ; pthread-create (&pid, &attr, Producer, NULL) ; pthread-create(&cid, Latts, Consumer, NULL); pthread-join (pid, NULL) : pthread-join (cid, NULL) ;
1
..., numIters into the data buffer * / void *Producer (void *arg) { int produced; for (produced = I; produced <= numIters; produced++) { sem-wait(&empty); data = produced; s e m g o s t (&full) ; / * deposit 1,
\
/ * f e t c h numIters items from the buffer and swn them * / void *Consumer (void *arg) { int total = 0, consumed; for (consumed = 1: consumed <= numltere; consumed++) { sen-wait (&full); total = total + data; sem-post (&empty);
1 printf ("the total is %d\nl', total);
1 Figure 4.15
Simple producer/consumer using Pthreads.
187
188
Chapter 4
Semaphores
and consumer funcrions are executed as independent threads. They share access to a single buffer, data. The Producer deposits a sequence of integers from 1 to numIters into the buffer. The Conaumer fetches these values and adds them. 71vo semaphores. empty and f u l l are used LO ensure that the producer and consutner alternate access to the buffer. The main function initializes the descriplors and semaphores, creates the two rhreads, and then waits for them to terminate. The threads itnplicicly call pthread-exi t when they complete. Neither thread in tlis program is passed an argu~nent(hence the NULL pointer value in pthread-create). Section 5.5 contailis an example in which Lhreads are passed arguments. Producer
Historical Notes
I
II \
~ I
I
I
~
In the ~njd-1960s,Edsger Djjkstra and five colleagues developed one 01' the fitst ~nultiprograrn~nedoperating systems at the Technological University of Eindhoven in h e Netherlmds. (The designers humbly named it " T H E multiprogra~nrni~ig system, after the Dutch initials of the institution!) The system has an eleganr structure, which consists of a kernel and layers of viflx~almachines impletnc~~ted by processes [Dijkstra 1968~11.11 also introduced sen~apborcs,which Dijkstra invented in order to have a useful tool for implementing nlut~~al exclusion and for signaling the occulTence of events such as inlerrtlpts. Dijkstra also invented the term privale semup1zo1-e. Because Dijkstra is Dulch, P and v stand for Dutch words. In particular, P js the first letter of the Dutch word passe~-et~, which means "to pass"; v is the first lctcer of vrijgeven, which means "to release." (Note the analogy to railroad semaphores.) Dijkstra and his group laler observed thar P rnighr betcer stand for prolagen-formed from (lie Dutch words proberen ("lo Cry") and vel-lugrn ("to decrease")-and that v might better stand for verhogen ("to increase"). At about the same time, Dijkstra [1968b] wroce an important papel- on cooperatj~lgsequenlial processes. flis payer showed how to use semayl~oresto solve a variety of synchronization problems and introduced tile problems of the dining philosophers and the sleeping barber (see Section 5 -2). In his seminal paper on monilors (discussed i n Chapter 5). Tony Hoare 119741 introduced the concept of a split binary semaphore and showed how to use it to impletnent monirors. However, Dijks~rawas the one w h o later named Lhe technique and illustrated its general utility. In particular, Dijkstra [I9791 showed how ro use split binary se~napboresto solve the readerslwriters problem; Dijkstra [ I 9801 showed how to implerneni general semaphores using only split binary semaphores. The author of this book developed the technique of passing (he baton [Andrews 19891. It was inspired by Dijkstra's papers on split binary semaphores.
Historical Notes
189
In fact, passing the baton is basically an optimization of Dijltstra's algorithms 11979, 19801. The solution to the dining philosophers problem in F~gure4.7 is deterministic-i.e., every process Lakes a predictable set o f actjons. Lehman and Rabin [I9811 show that any deterministic solution bas to use asyrmnetry or an ourside agent if it is to be deadlock free and starvation free. They also present an interesting probabilistic algorithm that i s perfecdy symmetric (see Exercise 4.19). The basic idea is that philosophers use coin Rips to deterrni~le(lie order in which to try to pick up forks-which irltroduces asymmetry-and h e philosopher who has most recently used a fork defers to a ncigllbor i i lhey botli want lo use the s a n e fork. Courtois, Heymans, and Parnas 1.19711 i~itroducedthe readel-s/writers probJern and presented two solutions using semaphores. The fit's[ is the rcaders' pre.ference solution developed in Section 4.4 (see Figure 4.10). The second solurion gives writers preference; it is much more co~nplcxthan their ~reaclers'preference solution and uses ti ve semaphores and two courltcrs (see Exercise 4.20). Their writers' preference solution is also quile dilficulr lo understand. In contrast, as shown at the end of Section 4.4 and in Andrews [19S9], by using the recl~niqueof passing tlie baton we can readily moclify a readers' preference solution to give writers preference 01. to get a fair solu~ion. Scheduling properties of algorithms often depend on the sernapliore operations being strongly fair-namely. that a process delayed al a P operation eventually proceeds if enough v operations are executed. The kernel implementation in Chapter 6 provides strongly fair semaphores since blocked lists are nmintainul in FIFO order. However, if blocked processes were clueued in ,some other ordere.g., by their execution priority-then the P operation might be o~rlyweakly fail*. Morris [I9791 shows how to implement a scarva~ion-freesolution to the critical sectioli problem using wealtly fair binary semaphores. Martin a31d Burch [I9851 present a somewhat si lnpler solution lo the same problem. Uddiilg [I9861 solves the same problem in a systematic way that makes clearer why his solulion i s correct. All three papers use split binaq semapl~oresand a Lecliniqice quite similar to passing the baton. lLlany people have proposed varialions on semaphores. Ful example, Patil [197 11 proposed a PMultiple prilnitive, which waits i~ntiliI set or selnaphores are all nonnegative and then decremenls [hem (see Exercise 4.28). Reed and Kanodia [I9791 present mechanisms called evenlcounts and sequencers, which can be used to conshuct scmaphorcs but can also be used directly to solve additional synchroniration problems (see Exercise 4.38). More recently, Faulk ancl Panas [I9881 have examined the kinds of sy~~chronization rhat arise in hard-realtime systems, which have critical t h i n g deadlines. They argue I-hat in real-time systems, tbe P operation on semapl~ores should be replaced by rwo more
I
190
Chapter 4
Semaphores
primitive operalions: pass. which waits u n ~ i l(lie semaphore is nonnegalive; and down. which dec~.e:cmentsit.
Se~naphoresare often used for synchronization within operating systems. Many operating systems also provide system calls that make them available ro application programmers. As ~ioledin Section 4.6. the Prhreads library was developed as a portable standard I'or threads and low-level synchi-onization mechanisms, including semaphores. (However, Pthl-eads does not include routines for barriel- synchronization.) Several books describe threads yrograrntning in general and Pthreads progra~mvingi n ~)ar.ticular;Lewis and Berg [I9981 is one example. The Website for this book (see the Preface) contains links to information on Pthreads.
References Anclrews, G.R. 1989. A neth hod for solvjr~gsynd~ronizatio~l problems. Sciei~cc! of Computer Prog. 13, 4 (Decembcr): 1-21.
Couriois, P. J., F. Heymans. and D. L. Parnas. 1971. Concurrent control with "readers" ;uld "wr~ters." Comrn. ACM 14, I 0 (October): 667-68. Di.jkstra, E. W. 1968a. The slruccure (sf the "THE" ~nultiprogramrningsyslem. Cotnm. ilCjl4 1 1, 5 (JvIay): 34 1 4 6 .
Dijkstra. E. W. 196Xb. Cooperating sequential processes. In E Genuys, ed., Pl.og~z?n?mingLai?guuges. New Yol-k: Acadenlic Press, pp. 43- 1 I 2. Dijkhlra, E. Mr. 1979. A ru~orialon the split b i n a ~ ysernaphore. EWD 703, Ne~~nen, Nelliel-lands. March. Dijkstra, E. W. 1980. The superlluity of the genertll semaphore. EWD 734, Neu nen, Nerherlands, April.
h u l k , S. R.. and D. L. Parnas. L988. On synchronization in hard-real-time ~ y s (ems. Cotlznz.ACM 3 1, 3 (March): 274-87. I-lermm: 3. S. 1989. A comparison of synchronization mechanisms for concurrent progratnrning. Master's thesis, CSE-89-26, University of Cali fot~lia at Davis.
Hoare, C.A. R. 1974. Monitors: Ar? operating systern structuring concept. Cornir?. ACIM 17. I 0 (October): 549-57.
Lehmann, D., and M. 0 . Kabin. 1981. A symmetric and fully djstributed solulion t o the dining philosophers problem. Proc. Eiglzlh ACM S~wzp.on.PJ-~YLoipies ofProgr-utnnzingLanguages, Janual-y, pp. 133-38.
Exercises
191
Lewis, B.. and D. Berg. 1998. Multitht-eacted Programnzing with Prhrends. Mountain View, CA: Sun Microsyste~nsPress. Mu-tin,A.J., and A. J. Burch. 1985. Pajr rnulual exclusion with unfair P and V operations. Tnfor~nnrionProcerring Letters 21, 2 (August): 97-100.
Pvlol-ris, J. M. 1979. A slmation-free solution to the mutual exclusion probleln. Inforn7.ation Processing Letters 8 , 2 (February): 76-80. Parnas, D. L. 1975. Oo a solution to Lhe cigarette smoker's problem (without condilioilal statements). Comm. ACM 18,3 (March): 181-83. Palil, S. S. 197 1. Limitations and capabilities of Dijksrra's semaphore primitives for coordination among processes. MIT Project MAC Memo 57, Febr~lary.
wit11 eventcounts and Keed, D. P., and R. K. Kanodia. 1979. Syncli~~onization sequencers. Comm. ACM 22, 2 (Februa-y): 1 15-23. Uddjng, J. 1986. Absence of individual starvation using weak semaphores. lnjorrnntio~tProcrssi~~g Lelrcrs 23, 3 (October): 159-62.
Exercises 4.1 Develop a silnulatiorl of general semaphores using only binary semaphores. Specify a gJobal invariant, then develop a coarse-grajned solution, and fillally a fine-grained solution. (Hint: Use the technique of passing che baton.) 4.2 The semaphore operations are derined in Section 4.1 using a w a i t statements. Consequently, jrdcrence rules for P and v can be derived by applying [he Await Statement Rule given in Section 2.6. (a) Develop inference rules for P and V. (b) Using your rules, prove that tbe critical section program in Figure 4.1 is correct. In particular, use rhe method of exclusion of configurations described in Section 2.8 to prove that two process cannot be in their critical secrions at the same dme. 4.3 Recall that Fetch-and-Add, F A (var, increment ) , is an atomic function that returns tbe old value of var and adds increment to it. Uslng FA, develop a simulation of the P and v operations on general se~naphores. Assr~meLhal meniory reads and writes are atomic but that FA is Clle only more powerful atomic operation. 4.4 A precedence graph is a directed, acyclic graph. Nodes repi-esent tasks, and arcs indicate the order in which tasks are lo be accomplished. In particular, a task can execute as soon as all its predecessors have been completed. Assume lhal [he tasks are processes and rhal each process has the following outlj.11e:
I
192
Chapter 4
Semaphores process T i wait for predecessors, if any; body of (he task;
signal successors, i l any: 1 (a) Using semaphores. show how Lo syncli~.onizefive processes whose permissi-
ble execution order is specilied by [he followiag precedence graph:
h/Ijnimize the number o l semaphores [hat you use, and do not impose conch-aints not specified in the graph. For example, ~2 and ~3 can execute concurret~tly after TI compleles. (b) Desc1,ibe liow to synchronize processes, given an arbitrary precedellcc graph. I n palticulal; devisc ;I general methotl for assigning selnaphores to edges or processes and for using them. Do not try to use the absolute ininimum number OF semaphores since determining that is a n NP-hard problem for an arbitrary precedence graph !
4.5 Suppose a ~uachinehas atomic increment and decrement instructions, rec (var) and DEc (var). These respectively add 1 io or sublrac( 1 from var. Assurne that meinoiy reads and writes are atomic but that INC and DEC are the only more powerf~ulatomic operations.
(a) Is il possiblc to sitnulate 1l1e P and v operations on a gelieral sernaphore s ? 1f so, give a sirnulalion ancl explain liow i t works. If not, explajn carefully why it is not possible to simula~eP and v. (b) Suppose INc and D E c also relurn the sign bit of tlze linal value of var. In pariicular, i f clte final value o-F var is negative, they return 1; otherwise, they retorn 0 . Is it now possible to simulate the P ancl v operatio~lson a general semaphore s ? 11 so, give a simulation and explaill how it works. I F not, explain carefully why it is still not possible to simulate P and v. 1
4.6 The UNIX kernel provides two primitives sirnjlar to the Following:
I
sleep ( ) : block the executing process wakeup ( ) : awaken all blocked processes
E;kcli of these
js at1
atornic operation. A call of sleep always blocks the calling
Exercises
193
process. A call of wakeup awakens every process that is blocked at the lime wakeup is called.
Develop an i~nplementntionof Lliese primitives using se~napl~ores for sy~lchronizalion. (Hiru: Use the method of passing h e biiton.) 4.7 Consider the sleep and wakeup primirives defined in the previoi~sexercise. Process PI is to execute stalements S1 and s2; process ~2 is to execute stalements 53 and s4. Staleinen1 se rnusl be executed allel- s ~ A. colleague gives you Ihe I'ollowjng program: process P1 ( S1; wakeup(); S 2 ;
1
process P2 { 53; sleep(); 54; 1
Is Clle solution corred? I f so: explain why. 11' not, explain how it can ]-'ail,then describe how Lo change the primitives so that it is correct. 4.8 Give all possiblc fitla1 v:~lues of variable x in the following prograln. Explai~~ how you got your answer. i n t x = 0 ; sem sl = 1, s2 = 0 ; co ~ ( 8 2 ) ;~ ( 0 1 ) ; x = x f 2 ; V(BZ); / / P ( s 1 ) ; X = x*x; V ( 5 I ) ; / / P(s1); x = x + 3 ; V ( s 2 ) ; V ( s 1 ) ; OC
4.9 Consider 1.he colrlbirlir~gtree barrier in Figures 3.13 and 3.14.
(a) Using semaphores for synchronization, give the actions of the IcaP nodes, itlterjor rlodes, and root node. Make sure the ban.ie~-can be reused by the same set 06 proceshes.
(b) If there are n processes, what is the Iota1 execulio~)timc for one completc barrier synchl-onizatioii as a function of n? Assume each semaphore ope~.adon takes one timc unit. Illustrate your answer by showing the su-uccure of tlne colnbinillg tree and giving the executiotl tj m e for n fcw values of n. 4.10 Consider t11e butterfiy and djsseminatio~ibitn.iers in Figures 3.15 and 3.16.
Using semaphores I'or synclzrorri~ation,give complete details for a butterfly ld The barrier for eight processes. Show the code that each process w o ~ ~ cxccrrle. ban-ier should be reusable. (21)
(b) Repeat (a) for a dissem.iciation barrier I'or eight processes
194
Chapter 4
Semaphores
(c) Compare your answers to (a) and (b). How many variables are required for eacb kind of barrier? If each semapliore operation takes one unit of time, what is [he total time required for barrier synchronization in each algorithm?
(d) Repeat (a), (b), and (c) for a seven-process barrier.
(e) Repent (a), (b), and (c) for a 12-process ban-ier. 4.1 1 11 is possible to itnplernent a reusable n-process banier with two semaphores and one councer. These are declared a5 follows: i n t count = 0; aem arrive = 1, g o = 0;
Develop a solution. (Hinr: Use the idea of passing the baton.) 4.1 2 I[ is possible to coastluct a simple O(n) execution time barrier for n worker processes ilsing an array of n semaphores. Show how Lo do so. The workers should synchronize with each other; do tlot use any other processes. Be sure to show the in i lial valr~esfor the semaphores.
4.13 Consider ihe following proposal for i~nplemenling await statements using semaphores: sem e = 1, d = 0 ; int nd = 0;
!
# entry and delay semaphores # delay counter
# implementation of (await (B) s;) P(e) ; while (!B) { nd = ndcl; V ( e ) ; P ( d ) ; P(e); 1
s; while (nd > 0) ( nd = nd-1; V(d);
)
V(e) ;
Does this code ensure that the await statemeill i s executed atomically? Does it avoid deadlock? Does ir guarantee thac B is true before s is executed? Foc each of these cluestions. eitlier give a convinciup argument \vhy rhe answer js yes. or give an execulion sequence that jllustrates why the answer is no. 4.14 Develop a concurrenl implementation OF a hounded buffer with multiple producers and multiple consumers. In particular, modify the solution in Figure 4.4 so that ~nultjpledeposits and lnultiple feiches can execule al the same time. Each deposit must place a message into a cliffere~itempty slot. Fetches must retrieve messages I'rorn different full slots.
Exercises
195
4.15 Another way to solve the bounded buffer problem is as follnws. Lcl count be an integer between 0 and n. Then deposit and fetch can be prog~.an~med as follows: deposit : ( await (count < n) buf [rear] = data; rear = ( r e a r + l ) % n; c o u n t = count+l; ) fetch: ( await (count > 0 ) result = buf [front ] ; front = (front+l) % n; count = count-1; )
Implement these a w a i t staterilenls uJng sem;lphores. passing the baton.)
(Hint: Use a varialion 01
4.16 Atomic Broadcasl. Ass~lineonc producer process and n consumer processes share a buffer. The producer del~osilsnlessages into the bufkr. consumers felch them. Every message deposited by the producer bas to he fctcbed by all n conmessage into the buffer. sumers before the producer can deposit a~~otlier
(a) Develop a solution f01- this proble~nusing se~naphoresI'or synchronization. I
(b) Now assume llie buffer has b slots. The producer can deposit messages only it110 empty slols and every message has to be received by all n consumers before the slot can be reused. Furthermore, each consumer i s co receive clie messages in the order they were deposited. However, different cons11mers can receive messages at different tirnes. For example, one consumer could receive up to b Inore messages than anotl~erif the second consumer is slow. Exteod your answer to (a) lo solve this more general problem.
I I
1 I
4.17 Solve the dining phjlosophers problein by focusing 011 LIE state 01 the philosophers rather than the forks. In particular, l e ~eating [ 5 ] be a Boolean array; eating [i I is true if Philosopher [i] is eating, and is false otherwise.
1
I
(a) Specify a global invariant, then develop a cowse-grained solution, and finally develop a line-grajtled soli~tionthar uses semaphores for synchronization. Your solution should be deadlock-free. bul an jrldividual philosopher might slarve. (Hint: Use 111e technique of passing the baton.) (b) Modify your answer to (a) Lo avoid slarvaljon. In particular, i f a philosopher
wants to eat, eventually Ile gets to.
4.18 Solve the dining pli i losophers problem using a ccntral ized coordinator process. In particular, when a philosopher wants to eat, he informs the coordirralvr and then waits for permission (i.e., the philosopher waits lo be given both forks).
I> ,
,,
. ., ri , I , ,
:< ,
196
Chapter 4
Semaphores
Use semaphores for syuchronizatio~~.Your solution should avoid deadlock and starvalinn.
4.13 In the dining philosophers problem, assume that a philosopher flips a perfect coin to deter~ninewhich fork to pick up first. (a) Develop a fully symmetric solution lo this problem. In particular, every philosopher sllould execute the same algorithm. Use busy waiting, semaphores, or both for synchronization. Your solution should be de;~dlock-free,but an individual philosopher might starve (with low probabiliiy). (Hint: Have a philosopher pi11 his first fork back down and flip the coin again if the second fork is unavailable.)
Extend your answer to (a) to guarantee absence of starvation. Clearly explain your soluiion. (Ni~zt: Use exka variables lo preclude one philosopher From eating again if a neighbor cvants to eat.) (b)
4.20 Consider the following writers' preference solution to the readers/writers problern [Courtois et al. 19711: i n t nr = 0, nw = 0; sem ml = 1, m2 = 1, m3 = 1; # mutex semaphores sem read = 1, write = 1; # read/write semaphores
Reader processes:
Writer processes:
P(m3); P(read) ;
~(m2); nw = nw+li if (nw == 1) P (read); v(m2); P(write) ; write the database; V(write) ; P(m2);
P(rn1);
nr = nr+l; if (nr == 1) P(write); V(m1) ;
V (read); V(m3 ;
read the database; P(m1); nr = nr-1; if (nr == 0 ) V(write);
nw
= nw-1; if (nw == 0 ) V(read) ; V(m2 1 ;
v(ml)i The code executed by reader processes is similar to that in Figure 4.10, but the code for writer pr0cesse.s is much more complex.
Explain the role of each semaphore. Develop assertions that indicate what is true at critical pojnts. 111 particular. show that the solution ensures rhal writers have exclusive access to the database and a writer excludes readers. Give a convincing argument rhat the solution gives wrilers preference over readers.
Exercises
197
4.21 Consider the following solution to he readers/writers problem. I t employs the same counlers and semaphores as in Figure 4.13, but uses them difi'erently. int sem sem int
nr = 0, nw = 0; e = 1; r = 0, w = 0; d r = 0, dw = 0;
# # # #
numbers of readers and writers mutual exclusion semaphore delay semaphores delay counters
procees Reader[i = 1 to MI ( while ( t r u e ) ( P(e); if (nw == 0 ) ( nr = n r ~ l ;V(r); ) else dr = dr-tl; V(e); P(r); # wait f o r permission t o read read the database; P(e);
nr = nr-1; i f (nr == 0 a n d d w > 0) (
dw = dw-1; nw = nw+l; V(w); }
v(e); 1
process Writer l j = 1 to N1 l while (true) { P ( e ); if (nr == 0 and nw == 0) { nw = nw+l; V(w); ) else dw = dw+l; V(e);
P(w);
write the database; P(e); nw = nw-1; i f (dw > 0 ) { dw = dw-1; nw = nw+l; V(w); )
else while (dr > 0 )
{
dr = dr-1; nr = nr+l; Vtr); 1
V(e) ;
1 1
(a) Carefully explain how thjs solution works. What is the role of each seu~aphore?Show that the solution ensures tha~writers have exclusive access to [lie database and a writer excludes readers.
198
Chapter 4
Semaphores
(b) What kind of preference does the above solution have? Readers preference? Writers psct-erence'? A ltemaiing preference'?
(c) Colnpare Lhis solution Lo the one in Figure 4.13. How many P and v operations are execurecl by each process in each solution jn the best case? I n the worst casc? Which program do you find easier to understand, and why? 4.22 Consider the re:~dersl\vritersprogram in Figure 4.13.
(a) Alter the program to give writers pref'erencc over readers. The text describes the basic idea: you are to fill in the details. (h) Compare your answer to (a) with the prowan1 in the previous exercise. How many semaphore operalions are executed by readers and by writers jn each solulion i n the hesl cnsc? In the worst case?
4.23 Modily the readerblwrilers solutio~lin Figure 4.13 so that the exit proiocol in writer processes awakens all waiting readers, if there are any. (Hinl: Yo11 will also need lo modify the entry protocol in Reader processes.) 4.24 Ahsllrne there are n reader processes in the readerslwrircrs problem. Let semaphore with initial value n. Suppose tbc reader's protocol is
rw
be a
P(rw) ; read the database; V(rw) ;
Develop a protocol for writer processes that ensures the required exclusion, is deadlock-free, and avoids starvation (assuming the P <)perationis strongly fair). 4.25 The Wuter Molecule Problem. Suppose hydrogen aid oxygen alorns are bouncing around in space q ~ n to g group together inlo water molecules. This requires chat two hydrogen atoms and otle oxygen aton1 synchronize with each other. Let the hydrogen (H) and oxygen (0) atoms be simulated by processes (threads). Each H atom calls a pl.ocedut-e Hready when il wants lo combine into a water molecule. Each 0 atom calls another procedure Oready when it wants to combj ne.
Yolir job is to write Lhe two procedures, using semaphores for synchl.onizalion. An H acorn IIRS co delay in Hready until another H atom has also called Hready urzd one 0 atom has called Oready. Then one of the processes (say the 0 atom) should call a procedure makewater. After makewater returns, all three processes should relurn from their calls of Hready and Oready. Your solution should not use busy waiting and it must avojd deadlock artd stal-valion. This is a Lricky problem, so be careful. (Hint: Consider starlillg with a global invariant and using passing the balon.)
Exercises
199
4.26 Suppose the P and v operations on selnaphores are replaced by the following: PChunk(s, amt) : VChunk(s, arnt):
(await ( a >= artit) s = (s = a + ant;)
6
- amt;)
The value of amt is a positive integer. These primitives generalize P md v by allowing arnt Lo be other than 1. (a) Use PChunk and VChunk to constn~cta siinple reader's preference soliltion to Lhc readersfwriters pr-oblern. First give a global invarianl, the11 develop a coarse-grained solution, and finally develop a fine-grained solution.
(b) Identify other problems that could benefit from the use of these two pritnitives and explain why. (Hint:Consider other problems i n rhe text and in these exercises.)
Ciya~rtteSrnoke~rProbLern [Paril 197 1; Parnas 3 975 1. Suppose there are three smoker processes and one agenl process. Each smoker continuously makes a cigarette and smokes it. Making a cigarelre requires lhree ingredients: tobacco, paper, and a match. One smoker process has tobacco, the second paper, and the third matches. Each has an infinite supply of these ingredients. The agent places a random two ingredients on the Lable. The sm0ke.r who has the third ingreclierlt picks up the other rwo, makes a cigarette, then smokes ir. The agenr waits for thc smoker to tinish. The cycle cl~erlrepeats. Develop n solutjon to this problem using sernaphores for synchronization. You may also need to use other variables. 4.28 Suppose file P and v operations on semaphores u e replaced by the i-allowing: ~Multiple(s1, ..., 8N): ( await (sl > 0 and and sN > 0) sl = s l - 1 ; ...; SN = sN-1; ) VMultiple (sl, . . , sN) : ( sl = sl+l; . . .; SN = sN+l; )
...
.
The arguments are one or more semilpho~.es.These pri~nitivesgeneralize P and v by allowing a process to sjmultaneously do a P or v operation on multiple sernaphores.
(a) Use PMultiple and VMultiple to construct a silrlple solut.ion lo the cigarerle sinokers probIem defined in the previous exel-cise. (b) Consider other problems in Chapter 4 and these exe~rises.Which ones could benefit from the use of these two priinicives and why?
200
Chapter 4
Semaphores
4.29 Consider a function exchange (value1 that i s called by Lwo processes to exchange the values crf tlleir arguments. The first process to call exchange has to delay. M'hen a second process calls exchange. the two values are swapped nnd returned to the processes. Tile same scenario is repealed for the third and I'oulth calls lo exchange, tlie I'lflh and sixlh calls, and so on. Develop code to ilnple~nent the body of exchange. U$e semaphores for synchronization. Be sure to declare and initialize every variable and semaphore you need, and to place shared variables global to 1I~eprocedure. 4.30 The Urzisc-r B~rlhroom.Suppose there is one bathroom in your department. It can he used by both men and women. but not at rhe same time.
(a) Develop a solu~ionto this problem. First specify a global invarianl, then develop a solution using semaphores ,for synchronizaciot~.Allow any number of men or women to be in rhe bacliroom at tlie same time. Your solution should ensure the required exclusion and a\toj.ddeadlock, but it need not be fair. (b) Modily your ;uiswer to (a) so that at most tour pcople are in the bathroom at
Lhe same time.
( c ) Moclify your answer to (a) to ensure fairness. You might want to solve the problem diffc~.eurly.(Hint: Use the teclinic~ueol' passing the baton.) 4.3 1 The One-Lane Bridge. Cars coming from the north and the south arrivc at a onelane bridge. C;us beading in the same direction can cross the b~idgeat the same time. bur cars heading i n opposite directions cannot. (a) Develop a solution to this problern. firs^ specif)! a global invariant, Lhen develop a solurion using seruapliores for synchronizntiorl. Do not worry about fai~ness.
(b) Modify your answer to (b) rn erlsuie that any car that is waiting to cross the bridge eventually gets to clo so. You may wanr to solve h e problem differer~tly. (Hir-11: Use the techoique of passing the baton.) Searcl~/lnsel.r/Del u . Three kinds of processes share access to a singly linked list searchers, inseslers, and delete~-s.Searchers merely examine lhe list; hence they cao execute concurrcrlrly with each olhel-. Ioserters add new items to the end of llie list; insertions must be mutually exclusive to preclude two inserrers from inserting new items al about the same time. However. one insert can proceed in parallel with any numbel- of searches. Fitlally, deleters remove items from anywhere in the list. At jy)ost one dele-ter process can access (lie list at a time, and dclctiou must also be tnutually exclusive with searchcs and insertions. (;I)
'I'lii~problem is an example of selective mutual exclusion. Develop a
Exercises
201
solurion using semaphores that is similar in style to the readersfwriters solulion in Figure 4.10. (b) This js also an example of a condition syt~cl~roniz,atio~l problem. Derive a solulion that is similar in style co the readersJwrirers solu~ioni n Figure 4.13. First specify the sytlcluo~lizationproperty as a global invariant. Use three counters: ns. the number of aclive searchers; n i , the number of active inserters: and nd,
the number of active deleters. Then develop a coal-se-grained solution. Finally. develop a he-grained so\~tionusing the technique of passing the baton.
4.33 Considec lhe following memory allocation problem. Suppose there are two operations: request (amount)and release (amount),where amount is an integer. When a process calls request, it delays u~zrilat least amount free pages of memory are available, then takes amount pages. A process reriirns amount pages Lo the free pool by calling release. (Pages may be released i n different quantities than [hey are acquired.) (a) Develop inlplelnentations of request and release that use the shortestjob-next (SJN) a1locaLion pol icy. In particular, srnallcr requests take precedence should have 11ie style of the program in Figure over larger ones. Your soliltio~~ 4.14. (b) Develop implementations of request and release that use a first-come, firs(-served (FCFS) allocation policy. This rneans (ha1 a pending requcsl might have to delay, even if there is enough rnelnory available.
I
4.34 Suppose n processes TJ [l:nl share two prinlers. Before i~singa pri~llc'r,u [ i I calls request (printer). This operation waits until one of the printers is available, (hen returns the identify of a free printer. After using that printer, u [ i 1 returns il by calling release (printer). Both request and release arc atoinic operafons.
(a) Develop irr~plemcntationsof request and release using semaphores for synchronization. (b) Assume each process has a priority stored in global array priority [ 1 :n]. Modify request and release so thal a printer is allocated Lo the highest priority waiting process. You may asslime that each process has a unique priol-ity. 4.35 The Hungry Birds. Given are n baby birds and one parent bird. The baby birds eat oul of a corninon dish that iilitially contains F portions of food. Each baby
repeatedly eats one portion of food at a time, sleeps for a while, and then comes back to eat. When the dish becomes empty, the baby bird who empties the dish awakens the parent bird. The parent refills the dish with F portions, then waits for the &sh to become empty again. This pattern repeats forever.
202
@)
Chapter 4
Semaphores
Represent (he birds ;is processes and develop code that simulates their actions. Use se~naphoresfor synchronizatjon.
4.36 The Bear and the Honeyljees. Given are n honeybees and a hungry bear. They shwe n pot of honey. Tbe pot i s initially empty; its capacity i s H portions of Iioney. The bear sleeps until the pot is full, the11 eats all the honey and goes back Lo sleep. Each bee repeatedly gathers one portion of honey and puis it in the pot; rhc bee who fills the pot awakens the bear. Represent the bear and honeybees as processes and develop code that simulates their actions. Use semaphores for synchronization. 4.37 The Roller Coaster Problem [Herman 19891. Suppose here are n passenger processes and one car process. The passengers I-epealedly wait lo take rides in the car, which can hold c passengers, c < n. However, the car can go around the tracks ouly when i t is full.
(a) Develop code for the actions of the passenger and car processes. Use scrnapliores for synchroniza~jon. (b) Generalize your answer to (a) lo employ m car processes, m > 1. Sjnce there i s o~llyone tmck. cars cannot pass each other; i.e., they must finish going around the track in the order in which they started. Again, a car can go wound the tracks
only when it is full.
4.38 A n eventcount is used to record the number of times at) event has occurred. It is represented by an integer initialized to zero and is manipulated by three primihves: advance(ec): (ec = e c c l ; ) read(ec) : (return(ec1; ) wait(ec, value) : (await ( e c >= value);)
A sequencer dispenses unique values. It is also represented by an inleger initialized to zero and is manipulated by the following ato~nicfunction:
I
~
I
ticket(seq):
(temp = seq; seq = seqtl; return(terng);)
A sequencer is often used in the second argument to the wait primitive. Using these primitives, develop a solution to the hounded buffer problern. Assume there is one producer and one consumel- and that the co~nmunicalion buffer contains n slots (see Figure 4.4).
(b) Extend your answer Lo (b) to permit multiple producers and consumers.
( c ) Using these primitives, develop implementations of the P and v operations on a general semaphore.
Monitors
Semaphores are a fi~ndanientalsynclil-onization mecl~~nisni. As sho~vnin Chapter I , they make il easy lo program ~ n ~ ~ t iexclusion ial
204
Chapter 5
Monitors
LVlie~ia collcrrrrenl program uses monitors for communicalion and synchro11isc.a~ion. 11 contiiins two kinds o.l' modules: active plnccsses and passive monitors. Assulning all sliarcd \!a~.iablesare wil.hin monitors. two processes interact only by culli~lg~)~accdures in the hamc nionitol.. The resulling modularization lias two ilnportnnl bcnelits. Fil-st. a proccss that calls a monitor procedure can ignorc how thc procedure is imple~neuted;all rhat 1nattel.sare the visible. e-l-'fectsolcalli~lgrhc p1.occdu1.c. Second, the pi-ogramlncr.of a monitor can ignore how or where the non nit or's procedr~resare used and c:rn change the wiry in which the norl lit or is and their etl'ecls are not changed. implc~nented,so long a s Lhe visible p~~occdu~.es These benefi~smakc i t po~sibleto design each process and monitor I-el:~tively indrpeiideiilly. This ~naltesa concurrent prognu11 easier to develop a~idalso easicy to untlerstiuid. This chapter describes ~nonilors i n detail and illusrrales heir usc with nurncrous exi~lnpleh,some that we have seen before and many rhal are ncw. Section 5.1 defines the syntax and semantics of monilors. Sec~ion5.2 descr~besa \,al.ie[y of useF~11progra~nming rechniques, using u sequence of exampl.cs: bounded buffcl,s, readers/wrilers. shortest-job-next scheduling. an interval Limer, and a classic problem called ll~esleepi~~g barber. In Section 5.3 we take a different lack, focuusil~gmorc on the .r/r~lctureof solu~io~is TO collcui.rellt programming problems. We use another interesting problem-schedul ing access to a rnovinghead disk-as the niorivating example and show seve.ra1 ways Lo solve he problem. Because of their utility and efficiency, manjtors have been einployed in several conculscnt programming languages, most recently (and notably) in Java, wliich we describe in Section 5.4. Tlie underlying synchronization ~nechanisms of monitors-implicit exclusion and condition variables for signaljng-are also employetl in the Unix operating system. Fillally. condition variables are suplibraries. Section 5.5 describes the ported by several concurrent progran~mi~lg routines provided by tlie POSIX threads (Pthreads) library.
5.1 Syntax and Semantics A ~nonicoris i~sedto group together the representation and implernenlation of a shared resource (class). It has an interface and a body. The interface specifies tlie operations (methods) pro\lided by [be resource. The body contains variables tlial repi-esenr the state of tlie resource and procedures that implement the operations in [ h e interface. Monitors are declarcd and created in diflerenl ways in diKcnnt languages. For si~nplicily,we will assume here rhat each monitor is a static object and that tlie interface 21nd body are declared together as to'ollows:
5.1 Syntax and Semantics
monitor mname
205
(
of perinanerli variables initializatjon staretnents procedures declaj-at ions
1
The procedures impletnent the visible operations. The permanent variables are shared by all the procedures within the body of [lie monitor: they are called pel-~nanentvariables because lhey exist and retain their values as long as the monitor exists. The procedures may also, as usual, have their own local variables; each procedilre call gets its own copy of these. A tnonilor has three properties that are 21 consecIuence of' its being an instance of an abstract data type. First, only the procedure narnes are visible ourside the n~onitor-they provide the only gates through h e "wall" defined by a monitor declaration. Thus, to a1ter the resource state represented by the permanenl variables, a process must call one ol' [he monitor procedures. Calls o f monilor procedures have the form: call rnname.opname(arguments)
where mnarne is the name of a monitor. opname i s one of the operalions (pi-ocedures) of the monitor, and arguments are a set o.l: arguments to that opelation. If opname is unique in thc scope OF the calling process. [lien rhe "mname ." part may be o~rlittedFrom the call statement. The second property of a monitor is 111al sraternents within [lie monitori t . , inidalizalion statements and statements with in procedures-may not access variables declared outside the monicor. The third property is that permanent variables are initialized before any procedu.re is called. This is itnplemented by executing the initialization stalemenb when the ~xonitoris created, and hence before any procedure is called. One of h e attractive attributes of a monitor-or any abstract clata type-is that it can be developed in relative isolation. However, this means that the programmer of a monitor cannot know a p~-iorithe order in wlijch the monitor's procedures might be called. Whenever execution order js indeterminate, it is useful to deb ne a predicate that is true independent of the execution order. In particular. a monr'ror irn~ariantis a predicate that: specifies the "reasonable" scates of the permanent variables when no process is accessi~igthem. The initialization code in a moni~ormust establish the invariant; each procedure must maintain ic. (A monitor invariant is like a global invariant, but just for the variables within a single monitor.) We include a ll~onitorjnvm-iant-in cominencs starting with ##-in each exa~nplejn he chapter.
206
Chapter 5
Monitors
What distingu islies ;I monitor horn a data abstl-action ~nechanisnl in a seqiientinl programming language is lhal il rnonitor is shared by concun-enrly executing processes. Thus processes executing in monitors nay require mutual exclusio~l-to avoid interference-and may require condition synchl-onizationto delay uiitjl rhe monitor slate is conducive to continued execution. We now considcr llow processes synchronize within monitors.
5.1. I Mutual Exclusion Synchronization is easiest to understand and hence 10 program if mutual excJnsion and coildition synchronization are provided in diC(erent ways. It is best if mu~ualexclusion is implicit: because h i s automalically precludes inrerference. 11 also makes proglxrns easier to read. because lhere 31-eno explicit crilical section entry and exit protocols. In contrast, condition synch~-oniz;~~ion must be prograrrul~ed explicilly, because different programs requit-e dil'ferent synchl.onization conditions. Although i t is often easiest ro synch~.onizeby means of 13001ean conditions as in await statements, lower-level mechar)isms can be implemented niuch more efficientl y. They also provide the programmer wirh finer control over execution order, which aids in solvjng allocation and scheduling problems. Based on these consjderacions, mutual exclusion in munitol-s i h provided implicitly and condition syncllronization i s programmed using whar are called condition variables. A monitor procedure is called by an external process. A procedure is croive i-F sollle process is executing a statement jn the procedure. At Inost one inst.ance of one ~nonil.orprocedure Inay be active at a iirne. For cxainple, two calls of djffere~irp~.ocediu-esrnay not be active at the same time, nor can two calls of tlie same PI-oceclure. I n. pal-ticuMonitor procedures by dejn.ition execute with t n u t ~ ~ca~l c l u s i ~ n lar. it is up to Ihc implemenlatiot~of a language: library, or operating systerrl to provide mutual exclusion. 11i s nor up to the programmer who uses monitors. I n practice, languages and libraries iml>lement~nutualexclusion by using locks or semaphores; single-PI-ocessoroperating systems j.mplement mutual exclusion by inhibiting exrcrnal inkl~upts;and multiprocessor operating systems irnpbnenc ~nutualexclusion by using locks (belween processors) and by inhibiting interrupcs (on a processor). Chapter 6 discusses i~nplementacionissues and tech~iiql~es in detail.
5.1.. I
5.1 Syntax and Semantics
207
5.1.2 Condition Variables A condiliorz variable is used to delay a process (hat cannot saltly conrinue executing unril the monitor's state satisfies sollie Boolean condition. TI IS also i~sedto awaken it delayed process when h e condiiion becomes true. The declaration of a condition variable has the form
Hence, cond is a new data type. A n a n y of conditioii variable!, i v declared in the usual way by appending range information to the variable's name. Condition variables may be declared, and hence used, only wirl1i.n Inon ilors. The value of a condition variable c v is a queue of delayed processes. Tni~ially,this queue i s empty. The value of cv is not: however, di~cctlyvisible to thc pmgranlmer. Instead it is accessed indirectly by several special operations. as described below. A process can query the state of a condition \lal-iahle by calling
This function returns true if cv's queue i s cmpty: othel.wise it 1-elunls fi~lse. A process blocks on a condition variable by executing wait (cv);
Execution of w a i t causes h e executing process lo clelay nl [he rear of cv's queue. So thar some other process can eve~~rually entcr [lie n~onitorto awaken the delayed process. execution of w a i t also causes I1,e pr0ces.s to relinquish exclusive access to the monilor. Processes that are blocked on condition variables ger awakened by means of signal staternenls. Execution of
examines cv's delay queue. If it is empry, signal has no effect. I-Iowever, if there is some delayed process, then signal awakens d ~ process e an the front of the queue. Thus, wait and s i g n a l provide a FIFO signaling discipline: processes are delayed in the order they call wait a ~ they d m awakened in the order rhar signal is called. We will later see how to add scheduling priorities to the delay queue, bul the default is that the queue js FIFO.
208
Chapter 5
Monitors
5.1.3 Signaling Disciplines When a process executes signal. it i s executirig ulilhin a monitor, and hence it I~ascontl-ol o.l' the luck implicitly associaled with the tnonitol: This leads to n clilernr~a. I f signal awakens another process, there are now two processes tbal coulcl exccule: the one chat executed signal ancl the one that was just awakened. At nlost one process can execute next (even on a ~nultiprt~essor). because at most orle can have exclusive access to h e monilor. Thus. there are ~ w opossibililies: Sigrzal and Continue: The signaler continues and tlie signaled PI-ocess
executes at some laler lirnc. Signal n~qdWzit: The signaler wairs until some later time ancl [lie signaled process execules now.
The process executing signal Signal and Colitinuc (SC) is n.onpr.ec~n.zl>tive: retains exclusibe control of the monilor ancl h e awakened proceqs executes al some later time when it can have exclusive. access co the monitor. In esse,nce. a signal is merely a hinl that the awakened process might continue, so i t goes hack onlo the queue of processes waiting f o r the monilor lock. On the other hand, Signal and Wail (SW) is yreemplive: The process exccuring signal passes control of the non nit or l a d t to the process awakened by the signal, and hence lhc awakeried process preempts the signaler. In this case. the signaler goes back onto the queue processes waiting for tbe moni~orlock. (A \~a.iationis lo put the signaler at rhe front of Lhc qucue of processes waiting for h e n~orlico~ lock; this is called Signal kind Urgent Wait.) The state diagram i n Figure 5.1 illustrates IIOW monitor syncbroniz;ltion worlts. When a process calls a monitor procedure, the call,er goes onto the enlry queue it anolher process is executing in tlie monitor; othelwise lhe callcr passer through the en1l-y queue and i~n~nedialely starts executi~igin the moniroi-. When Lhe inonitor Becomes fl-ee-due lo a return or wait-one process can move From ille cntry queue to executing in r11e monitor. When a proccss executes wait (cv) i t inoves From executing in the inonitor to the queue associaled with the condition val-inble. When a process execures signal (cv), with Signal and Continue (the arc labcled SC) he process a1 rhe front of the condition variable queue moves to rhe e n ~ r yqueue; with Signal and Wait (the two arcs labeled SW), the process executing i o the monitor moves to the entry queue and the process at the front o.T the condition variable queue moves ro executing in the monilor. Figure 5.2 contains a ~nonirorLhal implements a semaphore. It illustxates all the components of a monitor and will further clarify llie differences between the SC and SW signaling discjplines. (Althoogh one would most likely not ever use monitors lo i~nple~nent semaphores, the exarnplc illustrates that monitors can do
5.1 Syntax and Semantics
209
Condit~on variable queue SC
Figure 5.1
State diagram for synchronization in monitors.
so. In Chapter 6 we will show how to implel~lentlnonirors using semaphores. Monitoi-s and srmapliores are thus duals in the selise that each can jmpletneul the other and hence can be used to solve the same synchronization prohlelns. How-
ever., monitors are a higher-level mechanism than sernayhores for the reasons described at the sta1-t of the chapter:) In Figure 5.2 integer s represents Ihe value of the se~napliol-e.When a process calls the Psem operation, i t delays until s is posidvc, then decrements the value of a. T n chis case, r-he delay is prvgrarnmed by using a while loop that caikses ll~eprocess co wait on condition varjable pos if s is 0. The vsern operation i~~crements the vi~lueof s , then signals pos. If thcre is at llcasl one delayed process, the oldest one i.s awakened.
monitor Semaphore { int s = 0; # # s > = 0 cond pos; # signaled when s > 0 procedure P s e m O { while t s == 0) wait(pos);
s
= s-1;
1
procedure Vsern( s = s+l; signal (pos ;
{
>
1 Figure 5.2
Monitor implementation of a semaphore.
210
Chapter 5
Monitors
The code i n Figure 5.2 works corl-ectly for. both Signal and Continue (SC) i~ndSig17aI and Wait (SW), where by correctly wc m a n lhal the semaphore invariant s >= 0 is preserved. The difference is just the order in which processes execute. When a process calls esem, i t waits i f s i,s 0 ; aELer awakeni.ng, a process decrelnents s . When a process calls vsem, i t first increments s then awakens one delayed process il' there is one. With the SW discipli~ie,the awakened process executes ,?ow and decrements s. With (lie SC d.iscip).ine, che awakened process executes sowe time qfier the signaler. The awakcncd process needs to recheck the vdue of s to ensure that it is stijl positive; this is because [here could be some ocher process waiting on [he elltry clueue and that process could execute Psem first and decrement s . Hence, the code in Figill-e 5.2 i~nplemelltsa FIFO semaphore Cor tlie SW cliscipline but no1 for the SC discipline. Figure 5.2 illustrates another difference between the SC and SW signaling tlisciplioes. 11) partjculi~s,with the SW discipline d ~ w e h i l e loop in Lhe P s e m operat.ion can be replaced by a s.ilnple i f statement:
Again h i s i s because the signaled process executes immediately, so il is guaranteed that s is positive when that process clecremenls s. The monitor in Figure 5.2 can be changed so that it ( I ) works correctly for and Signal and Wail, ( 2 ) does not use a while loop, both Signal and Contin~~e aid (3) implements a FIFO semaphore. To ilndersland how, reconsider the program ill Figure 5.2. When a process tirst call&the Psem operation, ir needs to delay jf s is o. Wberi a process calk the vsem operation. it. wants lo awaken ;I delayed process, if there is one. The difference between SC and SW signaling is that if the signaler continues, rl~evalue of s has already been incl-ernenled and could be seen to be pohitive by some process other than Lhe one that is awakened. Tile way to void this problem is to have a process executitig vsern make n decision: 1I there i s delayed proccss, then signal pos but do not increment s; otherwise, incremenr 6 . A pi.occss execuring Psem lhcn taltes corresponding actions: if i t bas LO waif, lhcn it does not I.ater decrement s,because rile sig~lalerwill no1 have incremenred s. Figu1-e 5.3 gives a mooit.or lliat uses the above approach. We call this techn i c ~ ~pussing " thP conclition, because in essence (he signaler ilnplicitly passes tlie condition that s is positive to the process that it awakens. By no1 making the condition visible, no process other than the one awakened by signal can see that the condjtiorl is true and fail to wait. The technique of passing the condition can be used whenever procedures that use w a i t and signal conkin complemenlary aaions. In Figure 5.3, the
5.1 Syntax and Semantics
21 1
monitor FIFOsernaphore { int s = 0 ; ## s >= 0 cond pos; # signaled when s > 0 procedure P s e m O {
if ( S == 0) wait (pas) ; else s = a-1;
> procedure Vsem( )
{
if ( e m p t y (pas) 1 s = s+l; else signal (pos)
;
1
1 Figure 5.3
FIFO semaphore using passing the condition.
complemen~aryaccions are clcci-ementiirp s i n procedure Psem and incrementing e in procedure vsem. ln Sections 5.2 and 5.3 me will see additional exarnples that i~scthis same technique to solve scheduling PI-ohle~ns. Figures 5.2 and 5.3 illustrate that condition variables are sindlar to the P and v operations on semaphores. The wait operalion, like P, delays a pl.ocess, and the signal o~cration,like V, awakens a process. Howcvel; the~.enre two important cliffel-ewes. First, w a i t n1~jrly.sdelays a process unril a litter signal is executed, whereas a P operation causes a process to delay o ~ l yif the semaphore's value is cul-lently 0. Second, signal has no effect if no process is tlelityed on the condition variable, where;~sa v operation eilller awalccns a delayed process 01increments the value of the se~napho~-e; in short, the h c l 111:t signal llns been executed is not remembered. These diifc?re~~ces cause condition synchronization to be prograrnmcd differently than wirh seiwdpho~-es. I n the remainder of lhis chapker, we will assunie that monjtols use the Signal and Contin~~e discipline. Although Signal and Wail was the lil-sl discipline proposed lor monitors. SC is thc signaling discipline used wirliin the Unix ope^ ating sysle~n,the lava progl.amming language, niitl the PLhrei~cIslibrary. The reasons for prefering Signal and Continue are hat: ( I ) it is colnpalible wih priorily-based process scheduling, and (2) it has himpler formal semantics. (The Hiscorical Notes section discusses these issues.)
j
.:,!' Y
5.2 Synchronization Techniques
213
Rroadrast signal is the final operation on condition variables. It is used if more than one delayed process could possibly proceed or if the signaler does not know which delaycd processes might be ablc to proceeil (because they themselves need to rccheck delay condiiions). This operation has the form
Execution of signal-all awakens nU processes delayed nn and Continue discipline, its effect is the same as executing:
cv.
For the Signal
while I!empty(cv)) signal(cv);
Each awakened process resumes execution in Ihe monitor at qornc tnne in the fulure, subjcct to the usual nlr~tualexclusicm constraint. Like signal, signal-a11 has no effect if no process i s delayed on cv. Also, the signaling process continues executjon in the monitor. The signal-all operation i? well-defined when monitors use the Signal and Continue discipline, because thc signaler always continues executrr~gnext In the montlor. Hnwevcr, it I S not a wcll-defined operation with (he Signal and Wait discipline, because it i:, not possrble to transfer control to more than one other procesr; (ind to give each 1nutu;~llycxclusi\,e acccss lo the monitor. This is another rcasun why Unix, lava, Pthrcads, and this book u.;e thc Signal ant1 Continue discipline.
5.2 Synchronization Techniques This section develops monitor-based solutions to five problems: ( I ) boundccl buffers, (2) readc1sfwriter5, ( 3 ) shortest-.iob-next scheduling, (4) intcl-val limcrs, (5) and thc sleeping barber. Eilch is an inkresting problem in its own right and also ~llusu-alesil programming technique [hat i s uscfuI with monitors, a5 incliciiled by the subtitles in he section headings.
5.2.1 Bounded Buffers: Basic Condition Synchronization Consider again the hounded buffer proble~nintroduced in Section 4.2. A yr40ducer process and a consumer process communicate by sharing a buffer having n slots. The buffer convains a queuc of messages. The producer sends a messdge to the consumer by depositing d ~ emessagc al the end ol' Ihe queuc. The consumer receives a message by fctching the onc a1 the front ol' the queuc. Synchronization i s required so that a message i s nol deposited il' Ihe queuc is full and a message is not felclied if the, queue is empty.
214
Chapter 5
Monitors monitor Bounded-Buffer
{
t y p e T buf Cn3;
# an array of s o m e type T
int front = 0, rear = 0; count = 0 ; # # rear == (front cond not-f ull ,
# index of first full slot # index of f i r s t empty s l o t # number of full s l o t s + count) % n # signaled when count < n # signaled when c o u n t > 0
not-empty;
procedure deposit (typel data) C while (count == n) wait(not-full); buf [sear] = data; rear = [rear+l) % n; count++; signal(not-empty); 1 procedure f e t c h (typeT &result) { while (count == 0) wait (not-empty) ; result = buf[front]; front = (front+l) % n; count--; signal (not-full) ; 1
1 Figure 5.4
Monitor implementation of a bounded buffer.
Figure 5.4 contains a monitor that implements a bounded brrffer. Wc again lepresent the message queue using an array buf and two integer variables f r o n t and rear: lhcse poinl to thc first lull slot and the first empty slot, respeclively. Integer variable count keeps track ol' lhc liurnher of messages i n the buffer. The ~ w ooperations on the buffer are deposit and f e t c h , so these arc the monitor procedures. M u m 1 exclusion is iinplicit, so semaphores are not needed to protcct critical sections. Condition .;ynchronizat~onis ~rnplementedusing two condition vni-iablcq, a.; shown. I n Figi~rc5.4, bolt) w a i t statements arc enclosed in loops. This is always a safe way LO cnsure that rhc desired condition is true beCorc the permanenl valiabler: ale accessed. IL is also necessary if [here are muttiplc producers and consumers. (Kccal l that we arc using Signal and Continue.) When a process executes signal. it merely gives a hint that the signaled condition i s now true. Recause the ~ignaler;~ndpossibly other processes may exccutc in the inonitor before the process awakened by signal, rile awaited condllioii may no longer bc true when thc awakened process resumes execution. Fol examplc, a pmduces could he delayed waiting for an crnpty ~ I n tthen , a consumer could Cctch a Itlessagc and awaken ~ h cdelayed produces. However. bcfore
5.2 Synchronization Techniques
215
thc producer gets a turm to execute, another producer could enter deposit and fill the empty slot. An analogou~;situation could occur wjlh consumers. Thus, in
general, it is necessary to recheck the delay condition. The signal statements in deposit and fetch are execi~ted unconditionally since in both cases the signaled condrtion is true at the point of the signal. In fact, as long as wait staternenls are enclosed in loops lhat recheck the awaited condition, signal statements can be executed r ~ fnnj7rime since they rncrely give hints to dclayed processes. A program will: execute more efficiently, however; if signal is executed only if it is certain-or ar least likely-that rome delayed proces could proceed.
5.2.2 Readers and Writers: Broadcast Signal The readerdwriters problem was introduced in Section 4.4. Recall that reader processes query a databasc and writer processes examine and alter it. Readers may access the database conairrently, bul wrrtcrs require exclusive access. Although the database is shared, we cannot encapsulate it by a monitor. because readers coulcl not thcn access it concurrently since all code within a monitor cxecutes with tnutuaI exclusion, Instead, we use a monitor rnerely to a1bitrate access to the database. The database itself is global to the rcaders and writersfor example, it could be slored in a shared melnory l;thle or an external filc. As we shall see. thih same basic swucture is often employccl in monitor-based programs. For the rcaderslwilers problcm, the arbitration monitor grants per~nissionl o access the database. To clo so, it requires Lhat procewes inform it when thcy want accesi and when they have finished. There are Iwa kinds of prooes5es and two aclions per process. so the monitor has four procedure$: (L) request-read. (2) release-read. (3) request-write. and (4) release-write. These procedures arc used in llie obvious ways. For example, a reader calls request-read before reading the database and calls release-read after reacling the database. To synchronize access to the ddtabase, we need Ee record how many processes are reading and how many are writing. As before lei nr be the number o f readers. and let nw be the number of writers. These are the permanent variable^ of he monitor; for proper synchronizatii)n. they must satisfy the monitor invariant: I
Tnitially, nr and nw are o. Each vari~bleis incremented in the appropriate request procedure and decrerncnted in the appropriate relcase procedure.
216
Chapter 5
Monitors
monitor RW-Controller C i n t nr = 0, nw = 0; condoktoread; # cond oktowrite; #
## (nr == 0 v nw == 0) A nw <= 1 signaledwhennw== 0 signaled when nr == 0 and nw == 0
procedure request-read0 I while (nw r 0 ) wait loktosead) ; nr = nr + 1;
1 procedure release-read() { ns = nr - 1; if (nr = = 0 ) signal(oktowrite);
# awaken one writer
1 procedure seguest-write ( ) I while (nr > 0 I I nw > 0 ) wait (oktowrite); nw = nw + 1;
I procedure release-write() nw = n w 1; s i g n a l (oktowrite);
-
signal-all(oktosead);
{
# awaken one writer and # all r e a d e r s
I
Figure 5.5 Readerdwriters solution using monitors.
Figurc 5.5 contains a monitor that meets thir; specj ficalion. While loops and RI'V is inva~iant. At the start of request-read. a reader needs ro delay until nw is zero; oktoreaa is thc condition variable on which readers delay. Similarly. wrilers need to delay at the start of request-write untjl both nr tiild nw are Leer; o k t o w r i t e is lhe condition variable on whlch they delay. In release-read a writer rc signaled when nr is .zero. Since writers recheck their delay condition, the solution would still be correct if writers were always qignaled. However, the solution woulci h e n be less cfficicnt since a signaled writer would have to go right back to sleep il n r were zero. On the other hand, at the end of release-write, we know that both nr and nw are zero. Hence any delayed process could proceed. The solutio~~ in Figure 5.5 doeh not arbitrale between readcrs and writers. Instead it awakens rcM delayed processes and lets Lhe underlying process scheduling policy determine which executes first and hence which gels LO access the database. 11' a
wait statements are used to ensure that
5.2 Synchronization Techniques
217
writer goes firs[, Lhe awakened reder rj go hack to sleep; if a readcr goes Eirs~. he awakened writer goec back to qleep.
5.2.3 Shortest-Job-Next AJ/ocaiion: Prioriiy Waif A condition variabIe is by default a FIFO queuc, so when a process euecuter;
it delays at the end of the qucuc. Thc prio~+ilywait statement wait ( c v , rank) dclays proccsscs i n ascending order of' rank. It i s used to implement non-FIFO schcchiling policies. Here we revisit the Shortest-Job Ncxt allocation problem, which was introduceil in Section 4.5. A ~hortest-job-nextallocator has two operation<: request and release. When a process call5 request. it delays until thc resource is f'ree or is aIlocated to it. After acquirit-ig and uqing the resource. a process calls release. The resource is then allocated to the requesting proccsc Ihal will u.se it the shortest length nf time; if there are no pendilig rcqucsts, thc resource is freed. Figure 5.6 gives a rno~litorh a t implements ail S J N ;ilTocalor. The pcrmanent variables are a Boolean variable free I h a ~indicates whether thc rcsource is wait
monitor Shortest-Job-Next boo1 free = true;
cond t u r n ;
{
# # Invariant S J N : see text # signaled when resource available
procedure xequest(int time) { if (free)
free = f a l s e ; else wait (turn, time) ;
1 procedure release ( ) if (empty ( t u r n ) ) free = t r u e else signal(turn);
{
1 1 Figure 5.6
Shortesl-job-next allocation with monitors.
218
Chapter 5
Monitors
free, ancl a condition variable t u r n thu~is uscd to delay processes. Together these satisl'y the monitor invariant:
SJN:
turn ordered by time
A
(free
=
turn is empty )
The procedures in Figure 5.6 again use h c technique of passi~lgh e condilion. Priority wait is iised to ordet delayed processes by the amount (IF time they will usc the resource; empty is useil to deleimiize if there are delayed proceqscs. When the rcsource is reletised, if there are delayed processes, the one with minimal rank is awakened: otherwise, the resource is marked as being free. The resource is not marked as free if a process is signaled to ensure that no other process can take Lhe resource f i r ~ t .
5.2.4 Interval Timer: Covering Conditions We ROW turn to a new pmhlern: thc design of an inlerllal rimcr that makes it passible for proccsscs to sIeep Tor a specified nurtll~erof time units. Such a facility is oficn provided b y operating systems lo enable users to do huch hings as periodically execute uhlity commands. We will develop two solutions ihal illustrate two gene~aliyusefiil techniques. The first solution employs what js called a covering condition: lhc qecond uses priority w a i t to provide a compact, efficient delay rnechanixm. A moni~orthat implernenls an interval timer is another example of a resource controller. Tn this case, the resource is ;i logical clock. The clock has two operations: delay ( interval), which delays the calling process for interval licks of thc clock, and t i c k , which increments the value of thc logical clock. Other operations might also be provided-for example, lo reluin thc value of the clock or to delay a process until thc clock has reached a specific ~ i ilie. l Application psocesser; call delay ( interval 1, where interval, is nonnegative. The tick operation is called by a process that is periodically awakened by a hardware timcr. This process lypically has a high execution priority so [hat thc value of the logical clock remains fairly accurate. To represent the value of the logical clock. we use integer variable bod (time of day). Initially. tod is 0: it qatisfies the simple invxiant CLOCK: tod
s= 0
A
tod increases monotonically by 1
When d proccss call5 delay, i t nlust noT return until Ihe clock has "ticked" at least interval times. We do no1 insist on exact equality since a delayed process might not execute before the high-priority process that calls tick has done so again.
I
5.2 Synchronizatron Techniques
219
A process that calls deLay first iieecls LO compute thc desired wakeup time. This is accomplished in the obvious way by executing wake-time = tod
+ interval;
Here. w a k e - t i m e is a variabIe local to the body nf delay; hence each process that calls aelay computes its own private wake-up timc. Next the process needs to wait until tick has heen called often enough. This is accompliqhed by using ;I while loop lhal causes thc process to wait until wake-time is at [cast a? large as toa. The body of the tick procedure is even simpler: it merely increments tod and then awakens dclaycd processes. The remaining srep is to implerncn t the syn~hron izatioa between delaycd processes and thc process that calls t i c k . One approach is to erliploy one condition variablc for each different dclay condition. Here, each delayed process could be waiting for a dirfe1-ent tirne, so each process would need a private condition variable. Bel'ore delaying, a process would record in perinanent variables the time it wishes 10 be awakened. When tick is callcd, the perrnanenl variables wouId be consulted, and, il' some processes were due to he awakened, their prrvale condition variables would be signaled. Allhough certaillly reasihle, and for some problems necessary, this approach is more cumbersome and inefficient th:o~ requircd for Ihe Timer monitor. A much simpler way to implement thc required bynchronization ii; to empIoy a single condition variable and to use a techn~cluecalled a covering cnnditinn. Such a condition variable is one for which the associated Boolean condition "covers" the actual conditions for which dirSerent proccsscs are waiting, When any of thc covered conditions could be true. all processes delayed on the covering condition variable are awakened. Each such process rechecks i t q spec~ficcondition and either continues or waits again. In lhe Timer nloriitor. w e can employ one covering condition vari;~ble, check. with associated condjtion "tod has increased..' Pl.ocesses wait on check in the body of delay: every waiting process is awakened cack time tick i s called. In particular, we get the T i m e r rno~lilorshown in Figure 5.7. In t i c k a broadcaqt signal, 9 ignal-al 1,is uscd to awaken a11 delayed processes. Although the solution in Figure 5.7 is compact and simple, it is not very efficient for this problem. Using a covering cond~t~on is appropriate onIy j f the expected cost 01ldse alarms-i .c., awakening a process that finds that its dclay condjtion is false so immedately goes back to s l e e p i s lower than the cost of mainlaining a record of thc conditions of all waiting processes and oiily awakening a process when its condition is true. Often this is the case (sec the exerciqes at [he end of this chapter), but here rt i x likely thal processes delay for relatively long intervalc and hcnce would be needlessly awakened numerous times.
I
220
Chapter 5
Monitors monitor Timer {
i n t tod = 0; cond check;
--
## invariant CLOCK see text # signaled when tod has increased
procedure delay(int interval) int wake-t i m e ; wake-time = tod + interval: while (wake-time > t o d ) wait (check1;
1 procedure tick() { t o d = tod + 1; signal-all(check);
1
> Figure 5-7 Interval timer with a covering condition.
By using priority wait, we can Irai~sfoimthe solution in Figarc 5.7 ~ n t oonc that is cqually simple, yet hjghly efficient. In particular, priority w a i t can be wed whcncver thcrc is a static order between the conditions I'or which dil'rerent processes are waiting. Herc, waiting processe? can be ordered by their wake-up limes. When t i c k is called, ir uses minrank to detemiine if it is time to awaken the first process delayed on check;if so, that process i s xign;~led. Tncorpordlng these relinements, wc have the second version of the Times monitor r;hown in Figure 5.8. A while loop is no longer needed in delay since tick ensures that a process is awakened only when ils delay condition is me. I-Iowever, the signal in tick has to be embedded in a loop, because there may be rnorc than one proccss waiting for the same wake-up time. To summarize, here are three basic ways to implement conditjon synchronization when the delay conditions depend on variables local to the waiting process. The p~-cferredchoice-because it Leads to compact, efficient solutions-is to use pr~oritywait, as in Figure 5.8 and earlier in Figure 5.6, This approach can he used whenever there is a static order between dclay conditions. The second best choice. because it also leads to compact solntions, is to LISC a covering conditjon varial~le. This approach can be used whcncver it is possibIc for waiting processes to recheck their own conditions: how eve^, il cannot be used when waillng conditions depend on a function of the states of other waiting processes. Use of covering condition variables is appropriate as long as the cosl of false alarms is less ~ h a nthe coqt of maintaining in permanent variables exact records oi' waiting conditions.
5.2 Synchronization Techniques
monitor Timer
i n t tod = 0; cond check;
221
{
## invariant CLOCK -- see text # signaled when minrank(check) <= t o d
procedure delay(int interval) { i n t wake-time; wake-time = t o d + interval; if (wake-time r t o d ) wait(check, wake-time);
1 procedure t i c k ( ) { t o d = tod+l; while ( ? empty (check) signal ( c h e c k ); 1
&&
minrank ( c h e c k ) <= tod)
1 Figure 5.8
Interval tirner with priorily wait.
The third choice is ta record in p c r m a t ~ c ~ variables ~l the waiting conditions of dclayed processes and to employ private condition variables to awaken such processes when ;ippropriate. This approach produces morc complex snlutians, bul it is required if neither of thc other choices can be employed or iT the second choicc is no1 efffcient enough. Thc cxerciies mntain proble~nsthat explore the Lradeofi's between these three choices.
5.2.5 The SIeeping Barber: Rendezvous As a final basic cxample, we consider another classic synchronizatioli problem: the sleeping barher. Like the dining philosophers problem, this onc has a colotii11 definiljon. Moreover, the sleeping barber is representative of pracllcal prohle tns. s ~ ~ as c hthe disk head scheduler described in the next section. Thc problem illustrates the impurla~~t clientlserver relationship lhar often exist< between processes. It also requires an important type of synchronization called a rende7vou.j. Finally, the prt~l~lein serves as an excellent illustration of thc ileed lo use a systematic approach when solving synchronization problems. Ad hoc techniques are far too error-prone for solving complex problems such as this one.
Sleeping Rarber Problem. An easygoing town coiitains a ~ n i t l Ibarbershop having two doors and a I'ew chair<. Cusromcrs enter through one door and leave through the olher. Becauw the shop is small, at tnoct onc
222
Chapter 5
Monitors
customer or Ihe barber can move around in it at a time. The barber spends his l ifc serving customers. When none are in the shop, the barber sleeps in the barber's chair. When a customer arrivec and finds the barber sleeping. thc customer awakens the barber, sits in the barber's chair, and sleeps while the barbel- ccts his hair. 11' lIle barber is busy when a custo~ner arrives, the custonier goes to sleep in one of Ihe other chairs. After giving a haircut, the barber open? the exit door for the customer and closes it when the custorner leaves If lhere are waiting customers. thc barber thcn awakens one and waitc for the customer to sit in the barber's chair. Otherwlxe. thc barher goes back 10 sleep until a new customer arrives.
Thc customen and barber arc processes, and Ihe barber's shop is a monitor wilhin which the pmcesscs interact, as shown i n Flgure 5.9. Cuslorners are cli~nrsthat request a service Irum the barbct; in this case, haircuts. The barbel- is a server who 1-cpeatedly provide5 the service. This type of interaction IS an exa~npleof a tlientL~erverrelaljonshi p. To implement these interactions, we can model the barber5hop by a monitor rh;it has three proccdures: g e t - h a i r c u t , get-next-customer. and finished-cut. Customers call get-haircut; the procedure returns after a customer has received a haircut. The barber repeatedly calls ge t-next-custorner to wait for a customer to
Entrance
Figure 5.9
customer
The sleeping barber problem.
5.2 Synchronization Techniques
223
both paflies must arrive before either can proceed. It diff'ers from a two-process b ~ r i e rhowever, , sjnce the barber can rendezvous with ;my cuslomer. Second, the customer needs to wait until the barber has finished giving him a haircut, which is indicated by the barber's opening the cxit door. Finally, before closing the exit door. the barber needs to wait until Lhe cuslornes has left the shop. in short, bolh the barber and customcr proceed ~ h m u g al s e r i e ~of synchronized srages, starling with a rendezvous. The mosl straightforward way to specify synchronization stages such as thesc is to employ incrementing counters that record the number o f procesyes thai have reached each stage. Customers have two important stages: sitting in thc barber's chair and leaving the barber shop. Let cinchair ancl cleave he counters for these stager;. The harbcr sepeatedly goes through three stages: becoming available, giving a haircut, and finishing a haircut. Ler bavail. bbusy. and bdone be the counters for these stages. A11 counters arc init i d l y zero. Since the processes pass sequentially Ihrough thehe stages, the counters must satisfy the jnvarianl
CI: cinchair
>= cleave
A
bavail >= bbusy >= bdone
To ensure that the barher and a customer rendezvous before the barber slarts cutting rhe customer's hair, a customer cannot sit in the barbcr's chair inore timcs than the barber has become available. In dddition, Ihe barber cannot became busy more times than customers have sat in the barber's chair. Thus we also require the jnvuiance of
C2:
cinchair c= bavail
A
bbusy <= cinchair
Finally, customers cannot leave fie shop more times than the barber has finished giving haircuts, which is expressed by the invariant C3:
cleave
i= bdone
The monitor invariant for the barbershop is then thc conjunction of three prerlicates: BARBER:
Cl
A
C2
A
C.3
Although incrementing counters are useful for recording stages through which processes pass, their values can increase without bound. However, we can avoid this problem by changing variables. We call do this whenever synchron ization depends only on the differences between counter values. Becanse here we three key differences hem, let barber, chair, and open he three new variables, a< follou s:
224
Chapter 5
Monitors
barber == bavail - c i n c h a i r chair == c i n c h a i r - bbusy oDen == bdone - cleave
These varinbies ;ire initial Iy o and each always has value o or I.. l n particular. the vali~eoS barber is 1 when the barber is wailing for a customer to sit in the bilrbcr'q chai~,c h a i r is 1 when rhe custo~zlerhas sat in the chair but the barber has not yet becvlne busy, and open is 1 w h e n the exit door has been opened but the customer has not yet Icft. The rernatniizg lask is lo use condition variables to implement the requirccl syiichronitalio~~ between the barber and thc customers. There are four different synchronization conditions: customers nccd to wait Sir the barber to bc available, cuslomers need to wait for the barber lo open h e door, the barbcr needs to wait for a customer lo arrive, and the barbcr needs to wait for a custo~nerto leave. Thus, we use four condition variables, one Tor each of the four condilions. The proce\hes wait for the conditions by using wait statements embedded in loops. Finally. processes cxccutc signal statenients at points where conditions arc made true. The full soluliorl is given in Figure 5.10. The prohIem is much more complex th;m the ones considered earlier, so the solutian is naturally longer and more complex. However, by proceeding systematically when developing the solution, we were able to break lhe synchronization into small pieces, solve each, then "glue" the solations together. The monitor in Figure 5.10 rs also the first rnon~torwe have seen that has a procedure. get-haircut, containing more than one w a i t statement. This is because a customer proceeds through two stages: first wait]ng for thc barber to be available, then waiting for the barber lo finish giving a haircut.
5.3 Disk Scheduling: Program Structures The previotls examples examined scvcral small problems and prescntcd a variety of usel'ul sy~~chronization techniqucs. In this section, we examine various w;~ys in n,hich processes and rnonitors mighl be organized to solve a single, larger problem. In doing so, we consider Issues of "programming in the 1arse"-name1 y, issues that occur when conhidering different ways to structure a program. This change in emphasis pervades thc rest of the book. As a spccilio exampIe, we examine thc problem of scheduling access to a moving head disk, which is used to store data files. We will show how the techniques described in the prey-ious scctron can be applied to develop a solulion. As importantly, we will examine thxec difi'erent ways in which the solulion can be
5 . 3 Disk Scheduling: Program Structures monitor Barber-Shop C int barber = 0 , c h a i r = 0, open = 0 ; cond barber-available; # signaled when cond chair-occupied; # signaled when # signaled when cond door-open; cond customer-left; # signaled when
225
barber > 0 chair > 0 open > 0 open == O
procedure get-haircut l ) { while (barber == 0) wait(barber-available); barber = barber - 1; chair = chair + 1; signal(chair~occupied); while (open == 0) wait(door-open); open = open - 1; signal(customer-left);
3 procedure get-next-customer ( ) { barber = barber + 1; signal(barber-available); while (chair = = 0) wait(chair~occugied1; chair = chair - 1;
1 procedure f inished-cut ( 1 C open = open + 1; signal(door-open); while (open s 0) wait (customer-lef t); 1
I Figure 5.10
Sleeping barber monitor.
stn~ctured.The disk-scheduling problein i \ represcnlative of numerous scheduling proble!ns. Also, each or the solulion struclurei: i c applicahlc in numerous other sj tuations. We begin by summarizing the relevant charac~eriqticcof moving-head cli~ks. Figurc 5. l l shows the structure of a moving-head disk. The disk contain!, several platters (hat arc connected to a central spindle and that rotale at consIa11t speed. Data is xtored on the surfaces of the plalterh. Each platter is like a phanograph record, except that the recording tracks form separate, co~icentriccircles rather than being connected in a sprral. The tracks In (he carnc relative pwition on differmi plattcrs I'onn a cylinder. Data is acccssed by posilioning a readlwrite hcail over thc appropriate track, thcn waiting for the piatlw In rotate until the des~reddata passes by the head. Norlndlly thcre is one rcadJwritc llei~dper- platter. Theye hcads are connected tn a single am. which can lnovc i n and out so t h a ~re,?d/writc heads can he placed at any cylindel- ;md hence over any track.
22h
Chapter 5
Monitors
Track
U
Arm
Figure 5.11
A moving-head disk.
The physical xidress of any piece of dala stored on a disk consists of a cy Iindcr, a Wack number, which specifies tIie plr~tter,and an offset, which specifies thc distance from a fixed clock point that is the same on every track. To access the disk, a prograin executes a machinc-specific inputloutput instruction. The parameters to the instruction are a physical disk address, a collnl ol the number of bytes to transfer, an indication of the kind of tim~sferto perform (read or wl-~te),and the address of ;i data bu lfer. Disk access time depends on three quantities: ( I ) seek time to move a rcadlwrite head to the appropriate cylinder, (2) rotational delay, and ( 3 ) dala fmn~~ni\sior~ time. Transmission time depends totally on the number of bytes to be translcrrecl, but the other two quantities depend on the state of the disk. Tn the best case, a reacllwiitc l~cadis already at the requested cylinder, and the requested track area is just beginning 10 pass under the readlwrite head. In he worsl caqe, readfwrite heads h;we to be moved cle;~racross the disk, and the requested track has to make a full revolution. A characteristic of moving-head disks is that the ti~ncrequired to move the readwrite heads from one cylinder to another is an incrcasiirg functjon of the distance between the two cylinders. Also. the time it lakcs to rnovc n readlwrite head even one cylinder is much greater than platter rotation tinic. Thus the most effectjve way to reduce the average d ~ s kaccess time is to tniniinize head motion and hence to reduce seek time. (Tt alcn helps to reduce rotational delay, but this is harder to accomplish since rotational delays are typical1y quite short.) We assume there are several clients that use a disk. For example, in a rnultiprogram~ned operating system, these could be processes executing liser cornrndnds nr system processes impleinenting virtuaI memory management. Whenever only one client at ;I time wanls to access the disk. nothing can be
5.3 Disk Scheduling: Program Structures
227
gained by not allowing the di
228
Chapter 5
Monitors
Disk scheduler
Figure 5.12
Disk scheduler as separate rnonltor.
intermediary between users of the d~skand a process that pperfortn~actual disk acces5: this structure IS similar to that of the solution to the sleepiilg barber problem (Figure 5.10). The third solution uses nested monitors, with h e first peid'or~ningsclleduli ng and the r;econrl performing disk access. All three scheduli~gmonitors iinplement the CSCAN strategy. but they can easily be modified to iinplement any of the vai.ious scheduling smlegies. For example, we show how to implement thc SST strategy in Chapter 7.
5.3.1 Using a Separate Monitor One way to orzanize a solution to the disk-scheduling problem is to have the schecluler be n monitor that is separate from the rcsource being controlled, in this car;e a disk. Thir; structu~eis depicted In Figure 5.12. The solution has threc kinds of components: ( I ) user processes, (2) the xcheduler, and (3) the procedurcs or process thal performs disk transfers. The scheduler is impleinented by a monitor s o that schedrrling data is accesscd by only one user process at a time. The monitor provides two operations: request and release. A user process wishing tv access cylinder c y l lirst calls requesb (cyl); the proccss returns from rewest hhen the scheduler 11as selectcd its request. The user prrlcess then accesses the disk-for example, by calling a procedure or by commui~icatingwilh a disk drivcr process. ATier acccssinp the disk. the user citlIs release so thal ailother request can be selected. The user interface to the scherluler is thus ~islr-Scheduler.request(cy1)
access he disk ~isk-Scheduler-release0
oisk-scheduler has the dual roles o f scheduling requcsls :mcl ensuring that at most nnc process at a liine uses the disk. Thus, all users must follow the above
protocol.
5.3 Disk Scheduling: Program Structures
229
We ;l\sutne that disk cylinders are numbered rrom 0 to MAXCYL and that scheduling is lo employ the CSCAN strategy with a search clirection from 0 to M A X C ~ ,As usual. the critical step in deriving a correct solution is to slate precisely the propelties the solulioil is to have. Here, at most one process a1 a time can be allowed to use the disk, dnd pending requests are to bc serviced in CSCAN order. Lct p o s i t ion indicate the current head position-i.e., the cylinder being accessed by the process using the disk. When the disk 1s not being accessed. we will set posit ion to -1. (Any invalid cy iinder number would do, or wt:could use an additional val-iable.) To implement CSCAN scheduling, wc need tn di';tinguisIi betwecn pending requests to be serviced on the current scan across the disk and those l o he serviced on the next scan across the disk. Lel c and N bc disjoint qets that contain these requesls. Both sets are ordered by increasing valuc ol'cyl; reqLicsts For the sarnc cylinder are ordered by set-insertion time. In particuIar. set c cont~ias requests for cylinders greater than or cqual to the current head posilion. and set N contains requesls for cylinders less than or ctlual to the cmfent head porition. This is exprcssed by the followj~~g predicate, which will be our inonitor invariant: DlSK: c and N are ordered sets A all clcmenls of set C are >= position A a11 elements of set N arc i= position A (position = = -1) (C == 0 A N == 0)
*
rnPl?d ' G C ~ h4 f31 BIBLJUTL
A pending request for which c y l is equal to position rrlighl be in either set, hut it is placed in set N when the request is first issucd, as descnhcd in the next paragraph. When a process calls request, it takes one of three actions. If position is -1, t h e disk is free, thus, the process sets position to c y l and proceeds so access the disk. If the disk is not free and if cylinder > p o s i tion,the process inserts c y l in set C; otherwise, it inserts cyl in set N. We use N rather lhan c when c y l equals position to avoid potential unfairness; in this case, the request waits until the next scan across thc disk. After recording cyl in the appropriate set, the proce\s deIays until it is granted access to the disk-i.c., until p o s i t i o n equals c y l . When a process calls release, it updates the perrnanenl varial~lesso as to maintain DISK If c i r: not empty. there are pending requcsls for the current scan. In this case, the releasing process removes the Iirst elemcnt of' c and sets position to its value. Tf c is empty hut N js not, we need to shrt the next scan-i .en, it needs to became the current scan. Thc releasing process accompiishes this by swapping sets c and N (which sets N to the null set), then removit~gthe first
230
Chapter 5
Monitors
element of c and setting position to i t s value. Finally, iC both c and N arc empty. the releasing process sets position to -1 to indicate that the disk is free. The final step in developing a solution is lo implement the synchronizalion bctween request and release. We have here a situation like that in the interval timer problein There is a static order bewee11waiting conditions, and lhus it is possible to use priority wait to implement ordered sets. In particular, requests in sers c and N are both qerviced in ascending older uf c y l . Wc also have 21 situation like that In the FIFO semaphore: When the disk is released, permission to access the disk is rans sf el-red to one specific waiting process. In particular, we need lo sel p o s i t ion to the value of thc pending request that iq to be serviced next. Because of theye two attributes. we can i~nplementsynchronization efficientl y by combining aspects of the Timer monitol- in Figilre 5.8 and 1he FIBOS emaphore munitor in Figure 5.3. To represent sets c and a, let scan [ 2 1 be an array of coridition vari;rhles indexed by integers c and n. When a requesting proccss needs to insert its paniiiiete~c y l in set c and then wait lor position to be equal lo cyl, it simply s in set N executes wait:(scanlcl ,c y l . Similarly, a process inserts ~ t request and thcn dclays by executing w a i t (scanCnJ, c y l ) . In addition, we usc e m p t y to lest whether a set is empty, use minrank to determine its srnalIes1 value, and use signal lo remove the firsl element and at the Tame time awakcn the appropriate requesting process. Also, we swap the sets c and N when needed simply hy swapp~ngthe values of c and n. (This is why scan is an array.) Tncarporating these changes, we have the final ~olutionshown in Figure 5 . I 3. At the end af release, c is the index of the cuwenr scan, s o it is snfficicnt to include just one signal statement. If in fact p o s i t i o n is -1 at ths point, scan [ c I will be empty, and thus signal will have no effect. Scheduling prc>hlems such as this are among the most difficult to solve correctl y. whatever synchroilization mechanism is employed. The key is specifying exactly the order in which processes are to be senred. When the service order is static-as it is here-we can use priority wait statements. However, as noled earlier. when the service older is dynamic we have to use either private condition variables to awaken individu;il processes or cavcring conditions to let dclaycd processes schedule themselves.
5.3.2 Using an In fermediary Implcmenling Di sk-Sc heduler, or any resource controller, as a separate monitor is a viable way to qtructure a solulion to a scheduling/ailocalion problem. Because the scheduler is separate, it can be designed independently of the other components. However, this very separation inlroduces two potential problems:
5.3 Disk Scheduling: Program Structures monitor Disk~Sccheduler {
231
## Invariant DISK
i n t position = -1, c = 0, n = 1; cond acanL2l; # scanlcl signaled when disk released
procedure request(int cyl) { if (position == -1) # disk is free, so r e t u r n position = cyl; elseif (position ! = -1 & & c y l > position) wait (scan[cl,cyl) ; else wait (scan[nl,cyl) ;
1. procedure release() { int temp; if ( !empty(scan [c]) ) position = minrank(scan[c]); e l s e i f (empty(scan[cl) && !empty(scan [ n ] ) ) { t e m p = c; c = n; n = temp; # s w a p c and n position = minrank(scan[c]);
1 else
position = -1; signal (scan [ c I ) ; 1
1 Figure 5.1 3
Separate disk scheduler monitor.
Thc presence of the scheduler i s visible to llle processes Lhat use thc clisk: if the scheduler is deleted, user processes change.
All user processes r n u ~ folluw t the required protocol of requesting thc disk, then using it. then releasing it. Tf any process fails to CoIlnw this prolocol. scheduling is deteatcd. Bolh problems can be aIleviatcd if Lhe diuk-use protocol is embedded in a procedure and user processes do not directly access either the dislc or the disk scheduler. However, this introduces another layel- of procedures and some ailendant
inefficiency. There i s a third problem if thc clisk is accessed by a disk driver procesc rather than by procedures that are called direct1y by user processef. In pu~ticular, aftcr being granted access to the disk, a user process must comlnunicale with the
232
Chapter 5
Monitors
driver lo pass arguments and lo receive resulls (see again F~gure5.12). These com~nunicationpathx could be implemented by two instances of the bounded butfer monitor of Figure 5.4. But the user interface would then consist or three monitors-the scl~edulerand two bounded buffers-and the user would have to make a total of four tnonitnr calls every time it uses thc dcvice. Since the disk users and dilik driver have a clientlserver relationship, we could implement the cornrnunicatiori interface using a variant of the solutioil lo the sleeping barber prablcm. Bul wc would stiIl have two monilors-one for qcheduling and orie for interaclion between user processes and the disk driver. When a disk is controlled hy a driver process, the best possible approach is to combine the scheduler and communication interface into a single monitor. Elisentially, the ~chedulerhccomcs an intcrmcdiary between the user processes and the disk driver. a< shown i n Figurc 5,14. Thc monitor forwards user requests to the driver in the de~iredorder o f preference. This approach provides three benefits. First, the d i ~ kjntcrfncc employs only one monitor, and the user must rnakc only one moi~ilorcall pcr disk access. Second, the presence or absence of scheduling is transparent. Third, there is no multistep protocol the uscr can fail lo rollow. Thus, this approach overcomes all three difficultics caused by the scheduler being a ccparate monitor. In thc rema~nclerof this seclion. we show how to transform the solution to thc sleeping barber prublenl (Figure 5.10) into a disk driver interface that both provides communication between clients and ihe disk driver and implements CSCAN sclicduling. Wc need to mnkc several changes to the sleeping barber solution. First, we need lo rename the processes, monitor. and monitor procedures a?;described below and shown i n Figure 5.15. Second, we neccl to parameteri7e the monitor procedures to transfer requests from users (customers) to the diqk driver (barber) and to transfer results back: in essence, we need to turn the "barber's chair" and "exit cloor" inlo co~nmunicationbuffers. Finally, we need to add wheduling to he ii.ser/dlsk-driver rendezvous sn that the driver services the pref'en4ed user recluest. These changes yield a disk interface with the outline shown in Figure 5.15. Ti>laline this outline Into an actual solution, we etnploy the same basic synclironizatinn as in the sleeping barbel rjolution (Figure 5.10). Howe\rci-*we add
Figure 5.14
Disk scheduler as an intermediary.
5.3 Disk Scheduling: Program Structures monitor Disk-Interface
233
{
permanent variables for status, scheduling, and data transfer procedure use-disk ( i n k cyl, transler and rcsull parameters) wait for turn to use driver store transl'er parameters in permanent variables wait for transfer to 17e cotl~pleted retrievc resulls from permanent variables 3 procedure get-next-request(sameType
{
&results) {
select next request wait for t~ansrespiir;lmeters to be slored set results lo Iranxfer parameters 1 procedure finished-transfer ( s o m e m e results) store results in permanent variables wajt for results to be retrieved by client
{
1 1
Figure 5.15
Outline of disk interface monitor.
schcduli~lgax in the nisk-scheduler monitor (Figurc 5.13) and add parameter paq~ingas in a bvundcd buffer (F~gurc5.4). Exsentlally, the monitor invariant for ~isk-Interface becomes the conjunclion of barbershop invarianl BARUI?I<, disk sched~ilerinvariant DISK, and bounded buffer lnvarlant UU (sirnplificd to the case o f a single-slot huffcr). A user plocess waits its turn to accesc; the disk by executing the same actions as In the request procedure of the Disk-Scheduler rnonitor in Figurc 5.13. Sirni larly, the driver process indicates it is available by executing t h e same actions as in the release procedure of the ~isk-scheduler ~ilonitor.TnjtiaIly, however. we will set position to - 2 to indicate that the disk is neither available nor in use until after thc drrver makes its fr5t call to get-next-request; hence, users need to wait for the first scat1 to stan. Whcn it become.; a user" turn t o access the disk, the user procer;s deposits its transfer arguments in pcrrnanent variables, lhen waits to fetch results. After selecting the next user requcsl, the driver process waits to felch the user's transl'er argumenk. The driver then performs the requested disk transfer. When i t is donc, the driver deposits the results, then waits for them to he fetched. The deposils and fetches are implemented as for a single-slor buffer. These refinements lead to the monitor shown in Figure 5.16.
234
Chapter 5
Monitors monitor Disk-Tnterface
{
int position = - 2 , c = 0, n = 1, args = 0, results = 0; cond scan [ 2 I ; cond args-stored, results-stored, results-retrieved; argType arg-area; resultType result-area; procedure use-disk(int c y l i axgType transfer-params; resuLtTme &result-params ) { if (position == -1) position = c y l ; elseif (position ! = -1 and c y l > position) wait (acan[cl, c y l ) ; else wait (scanIn],c y l ) ; arg-area = transfer-params; args = argscl; signaL(args-stored); while (results == 0 ) wait(rexu1ts-stored); result-parama = result-area; results = results-1; signal(resu1ts-retrfeved);
1 procedure get-next-request(argType &transfer-params) { i n t t erg ;
if ( !empty (scanlcl ) } position = minrank(scan[c]); e l s e i f {empty(scan[cl 1 && !empty(scan[n]) ) ( temp = c; c = n; n = temp; # swap c and n position = minrank (scan[c]) ;
I else position = -1;
signal(scan[c]) ; while (args == 0) wait(arg8- to red); tranefer-params = arg-area; args = args-1; 1
procedure finished-transfer(resu1tTyge result-vale) result-area := result-vals; results = resultac2; eignal (results-stored) ; while (results > 0 ) wait(resu1ts-retrieved);
1
3 Figure 5.116
Disk interface monitor.
{
7
7
5.3 Disk Scheduling: Program Structures
235
Although thi\ uscrldisk-dnver iatcrface is quite efficient, 11 can be rnndc etrcn more efficient by two relatively simple changes. First, the disk driver call begin servicing the next uljer request sooner if finished-transfer is rnt~dified so th;lt thc driver does not wait for results from the previous rrr~nsfcrto he retrieved. One nus st be c~reful,however, to ens~ll-ethat the results area is not overwritten in the cvent the driver conzpletes another transfer bcfbre the rewltl; fro111 the previous one have bcen retrieved. The second change is to cvmbine the two procedures called by the disk driker. This eliminales onc rnrlnitor call pcr disk access. Tmplemenling Ihix change requires modicy ing the in i tlaIixaiion of results. We lcave incorporitting bolh of these changes to thc reader.
5.3.3 Using a Nested Monitor
I
I
, I
1
When the disk scheduler is a separate monitor, ~lserprocesses have to follow tthc proloco1 of requcsting the disk, accessing it, then releasing it. The disk itself is controlIed by a process or by a monitor. When the dixk scheduler is an intermedia~y,the user interface is simplified-a uqel. has to make only a single requcstbul the monitur IS much more complex, as can be seen by comparing thc monitors in Figure 5.13 and 5.1 6. Moreover. the qolution in Figure 5.16 assumes that the disk i s controlled by a driver process. A third approach is lo combine the two styles hy using nvo inonilors: one for scheduling, and one for access to Ihe disk T h ~ sstructure is shown in Figurc 5 17. Slo~ever,in order to utilize this structure, it is imperative that the calls from the scheduling monitor I-eleaseexclusion jn that m~flitar,Below wc expIni-c this issue of nested tnonitar calls, and thcn outlinc a third solution to the disk scheduling problem. The permanent variables of a ~nonrtorc w bc accessed by ut mosl one process at a time, hecause processes execute with mutual exclusion within monitor procedures. However, what happens if a proccss executing a procedure in one monitor calls a procedure in another and hence temporarily leaves the 6rst monitor? If monitor exclusion is retained when such a nested call is made, the call I \ termed a cloced call. The allernative is to release monitor exclusion when it
---
@ Figure 5.17
,ISk
access
Read
Write
Disk access using nested monitors.
-
. D~sk transfer
8
236
Chapter 5
Monitors
tleslcd call is madc and to reacquire exclusion when the call returns; rhiq kind ol' nested call is termer1 an open cull. Pelmaneill monitor variables are clear1y pro~ectedfrom concurrent ziccess on a closed call since no other process can enter the monitor while the nested call i s being executed. Permanent vari;~blesare also pi.otecled l'r'rom concu~renlaccess on an open call, as long ar; such vari:~ble\ are not passed by reference as arguments on the call. However, since an open call release5 exclu~ivn,the monitor invariant must bc tnrc bcfore the call. Thus open calls have slightly more complex sernantics than closed calls. On thc other hand. a closcd call is more prone lo deadlock. In particular, if a process is dclaycd at a wait statemenl after making ;I nested call, ~t cannol be awakened by another process that has to make the same set of nehted calls. The disk-schecluling problem provides a concrete example of these issues. As noled we car1 restrilclure a solulion to the problem as shown in Figure 5.17. I n essence. the Di sk-Scheduler 01' Figure 5 .I 3 is replaced by two monitors. User processes inake one call to the do10 operation of the Disk-Access monitor. That inorlitor scheduIcs acccss as in Figurc 5.13. When it is a process's turn lo access Lhe disk, it makcs a second call to thc read or write operation oT a xecond monitor, ~isk- rans sf er. This second caIl happens from within the Disk-Access munilor. In particular, the Disk-Access non nit or has the following qtructure: monitor D i s k c c e s s
{
permanent variables as in Di sk_Schedulsr; procedure do10 ( int cyE: transfer and result arguments) actions of ~isk-scheduler.request; call Disk-Transfer read or Disk-Transf er write; acrions of ~isk-scheduler.release;
.
{
.
I 1
The calls to ~isk- ran sfer arc nested calls. In order for disk scheduling to take place, they must be open calls. Otherwise, at most one process al a time could ever be wilhin d o ~ o making , h e request and release actions superff uaus. Open calls can be used here since only local variables (parameters to ~ O T O ) are passed as arguments to ~isk- ran sf er, and the disk scheduler invariant, DISK, will be true before read or write is called. Independent of the semantics of nested calls, there is the issue of mutual exclusion within a monitor. By having monitor procedures execute one at a time, permanent variables cannot be accessed concurrently. However, this is not aIways necessary to avoid interference. If a procedure reads but does not alter
I
5.4 Case Study: Java
237
permanent variables. different callc of the procedure could execute concurrenlly. Or, if a procedure merely return.; the value OF some permanent variable and lhia value can bc read atomically (e.g., it i s a simple variable), Illen the procedure can bc allowed lo executc concurrently with any monitclr proceclure. Rq thc lirne the calling process cxamines the relurned value. it tnigl~rnot be the qame as the permanent variable's currcnt value, hut this is always the case in a concurrent program. As a concrete example, we could add :I procedure read-clock to the Timer monitor in Figures 5.7 or 5.8. Whether or not r e a a c l o c k executes with mrrtual exclusion. a process that calls read-clock could only know that the return value is no greatel. thau the current value of tod. Even in situations where different monitor procedures alter permanent variables, thcy can soineti~nessafcly cxecule in parallel. For example. we have yeen in previous chapters that a producer and a consumer can concun-ently access diflitrent slots of a bounded buffer (e.g.,r;ee Figure 4.5). If monitor procedure5 must execute with murual exclu%ion.it is awkwa~dto program such a buffcr. Either each burrer slot has to be encapsulated by a separate nionitor, or the bul'fer has to be global to procesres, which then synchronize wing monitors lhar implement sanaphurcs. Fortunately. m ~ c hsituations are rare. We have raised two new issues in this section. Fjrst, when using monilors the programmer needs to know whether nested calls-ones Trom within one nionitor lo a second-are open 01- closed. Second, the progrn~~lrncr needs to know whether there is a way to ~ e l a xthe implicit excIusion of a monitor in cases where it i s safe to have some procedures execute in pxnllei. In the nexl s c c t i o ~we ~ expiore these issues further in the ctmtext of the Java program tiling 1angu;lgc.
5"4 Case Study: Java Java is the latest craze lo hlt compuling. It was ~nolivate~l by the dccire to provide a way to write portable programs for the World Wide Wcb. Like no st inno-
~iitionsin computer science, Java borrows hc;ivily from its pl-edecessori;. in thir: case interpreted languages, object-oriented languages, and concurrent languages. Java recombines old fcaturcs and addl; several new oncs to produce an interesting combination. St also provide< extensive libraries for- graphics. distribuled programming, and other applications. Java is an cxample of an object-oricnted language. Allholigh the details of object-oriented languages are beyolid the scope of this book, thc essential idea is that a program consisls of a collection of interacting ol7jectc. Each object is an instance of a class A class has two kinds of members: data fields and methods. Data fields represent the state 01' instances of lhe class (objects) or of the class
238
Chaater 5
Monitors
itself. Tlze methods are procedures that are used to ~n;inipulatcdata fields. most commonly in individual objects. For our purposes. the most interesting aspects of Java are that it supports threads, moilitors as defined in this chapter, and distributed programming as covcred in Parr 3. Thfs section describes Java's mechanims for processes (threads) and mnnitors (synchronized metlzods), m d then illustrates the use of thesc fcatures lo solve variations on he readerslwsiters problem. (Setlions 7.9 and 8.5 dewribe Java's mecha~~isms for message passing and remote rnethod invocation .) The Histosjcal Noles give pointers to more information on Java in general and concurrent progrmnming i n Java in particular.
5.4.1 The Threads Class A Java thread is a ligh~weightprocess; it has its own stack and cxcculion context but it ~ I S Ohas direct access to all variablcs in its scope. Threads are programmed by exlending the Thread class or by implementing a Runnable interface. Bolh the Thread class and the Runnable itlterfacc arc par1 ol' the standtdrd Java libraries, i n particulal- he java lang package. A thread is created by creating an inslance or the Thread class:
.
Thread f oo = new Thread (
);
This conslructs a new thread foo. To aclually start the thread, one execute?
The start operation i s one of the methods defined by the Thread class; there are numerous others such as stop and sleep. The above example will not in fact do anything,because by default the body of a thread i s empty. More precisely, the start method in Thread invokes the r u n method in Thread, and the default definition of run does nothing. Thus to create a useful thread. one needs to define a new class that extends the Thread class (or that uses the Runnable interface) and that provides a new definition of the run method. For example. consider the following class definition: c l a s s Simple extends Thread { public void r u n ( ) { Systern. out .printLn{"thi~ Fs a thread") ; 1 1
We can create an instance of the class and start the new thread by executing
5.4 Case Study: Java
239
Simple 6 = n e w Simple(); s.start(); / / calls the r u n ( ) method in object s
The thread wilI print a single line of output lhen terminate. If we do not neecl to keep a rel'tlrence Iu lhe thread, we can sjmplil'y lhe larl two lines t o new Simple ( ) . s t a r t ( )
;
An alternative way to program the previous cxan~ple is to use he Runnable intcrfacc. First change Simple as follows:
c l a s s Simple implements RunnabLe public void run() { System.out.println(l'this
(
is a thread");
1 3
Then create an inslance of lhe class, pass ;md start tlie thread:
hill
instance
10
the Thread constructor,
Runnable s = new S i m p l e O ; new Thread{s).startO;
The advantage 01using the Runnable intcrfacc is that lhen class simple could also extend somc systcm or user-defined class. T h ~ sis nul alluwed w1t1-1the first approach, hccause Java docs not allow a c l x s to cxtcnd more than one olher class. To suininari~e,here are rour szeps in creatinz and st;~i-tingthreads in .lava: ( 1 ) define a new clau lhal extendx the Jaea Thread clacs 01- ~liati17~1111e1nent~ the Runnable inlerface: (2)define a r u n method i n the new cl;iss; it will contain lhc body of the thread; (3) create an instance of the ncml claxs uiing the n e w statcmcnt; and 14) start thc thrcad using thc s t a r t mcthod.
1
5.4.2 Synchronized Methods Threads In Java execute concurrcntly-at least conceptually. Hcncc thcy could simultarieously access shared variables. ConsitIcr the following class: c l a s s Tnterfere I private i n k d a t a = 0; public void update ( ) { data++;
1 1
1 I
240
Chapter 5
Monitors
This class contains a private field aata that is accessible only wilhin the cIa5s. and it cxports a public mcthod update that, when called, i~lcretnentsthe value of data. If u p d a t e is callecl simullaneously by two lhreads. ohvicsusly they will inlerfel-e with each otlier. J;lva supports inutual exclusioil by means 01' the synchronized keyword. which can be used wit11 an entire method or with a sequence of statements. For example, lo make the update method atomic, one could reprogram the above code ai; c l a s s Interfere { private i n t data = 0; public synchronized void update() { data++;
1.
1
TIiis is a simplc cxample of how a moiljlor is programmed in Java: the monitor's perm~lnenr\,anahles are private data in a class, and the monitor procedures are ~mplementedusing synchror~~zed methods. There is exactly one lock per object in Java: when a synchronized mcthod is in\:oked, it wails lo ohlain that lock, then cxccutes the body of the mcthod, then releases he lock. An alternative way to prograin the above example il; to use a synchron i z e d statement wilhin a mell~od,as in class I n t e r f e r e ( p r i v a t e int data = 0;
public void update() { synchronized ( t h i s ) data++;
{
/ / lock this object
1 1 ?
The keyword this refers to the object on which the update method was invoked. and hence ko the lack on that object. A synchronized statement is thus like an await statement whereas a synchronized method is like a monitor
procedure. Java supports condition synchronization by means of w a i t and notify. These are very similar to the monitor wait and signal starernents as used earlier in this chapter. However. wait and notify in Java are in fact methods of the class abject, which is the parent of all classes in Java. Both wait and notify must be executed within synchronized portions of code, and hence when an object is locked.
5.4CaseStudy: Java
241
The w a i t method releases the lock on an object and clelays thc executing ~hreacl,Thcrc is a single delay queue per object. It i % usually 3 FTFO queue but not necessarily. Java does not provide coildilion variables. but one can think of there hejng cxaclly one (implicit1y declared) condition variable per sy nchnmi7,ed object. Thc notify inethod awaken$ the thread at the fronl o f the delay queue, jf there is one. The thread that invokes notify continues to hold the oh,ject's lock, so the awakened thread will execute at some future time when it can reacquire the lock. Hence, n o t i f y has Signal and Continue semantics. Java also supports a broadcast ~jpnalby means of the notifyAl1 method, which i s analogous to signal-all. Since therc is only one (implicit) condjlion variable per object, the wait,notify, or not ifyAll methods have empty parameter lists. Tf a synchronized rnetl~od(or statement) in one object contains a call to a method in another nbject, thc lock un the firs1 object is ret;iined while the call is executed. Thus. nested calls from 5ynchronixed melhods are closed calls in Java. This precludes using the structure shown in Figure 5.17 for solving the disk scheduling problcm using nestcd monitors. 11 can alco lead to deadlock if a synchronized method in one object calls a synchronized method in anolher nbjcct and vicc versa.
5.4.3 Parallel ReadersMriiers This and the next two subqections present a series of examples that 11luslr:lte the concurrency and synchronization aspects or Java, as well as Java's use of classei, declarations, and statements. All three programs are self-contained: they can be compilcd by the Java compiler javac and then executed by the Java interpreter java. (Consult your local installation for details on how to use lava; see also the Websile for this bonk for r;ourcc code for thc progi.anis. 1 First consider a parallel versio~lof readers and writers-nrunely, a program in which reslders and writers can accesr; a database in pamllel. Although this program can lead to interhence, it serves to illustrate the slructure of Java programs and thc use of threads. The starting point for the program i s a clasx that cncapsulatcs a clatnhasc. Here, we will usc a very simple databaqe-namely, a single intcgee The class cxpoi t s two operalions (methods ), read and write. 1I is defined as tollows. / / basic read or write; no exclusion class RWbasic { protected i n t data = 0 ; / / the "database"
1
242
Chapter 5
Monitors gublic void r e a d 0 { Syetexn.o~t.println(~~read:"
+
data);
E public v o i d w r i t e 0 { datatt; Systern.o~t.grintln(~~wrote: "
+ data);
E 1
The members of h e class are one field, data, and two methods, read and write. The data field is declared as p r o t e c t e d , which means that it is accessible only within the cl;iss or within subclasses that inherit the class (or within other classes declared in the same package). The read and write methods arc declared as public, which means they are acceqsiblc anywhere the class is accesBble. Each rnethud, when invoked. prints a single line of output lhal indicalcs the current value of data: the write lnelhod also incremenls the value of data.
The ilcxt part? of our cxample are two classes-~eader and writer-that contain the code for reader and wrjter processes. Each of these classes extends the Thread class. Each also contains an initiaIization mehod with the samc name as the class; this method is invoked when new instances or thc class are created. Finally, each class cclntains a r u n method that gives the code for :i thread. The Reader class is declared as follows:
class Reader extends Thread i n t rounds ;
{
RWbasic RW; / / a reference to an RWbasic object public Reader(int round^, ~ W b a s i cRW) { thfs.rounds = roundei t h i s .RW = RW;
1 public void r u n ( ) { for ( i n t i = 0; i < rounds; i++) { RW.read() ; 1
1 1
The writer class is declared similarly: clasa Writer extends Thread C i n k rounds; RWbasic RW;
5.4 Case Study: Java public Writer(int rounds, RWbasic RW) this-rounds = rounds; this.RW = RW; 1 public void r u n ( ) ( for (int i = 6; i < rounas; i++) (
243
{
RW.write();
1 1
1
When an instance of either class is c~eated,the new object- is passed two pararneters: the number o f rounds to execute and an ~nstanceof the RWbasic class. The initialization methods store the parameters in permanent v,~riablesrounds and RW. Within the initialization methods, theye variables are prefixed by the keyword this to diffcrcntiatc between the permanent variablc and the pmamctcr of [he sane name. The above three clahses-RWbas ic, Readex, and Writer-w re Ihe building blocks for a readerslwriters program in which re~derxand writer? c a i access the same instance of RWbasic in parallel. To con~pletethe program, we need a main class that creates one instance of each nf the nbovc classcs and lhcn starts h e Reader and Writer threads: class Main { static Rwbaaic RW = n e w RWbasicO; public static void r n a i n ( S t r i n g [ ] a r g s ) ( i n t rounds = Integer .parseInt ( a r g s [O] ,lo) ; new Reader(rounds, RW).starb(); new writer [rounds, RW) .start ( ) ; 1 1
The prograin starts cxec~~tion i n tbc main method. which has a parameter args thal contains the command-line arguments. In lhis caw, illere is a single a r p inent for Ihe nu~nberof rounds each thread should execute. The outpi~tof the program is a sequence of line:, that list the data value\ that are read and written. A totaI of 2*rounds line< :ire printed. hecause there are two fhreadq and each execules a loop that executes for rounds iter;I t'ions.
I i
~
5.4.4 Exclusive ReadersNriters The above program allows each thread to access the data field concurrently. We can modify thc codc as follows to prt~vide mutually excli~sive acccss to
I
244
Chapter 5
Monitors
data. First, define a new class RWexclusive that extends RWbasic to use synchronized read and write methods: / / mutually exclusive read and write methods class RWexclusive extends RWbasic I
public synchronized void r e a d 0 I System.out . p r i n t l n ( T r r e a d : l1 + data) ; 1 public synchronized void w r i t e 0 { data++ ; Syatem.~ut.println(~~wxote:" + data); 1 1
Bccause RWexclusive extends RWbasic, it inherit? the data field. However, the read and write methods have been redefined so that they will now execute wit!^ mutual exclusion. Second, modify thc Reader and W r i t e r clahxes st1 that their local RW vanahlcs arc ilistailces of RWexclusive ralher than Rwbasic. For cxainple, he Reader becomes c l a s s Reader extends Thread { int rounds; RWexclusive RW; public Reader ( i n t rounds, exclusive RW) {
this.roundg = rounds; this,RW = RW;
1 public void r u n 0 { for ( i n t i = 0; i < rounds; i + + l RW.read() ;
{
I 1 1
The writer class is changed similarly. Finally, modify the Main class so that i t creates instances of RWexclusive ralher than Rmaaic. In particular. change the first line of ~ a i above n to s t a t i c RWexcluaive RW = n e w e exclusive() ;
With these changes, the h e a d s in the new program access variable data in RWbas ic indirectly by calling the gynchronized methods in Rwexclusive.
5.4 Case Study: Java
245
The firs1 readerslwriters cxamplc a l l ~ w sconcuwent access to data. and the second makes access to data rnulually exclusive. The true readerslwrilers problem allows either concurrent reads or il single write. We can imple~nenlthis change by reprograrnlning the RWexc l u s i v e class as I'ollows: / / concurrent read or exclusive write c l a s s ReadersWriters extends RWbasic { private int nr = 0; private synchronized void startRead() {
nr++; 1 p r i v a t e synchronized void endRead0 I nr-- 1 if (nr == 0 ) notify(); / / awaken waiting Writers
1 public void r e a d 0
{
startRead() ; System.out . p r i n t l n ( "read: endllead ( ) ;
"
+
data) ;
1 public synchronized void writel) { while (nr > 0) / / delay if any a c t i v e Readers t r y ( wait(); ) catch (IntesruptodExcegtion ex) {return;)
data++: System.out.println("wrote: " + data); notify ( 1 ; / / awaken another waiting Writer 1
We also need to change h e Reader, Writer, and ~ a i cla~ses n to use this new class rather than RWexclusfve. Hou~evecnnthi?zg else has To hang('. (This is one 01' the virtues of object-oriented programming languages.) The ReadersWriters class adds two local ( p r i v a t e ) nlethnd\: startRead and s t a r k w r i te. These are callecl by the read method bcfore ;311d after it accesses thc database. The startRead method incremenls a private variable, nr, that coutlts the number of active rcader t h r e d s . Tkc endRead tnethod dccre~nentsn r , if it is now zero, notify is caIled to awaken one waiting writer, if any. Thc startRead, endRead, and w r i t e methods arc sync11~)nized.so at most one can execute a1 a time. Hence, a writel- thread cannot execute while either s t a r t R e a d or endRead is active. Howevcl-. the read inetllod ~c not
246
Chapter 5
Monitors
synchronized, which means it can be called concurrently by ~nultiplethreads. If a writer Ih~aadcalls w r i t e while a reader thread is actually reading, then the \)slue of nr wilI he positive so the writer necds lo wait, A writer is awakened wl~cnn r i.c rero; after accessing data, a wrilcr uses notify to awaken anotl~er writcr il' one is waiting. Because notify has Signal and Continue semantics, that writer might not cxecute before anothcr reader increments n r , r;o the writer rechecks LIle value of nr. Tn the w r i t e method above, [he call of w a i t is wilhin what is called a t r y r;tatement. This is lava's exception handling mechanism, whit h is how Java lets the progratniner deal with abnormal hituations. Because a waiting thread mighl bc stopped or might have leinninated abnormally, Java rrquir~sthat every usc of wait he within a try statcmcnt that catches thc InterruptedExcept ion. The code above has the write melhod simply return il' an exception occurs while a thread is wailing. A virlue of the above redderslwriters solutic~nrelative to the one given earlier in Figure 5.5 is thal Lhe inter-face for writer threads is a s~ngleprocedure. write. rather lhan two procedure:, request-write() and releasewrite ( ) . I-lowever, both sol~llionsgive readers preferetlce over writers. Wc Icwc if lo the reader to figure out how to llzodify the Java solu~ionto give wrilers prcl'crence or tn make il fair (hee the exercises at the end of this chapter).
5.5 Case Study: Pthreads The Pthreads library was introduced in Section 4.6. We haw there how to create tlircads and l~owto synchronize their execution using semaphores. The library also
5.5.1 Locks
and Condiijon Variables
1,ocks in Pthreads are called mutex lock+-or simply mutexes-because they are used to implement mutual exclusion. The basic code pattern for declaring ancl initializing a lnutex is sEmiIal- to h a t f o r a thread. First, declare gIobal variables for a mulex descriptor and a inulex attl-ibutcs descriptnr. Sccond, initialize the descriptorh. Finally, use the mulex primitives.
5.5 Case Study: Pthreads
347
If a mutex lock is going to be used by threads in the saine (he~tvyweighi) process but not by lhreads in different processes. the first two steps can bc simplified to pthread-mutex-t
mutex;
m e .
pthread-mutex-init(&mutex,
NULL);
This initiali7es mutex with the default attributes. A critical section of code then uses mutex as rollow s: gthread-mutex-lock(&mutex);
critical section; pthread-mutex-unlock(&mutex);
A mutex can be unlocked only by the thread that holds the lock. There is also a nonblocking version of lock. Condition variables in Pthreads are very similar to those we have been using. Like mutex locks, they have descrip~orsand attributes. .4 condition variable is declared and initialized with the default attributes as follows: pthread-cond-t * .
cond;
#
pthread-cond-initI&cond,
NULL);
The main operations on condition variables are wait, signal, and broadcast (which is identical to signal-all). These must be executed while holding a ~ n u ~ e x lock, In partjcular. a monjtor proceclurc is simulated using Pthreads by locking a rnutcx at the start of the procedure and unlocking the mutex at the end. The parameters to pthread-condwait are a condition vmiviable and a mutex lock. A thread thac wants to wait must first hold the lock. For example, suppose a thread har already executed
and the11later executes
This causes thc thread to release mutex and wail on cond. When the process refumes execution after a signal or broadcast. the thread will dgain own mutex and i~ will be locked. When another thread executes
awakcns one thrcad (if one is blocked), but the signaler continues exccutioll and continues to hold onto mutex.
jt
I
248
Chapter 5
Monitors
5.5.2 Example: Summing the Elements of a Matrix Figure 5.1 X contaiils a coinplete program that u x e s locks and cvndition variables. The progr;un uses numWorkers threads to sum the elements of a xh;ired matrix with size rowh and columns. Although the program is not particularly useful, ils structure and camponenls arc ~jrpicalaf those found in parallel iterative cvm-
pi1 talions.
.
#include ~ p t h r e a dh > #include <stdio.hs #define SHARED I #define MAXSIZE 2 0 0 0 #define MAXWORKERS 4
/*
/*
pthread-mutex-t barrier; gthread-cond-t go; int numWorkers; int n w r r i v e d = 0;
m a x i m matrix size */ maximum nunber of workers * /
/* /* /* /*
lock for the barrier * / condition variable * / number of worker threads * / number who have arrived * /
/ * a reusable counter barrier * / void Barrier() { pthread-mutex-lock{&barrier}; nucArrived++; if (numArrived c num'Workers) pthread-cond-wait(&ga. &barrier); else ( num~rrived= 0; / * last worker awakens others pthread -condbroadcast(&go);
*/
1 void "Worker (void * 1 ; int size, stripsize; / * size == stripSize*numWorkers * / int sums[MAXWOBRERS]; / * sums camputed by each worker * / int matrix [MAXSIZE] [MAXSIZE];
/ * read command line, initialize, and create threads * / int main(int argc, char * a r g v [ l ) { int i, j; pthread-attqt attr; pthread-t workerid [MAXWORKERSI ;
/ * set global thread attributes * / pthread-attr-init(&aW; pthread-attr-setscope(&attr, PTBREAI3_SCOPE_SYSTEM);
5.5 Case Study: Pthreads
/ * i n i t i a l i z e mutex and condition variable * / pthread-mutex-init(&barrierr NULL); pthread-cond-init(&go, NULL); I * read command line * / size = a t o i (argv [I]) ; numworkers = atoi(argvE21); stsipsize = size/numWorkers;
/ * i n i t i a l i z e the matrix * / = 0; i < size; i++) f o r (j = 0; j i size; j++) matrix[i] [j] = I;
f o r (i
/ * create the workers, then exit main thread * / f o r (i = 0; i < numWorkers; i++) pthread-create (hworkeridIil , &attr, Worker, (void * ) i ) ; pthread-exit (NULL) ; 1
/* Each worker sums the values in one strip. After a barrier, worker(0) prints the t o t a l * / void *WorkerIvoid *arg) I int myid = ( i n t ) arg; i n t total, i, j, f i r s t , last;
/ * determine first and last rows of m y s t r i p * / f i r s t = myid*stripSize; last = first + stripsize - 1; / * s u m values in my s t r i p * / t o t a l = 0; f o r (i = f i r s t ; i c = last: i + + ) for (j = 0; j i s i z e ; j++) total += rnatrixCi1 [j1 ; sums [myid] = total; Barrier l ) : if (myid == 0 ) 1 / * worker 0 computes the total * J total = 0; f o r (i = 0; i < numworkers; it+) total += sums l i l ; p r i n t f { ' ' t h e total is '%d\nH, total ) ; 1
1
Figure 5.1 8
Parallel matrix summation using Pthreads.
249
250
Chapter 5
Monitors
The integer variables and B a r r i e r function a i the top of he figure implement a reusahlc cormter barrier. They axe not encapsulated within a monitor declarrttion. but they behave as if they were. The body of Barrier starts by locking mutex and ends by unlocking mutex. Within the function, the first workers co asrive at the barrier wail on conclitjnn variable go (and release the lock). The Iasl worker to arrive at lhe barrier reinitializes numarrived then uses the broadcast primitive ;m awaken all the other workers. Those workers wiU resume execution one at a time. holding rnutex; each unlocks mutex then returns from Barrier. T l ~ cmain fttnction initializes the shared variables and then creates the ~hreads. The last iirgumcnl of pthread-create is used to pas7 each worker h e a d 21 trnique ~dentity.'The last aclion of main is to call pthread-exit. This causec he main t lllead to lerm i nntc while the other lhreads continue executing. Il' instcad the tnain thread were to reliu-n-because rhe call to pthread-exit werc deleted-then thc enlire prograin would terminate. Each worker thread compulex the suin of all values in its strip of the matrix and ctorcs the result in it5 clement or global an+rly sums. Wc use an array for 6UmS rather tharl a single variable in order 10 avoid clitical sections After all woi-kers have I-cachedthe banier sy n c h ~onizalicln point, worker 0 colnputes and prints the f i n d total. I
Historical Notes The concept of dam encapsulation originated with the class construct of Simi~la-67.Edsgcr Dijkstm [I9711 is generally credited with being thc first to advocate uxing dala encapsulation to control access to shared variables in a concurrent program. I-Ie called such a unit a "sccretary" but d ~ dnot propose any syntactic mechanism lor prog~':~rnmingsecretaries. Pcr Brinch Hansen [ I 9721 advocates the same idea, a specific language propowl called a shared class js found in Brinch Iiansen 1 19731. Monitors were named and popularized by Hoare [I9741 in an excellent and influential papcr that contains numerous intercsling examples. including a houndcil buffcr, ~ntewaltimer, and disk heItd schcduler (using rhe elevator algori thm). ConcIit ion ~ynchronizalionin Tloare's proposal cmploys the Signal and Urgent Wail (SU) signaling discipline. The reader might find it instruclive to compare Hoai-c's soltrlion~,which use SU signaling, with those In this chapter, which use SC signaling. Hoare [I9741 also introduced the concept of a split binary semaphore and showed 11ow to use it to irnpleme~~t monitors. Concur~entPascal [Rrinch Hansen 19751 was the lirst concurrent programming language to includc monitors. Its three structuring components are proceshes. monitors, and classes. Classes are like moniton, except they cannot be
Historical Notes
251
shared by processes and hence have no need for mutual exclunon or condit~on 5ynchronirat ion. Concurrent Pascal has heen used to write several operating sy stems [Rrinch Hansen 19771. Tn Concurrent Pascal, 110 devices and thc like are treated ar; special monitors that are implemented by the language's run-time system, which thus hides the notion of an interrupt. Scveral additiona1 Languages havc also included monitors. Modula was dcvcloped by Nicklatrs Wirth-the desigiler ol Pascal-as a systems language for progra~nmingdedicated cinnputer systems, including process control applications [Wirth 19771. (The original Madula is quite different than i t s successors, Modula-2 and Modula-3.) Mesa was developed at Xerox PARC [Mitchell el al. 19791, Lampson and Redell El9801 give an excellent description of experiences with procecws and monitors in Meca; they also discuss the tcchniquc of using a covering rondifion to awaken delayed processes, which wc described in Section 5.2. Pascal Plus [Welsh & Rustard 19791 intsoduced thc minrank primitive. which the designers called the PRIORITY function. Two books, Welsh and McKeag [1980] and Buslard el al. [ I 9x81. give several large examples of Tystems programs wrjtten i n Pascal Plus. Per Bnnch Ilansen (19931 has written an inleresling personal history of the developrnenl of [he monitor concept and its realization in Concurrenl Pascal. Gehani and McGcttrick 119381 conlains ~+ep~inrs or many of the most important papers on concurrent programming, including three cited abovc-Hoare [ 19741, Rrinch Hansen 119751, and Lail~pson&r Redell [1980]-as well as oO~ersthat examine thc usc of monitors. Ric Holl and his colleagues at the University of Torontn have designed a series or monitor-based languages. The first, CSPlk [ H o l ~et al. 19781, is a superset of S P k , which is in turn a struc~uredsubset af PLII. Concurrent Euclid I Holc 19831 extends Eucltd; it has becn used to implement a UNTX-compalible nucleus called Tunis. Holt [I9831 contains an excellent overview of concurrenl programming ac well as very readable descriptions of Concurrent Euclid, UNIX. and operatiug syslein and kernel design. Holt's most recent language is Turing Plus, an extension of the Tiiring Ianguage [Holt & Cordy 19881. All of CSP/k, Concurrent Euclid, and Turing Plus employ the Signal and Wait (SW) discipline. Turing Plus alw supports the Signal and Continue (SC) discipline and requires ils use within device monitors so that itlterr~~pl handlers are not preernpled. Emerald is a different kind of language than the above [Raj et al. l99J]. In particuIar. Emcrald is an object-oriented distributed programming language. I i is not based on municors but rather includes them as a synchronizn~ionmechanism. As in other object-orienled languages, an object has a represenlation and is manipulated by invoking operations. However, objects m Emerald can execute concurrently, and within an object invocations can execure ~ o n c urent1 l y. When rnutual excIusion and condition synchronization are required, variables and the
1
248
Chapter 5
Monitors
5.5.2 Example: Summing the Elements of a Matrix Figure 5.18 contains a complete program that uses locks and condition variables. Thc program uses numWorkess threads to sum the element5 of a sharcd matrix w ~ ( hs i z e rows and columns. Although l l ~ cprogram i s not particularly iiseftll, ilr structure and components are typical of those found in pdrallel ileralive cornputat~ons.
#include cpthread-hs # i n c l u d e cstdio.hs #define SHARED 1 #define MAXSIZE 2000 #define MAXWORKERS 4
/ * m a x i m u m matrix
size
*/
/ * maximum number e f workers * /
pthread-mutex-t barrier; pthread-cond-t go; int numWorkers; int numArrived = 0;
/ * lock
for the barrier
*/
/ * condition v a r i a b l e * / /* number o f worker threads * / / * number who have arrived * /
/* a reusable counter barrier * / void Barrier ( ) { pthread-uex-lock(&barrier); numArrived++; if {numArrived c numWarkers) pthread-cond-wait (&go, & b a r r i e r ):
else { n m r r i v e d = 0; / * l a s t worker awakens others * / pthread-condbroadcast(&gol;
1 pthread-mutex-unlack(&barrier);
1 v o i d *Worker (void * )
;
i n t size, stripsize; / * size == strFpSize*numWorkers * / int sums[MAXWORKERS]; / * sums computed by each worker * / i n t matrix [MAXSIZE] [MAXSIZE];
/*
read command l i n e , initialize, and create threads * / int main ( i n t argc, char 'argv [ I ) { int i, j;
pthread-attr-t attr; pthread-t workerid[WWORKERSI ;
I* set global thread attributes * / pthread-attr-init (&attrl ; pthread-attr-setscope(&attr. PTHREAD-SCOPE-SYSTEM];
5.5 Case Study: Pthreads / * initialize mutex and condition variable * / pthread-mutex-init(&barrier, NULL); pthread-cond-init (&go, NULL) ;
/ * read command line * / s i z e = atoi (argv[ll) ; numWorkers = atoi(asgv[2]); stripsize
=
size/numWorkers;
/ * initialize
t h e matrix
*/
for ( i = 0 ; i < size; i++) f o r (j = 0; j < size; j++) r n a t r i x [ i l [j] = 1;
/ * create the workers, t h e n exit main thread * / for (i = 0 ; i < numworkers; i + + ) pthread-create(&workerid[il, sattr, Worker, ( v o i d * ) i); pthread-exit (NULL ) ; 1
/ * Each worker s u m s the values in one strip. A f t e r a b a r r i e r , wOrkex(0) prints the total * / void *Worker(void *arg) { int myid = (int) arg; i n t t o t a l , i, j, first, last;
/ * determine first and l a s t rows of my strip * / first = myid*stripSize; last = first + s t r i p s i z e - 1;
/ * sum values in my strip * / total = 0; f o r (i = f i ~ s t ;i <= last; i++) f o r ( j = 0; j i size; j + + ) total e = r n a t r i x l i ] [j]; sums [myid] = total; Barrier ( 1 ; i f (myid == 0 ) I / * worker 0 computes the t o t a l * / total = 0; for (i = 0; i 4 numWorkers; i++) t o t a l += surns [ i] ; printf("the total is "&\nu, total); 1 1
Figure 5.18
Parallel matrix summation using Pthreads.
249
250
Chapter 5
I
Monitors
The integer variables and Barrier function at the top of the figure implement a reusable counter barrier. They are not encapsulated within a monitor declaration, but they behave as if they were. The body of Barrier starts by locking mutex and ends by unIocking mutex. within the function, the first workers to ai~iveat the barrier wait on cotldition variable go (and release the lock). The last worker to arrive at the barrier reinirializes numArrived then uses h e brvadcast primitive to awaken all the other workers. Those workers will resume execution one ar a lime, holding mutex: each unlocks mutex then returns from Barrier. The main fui~clioninitializes the shared varial~le.; and then creates the threads. The last argument of pthread-create is used to pass each worker thread a uniquc ~dentily.The last action of main is to call pthread-exit. This causes the main rh~eadl o lerminate while the other threads continue executing. If instead thc mail1 thread were to return-because the call to pthread-exit were deleled-then the entire prograin would terminate. Each worker thread computes the rum of all values in ils strip of the matlix ;tnd stores the result in ils element of global array s u m s . We use an array for sums rather than a single vdri;lble in order lo avoid critical cections. Al'ter a11 workers havc reached the harrier synchronization point, worker 0 computes and print5 thc iinal 10tdl.
Historical Notes The conccpl of data encapsulation originated wilh the class conslnict of Simuln-67. Cdsger Dijkstra 1 1 97 11 1s generally credited with beins the first to advocate using data cr~capsuiationto control access to shared v;~riablesi n a concurrent program. He callcd cuch a unlt a "secretary" but did mt propose any synlactic mecliunisrn for programming secrelaries. Pcr Brinch Hansen 119721 advocates the sanic idca. a specific language proposal called a shared class is found in Rrinch T-lanscn [1973]. Monftors here nitmed and popularj~edby Hoare [ I 974 1 in an excellent and influential paper lhat contains numerous interesting examples, including a bounded hul'kl; itlterval timer, and disk head scheduler (using (he elevator algorithm). Condition synchronization in Hoare's proposal employs the Signal and Urgenl Wait (SU) signaling discipline. The reader mighl find it instmctive to compare Hoare's solutions. which use SU signaling, with those in this chapter, which use SC signaling. Hoxe [I9741 also introduced the concepl or a split bindry seinaphorc and showed how to use it to implement monitors. Concurrent Pascal [Brinch Hanserl 19751 was the first concurrenl programming Innguagc to incl ucle monitors. Its three structuring components are pro-
ccsscs, monitors, and classes. Classes are like monitors, except they calm01 be
I
I
Hisforical Notes
251
shared by processes and hence havc no need for mutual exclusion or condition synchronization. Concurrent Pascal has been used to wrire xeveral operating systems [Brinch H a n ~ e n19771. Jn Concurrent Pascal, If0 devices and the like are treated as special rnontlors that are implemented by the languagc's run-time xystern, whrch thus hides the notion ol' an interrupt. Several additional languages have also included monitors. Modula was develoyed by Nicklaus Wirth-the designer uf Pascal-as a syslems languirge for programming dedicated cornputcr systems, including process control appI icatioils [Wirth 19771. (The original Modula is quite different than its successors, Modula-2 and Modula-3.) Mesa was developed at Xerox PARC [Mitchell el al. 19791, Lampson and RedeIl [I9801 give an excellent description of experiences with processes and monitors in Mesa; they also discuss the technique of using :I covering colldidon to awaken delayed processes, which we described in Section 5.2. Pascal Plus rWelsh & Bustard 197?] introdliclced the minrank primjtive, which the designers called the PRIORTTP function. Two books, Welr;h and McKe~g[19801 and Bustard et al. [19X8], give several large examples of systems programs written in Pascal Plus. Pcr Brinch Hansen [I993 1 has written an interesting personal history of the devcloprnent of the monitor concept and its realization in Concurrent Pascal. Gehani and McCeltrick [I9881 contains reprints 01many of the most important papers on concurrent programming. illeluding three cited above-Hoare [ I 974 1, Briilch Hanscn [I 9751, and Lampson & Redcll [I 9801-as well as others that examine the use of monitors. Ric Holt and his colleagues al the University of Toronto have designed a sel i e of ~ monitor-based languages. The first, CSP/k [HoIt et al. 19781, is a supetset 01 SPk. which is in turn a structurcd subqet of PL/I. Concurrent Euclid [Hnlt 19831 extends Euclid; it has been used tu implement a UNIX-ctlmpat~blel~ucleua called Tunis. Molt 11983 1 contains an excellent overview of concurrcnl programming as well ax very rcadable descriptions of Concurrent Eucljd. UNIX, and operating system and kernel design. Holt's most recent language is Turing Plus. an extension of thc Turing language [Holt & Cordy 19881. All of CSP/k, Cancurrent Euclid, ancl Turing Plus empIoy the Signal and Wait ( S W) discipline. Turing Plus also supports the Signal and Conlinue {SC) discipline and require!, it5 usc wilhin d e ~ i c monitors e so that intcri-upt handlers are not preempted. Emerald is a different kind of language than the above [Raj et al. 19911, In particular, Emerald is an object-oricntecl distribuled programming Ianpuage. It is no1 based on t~~onitors but rather includes them as a synchronization mechanism. As in othcr object-oriented Tanguages, an object hac a representation and is manipulated by invoking operations. However, objectc in Emerald can execute concurrently, and within an object invocations can execute concunenlly. Whcn mutual cxclusicm and condition synchronization are required, ~miahlesancl the
252
Chapter 5
Monitors
operations that access them can be defined within a monitor. Objects in Emerald can also be mobile-i.e.. they can move during program execulion. Thus, Emerald was in a way a precursor of Java. Two modern and widely used languages-Java and Ada 95-also provide mechanisms that can be used to program monitors. We gave an overview of Java's mechanisms in Section 5.4. General informalion on Java can he found in the many books on the language; a wealth of ~nformalion is also online at www. javasoft .corn. TWObooks 5pecifically cover concurrenl programming in Java. Doug Lea [I9971 descrtbes design principles and plfterns. Stephen Hartley [I9981 cover5 many ol' the same topics as this text (semaphores, monitors, etc.) and includcs clo~enr;of example programs. A new book by Jeff Mapee and Jeff Krarncr 1 19991 provides more general coveragc of concurrency, lncluding rnodeling concurrent behavior, and uses Java to illustrate the concepts. Thc first version of the Ada programming language, Ada 83, supported concurrent programming by means of tasks (processes) and rcndezvous (see Section 8-2). The lalest vcrsion of the language, Ada 95, added protected types. which are similar to moni~ors, Ada's cancumnt programming inechanisms are described in Section K 6. In particular, Figure 8.17 shows how to iinplcment a counler barrier in Ada using a protected type. A general Web resource for Ada infor~nalion,including hooks and online tutorials, is located at adahorne .corn. D~jkstra119681 introduced the ?leeping barbcr problem in hir important paper on using semaphores. Teory and Pinkedon 119721 and Geist and Danicl [ I 9871 discuss and analyze a variety of ditrerent disk scheduling ;ilgorithms. Section 5.3 uses the circular scan algorithm. Later we will show how to implemcnt ~hortestseek time (Section 7.3). Hoare [ I 9741 illustrates the elevator algorithm. Lister 11 4771 firs( raised the problem 01 what to do about nested tnonitor calls This icd to a flurry of follow-up papers and letters in Operating Systems Review, the quarterly newsletter of the ACM Special Interc5t Group on Operating Systems (S [GOPS). These papcrs discussed several possibilities, including thc following: prohibit nested calls; uqe open calls; use closed call< and release cxclusiun only in the last monitor: use closed calls and release all locks w11e11 a process blocks, then r e a c q ~ ~ ithem r e beforc the process continues; and finally, let the programmer specily whether a 5pecific call is open or closed. In rcaction to Lisler's papcr, David Parnas [ 19781 argued that the fundamental issuc is data integrity, which does not necessarily require mutual exclusion. I n particulal; concurrent execulion of monitor procedures i~ fine as long as processes in a monitor d o not interfere with each other. A t about the same time, Andrews and McGraw [I9771 defined a monitor-likc construct lhal peimits the progr;uillmer to specify which procedures can executc in parallel. Anoilleiapproach is to use path ex~>re,s,sion~ [Car-upbell & Kolstad 19801, a high-level
References
253
mechanism that allows one to specify the order in which procedures execule and obviates the need for conhtion variables (see Exercise 5.26). Tf~eMesa Ianguage provides mechan~srnsthat give the programmer control over the granularity of exclusion. More recently, as shown in Section 5.4, Java has taken this approach of allowing the programmer to dectdc which methods are synchronized and which can execute concurrently.
References Andrews, G. R., and J. R. McGraw. 1977. Language fcaliires for procecs interaction. Smr A CM Conf kre-encr on Lu?qziage Tleszg~zfi)r Rriiublr Softwxrc2. SIGPLAN Nrtiees 1 2 , 3 (March): 114-27. Brinch Hansen, P. 1972. Structured multiprogramming. (July), 574-78.
Cupnpn.
ACM 15, 7
Brinch Hansen, P. 1 973. Opemting Systam Principles. Englewood Cliffs, NJ: Prentice-HaI1. Brinch Hanscn>P. 1975. Thc pragram~ninglanguage Concurrent Pascal. IEEE Truns. on S@ware EEIZSK SE-I. 2 (June): 1 99-206.
Brinch Hansen, P, 1977. The A r r h i t e c f ~ roj~ Concurrent Prqyr~ims.Englewood CliIfs, NJ: Prenticc-l-Iall. Rrinch 1-Iansen, P. 1993. Monitors and Concurrent Pascal: A penanal history. Hislory of Programming Languages Conrereme (ElOPL-IT), ACM SigpEan N # ~ ~ c E28. , I . 3 (March): 1-70. Bustard, D., J . Elder. and J . Welsh. 1988. Conc~irre.enzPrclgmm Sfatr'lures. New York: Prentice-Hall 1nler11;ltiolial.
Cainphell, R H., and R. R - Kolstad. 1980. An overview of Path I'ascal's dcsign and P;th Pascal user manual. SZGPlAA/ ~J~otices 15,9 (Seplember): 13-24. I('
Dijkstra, E. W. 1968. Cooperating sequential processei;. Ttl F, Gellllyh, ed., Progmm~~ring language,^. New York: Acadcmic Press, pp 43-1 12.
Dijkstra, E. W. 1 97 1. Hierarchical or-dcring of sequential processes. Artr! Informarica 1, 115-38. Gehani, N. H., and A. D. McGettrick. 1938 Conr~rrrcnrPro~mmming.Read-
ing, MA: Addison-Wesley.
Geist, R., and S. Daniel. 1987. A continuum of disk xcheduling algorithm<. ACM Tran,r. on Comp~kterSyst~rns5 , 1 (February): 77-92.
,
254
Chapter 5
Monitors
Habcmmann, A. PI!. 1972. Synchronization of communicating processes. Cnrnm.
ACM IS. 3 (March): 171-76. Hartley, S . J. 1998. Concurrent Programming: The Java Programming Langilage. New York: Oxford Unive~sity Press.
I-loai-c. C.A.R. 1974. Monitors: An opcrnting syslein slrucluring concept. Covnm. ACM 17, 1 0 (Oclober): 549-57. Holt, R. C. 1983. Conrurr(wr Euclid, The UNiX System, and Tunzs. Reading, MA: Addimn-Wesley. Holl, R C. and 1. R Cordy. 2988 The Turing programming language. Cumin. ACM 3 1, 12 (December): 1410-23. FToll, R. C.. G. S. Graham. E. D. Lazowska, and M. A. Scott. 1978. Strrdctured Con(-urr~ntPmgruniming with Opemting Systrm Applic'afiolzs. Reading, MA: Addison-Wesley. Lampson, B. W., and D. D. Redell. 1980. Experience with processes and moni-
tors in Mesa. Comm. ACM 23,2 (F~hruary)~ 105-13. Led, D. L. 1997. Colzcurre-enrProgmmmmg in Java: Desrgn Pri~iriplesugld Puftern-ns, Reading, MA: Addison-We~ley.
Lister, A. 1977. The problem ol' nested moniror calls. Operating S y s t e m Review 11, 3 (July): 5-7. Magee, J., and J. Kramer. 1909. Conrurrenry: star^ h4od~lsand Java Programs. New York: W~ley. Mitchell, I. G.,W. Maybury, and R. Sweet. 1979. Mesa lclnguage manual, version 3.0. Xerox PaIu Alto Research Center Report CSL-79-3, April.
Parnas, D. L. 1978. The noil-problcm u l ncslcd monitor c;~lls. Op~ratingSysfern3 Review 12, 1 (January): 12-1 4. Raj. R. K., E. Tempero, H. M. Levy, A . P. Black, N. C. Hutchinson, and E. Jul, 1991 . Emerald: A general purpose programming language. S i f ~ w r e Practice and E~-p~rianue 21, 1 (Januaiy): 91-118. Teorey. T. J.. and T. B Pinkerton. 1972. A comparative analycis of disk scheduling policies. Co~nnz.ACM 15. 3 (March): 177-84. Welsh. J.. and L). W. Bustaldd. 1979. Pascal-Plus-anolher language for modular multiprogramming. SnSnyure-Praotic~ Experrencr 9, 947-57 Welsh. T , and M McKeag. 1 9 R O Srrut-rure-ed Sysrem Pro,orurnming. New York: Prentice-Hal1 International.
N. 1977. Modula: A language for modular rnullipl-ogramming. So&wurc-Pmcti~e and Expori~nre7 , 3-35.
Wirth.
Exercises
255
Exercises 5.1 Suppose that the empty primitive is not available, but that you want lo be able to tell whether any proccss is waiting on a condition vaxiable queue. Develop a way to simulate empty. In particular, show the code yo11 would need to add before and after each wait Icv) and signal ( c v ) statement. (Do not worry about signal-all.) Your solution should work for both the Signal and Continue and Signal and Wait discipIines.
5.2 Consider the shortest-Job-Next monitor in Figurc 5.6. Is the monitor corrcct if the Signal and Wail discipline i s used? If so, explain why. If not, c h a n ~ e the monitor so it is correct. 5.3 Consider the following proposed solution lo the shortest-job-nexl alloca~ionproblem in Section 5.2: monitor SJN 1 boo1 free = true; cond t u r n ; procedure requestcint time) if ( ! f r e e ) wait(turn, time); free = f a l s e ; 1 procedure release ( ) { free = t r u e ;
{
signal (turn);
1 1
Docs this solution work correctly for the Signal and Conrinue discipline? Does it work correctly for Signal and Wait? Clearly explain your answer\. 5.4 The following problems deal with the readers/witers monilor in Figure 5.5. Assurne Signal and Continue semantics f o r all four problems.
(a) Suppose there is no signalpall primitive. Modifj the solutioil so th;it it uses only signal. I I
(b) Modify the solulion to give writers preference instead of readers. (c) Modify the solution so that readers and writer5 alternate if both are rrying to access the database. (d) Modify the solutioi~so that readers and writers are given permission to access the database in FCFS order. Allow readers concurrent access when thal does not violate the FCFS order of granting permission.
256
Chapter 5
Monitors
5.5 Con~iderthe following definition ol' semaphores [Habermann 19721. Lct na he the nurnhcr of times a process has dlteinpted a P operation, let np be thc number of completcd P operations, and let nv be the number of completed v opcralionr, The cernaphore invariant for this representation is np == min(na, nv)
This invariant specifies that a process delayccl in a P opcralion should be awakened and allowed to contiririe as soon as enough v operalions have been executed.
(a) Develop a monitor that lmpremenls semaphores using this representatioil and invariant. Usc the Signal and Continue disdpline. (b) Develop a monitor thar iinplements semaphores using h i s representation and invarianr. Use the preemptive Signal and Wait discipline.
5.6 Consider the dining philosophers problem defined in Sectioii 4.3. (a) Dettelop a monitor LO impEernent rhe required synchronization. The rnonilor should have two operations: get forks ( i d ) and s e l f orks ( i d ) , where i d is the identity oT the calling philosopher. First specify a monitor invariant, then develop ihe body of the monitor. Your solurinn need not be fair. Use ihe Signal and Continue discipline. (b) Modify your answer to (a) so that it i~ fair-i.e., wants to eat cventu;illy gets to.
so that n philosopher who
5.7 T17r C l n e - l m e Bridge. Ciirs comlng from the north and Ihe south 'arrive tit a onelane bridge. Cars heading in thc sa~nedirection can cross thc bridge at the same time, but cars heading i n opposite directions cannot.
(a) Devclop a solulion to this problem. Model the cars as processeq, and use a monitor. tor synchroni7atjon. First specify the monitor invarianl, Ihen develop [he body o f the monilor. Do not woii-y ahout fairness, and do not give prefcrcnce to any one kind of car. Use thc Sjgnal and Contlnue discipline. {b) Mod~i'yyour answer lo (a) to ensure Iajrnecc. (Hint: H a w cars take turns.) Account Problem. A saving%account is shared by several pcople (processes). Each person may deposit or wlthdraw funds from the account. The curi.ent balancc in the account is he \urn of all deposits to date minus the sui11 of all withdrawals to date. The balance must never become negative. A deposit never has lo delay (except lor mutual exclusion), but a withdr;~walhas to wait until therc are ~ufficientfunds.
5.8 Tlzr Savings
(a) Develop a monitor to wive this problem. Thc monitor should have two procedures: depaai t ( amount ) and w i t h d r a w (amount ) . Fii-st specify a
Exercises
257
~nunitol-invarianr. Assume the arg~mentsto d e p o s i t and withdraw are poxilive. Use the Signal and Continue discipline.
(b) Modify your answer to (a) so that wilhdrawals are scrv iced FCFS . For example. suppow lhe current balance is $200, and one customcr i q waiting to withdraw $300. If another customer arrives, he n ~ u s wait. l even if he wants to withdraw at most $200. Assume there i q a magic [unction amount (cv) that returns the \ d u e of the amount pa~.arneterof the first process delayed on cv. ( c ) Suppose a magic amount function docs not exist. Modify your answer to (h) lo simulate it in your solulion.
5.9 7Xe Water Molecule Problem. Suppose hydrogen and oxygen atoms arc bouncing around in space trying to group together into water moleculex. This requires that two hydrogen atoms and onc oxygen atom synchronize with each other. Lel the hydrogen (H) and oxygen (0)aloms he simulated by processes (threads). Each H atom calls a procedu~eHready when it wants to cornbinc in10 a water molecule. Each 0 atom calls another procedure Oready when il wants to combine.
Your job is to write a monitor Il?al implementc Hready and Oready. An &atom I has to delay in Hready uilrrl another H atom has alst~called wready and one 0 atom has calfed Oready. Then one of t h e proccsses (say the 0 atom) should call a procedure makewater. AlZer makewater returns, ull tftree procmsea should return from their calls of Hready a i d Oready. Your rolution must avoid deadl o ~ iuld k skarvaliu~i.Thib is a tricky problem, so be careful. 5.10 Atonzic Broaulcasr. Assume one producer proccss and n consumer processes share a hounded bulfer having b $lots. The producer ciepoqits rncssages i n the buffer; consumcrs felch lhem. Every message deposited by the prod~~cer is to he received by a11 n consumers. Furthermore, each conwntncr is lo receive thc messages in the order they were deposiled. However. consumcl-5 can receive messages at different times. For example. one ctlnsurner could receive up to b more messages than another if thc second eonsurner is slow. Develop a monitor thal implements this kind o l communication. Use [he Signal and Continue discipline.
5.11 Develop a monitor that allows pairs of processes l o exchange values. The monilor has one operation: exchange ( int *value ) . A k r two processes hake c;illed exchange, the monitor swxpc the valucs of the arguments and returns them to the proccsses. The monitor qhould be reuxdhle in the scnse that i t exchanges the parameters of the first pair of cdlerc, then the second pair of callers, and so on. You may use either thc Signal and Continue or Signal i~nd Wait discipline, but Atate which oar yoti are using.
258
Chapter 5
Monitors
5.12 The Dining ,Yuvages. A tribe of savages eats communal dinners from a large pot that can hold M scrvings of stewed ~nissionary. When a savage wants to eat, he helps himself from h e pot, unless it is empty. If the pol is empty. [he swage wales up the cook and then waits until the cook has refilIed the pol. The bcha.vior ol' the savages and cook is specified by the Jbllowing processes: process SavagelZ:n] ( while (true) { get serving from pot; eat; 1 1 process Cook { while ( true) { sleep; put M servings in pol; 1 1
Develop code for h e actions of Lhe savages and cook. Use a monitor tor synchronizalion. Your sotiition should avoid deadlock and awaken the cook only when the pot is empty. Use the Signal and Continue discipline.
5.13 SearcWInserV'Delete. Three kinds of processes share access to a singly linked list: searchers, inserters, and deleters. Searchers merely examine the list: hence lhey can execute concurrently with each other. Inserter5 add new items to the end of the list; insertions lnusl be mutually exclusive lo preclude inserting 1wo items at about the same time. However, one in~ertcan proceed in parallel with any number of searches. Finally, deleters remove items from anywhere in the list. At most one deleter process can access the list at a time, and deletion must also be mutually exclusive with searches and insertions.
Develop a monitor to implement this kind of synchronization. First specify monitor invariant. Use the Signal and Continue discipline.
3.
5.14 Me~noqpAllucatinn. Suppose Lhere are two operations: request {amount> and release (amount) , where amount is a positi~einteger. When a proccss calls request, it deIay'i until at least amount Sree pages 01' memory are available. A process rcturni amount. pages to the free pool by calling release. Pages may be relcased in different quantities than they arc acquired.
(a) Develop a monitor. that implements request and release. First specicy a global invariant. Do not worry about h e order in which requests are serviced. Use Lhe Signal and Continue discipline. (Him: Use a covering condit~on.) (b) Moclily your answer to (a) to use the shortel;t,job-next (SJN) allocation
policy.
I11
particular, smaller requests take precedence over larger ones.
(c) Modify your answer to (a) l o use a first-come. firs[-served (FCFS) allocalion policy. This mcans that a pending requcsl might hwe lo delay. even if hcre is cnough ~nernoryavailable.
Exercises
259
(d) Supposc request and release acquire and return conliguous pages of memory; LC., if a process requests two pagcs, it delays until two adjacent pages are available. Develop a rnonitor that implements these versioils of request and release. First choose a representation for the status of mcmory pagcs and speci ty a monitor Invariant.
5.15 Suppose n p1;ocesses P [ 1 :n] share I w o printers. Before using a prin~er,P [i] ~~111s request (printer). This operation relirrns the identity of a Iree prinzer. Aflel- using that printer. P [i] returns it by calling release (printer). (a) Dcvelnp a monitor [hat implements request and r e l e a s e , First spccify a rnonilor jnvartant. Use the Signal and Continue discipline. (b) Assume each process has a priority that it passes to the monitor ac an additional argurnenl to request. Modify request and release so thnl a printer i h allocated to he highest priority wailing proccss. [f two ploce5ses have thc same priority, their requests should be granted in FCFS order.
5.16 Suppose a computer cenler has two printers, A and B, that are similar but not identical. Three kinds of processes use the printers: those that must use A. those that must uae B, and those that can use either A or B. Develop a monitor to allocate thc printers, and show the cude that the processes cxecute to request and release a printer. Your solution should be fair, assuming lhat printers are eventually released. Use the SignaI and Continuc discipline. 5.17 TAP Roller Coosti>rP r o b l ~ m .Suppose there are n passenger processes tmd one car plocess. The passengers repeatedly wait to lake ndcs i n thc car, which can liold c passenger.;, c < n. Howevel; the car can go around thc tracks only when it i s full.
(a) Develop code for the actions of the paqsenger and car processes. and devclop a monitor lo synchronize them. The monitor should have three operatiat~s: takeaide, which is called by passengers, and Load and unload, which arc called by the car process. Specify an in~ariantLi)r your monitor. Use the Signal and Continue discipline. (h) Generalize your ,answer tu (a) to employ rn car processes. m > 1. Since thcrc is only onc track, cars cannot pass each other, i.e., they must finish going around the track in the order in which they started. Again, a car can go around the tracks only when it is full. 5 . I X File Bugtr A /location. Many operating systems, such as UNIX. maintain a cache
of fiIe access hurlers. Each buffer is the size of a disk block. When a user process wants to read 3 disk block, the file system first looks in the cache. If the block is there, the file system returns the data to the uscr. Otherwise, the filc
260
Chapter 5
Monitors
system selecrs the least recently used bufKer, reads the disk block inlo il, then returns the dam to the user. SimiIarly, if a user process wants to write a d i ~ kblock that is in the cache, thc file qystcm simply updates the block. Othexwisc, the file system selectc thc least reccnlly uied buffer ;ind writes into that one. The file system kecps track of w h ~ c hbuffers contain new data-i.e., which have been modified-and writes them to disk before letting them be used for a different disk block. (This is called a write bu~.kccaclae policv.)
Develop ;I monitor ro irnplemellt a bufler cache having the abovc specific;~rions. First define the procedures you nccd and h e i r pardmeters, then develop h e body oi' the monilor. Use the Signal and Coiltinue discipline. Explain any additional mec11;inisms you need-for example, a clock and a d~sk-accessprocess.
5.19 The following problems deal with the sleeping barber monitor in Figure 5.10 (a) Some of the w h i l e loops can be replaced by i f statements. Determine which ones, and modify the monitor appropriately. Assume the Signal and Conhnue discipline. (b) Is the monitor, as given, correct for the Signal and Wait discipline? If so, give a convincing argument. Tf not, modify the monitor so that i t is correct. (c) 7s the monitor, as given. correct for the Signal and Urgent Wait discipline? If
so, give a convincing argument. If not, modify the monitor so that it is correct. 5.20 In the sleeping bu-bcr problem (Section 5.2), suppose there are several barbers rather than just one. Develop a rnon~torto synchronize Ihe actions of the customers and barbers. First specify a monitor invariant. The non nit or should have the same procedures as in Figure 5.10. Be carcful lo ensure that f inished-cut awakens the sarne customer that a barber rendezvoused with in get-next-cus tomer. Use the Signal and Continue discipline.
5.21 The following problems deal with the scparatc disk scheduler tnonitor in Figure 5.13. (a3 Modify the monitor to einploy the elevator algorithm. First specify a monitor invariant, then develop a solution.
(b) Modiry the monitor to employ the shortest-seek-time algorilhtn. First specily a rnoililor invarjant, then dcvelop a solution. 5.22 The follnwi ng problcms dcal wjlh the d ~ s kinterface monitor in Figure 5.16.
(a) Give lhe details nl' the monitor invariant.
261
Exercises
(b) Modify f inished-transfer so that the disk driver process does not wait for a user process to fetch ils resuIts However, be careful that the disk driver does not overwrite the resulls area.
Combine the procedures get-next-request and finishedtransfer that are called by the disk driver. Be careful aboul the initialization of the monitor's variables. (c)
5.23 Figure 5.17 illustrates the use of a ~iak-Access monitor, and thc lext oullines its implementalion. Develop a complete implementation of Di sk-~cceas. Use the SCAN (elevator) disk-scheduling strategy. First specify an appropriate monitor invarianl, and then develop the body nf procedure d o ~ O .Do not worry about implementmg the nisk-~ransfermonitor; just show the calls to il dt appropriate points horn within Disk-Access (and assume they arc open calls). 5 24 The main difference betwecn the variou? monitor signdli~igdisciplines is h e order in which processe? exccrrte. However, il i s possibIe ro simulate the semanlics of one monitor signaling discipline using a different signaling discipline. 111 particular, it is possible to transform a monito~that uses one signaling discipline into a monitor that ha? the ? m e behavior hut usex a different signaling discipline. Thls requires changing the code and adding extra vari:ibles. (a) Show how to simulate Signal and Continue using Signal and Wait. Illt~stratc your simulation using one of'the monitors earlier in [he chapter. (h) Show how lv sirnulate Signal and Wdit using Stgnal and Canti~lue.Dcvelop an example that illustriites your simulstion.
( c ) Show how t o simulate Signal and Wait usirlg Signal and Urgcnl Wait, and vice versa. Develop an example that illustrates your sirnulation. 5.25 Suppose input and output on a tenninal are supported by two procedures: mp8;"Gc getline(string & s t r , int &count) gutlLne(string s t r )
~
v.Y?i2~1j7%;k.;,
An application process calls setline to receive the next line of inpui; it calls putline to send a line to the display. A call of g e t l i n e returns when there is another line of input; result argument str is set to the contents of the line. and count is set to its actual length. A line contains at mosl MAXLINE characters; it is terminated by a NEWLINE character, which is not part of the line itself.
Assume both input and output are buffered-i.e.. up to n inpul lines can be stored wailing to be retrieved by getline and up to n output lines can be stored waiting to be printed. Also assume that input line? are echoed to thc display-i.e., each complete input line is also sent to the display. Finally, assume that input
262
Chapter 5
Monitors
lines are "cooked"--i.e., backspace and linc-kill characters are processed by getline and do not get ~eturnedto the application procehs. Devclop a monitor thal implements getline and putline. Assu~nethere are two ilcvice driver psocesseh. One reads characters from Ihe keyboard; the other mriles lines to the dir;pIay. Your monitor will nccd to have additjonal procedures th;d these processes call. 5.20 Pall1 expressions are a high-levcl rx~echanismror speci [jr ing synchrontzation helween procedures in a module [Campbell & Kolstad 19XOI. They car1 bc uscd lo specily which procedures execurc wilh mulual exclusion and which can execute in parallel, and they obviate thc nccd ibr condilion variables and cxplicil w a i t and signal statetnents. The syntax of an open path expression is dcfined by the following BNF grammar:
palh-declaration ::= "path" lisl "end" Iist ::= scquence { " , " sequence ) scquencc ::= itern { ";" itetn } ..itcm . .- bound ":"" I " [jqt " ) " 1 " [ " lisl " ( " list " ) " I identifier
", "
I
Braces denote zero or morc occurrences of the enclosed itcrns, and quotes enclose literal itcms (nonterminals). Tn the choices for item, bound is a positive integer, and identifiw i s the name of a procedure.
The comma operator i n a list imposcs no synchwninattion constraints. Thc semicolor1 operator in n sequence itnposes the constraint that one cxccutton or lhe lirst itcm nulsl complete before each execution of the secor~ilitem, which must cnmpIerc bcfo1.c each execution of Lhe lhird item, and so on. The kouild operator limits 111c number of elements of the enclosed list thal can be active at a time. Thc bracket operalor [ ... J allows any number of elernenls or the enclosed list to be active : ~ lonce. For example, the following specilies l h : ~any number of calls OF procedures a or b can proceed in parallel but that a and b are mutually cxclusive with respect to cach othcr: path 1: ( [a], [b] ) end (3) Give a paLh cxprcsbion to express the synchronization fox a bounded buSfer with n slots (I'ig~~re 5.4). The operations on the buffcr arc d e p o s i t dnd fetch. They are to executc wirlz lliutual exclusion
(b) Give a path exl~ressionto express thc synchronizalion for a l~oundedbuffcr wilh n slotr;. Allow maximal par-allclisln;i.e., inslance\ of deposit ancl fetch c w execute i n parallcl, as long as they are accessing different slots.
I
Exercises
263
(c) Give a path expression to express the synchronization for lhe dining philosophers problem (Section 4.3). Explain yom answer. (d) Give a path expression to express the synchronization Cor the readers-writcrs problem (Figure 5.5). Explain your answer.
(e) Give a path expression to express the synchronization for the sleeping barber problem (Figure 5.10). Explain your anr;wer.
(0
Show how to ~mglcmentpath expressions using semaphores. Tn particular. suppose you are given a path expression that specifies the ~ynclironi7ationfor a set of procedures. Show what code you would insert at the start and end of each prt~cedurc to enforce the specified synchronization. Start with somc simple exdrnples, and then geiicfi~lizeyour solulion.
5.27 The following problems refer to the Java program< i n Scclinn 5.4. The snurcc Tor L ~ E ? programs ciln he duwnloaded from the Wcbsile fns this book (see the PIeface).
(a) Write a simple program that hac two clacses. 011edefines a thread and has n run method that prints a line; the sccotld is the main ~ l a s 5thnl creates a thi-ead In the main clar;s, creatc and slarl thc thread as described in (he text. Then try c;ill~ngthc run tnethod directly rothcr than indirec~lyvia s t a r t . (Niimcly, use s .run ( ) i l s is the thread.) What happens? Why" (b) Develop morc realistic simulationc; of the readers/writers prograniy. Use multipIe ~eadcrsand wr i k l b ;md modjfy h e m so that you can obse~+ve [hat thcy synchronize correctly. Pcrhaps modify rhe database to rnakc i r so~newhalmore realistic, or at least to have read and write take longer. Also 11ave each thread clecp for a s~nallrandom amounl oc time before (or after) ever-y accesh to rhc database. .Java providex several methods-such as nap. age, random, and seed-that you call use to construct your simu1;itions. Write a hrief report surnmarking what you obse~ve.
I
(c) Modify Lhe ReadersWri t er s class 10 give wrilers prcfcrence. Repeat your si~nulationsfrom part (b), and summarize what you observc. (d) Modify thc Readerswriters class to make it fair. Repeat your simulations froin part [b), and summarize what you observe. 5.28 Devclop a Java program to simulate the dining philosophers problem (Section 4.3). Your program shimld have 5 philocopher threads and a class that implernenls a non nit or to synchroni7e the philosopl~ers.The moililor vhould have two methods: get forks ( i d ) and r e l f orks {id),where id is an integcr between 1 and 5. Have the philosophers eat and sleep fur random amounts of time Add print sratenlents to your program to generate a trace of rhe activity ol' the program. W1.i te a brief report suinmarbi ng w h a ~you obserte.
Implementations
The previous chapters defined mechanicms Ibr concurrent programming with shared variables and showed how to u w them. Ar a minirnuin, modem multiproceswrs provide machine instrrrctions that the cyslems programrncr can usc to irnpleinenl locks and barriers, as described in Chapter 3. Some mn~~lliprocessors provide llardwarc support for processes, context switching, lock$, barriers, and sometimes even the \pinning aspect of semaphores. However, in gcneral concurrent programmins lnecha~lisrnsare implemented in software. This chapter describes software implerneniations of processes. semaphores, and monitors. The basis for the ilnplementat~onsis what is called a I e n l ~ I :a small set 01' data structures and subroutines that are at the core of any concurrent program. (A kernel i$ sometimes called a aucleu,~;both terms i~ldicate[hat the software is corninon to cvery software module.) The iole of a kernel is to provide a virtual processor to cach process so that ihe process has the illusion that ir is executing on its own processorm The data slructurcs represent the states of processes, semaphores, and conchtion ~~iriab1e.s. The qubsautines implement operations on the data structures: each is called a primitive operation because it is implemented so that it executes atomically. Section 6.1 describes a kernel that i~nplernenlsprocesses on a single processor. Section 6.2 extends that kernel to one for a multiprocessor. Sections 6.3 and 6.4 add kernel primitives to suppo1.t semaphores and monitors, respectively. Finally, Section 6.5 describes how to implement monitors using semaphores. Our focus is on implementing processes and synchronitation, so we do not cover many additional issues. such as dynamic storage allocation and priority scheduling, that arise in practice.
I
I
,
266
Chapter 6
Implementations
6.1 A Single-Processor Kernel co r;latcrncnt and praces s dcclarationr; in pre\ ious chaptcrs to ~pecifyconcuxuer~tactivity. Processes are merely special cases of c o slatements. so we li)cus on how to i~nplemenico ?tatemen(&. Consider the following progmm fra,uinenl:
Wc havc used the
SO; co P I : Sn+1 ;
s1; / /
... / /
Pn: Sn; oc
The Fi are process names. The si arc staternent lists and optional declarations of variables local to pmcesc; Pi. Three mechanisms are needed to implcrnent the above program fragment: one to creak processei; and star1 lhem executing,
one lo stop (and destroy) a process. and one to determine when the co statement has completed. A ~winzitiv~ is a routine lhal is implemented by a kcmel in such a way lhat it executes as an atomic action. Processes are crcalcd and destr~yedby means of two kernel primitive?; f o r k and quit. When one process invokes f o r k , another process is created and made eligible for execution. The drgurnents to f o r k give the address of the first inslruction lo bc executed by the new pmcesc; and any other data needed Lo spccify its initial stale-for cxample. its paramctcrs. The new proccss is called a chiM; the proccrs hat execurcs f o r k is called ils parcnf. When a praccs$ invokes quit. il ceases to exist. "I'hc quit prirniirvc has no arguments. A 111ird ltcrnel prim~tive,join, js used l o wait for processes In complcte cxecu~ion,and hcnce to delermine that a co st;llement has completed. In panicuIdt.. when a pitrent process executes j o i n , it waits for a child proccss that it previously forked to execute q u i t . The argurncnt to the join prnnilivc is the name of a child process. (Alternatively, j oin col~ldhave no argumencs. in which case join w a i t s for nnll child to terminate and perhaps returns the iden~ityoT the child.) Thcsc three ke~nelpriinitives-fork, j o i n , and quit-can be uscd as shown here to implement rhc above program lragmcnt. Each child proccsscs p i executes ~ h following c code: Si; q u i t ( ) ;
The main process executcs the following code:
27 2
Chapter 6
\mp\ementations
When a procewor is intcn-upted. it enters the kernel 2nd inhib~tsfurther i~lierruptson tizaf p r o ~ e s ~ ~ oThis r . makes execution in the kernel indivisible on lhat processor, but it doel, not prcvcnt othel proces5ors from simultaneously executing in Ihe kernel. To preclude intet-fexencc between the processors, we could make Ihe entire kernel a critical ~ection.However, this i s a poor choice for two reasons. First, it uunccessaril y precIudes some safe concurrent execution because on1y access to shared data str~~ctures such as the frce and ready lists is critical. Second, making the entire kerncl into a critical section results in unnecessarily long critical sccrions. T h i s decreases pe~forrnance because it delays processors trying to enlel. the kernel, and il increases memory contention for the variables that implelncnl the kernel critical scclion protucul. The f'ollowing principle elucidates a much better choice. (6.1 ) Multiprocessor Locking Principle. Make crjtjcal scctivns short by individually pIotccting each critical data qtructure. Use separate critical sect1on5-with separate variables 1.01. the entr-y and exit protocols-hr each critical dala structure.
In our kernel, the critical data are t h e free, rcady, and waiting lisls. To protcct access lo Zliese, wc can use any of the critical section protocols given earlier in this chapter. On a p<wticular multiprocessor, the choice of which I(> use will be al'l'ected hy which special instructioris arc available. For example. if there is a Fetch-and-Add instruction, we can use the simple and fair ticket algorithm in Figurc 3.9. Bccause we assume that Waps are handled by [he processor on which the trap occurs and that each processor has its own interval limer, the trap and timer intclxupe hand1e1.s are essentially the same as those found in thc single-processor kernel. There are only two difrerencer;: executing needs now to be a n array, with one entry per processor. and ~ i r n e r - ~ a n d l eneeds s lo lock and unlock the ready l i ~ t . TIie code for Ihe three kcrncl primitives is also essentially the same. Again, the differences are that executing is an array and that Ihe free. rcady. and waiting lists need to be accessed in critical sections. Thc grcaiest changes are to the code for dispatcher. Before, we had one processor and assumed it always had a procesc to execute. Now there could be fewer processes than processors, and hence some processors could bc idle. When a new process is forked (or awakened after an 110 interrupt), it needs to be assigned m an idle processor, il' there is one. This f~rnctionalitycan be provided in one of three wayc:
6.2 A Multiprocessor Kernel
*
+
273
Have each proccssor, when idle. execute a special process that per~odically cxarnines the rcady list until it finds a ready process.
Have n processor executing fork search for an idle processor and assign the new process to it. Use a separate dispatcher process lhal executcs on its own pi.oce
The first approach is the most effrc~ent,in part because idle procesrors havc nothing else to do ilntil they find some process to execute. When the dispatcher finds that the ready Iiqt is empty, i t sets executi n g [i] tto poinl to the descriptor of an idle process and then loads the state of (hat proccss. The code for he idle process is r;hown in Figure 6.3. In essence. I d l e i s a self-dispatches. It first spins unliI the ready list i s not cmpty, then i l rcmoves a process descriptor and begins executing that process. To avoid mernory contention, I d l e should not continuously examine the ready list or continuou.;ly lock and unlock it. Thus we use a TCSL-and-Test-and-Setprotocol similar in structure to the one in Figure 3.4 Since the ready list might be empty again before rdle acquires the lock for the ready list, it needs tn retest whcther the list
is empty. Our remaining concern is lo ensurc fairnesr;. Again, we will cmploy timers to ensure lhat processcs executing outside the kcrnel are forced lo relinquish processors. We assume each proces5or has its own limer, which il uses a< in the single-processor kernel. However, lirners alonc are not cufficient since processes can now be delayed within the kernel waiting lo acquire access to the shared
process I d l e { while (executing [i] == the Idle process) while (ready Iisl empty) Dclay;
(
lock ready lisl; i f (ready list not empty) { remove descriptor from front of ready list; set executing [ i l to point to it;
1 ;nlock ready list:
1 start Lhe interval trnlcr on processor 5.; load state of executing [il ; # with interrupts enabled
1 Figure 6.3
Code for the idle process.
274
Chapter 6
Implementations
kerncl data structuies. T h u ~we need to use a fair solution to the c r i t ~ c seclioil ~~l problcm such ar; the Tic-Hrcaker, Ticket, or Bakeiy alporrthms given in C11;~pler 3 It ~nsteutdwe usc ~ h Test-and-Set c pro~ocol.lhcre is the possibility that prc~ cesses might starve TIiic ic not vcly likely, however, since he crilical sections in the kernel arc very short Figure 6.4 ourllnes a mu1ti processor kernel that i ncorporaks 1' 11 thcsc a ~ ~ u m p l i a nand s decisions. Vdrrable i is lhe indcx oi the processor execulirlg Ihc routines, and lock dncl unlock are cr~ticdlsection entry and exit protocols. Apaln. we have ~gnoredpossible exceptions and Imvc not included code tor I/O interrupt handlers, memory manxgetnent. and s o on. Thc rnultiprocexsor kcrncl in Figure 6.4 employs a singlc ready list thdt is alisulned to bc a FTFO queue. If processes have d ~ ierenl i prlontics, the ready l ~ r l needs to bc n priority queue. H o w e v c ~ ,this will cause a processor l o takc longer when accessing !he ready list, a1 1e;~rtwhcn doing an Insertion, hecaucc tt has ro warch for the approprtatc place in the queue and thcn m e r t the new pmcers at lhnt location. Thus the ready list might become a bottlcneck T f there ale a fixed number 01prrnrity lexels, an clficicnt solution IS to habe O ~ ~CU C U Cper priority level and one lock per queue. With t h ~ reprecentdlon, s insct ting a procew un the zeady list rcquires only ~nsesling~t at the end of the appropriate qucuc. T f the numbei of priority lmels is dynamic. however. the most commonly used schcme 11; to employ a singlc ready list The kernel in Figure 6.4 also assumes that a pr-ocers can execute on any proce\\or In par~icular,the d~spatcheralways select< the first ~ e ~ procesx. d q On some multlprocesiol-l;, proccrscs such as device dl-iveis or filc scrvcrr; rnlght have ro exccute an d specific processor bccause a periphe~alclevicc is attached only to t h ~ processor t In this caw, edch such procesqor ~houldh;lve its ow11 rcndy I l q t and pcrhap~t t s own diipatchcr. (The q j ~ u ~ ~ t gel5 i o n more complicntccl if a special processor car1 a150 execute repillar procccses since it (hen needs to bc ctble to schedule Ihcrn. too ) processType processDescriptor[maxProcsl; ink executing [maxProcs]; # one entxy per processor declarations of free. ready. and waiting list5 and their locks; SVC-Handler : I # entered with interrupts inhibited on p r o c e s s o r i save state of executing [ i I ; detcrnline which primitive was invokcd, then call it ; 1 Timer-Handler: { # entered w i t h interrupts i n h i b i t e d on processor i
6.2 A Multiprocessor Kerner
lock ready lisl: iniert executing [i] iilend; i~illockready
275
Ii.st;
executing [i] = 0; dispatcher ( ) ;
1 procedure fork (initial process state) { lock lree list; rernove a descriptor; unlock free list; initialize tkc dcscriptor; luck ready Iisl; insefl descriplvr al end: unlock ~ e a yd list; dispatcher 0
;
I procedure quit ( )
{
lock free lisl; itlserl executing [i] at end; unlock free Iisl; record that executing [i] has quil; executing ti] = 0; i f
(parent process is wailing) { lock waiting list; remove parent from that list; unlock waiting list; lock ready list: put parent on ready list: unlock ready list;
1
dispatcher 0 ;
3 procedure join (nainc or child process) { if (child has already quit) return; executing[i] = 0; lock waiting Itst; put executing [il on [hat l i h t ; unlock waiting list; dispatcher ( ) ;
1 procedure dispatcher0 { if (executing[il = = 0) {
Iock ready list; if (ready list not empty ) ( remove descriptor rroni ready list ; set executing [il to point to it;
1 else
# ready list i s empty
~ e executing t [ i l to point to Idle process ; unlock ready list;
I
if (executingiil i c notthe I d l e proccssl start timer on processor i; load stdte of executing [il ; # with interrupts enabled
1 Figure 6.4 Outline of a kernel for a shared-memory multiprocessor.
276
Chapter 6
Implementations
Evcn if a procecs can execute on any pmceshol; i~ may be very inefficient 21, cchedulc il on an arbiiiary processor. 011 a nonunifnmm mcmory access macl~ine, fclr- example, processors can acccss local mcmary more rapidly than remote memory. Hcnce, a proccsl; sl~ouldideally cxecutc ~ I the I processor whose local rncmo1.y coinlains i l h code. Thi\ quggests having a separate rcacly list per proces
6.3 Implementing Semaphores in a Kernel Sincc [he se~naphoreoperations are special cases or' await 5tatemcntx. we can implernenl lheln uxing bu\y waiting and the technirli~esof Chapter 3. However, (he only lraxon onc ]nigh[ wallt 10 do so ic lo be able to write paralJc1 programs using \emaphores rather than lower-level sprn locks and flags. Consccluently, we will illst show how tn add semaphnrcs tn the kernel described in the previous sec-
Lions. This involves augmenting the kel-ncl with semaphore descriplors and three additional primilives: createsern. Psern. and vsem. (A libra~ysuch as Ptlireadr is implementcd in a similar way; hr>wc~er.a library runs on top of an operating system, so it is entcred by means of normal plocedure calls and contains s o f t ~ ~ t ~ r c s i p a l handlcrs rathe1 than hardware interrupt handlers.) A sernapharc descriptor contains the value of one semaphore; it is init1:lliaed by invoking createsern. The Psem and vsem prirnjtives Implement the P and v operations. We assume here that all semaphores m-e general scmaphores. We frrsl show how 10 add bemaphorcs to the single-proce\sof kernel o f Scct~on 6.1 and thcn show how to change thc resulting kcrnel to supporl multiple proceswrs. ac; in Section 6.2.
6.3 Implementing Semaphores in a Kernel
277
Kecall rhat in a single-processor- kernel, one process at a timc is cxeculing, and all others were ready to cxecule or are waiti~igfor 1hei1children ro qiiil. As bcfore. he index of thc descriptor of t l ~ ccseculinp proccss is slored in variahle executing, and the ~IescriptorsI'or ulI ready processes are stored on Ihe rc;~cIy list. When semaphores arc added to the kcrnel, illere js a fourth possiblc pTAoce.ss stale: blocked on a sciuaphore. Tn particrrlar, a process is blocked if ir ic waiting lo complete ;I P operation. 'To keep (rack of blocked proceqscs. each scrnapllnre clescriptor contains a linked list ol' [he rlescriptars of processes blocked on tlzal semaphore. On a single processor, exactly orle process is cxeculing, and iis dexcriptor is on no list: all orher process descriplors are on the ready list. Ihe waiting list, or a blocked list of solme semaphore. For each semnphorc declaration in a concurl-ent program, one call to thc creakesem prirnitivc i s ge~lerated;(lie i;emaphore's initial value i:, passed as an argument. Thc creat esem primitive finds ail empty qernaphore desc~iplor.in itializes the semaphore's valuc and hlocked lisl, and rerul.11s a "name" !'or thc descriptor- This name i s typically eithcr the descriptor's address or an index inlo a table that coiltains the address. After a se1naphol-e is created, it is used by invoking: tllc Psem ancl vsem primitives, which are the kernel mutines lbr the P and v programming primitives. Both have a single argument that is the name ol' a sernaphore descriptor. The psem primitive checks Lhe vaIuc in Lhe descriptor. Tf the value is positive, it is decremented; otherwise. the descriptor of the executing process is inserlcd on the semaphore's blocked list. Similarly, vsem checks the semaphore descriptor's blocked list. 11' it is empty, the semaphore's value is incremented; otherwise one process descriptor is removed from the blocked l i s l and inserted on the ready list. It is common Sor each blocked list to bc iinplernented as a FIFO queue since this ensures ha^ the semaphore operations are fair. Figure 6.5 gives outlines of tliese primitives. They are added lo the routines in [he single processor kernel given earlicr in Figure 6.2. Again, the d i s patcher procedure is called at the end of each primidve; its actions are (he same as beforc. For simplicity, the imp1ementation OF the setnapliore primitives in Figure 6.5 does not reuse seinapkore descrip~nrs. This \.vould he sui'licient il' all seinaphorcs are created once, but in gcneral this i s not the case. Thus it is usually necessary to reuse semaphore descriptors as well as process clcscriptnrs. Onc approach is for Lhe kerncl 1.0 provide an additional destsoysem pi-irni~ive:this would then be invoked by a procecs when it no l o ~ ~ g neecls er a semnphorc. An alternative is to record in the descriplor of each process lhe names of all semaphores thar, tl-ic process created, and whcn rhe process invoke7 q u i t . to
278
Chapter 6
Implementations procedure create~ern(rnitia1value,
i n t *name) get iin empty semaphore dcscriptor ; ~ n i l i n l i ~(he e descriptor; yet name to tlnc lianlc (ii~dcx)of the descriptor; dispatcher ( ;
{
1 procedure Psem(name) [ lind semaphore dewriptor of name; i f (value > 0 ) value = value - 1; else {
inserl descrip~orol' executing at end of blockcd list; # indicate executing is blocked
executing = 0; 1 dispatcher )
;
1 procedure Vsern(name)
find aclnaph~lrcclescriplor ol' n a m e ; if (blockcil liqt cinptll) value = value + 1; else { re~novcpinccss dcscriptor i'rom I'ront uf blocked list; inscl-t thc dcsci-iploi-at end oC ready list; 1 dispatcher ( )
;
1
Figure 6.5
Semaphore primitives for a single-processor kernel.
desh-oy h e semi~phoresit created. With either dpproach, it is imperalive that a semaphore not he u ~ e dafter it has hecrl dcslroyed. Detecting misuse requircs that each descriptor- have a irniq~~c ilanle that i s validated each time psem or. vsem is calletl. This can bc irnyhcnicnteci by lerling the name of a descriptor bc a combination of an incicx-which is used 10 locate the descriptor-and a unique
i;erlucnce nurnkcr. We ~ ~ 1 exte~ld 11 thc slnglc-processor lmplemenlatlon of the semaphorc primitivev in Figure 6.5 into onc for- a rnullipincessar i n the s~lrneway as described in Secrio~i6 2 a ~ i dshown in F i p r e h.4 Again, the critical requircment is lo lock 1;11arcd ilala struclurcs. bul only I'or as long as absolutely rcquircd. Hence, there shol~ Id he a separule lock for each seniaphnre dewriptor. A ~emaphoredescriptor IS locked in Psem and v s e m ji~stbefotc 11 is accessed; the lock is releascd as
6.4 Implementing Monitors in a Kernel
279
soon as the descriptor is no longer ~leeded, Locks ai-e again acquircd and reIeased h y meanr of n busy-waitins solution to thc cr~lic;~l sectian problcm. The is~;ilesthat wcrc discussed a[ the end of Scclion 6.2 also arise i n a muItiptocewor kcrr~clthat implements scrnaphores. In particulal; a process might need to cxccttle on a rpecific proccssol; il may be itnportant to execute a process on thc stline placessor that it 1 x 1 executed on, or it niay be important to coschedule processes o n the camc processor. To support this I'unctionality-or to avoid contention for a shared ready I15t-cach processor might wcll have its own ready list Tn this casc, whcn a procesq is awakcned in the vsem primitive. the process needs to be put an the appropriate ready Iisl. Thus either thc vsem priiniLive needs to Iock a possibly remote ready list, or i l needs to inform another processor and let that processor put Lhe unblockctl prncexs on itc rcady lisl. The first dpproach requires remote lucking; thc sccond requires using soniething like interprocccsor intcrrupls lo send a messagc from one processor to anolher.
6.4 Implementing Monitors in a Kernel Mon~torscan also be readily implemented in a kernel; this scction qhowq how to do so. Thcy can also be sirnulatcd using semaphores; we show how to do that in the next section. The locks and condition variables in a hbrary such as Pthreads or a language such as Java are also implemented in a manner similar to thc kernel described here. We assume the monitor semantics defined in Chapter 5. In pariiculal; pro cedures execute with mutual exclusion, and condition synchronization uses thc Signal and Conrinuc diwipline. We also assume that a monitor's permanent variables are to red in memory accessible to all processes that call the monitor's procedures. Code thai implements the procedures can be storcd in shared memory, or copies of the code can be s~oredin local mcmory on each processor that cxcculeh processe5 that use tlle monitor. Finally, we assume the permanent varial~lesare initialized before the procedurcs are called. This can be accomplished hy. ~'(JI. exarnplc. allocating and 1nitia11zin.gpermanent variables before creating any processes rhal will access thcm. (AlrernaliveIy, init~alizationcode could be executed on t11c iirsl call of a monitor prucedure. However. this is le.;r; efficient since every call woulcl have to check to scc if il were the first.) To itnplement monitor\, we need to add primitives to our kernel for monitor entry, monitor exit, ancl each of the operalions on condition variablex. PI-imitives are dlso needcd ro create descriptors for each monitor and cach condition variable (unles~thcse are crealed when thc kernel itself ic initialized); these are not shown here, becauxe they are analogous to Lhe createsem PI-imilivein Figurc 6 5.
280
Chapter 6
Implementations
Each monitor descriptor mame contain< a lock mLock and an ctltrg qucue of descriptorh ot processes waiting to enter (or reenter) thc mo~~itoi-, The lock is used 10 ensure mrrtual excluxinn. When the lock il; cet, exactly o ~ l cproces\ is executing ~n the rnoilitot; othcrwisc, no process is executing in the monitor. 'fhc descriptor for a condi~ionvariable conlains Ihe head of a queue of descriptors of processes wailing on lhat condition variable Thur; every proccss descriptor--excepI perhaps [hose of executing processes-is linked to cithcr thc ~ e a d ylist. a monitor entry queue. or a condition variable qucuc. Condi~ionvariahle descr~ptorsare contmonly stored adjacent to the descriptor oi' llrc monllor in which Ole cunclition variables are declared. This is done to avoid cxccssive I'ragmentation of kernel storage and to allow the run-time idcnlity of a condition variable simply to be an offset from the start uf the appropriate monilor descriptor. The rnonitor entry primitive enter (mame) findt he descriptor f(,r tnonitor mame, thcn cither scls Ihe monitor lock and aIlows lhe executing procesl; to p~ocecdor blocks rhe process on the monitor entry queue. To enahle thc dcscripLor lo be I'ound quickly, the run-time identtty of m a m e is typically thc adclrcss or the non nit or descriptor. The monitor exit primilive e x i t (mame ) eithel- moves one process from the entry q u e u e to the ready list or clcass thc monitor lock The w a i t ( c v ) statement i s implemented by invoki~lgthe kernel priiiiilive w a i t (mame, cmame ) , and the signal ( cv) slatcinenc is implcmenled by invoking the kerncl prirriitive signal ( m a m e , c N a m e ) . In holh pri rnitives, mame is the "name" of the moililor wilhin which the pri rnitive i s invoked-this could e~tlierbe the index or adclre$s of Ihe monitor's der;criptor-and cName I S [he index or address of the clescriptor of the appropriate conditio~ivariable. Cxecution of wait delays the executing process on the spccificd condirion va1i:zble queue and the11eithcr awakens soxrlc process on Ihc monilor entry q u e u e or clears (he monitor lock. Execution of s i g n a l checks thc condition variable queue. JI' it is empty, the prilnitive simply returns; otherw~sc.Lhc dcscr~ptol-at the tront of the condition variable qucuc is movcd to the cnd or the rnoniIo~-cnlry clueue. Figure 6.6 gives code outlines for lhese primitives. Ac: in previous kcrncls. he pri~ni~ives are entered as a result of a supervisor call, executing pointq to Ihe dei;criplor of the executirlg process, and executing is set to 0 when he executing procesx blockx. Since a process that calls wait cxits thc mon~tor.Ihe w a i t primitive sirnply cdII5 the exit primitive after bloclcinp the execuling procew. The last action of the enter, exit, and signal prilnilives is to ci~llthe dixpalcher. It is straighlforward to implement the orhcr operarions on conclitio~lvariablec. For example, irnplernenting empty ( cv) mere1y ~nvolveslesling whether cv's delay qiicue is empty. In fact, if the delay y ueue is directly accessible to proccsws, il is not neccssasy lo use a kernel primili\,e lo ~rnple~nent empty. This is because the execuling process alrei~dyhas the rnonitor locked, so the contents
6.4 Implementing Monitors in a Kernel
procedure enter(int m a m e )
281
{
find dewriplor for inonitor m a m e ; if (rnLock = = 1) {
insert tlcscriplor ol' executing
at end of entry queuc;
executing = 0;
1 else m L o c k = 1; dispatches(};
# acquire exclusive a c c e s s to rnName
3 procedure exittint m a m e ) {
fncl descriplo~for monitor mame; if (enlry queue nor empty ) move pmccss from front of entry rlueue In reiir of' ready Ilr;t; else mZlock = 0; dispatcher ( ) ;
# clear the lock
1 procedure wait (int xnNWe;
i n t c N ~ M ~{ )
find descriptor for condition i~ariable c ~ m e ; insert descriptor of executing aL encl of delay quc~rcof cName ; executing = 0 ; e x i t (mame) ;
1 procedure signalIint mName;
int cName) {
find descriptor for monitor mame; find descriptor- for condition variable c N a m e ; i f ( delay queue not crnpty )
move proceu from front of delay queue lo rear of cotry queue; dispatcher ( ) ;
1
Figure 6.6
Monitor kernel primilives.
of the condition queue cannot be changed by anothcr process. By iinplernenting empty withoi~tlocking. we would twoid he overhead or a supervisor call and return.
We can also m a l e the implernentatio~i of signal more el'ticient that) is shown in Figure 6.6. In particuIar, wc could inodrry s i g n a l so that il ulwtlys moves a descriptor from the front o l the appropriate delay queuc to the end (>I'ihe appropriate entry qucue. Then, s i g n a l in a progratrl would be ~ranxlaiedinto
282
Chapter 6
Implementations
codc that rcsts t l ~ cdc1ay queue and invokes msignal in t l ~ ckernel only if [lie delay qucuc is cnlpty. By malti~~g thcsc changes. [he overhead 01' kernel entry and cxit is avoided u,hcn signal has 110 cffcct, Indcpenclcnt of how signal is 1ml7le1nented,signal-all is implemenlecl by a kernel pr~mit~ve that moves all descriptorl; l'rom the specified delay queue to the end of the ready list. Priority w a i t is irnplernented analogously to nonpl-iority wait. The only difference is that the descriptor of the executing proccss nceds to bc inserled at he appropriale place on he delay queue. To keep lhat queue ordered, the rank of cach waiting proceas nccds lo bc rccordcd: Ihc Iogical place Irr store the rank is in proccss descriptors. This also lnakcs implementation of minrank lrivial . In fact, minrank-like empty-can be ilnplemenled will~ou! entering the kernel, ac; long ah llie minimum rank can be read direclly by Lhe executing process. Thi5 lcernel can he exlended to one for a multiprucer;~orusing the techniques cler;crjbed in Section 6.2. Again, the key requirement is to protect kcrnel data structui-es froin bcing accesscd simultancausly by processes executing on diffcrcnt proccwors. A n d again, onc has to worry about avoiding memory conlenlion, exploiting caches, co-scheduling processes. and balancing processor load.
On 3 single processor, monitor%can in some cases be implemented even more elliciently withour using a kernel. 11 there are no nested monitor calls-or ~f nested calls are all open calls-and all monitor procedure< are short and guaranteed In tern~inate, it ir; both pnxsible and reasonable to implement mutual esclucion by inhibiting interrupts. This i~ done ac follows. On c n t q lo a monitor, thc cxccuting proccss inhibits all interrupts. Whci~it rctums from a monitor proccdutc, il enables inrcrrupts. 11' the procecs lzas LO wail within a monitor, i t bluuks on a condilion variable queue. and the fact that the process was executing wilh intcirupts inhibited is rccordcd. (This is oIZen encoded in a processor status rcpi;lcr thal li saved when a process block\.) When a wzuting proceqs 1s awakened iis ii rexull rrl' signal, il i \ i u o ~ e dfroin the condition variable queue to the ready lict; the iipnallng process conti tlues to execute Finally, whcnevcr a ready proceqs i s drspatclied, it wcsumcs cxcccltjon with intcrrupts inlzibiled or enabled, depending on wl~cthcrintcrrupls wcrc inhibited or enabled at the time the process blocked. (Ncwl y created proccsscs bcgin execution wtlh inlerrupts enabled.} This implemen(a1lon does away with kernel primi tjves-they become either in-line code 01 regular subroutines-and it does away with monitor descriptors. Since intcrrupts arc inhibilcd mshilc a process is cxecuti~~g in a rnonilor, il c;innot he lbroed LO relinquish the processor. T h u ~the process has exclusive access to Ihe mon~lorilntil il wails or I-e1u1.n.s. Assunling monilor procedures terminate, eventually the process will wail or return. If the process waits, when it is awakened and resumes execution, interrupts are again inhibited. Consequently, the procevs again has exclusive control of the monitor in which it waited. Nesled
I
6.5 lmplement~ngMon~torsUsing Semaphores
283
monitor. calls cannot he allowed, however. IT (hey were, anolher pi-ocess might slast cxccuting in the monitor from w h i d ~the nested call was made while a process is waiting in a second monitor. Monilnrs are implcmcntcd in essentially this way i n the UNIX operating system. I n tact, on entry to a monilor procedure In UNIX, intcrrugts are inhibited only frurn tho.;e devices that could cause some other process lo call !hc sarnc monitor before the inte~ruptedprocess wailta or rctul-na from thc moniror. 111 general, kowevcr, a kernel implementation is needecl since not 311 monitor-based programs mcct thc rcquireme~~ts for this speciali~edimplementation. In par(icular, only i n a "trusted" program such as w operal~npsystem is 11likely lhal all rnonilor procedures can he guaranteed to terminate. Also, the speciali~edimplemctltatioti only workc on a single procehsar. On a multiprocessol, lock!. 01' some form are s t i l l rcquircd to cnsur-e mutual exclusion of processes executi~lgun di f'l'erenl pl-ocessors.
6.5 lmpiementing Monitors Using Semaphores The previous seclion showed how to implement monitors using n kernel. Here we show how to i~nplementthem us~ilgsetnaphorcs. Thc rcasonq for clo111g so arc that ( I ) a ~oftw~1t.e library rniehl support semaphores but no( monitors, or (2) a la~lgiragcruight provide semaphores but not monilors. In ally evenl, the solution illustrates another cxarnple of the use of semaplzores. Again we assume he monitor semantics defined in Section 5.1. Hence, our concerns are ilnplementine mutual exclusion of monitor praccdures and condition sy nchroniaation belween monitor procedures. In particular, we need to develop (1) entry code that is cxcculed aftcr a process calls a rnonitor procedure but before the procesh begins execuling the procedure body. (2) cxit code that is executed just before a process returns from a procedure body, and (3) code h a t implements wait, signal arid the nthcr operations on condition variables. For simplicity, we assume thcrc is olzly one condition variable; the code we develop below can i-e;idily be generalized to having Inore condrtio~ivariables by uring arr-dys of deldy queue:, and counleis. To implement rnonitor exclusion, h e use one entry se~naphorcfor each monitor. Let e be (he xernaphore associated with norl lit or M. Since e i s to be useti for mutual exclusion, i t s initial value is 1,and ils valuc is always o or I. The purpose of the entry p~otocolol each piacedu~-ein M i s to acqi~irccxclusivc acccss to M: as usual with semaphores, this is implemented by ~ ( e )Similarly. . [he exit protocol of cach procedure releases exclusive access, so i t ii; implemenled by v ( e ) .
,
I 4
284
Chapter 6
Implementations
Execution of wait (cv) releases monilor exclr~sionand delays the executing proccss on condition variable cv. The process resumes execlit ion in the monitor after i t has been signaled and after il can regain exclusive cont~olol' lhe monitor. Because a conditioi~variahle is essentially a queue of del,~yedprocewes, we can LIxxurne there is a queue data type that iriiplernents a FIFO qucue of processes and we can use an m a y delays of this type. We can also use an intcgcr counter, nc, to count the number of delayed proceqses. Inirially delayQ 1s einpty and nc is zcro, because there are no waiting processes. Whcn a process executes w a i t ( cv), i t increments nc then adds its clexcripror lo delayQ. The procesr; next releases the monilor lock by executing V ( e ) ;tnd lhcn blocks itself on a private semaphore, one that only it waits an. After the process is awaiccned. it waits to reenter the motiitor by executing F ( e1 . When rt process exccutes signal ( c v ) , it firrt checks nc to see if there are any waiting processes. If nol, the signal has no effect. If thcre are waiting processec, the slgnaler decrements nc. removes the oldesl waihng process T~.ornthe front of delay^, then signals its private semaphore.' Figure 6.7 contains the code for impIementing monitor rynchroni7atio1-1 using scinaphor-cs. Mutual exclusion is ensured since semaphore e is initially I
and cllery cxecution path executes alternate P (el and V ( e ) operations. Also, nc i~ pos~livewhen ar Icasl one process is delayed or is about to delay. Give11the above rcprcscntation for condition variables, it is strs~ightlbrward to I mplernent the empty primitive. In particular, empty rcturils true if nc i < zero. and i t returns false if nc is positive. It is also s!raiglltforw:~rdl o irnplerricili signal-all. Rcrnove each waiting process limn delayQ and qignal its pribate scrnaphorc. then 5ct nc to 7et-o. To implement priority wait, i t is sufficient tc) male delayQ n priorily qucuc ordcrcd by delay ranks. Tlri.; requires adding a iieId Tor the dclay rank to each queue clement. With this addition, i t is trivial to ill-~plcnilcntminrank: just rcturn the delay rank of the process ,it the front of delayQ.
Historical Notes Thc f o r k , join,and q u i t primitives were in~n~cluced by Dennih and Van Horn 119661. Variants of fork,j o i n , and q u i t are provided by most operating systcms; lor cxamplc. UNIX [Ritchie & Thompson 19741 provldes similar I OHCtnisht think thot it would bc sufficient to usc a se~n:tphn~.eI b r the dclay queue. becalrze 11 scn~aptrorcis csscntially a ql~cuco f blocketl pruceises. 'l'liis w o ~ ~ lobvjatc ri (Re need Tor an explicil dr:lay cluerle and an arrnv nf ~)rivirte scrnaphorcs Hrrwever, thcxe is a whtle prrlblrm: Allhr~ughP anrl v oprralirms arc a r [ m ~ i SCLIIIEIICCS ~, of tlieln are ncn. In partict~la~:a proceix cruld be prsenlple~tin wait arler execulinp v ( e ) and b e h e hluckinp. Ertrciac: 6-12 explorec Iliir prtlhlern.
Historical Notes
sharcri variables:
monitor entry: wait ( c v ) :
= I; int nc = 0; queue delayQ; s e m private[Nl; senr e
285
# one copy p e r monitor # one copy per condition # variable cv # one entry per process
P (e); nc = n c + l ; insert rnyid on delayQ; V ( e ) ; P(privateTmykdJ}; P(e);
signal (cv): if (nc r 0) nc = nc-1; rclnove otherid from delayQ; V(private [ o t h e r i d ] ) : 1
monitor exit: Figure 6.7
v ( e );
Implementing rnonilors using semaphores.
system calls namcci fork, wait, a ~ exit. ~ d Similar primilives have also bccn included in several programming languages. xuch as PLIl. Me.s&~. ancl Moclttlu-3. Many operating systerrl tcxtq describe implerncnlalions of singlc-procer~or kernels. Blc and Shaw 11988 1 and Hull [I 983 1 contain parlicularly good descriptions. Those books also dcscribe the other r u ~ ~ ~ t i oiln n : ,operating syhlem inusr support-quch as a file syslem and lncrnory 111;inagement-ancl how they rclatc to Ole kerneI. Thompson 1 1 9781 descrihcc the implementation of the UNlX kcnicl: Hull [ I 9831 describe7 a UNIX-compa~ibiecys~cmcalled Tunis. Unfortunatcly. opet-;ding systems tcxls do not descri bc rnul~iprocessurkmnels in any dctail. However, an exccllcnt report on cxpcricnce with qome of Lhe early mulljprocessorr; devclopcd at Canegie-McIlon University appcars in ;I survey paper by Jones and S c h w a r ~11 9801; the Mu1liprt)ceh~osCocking Principle (6.1) come5 fram thal paper. A few multiprocessor operating syslems arc discussed i n Hwang 119'$3] and Altr~asiand Go~tlieh[1994]. Tucker and Gupva 1 19XgI descl-i bc process control and scheduling i s s ~ ~ fore r shared-memory multiprocessors with uniform mernory access lrme Scott et :tI. I19901 cliscusr kcrncl issues f'or nonuniform memory access (NUMA) multiproccssors. including thc use al' multiplc rcady lists. In general. Ihc proceed~ngsof the i'ollowing rhtce conferences are cxcellent sources for nluch o f the best recent work on language and software issues related to multiprocessors: Architecti~ral Support for Programming Languages and Operating Systems (ASPI,OS), Symposium on
286
Chapter 6
lmplementations
Ol'eraling S y slems Principles (SOSP). and Principles and Practice or P:irnllel Program~ning(PPoPP). Several papers ; ~ n d hooks descr~bekernel implcrncntal~onsof mon~tor-5. Wirth [ 19771 descr~besthe Moclula l c ~ n c l .Holl el al. 1 19781 describe both single and rnultiplc proccisor kcr11eIs lor CSPlk. Holl [I9831 describer; kernels for Concurr-eni buclid. Joseph et al. [ 19841 present the design and implcmcnta~ionol' a comple~eoperallng system for a shared-memory multiprocc~sor;lhey use moni(ors wilhin the npel-;~tingsystern and ~ l i o whow to iniplernenl them ~1r;inga kernel. Thompson [ I 9781 dncl Holt [I 9831 describc thc rmplcmcntation ol*UNIX. The UNIX ker11el imple~nents mutual exclusion by performing context switches only when uher processes block or exit thc lcernel and by inhihiting exlernal inlerrupts at critical points. The UNIX cqujvalenl of a condition variable is called an event-an arbitrary integer that is typically he address oi'a descriptor such as a proccss or file descriptor. Within the kernel, process blocks by cxccilllng sleep ( e 1. It i i awakened when mother proccss execu les wakeup(e). The wakeup pl-imitiiie has Sign'il and Contir~ucsernanlicc. 11 i h ;ilr;o a broadcast primitive; namely, wakeup ( e ) :wakens all plocesses blocked on event e. UNIX has no equivalent of the signal primithe lo awaken just one process. Thus. if more than one process could bc wailing for an evenl, each has to check if thc condition r t was waiting [or is slill tn~eand go hack to sleep if it is not.
Hoare's cIassic paper on monitors [ I 9741 deycribes llow to implcrnent them using hernnphores. However. he assumed the Signal and Urgclll Wait discipline r;ilhei. rhan Signal and Continue or Signal. and Wait.
References Almasi, G. S . , and A. GoLtlieb. 1994. Highly Parallel Computing, 2nd ed. Menlo Park, CA: Renjam inlcummings.
Bic. L., and A.C. Shaw. 1988. The Logical Design qf'Operufing ,~ysf~~rn,s, 2nd ed. Englewvod Clif'Fs. NJ: Prentice-Hall. Dcnnis, J . B., and E. C. V~11Hnrn. 1966. Programming semantics for mulcipropil111111ed cornpiilalions. Cornm. RCM 9. 3 (March); 143-55.
Heal-e. C.A.R. 1 974. Monitors: An operating sysccln skucluring concept. Uomm. ACM 1 7, I 0 {Ocmbcr);549-57. Halt. R. C. 1983. Concurr.cnl E~r,lid TIIP L/A/lX Sy.\ti>na, and Tunis. Reading, MA: Addison-Wesley.
References
287
Holl, K. C., G . S . Grah;im, E. D. Lamwska, and M. A. Scott. 1978. Strurfure~I Concurren t Programming wirh Operating Svstem Applicntinns. Reading. MA: Addison-Wesley.
Hwang, K. 1993. Advanced Computer Archilecture: Purulleiism, Scnlahiiily, Prigrammahility. New York; McGraw-Hill. Jones, A. K.. aiid P.Schwas~.1980. Experience using multiprocessor sy steinsa slatus report. ACM Complkting S U I V P ~12, S 2 (June): 121-65. Joseph, M., V. R. Prasad, and N. Natarajan. 1984. A Mulripmressor Oper~iting Sy.7t.m. New York: Prentice-Hall Internalional.
Ritchie, D. M., and K. Thompson. 1974. The UNTX timesharing systcm. L'omm. ACM 17, 7 (July): 365-75. Scott, M. L.,T. J. LeBlanc, and B. D. Marsh. 1990. Multi-model parallel programming in Psyche. Pro-oc. Second ACM S y ~ n p on . Princip1~s6: Prucrice elf PurnEI~lProg., March. pp. 70-78. Thompson, K. 1978. UNIX implcmenlalion. The Bell System Tcrhrrirn/ Jourrtul 57, 6, part 2 (July-August): 193146.
Tucker, A.. and A. Gupta. 1989. Process control and schcdulii~gishues lor multiprogram~ned %hared-memory multiprocessors. Pmc. Tw~lfTh ACM Symp. on Operalrag Systems Principles, December, pp. 1551-66.
Wirth, N. 1977. Design and i~nple~nentation of Modula. Sqflnrc~~.e-Pmoticc and Exi>eriencc 7. 67-84.
Exercises 6.1 In tbe rnultiproccssor kcmel described in Seclion 6.2, a processor exccutcs the I d l e process when it finds the ready list empty (Figure 6.3). On soinc machines. there ir a bit in the processor status word that, if scr. causcs a processor to do nothing until it is intermpted. Such machines also provide interprocessor interrupts-i.e., one processor can interrupt a scconcl, which causes the ~ecorldproceqsor to enter a kcrnel CPU interrupt handler.
Modify the multiprocessor kernel in Figure 6.4 so that a processor sets ils idle bit if it finds the rcady list cmpty. Hence another processor will need to awaken it whcn thcrc is a process for it to execute. 6.2 Suppose process dispatching is handled by one master processol. In particular, all the master does is execute a dispatcher process. Other processors executc
regular processes and kernel routines.
288
Chapter 6
Irnplernentations
Des~gnthe d~spatcherprocess, and modify the multiproceccor kernel in Figure 6.4 as appropriate. Define any dalu xtructu~csyou izecd. Kcmcmbcr lhat an idlc processor rnusr not block inhide Ihe kernel sincc that prcvcntq olhcr processorc f'rom enlerlng thc Iccrncl. 6.3 I n a multiprocess~rsystem, assume each processor has its own rcndy list and that it execules oiily lhose processes on its ready list. As discussed in the text: this I-ilises[he issuc of load balancing since ii new process has to be assigned lo some processor. There ;Ire numerous Inad-balancing schemes-e.g., assign to a ratldom processor. a\\ign to a '-neighbo~,"or keep ready lists roughly cqual in length. Pick some scheme. juctify yoi~rC ~ O ~ C Ci~nd . modify thc multiprocer
6.4 M o d i b thc multiprocessor kerneI in Figrrre 6.4 so Ihal a process is generally cxecutcd by the same proce\sor lhal execuled il last. I-lowcvlcr, your solution xhould avoid starving proceqqes; i.e , every procesq should periodically gel a chance lo exectile. Also considcr hat tu do about idle processors. Define any data structure.; you need. Explain the rationale for your solution.
6.5 Add an additional primitjve, multif ork (initialstate [*I ) . (O lhe inulliprocessor kernel (Figure 6.4). The argument is an array ol' i~litiillprocess slaleu. Execution or multifork creates one procesc for each argument and specifies that all of the newly created processes ure to be co-schcdulcd on thc samc proces
6.6 The semaphoi-e prilllitivcs in Figure 6.5 and the monitor primitives in Figure 6.6 are assumecl to he part of Ihe single proccsfor kernel in Figure 6.2.Modify the primitives
An application procecs callx getc Ln receive the next iilpi~tcharactcr: it calls putc to send a charactcr to the display. Assume both input ;uld outpu~are
Exercises
289
buffered-i.e., up to n input ~har~1ctet.s can be stared wailing to be reirievcd hy getc, and up to n output characrers can be stored wailing to be printed.
(a) Develop implementations o r g e t c and putc. assuming thal they are procedures that cxecute outside of thc ketnel. These procedures should use semaphorcs for synchronization. Define any additional 111ocessesyorl need. and dlow rhc actions thc. kernel should take whcn an input 01- output interrupt occurs. Use a startread primitive lo initiatc reading from the keykodrd. and u.;e startwrite to initia~ewr~ttngto lhe display. (b) Develop implernentat~onhof getc and putc, a\xuming that they are ketnel ~rimitives. Agii~n,specicy the actioilr Lhe kei-ncl sliould takc when an input or outp~itinlcn-upt occurs, use startread lo inilrate read~ngfro111 the keyboard. and u w startwrite to initiate writing lo the display. (c) Analyze and cornpat-c the efficiency or yclur answers t o (a) and (h). Conhider lactors sucl~as the number of statements that gcl executed in each case, (lie numbcr of context switches. and the length of time lhe kerncl is locked.
(d) Extend your answer to (a) to suppurt ccl~oingof illpill characters. In particular, each character that is input from (lie keyboard should automlhically be written to thc diaplay.
(c) Extend your answer to (b) to support echoing 01' input characrers 6.8 Suppose input and output on a tcrmin:~l are supporrecl by two procedures: int getline (string *stir) putline(string *str)
An applicalion process calls getline to rekid inpul lines; getline del;~ysiinlil there is another linc OC input, Ttorcs the line in str, and returns the number of characters in the line An application proccss calls putline lo print the line in s t s on thc display, A ljnc contains at mosl MAXLlNE chdracters, iil~dit I S ternitnated by a newline characler, which if not part of lhe Iine it~elf.
Assume both input and output are hi~f'irecl-i.e.. up tn n inpi11lines can be stnrcd wailing to he retricvzd hy g e t l i n e , and up to n output liucs can hc ctorcd walling ro be printed. Also aswrne thal input lincs are echocd Lo Lhe displ:~y-i.e.. each co~npleteinput Iine is also scnl ttr the display. Finally, ashume thar input lines ale "cooked"-i.e.. backspace and I me-kill c haraclcrs are procer qed 1?y getline and do not get returned l o the application proccss. (a) Develop i mplcmcnlalioni; of getline and putline, assuming that they are procedures that cxeoule outcidc ol' the kcmcf. Thew procedures should usc semaphores for synchronization. Define any additional proces5es you need, arrd
290
Chapter 6
Implementations
show t h e actir~nsthe kel nel should lake when an input or oi~zpulinlei~upcoccurs. Use a startread primit~ve to i n ~ t ~ a treading e I'rorn the keyboard, and use startwrite to in~liatewriting Lo the display. (h) Develop implemenl:~~ions nl' g e t 1 ine and put1 ine. assurnii~gthat llley are kerncl prirnitives. Again, spec~fythe actions the kernel should take when an input or output interrupt occur.;, u.;e startread to initiate reading from the keyboard, and use startwrite to initiate writing ta the d~splay.
6.9 Sornc rnacliines providc insuucdons that iinplernent thc P and v ope]-alions on semaphoi=es. Thesc instructions manipulate the value of thc ccmaphorc. rhcn trap into the kernel i f a proceis needs to bc blocltcd or awakened. Design implcmcnt~ttionxfor these machine instrt~ctiuns.and give ~ h kernel c code lor thc assoc~alecl interrupt hancllers. (Hint: Let the mlue or the semaphore become negative; il' a serniiphore is negative, its absoIute value indicales ~ l l enumber of blocked proccsscs.)
6.1 0 Figure 6.6 gives kernel primitives ihat implement moniior entry, ~nonilvrexit. wait. and signal for the Signal and Continue discipline.
(a) Modify the primitive5 to use the S~gnaland Wait disciplme. (b) Modify the primitives to use Lhe SignaI and Urgenl Wail discipline.
6.11 Figure 6.7 shows how to use scinaphorcs to implement monitor cntry, monitor exit, wait, and signal for the Signal and Continue discipline. (a) Modify the irnplementdtion for the Signal and Wait discipline. (h) Modify the i~nplementarionTor the Signal and Urgenl Wait discipline.
6.12 Figure 6.7 shows how to use semaphores to implement mon~torentry, monitor exit. wait, and signal for the Signal and Continue discipline. One would think that wc could simplify the implementation by using a semaphore c for the delay y ueue, as i+ollows: s11a1-eed variables : sem e = 1, c = 0; int nc = 0; inoniior entry : P ( e ) ; wait ( c v ) :
nc = nc+I; V ( e 1 ; P(c); P ( e ) ;
signal(cv) : monitor exit:
if ( n c z 0 ) I nc = nc-1; V ( c ) ;
1
Vie);
UnFortunately. this simpler imp1ernent;ttion is incorrecl. The prabIem occurs if a proces;? is preempted in wait ( c v ) after v ( e ) and hefore P (c) . Explain clearly what car1 go wrong and construct an example to illustrate the problem. (Hinl: Jt is possible for onc process to m i s s a cignal and for another to incorrectly sec two xignals: this can lead to a dcadlock that should not occur.)
Part
2
Distributed Programming
Thc synchronization constructs we havc examined so far are based on reading and writing shared variables. Consecluently, they are most commonly used in concurrent program? that cxcculc on hardware i n which processors share mcmory. Distrihur~d-mernorarchiaecture~ are now common. These include distributed-memory multicomputers as well as networks of machines, ai; descr~bcd in Section I .2. In addition, hybrid combrnations of shared-memory and nctwork nrchitecturcs are sometimes employed, such as a network containing ~lorlcstations and mulriprocessors. Tn a distributed architecture, processors have their own private memory and they interact using ;I comlnunication nelwork rather than a shared memov. I-Icnce, processes cannot comrr~unicaledireclly by shnring variables. Instead, they have to exchange rnessagps with cacli other. To write programs for a distribuled-memtjry architecture, it is Grsl necessary to define the interfaces with the communication nctwork. These could simply be read and write operations a~lalogousto read and u~rileoperations on sharcd variables. However, this would mean that programs would have to employ busywailing synchronization. A better approach is to define special nctwork operations char include synchronization, much u\ semaphore opcrations are special operations on
292
Part 2
Distributed Programming
disti-ihuted-mernoi-y archilecture. A distributed program can. however, be executed on ;I\hnrcd-tnernt~rymu1t iproceshor. just as any concurrent program can he exccutecl on a single, mul~iplcxcdprocessor. In this case, channels are implcmented using cliared lnetnory itlslead of a conim~nicationnetwo~k. In u distributed program, channels are typ~callythc orlly ob-iecls processer; share. Thus each variable is local to one prncess. its caretak~,: This implies that ~ariab1e.s;ire neber subject to concui~entaccess and therefore thal tlo special rnechnn~smI'or mutual exclusiurl is lecluircd. This also implies rhal proccsse5 must cotnmunlcate in order to interacl. Hcnce oiu n u i n concern in Part 2 i q hynchroni 7 inp incerpracesl; communication. HOW t h ~ si s done dcpcnds on Lhe pattern of procesr; inreracrion: producers and ctmsunlers, clients and servers, or inlcracting pecrs (see Secllon 1.3). Scvcral dil'ferent rncchunisms for dish-ibuted programllzing have been proposed. The\e vary in the way channels are namcd and used and the way communication is synchronlzcd. Fos examplc. channels can provide one-way or twoway inhimalion flow, and comtnunica~ioncan he asynchronous (nonblocking) or synchronous (blocking). In Part 2, we dcscribe the lous combinalious: asq7nc hranous message paxsing, synchronous rneswge passing, remote procedure call (RPC), and rendezvour;. All four mecl13nis1n~are eclui~,alentin the sense that a prvgratn written using one set of primitives can he rewritten using any of the others. However, as we $hall see, mcssage passing is besl for programming producers and consumers and interacting peers, whereas RPC and rendezvous are best for programming clients and servcrs. The figure helow illustrates how thc four distributed programming mechanisms are relatcd to each othcr as well as to the mechanism? uxing shared vnriabler devcribcd in Part 1. In particular, semaphores arc a11 outgrowlh of busy wailing; monitors combine implicit exclusion with the explicit signaling of semaphores; message passing extends semaphores with data; and RPC and rendezvous combine the procedural interface of !monitors with implicit message passing.
Busy walling
.1
J
Monitors
\
RPC +----F
Message passing '/A
Rendezvous
Part 2
Distributed Programming
293
Chaplet 7 exarnit~cswzPssage paF ~ i n g In , which communication channels provide a onc-way p a ~ hfrom a send~ngto a reie~vingproccqs. Channels ru-c FIFO queues of pcnd~nginessages. Thcy are accessed by mcaris 01' two prirnitiveq s e n d and receive, To iililiate a communication, a process scnds a rncssage tu a channel; anotha process acquires the meurtge by rcccivir~gfroin he channel Sending a lliessage can be a5ynchronouh (nonblock~ng)or xynchronou.; (blocking); receiving a message is Invariably bluck~r~g. as that makes programming easrer and more efficient. In Chapter 7, mc jirsl define asynchmnou5 rne$sagc passing pnmi~iveqand thcn pre+ent a nurnhcr oS examples that
I
I
294
Part 2
Distributed Programming
implement RPC and rendezvous. When writing a program for n shared-memory machine, we usually employ the techniques described in Part 1 of this hook. Whcn writing a program for a djstribuled-memory machine, we usually employ message passing, RPC, or rendezvous. However, it is also possible to mix and match: shared variables can be used on distribured-memory machines. and message passing can be used on shared-memory machines. Section 10.4 describes how to implement a di.s~rihutedshared rnemmy (DSM), which provides a shared-memory programming model on a distributed-memory machne.
Message Passing
As nnted. messagc passing can be asynchronous or synchronous. hsynchi-onous message passirrg-the more commonly used meclianisrn-ivill he the Ibcus of (his chapter. With asynchroi~ousmessagc passing, channels are like ~crnaphorec that carry data. and the sena and receive primitives are Iilcc the v and P operations, respectjvcly. In fact, il' a channeI contains only null messagcs-messngcs u~ithoutany data-then send ancl receive are exactly like v and P, with the number of qt~erled"messages" being Lhe value ol' the scrnaphorc. Section 7.1 dcfines the asynchrc)nous message-passi ng primitives, ancl Secrioi~7.2 shows how to ~ i s cIhem to program filters. Section 7.3 considers a nricty oi ciient/server applica~ions,including resource allocation, disk schecluling, and file servcrs. Thesc applications show rhc duality bclween monitors and message passing; they also illustrate how to program what is called ronvrrsotinnol continuif-y, in which n client continues tcl intcract with a server (as in ;i Web applicalion). Scction 7.4 gives an exainple of inleracting peers and ill~zstrtitcs three cururnon con~municatiunpatterns: centralized, symmelric, and ring. Seclion 7.5 describes synchronou5 mess;ige passing and the Lradeofrs l~erwecnit nnrl asynchronous message pashl ng. The ldst four sections of the chaptcr give casc slilciies of three Imgudgeh and a cubm-outine Itbwrtry: (1) CSP (Cornrnumcating SecluentiaJ Pn>ce~scs}.w h ~ c h introduces synch~onousrnescage pasxi ng and what is called guarded comrnunlcation; (2) Linda, which provide5 a Llnlquc combinatron of a shal ed. a~socialive memory (tuple space) and six message-like primitlves. (3) MPI, a commonly used library that provides a variety of process-to-process and glohal cominunica tlon primitlves; mid (4) Java's network packagc, whzch shows how to use wckcts lo program a iimple file server.
I
296
Chapter 7
Message Passing
7.1 Asynchronous Message Passing M;~nydirCcrcnt notat~onshave been p-oposcd for asynchronous mecsage passing. He1 e. we emplo? onc that i s representative and also simple. Wilh asynckronour message passing, 3 channel ix a queue of rncssages that l ~ a v cbcen \ent hut 1101 yet recei~~crl. A chatlnel declaration has the form chan ch ( typel i d l ,
.. ,,
type,
id,)
;
Identifier ch is thc channel'< name. The type, and i d i are the types and names o l the dala lields in niessapcs transmitted via (he channel Thc types are required. hut the field names arc opt~orzaI;wc wiIl use them when it is Izelpful to document what each f eld represen&. As cxamples. the following declares two channels: chan input [char); chan disk-access(int cylinder, int block, i n t c o u n t , char* buffer;)
'The firxt channel, i n p u t , is used to trans~nit single-character messages. The ~econdchannel. di sk-acce ss, contain'; messages h;~ving [our fields, with the licld names indicating their roles. In many cxamples we will employ arrays or channels. nq in chan result Lnl ( i n t ) ;
'Chc indiccs range from e to n- 1 unless we dcclarc another range A process scnds a mexxage to cha~lnelch by cxccuting send ch ( expr,,
. . .,
expr,) ;
The expr, 'ire expreshions whose typcs must be the same 21s lhose 01' the cnrresponding ficldc in the declaralioll 01 ch. Thc effect of executing send is to evaluale (he exprcssion5, the11 to append a message containing these values to the end of the queue a?l;ociilred with channel ch. Because this queue is irnbounded (at least conceptually), exccutinn of send never causes delxy; hence send is a nonhlor.king primitive. A process receives :I mcssagc from channel ch by executing receive ch (var,,
. ..,
var,)
;
Tlie var, arc variablec whoce types must be Ihe same as those of the corresponding fielclx in the declaration uf ch. The el*I*ecl ol' executing receive is to delay the recelver unlil Lkerc is at lcast one tmewage on lhe channel's qucuc. Then the Jncsqage ;it the Iron1 of the qucuc is removed, and ils Iields are assigned to the
7.1 Asynchronous Message Passing
297
var,. Thu\, receive 1s a bloclung pr~rnitivesince ~t might causc delay. The receive pritnitivc has blocking semantics so l11e receiving process does not have rn u\e busy-wailing lo poll the chnnncl il' il has nothing else 10 do until a
message a r i v e h . We ashurne that access t o the content5 of each channcl is alomic and that nlessage delivery is reliable and error-free. Thus. every lncssage that i s scnt tn a channel is eventually delivered. and meqsagcs are not corrupted. Becaure each channel i c also a Grsl-idfirst-out queuc. messages will bc rcceived in thc older in which they were appended lo a channcl. Hence, if a process sends a mcssage to a channel and later send? a secorld message to the same channel, thc two n1er;sagcs will bc received in the ordcr in tbhich they Rcrc sent. Ax u simple example. Figure 7. I conlains a tilscr procesv that receives a streain of characters from channcl input. assembles the characters into lincs. and sends the resulling lines to channel output. The carriage-leturn character CR irldicases Ihe end of a line: a line contains at rnost MAXLINE characters: ;I special value EOL is nppc~~ded lo the auIprrt lo indicatc lhe end of the line. (CR. MAXLINE, and EOL are symbolic conslLmts.) Channels will be declared gIob;il to proccues. as i n 12~gure7.1. since they are xhared by processe?. Any pl-ocew rnay send lo or receive froill any channel. When channeIs are used in llii!, way tlicv are someti~ncsc:~llednrnilho~i?~. However. in i m n y exnmplcs we will consider. each channel will have exact1y one
chan input (char), output (char [MFUELINE]
} ;
C char line[MAXLfNE]; int i = 0; while ( t r u e ) { receive input(lineli1); while (line[i] ! = CR and i < MAXLINE) { # line[O:i-l] c o n t a i n s t h e last i input characters
process Char-to-Line
i
=
i+l;
receive input (line[i] ) ;
1 l i n e [i] = EOL; send output (line); i = 0;
Figure 7.1
Filter process to assemble lines of characters.
298
Chapter 7
Message Passing
receiver, although it may have many senders. I11 this case. a ch;innel is ol'ten cnllcd an input port since it provides a window (porthole) into the receiving proccss. If a channel has just one sender and one receiver, it is often called a link 5ince it provides a direct path From the sending to thc receiving proccss. Usually a process will wan1 lo delay when ii execules receive, bul not always. For example, the process might have other userul work to do if' a message is not yet available for receipt. Or. a process such as a scheduler may need lo cxamine all queued messages i n order to select the best one to service next (e.g., the disk scheduler in Section 7.3). To determine whelher a channel's queue is currently empty, a prncess c ~ l ncall the Rotrlean-valued i'unction
t
This l'uncdon is true if channel ch contains no messages; otherwise, it is fdlse. IJnlike what happens with the corresponciing primitive on rno~litorcondition variables. if a process calls e m p t y and gels back true, thcre may in fact be queued messages by the time the procesc; conlinues execulion. Moreover, if a process calls empty and gets back false, there may not be ally queued messages when the process tries to receive one. {This second situation cannot happen il' the process is the only onc to receive from the channel.) Although empty is 3 useful prilnitivc, one necds lo be careful when using it.
3 L,
l,,
7.2 Fiifers: A Sorting Network A,filtor is a proccss that receives mewager frotn one or more input channels and sends mewages to one or Inore output channels. The output of a filter is a fut~clion or its input and its initial statc. Hence. an appropriate specification for st filler is r l predicate lhat relates the value of rncssagcs scnt on output channels to 11ic values of message!, received on input channels. The actions Lhe lilcer takes in response to recc~vinginput must ensure this relation every time the filter sends outpul. To illustrate how filters are developed and programmed, consider the problciu of sorting a list of n number5 into ascending order. The most direct way to qolve the problem is to write a single filter process, Sort,that receives the input from one channel. ernployi; one of the standard sorting algorithms, then writes the rcsult to another channel. Let i n p u t he the input channel, and let output be the output channcl. Assumc that the n value? to be sorted are sent to input by some unspecified process. Then the god of thc sorting process is to ensure that the v;ilues sent lo output are ol+dered and arc a pelmutation of the values
7.2 Filters: A Sorting Network
299
received from Input. Let sent [i] indicate the i'th vnluu scnr to output. Thus the goal can bc specified by lhe rolluwing pl-edicatc:
SORT: (b' i:1 <= i c n: sent[i] <= sent[i+l]) A values scnt to output ;re ;I perrr~utatio~~ of values I'rom input An nutlinc of thc sort process is process Sort
{
receive all nwmbcrs from channel i n p u t ; sort the numbers; send the sorted i~urnbersto channel o u ~ p u t ; 1
Since receive is n blocking pr~milive,a practical concern is for s o r t to dctcnz~inewhen it has rccci\~cdall the numbers. One solution is for sort 10 know the value of n it) advance. A more gencral solulion is foi- n lo be the firs1 input value ancl for thc numbers theinselvcs lo he the next n lnpul valuec;. A n cvcn more general wlutiun is to end thc input stream with a .ren!inel valuc. which is a special value rhal indicates that all numbers h a t e heerl received. Thc PaLLer solut~onis thc inos~general sjnoc he pl.iicess producing Ihe input ciocx 1101 itself need to know in advance how nlany vill~~es it will produce. I f processes are "hcavywcight ' ob,jects, as lhcy are in mort opc~.uIiiigsystems, then Ihe approach uscd abovt: in Sort wo~ilclbe the ~nrlstcflicicnt way to solve the sorling problem. However, a rliffcrcnt approach-which is amenable to d~rcctimplementation in hardware-is to crnploy a network of s~nallpsocesce\ thnc execute In parallel ancl interilct to qolvc the problem. (A hybrid approach would be lo employ a nctwork of medium-sircd proces\es.) Thcic are many kinds or sorling networks. j u s ~as there are many dif'i'crerit intci-nal sorling algorithms. Here we preserlti~nrerg.jip n~twork. The idca behind a merge nctworli 1s repeatedly-and in parallel-to merge two sorted lixls into a longer sorlecl list. The network i s con<tructeil out of Merge filters. Each Merge filter ~eceivesvalucs from two ordescd inpul streams i n 1 and in2 and produces one ordered output slream out. Assume that the ends ol' the inprtt streams arc mar-ked by a senttnel EOS as d~scusscdabove. Also asqume thal M e r g e appends EOS to the cnd of the output strciun. If there are n input xalues, not counting ihc sentinels, then the following 5hould he [rue when Merge terininates: MERGE: i n 1 and in2 areempty
(b i:
A
sent[n+l]
==
Eos
A
1 <= i < n: sent [i] < = sent [i+l]) A values sent to out are a permutation of valucs frorn i n 1 and i n 2
300
Chapter 7
Message Passing
The fir\t linc of MERGE says ,111 illpul has bccn consurned and eos has been appended ro rhe end of out; the second line says the output is ordered; the last two line7 say lhal Ihe output 1s a permutation 01' Ihe input. Onc mtay to implemenl Merge is to receive all input villue~.lnerge rhcm. then se11cl tlzc mergcd liqt to o u t . However, this requires storing all input values. Since the inpul strcarns are ordered, ;I heller way to irnplemcnt Merge i s repeatedly lo coinpai-c thc next two v;~luesreceived from in1 and i n 2 and to send the ~ n ~ n l i lo e r out. Lei v l and v2 be these v;~lues. This leads to the filter procesl, r;howii i n Figure 7.2. To form a sorllng network. me can employ a collectioll of Merge processes and arrayr of input and output channels. Axsum~ng[hat the nutllber of input v;~lues n ir a powel or 2, lhe pmccsses and channels are connecled so that the
chan inllint), inZ(int), out(int): process Merge { ink v1, v2; receive inl(v1); # get f i r s t two input values receive in21v2); # send smaller value to output channel and repeat while (vl ! = E O S and v 2 ! = EOS) I if(vl c = v2) ( send out (vl); receive in1 (vl); 1 else # Iv2 < vl) { send out(v2); receive inZ(v2); ) }
# consume the rest of the non-empty input channel if (vl = = E O S )
while ( v 2 ? = E Q S ) { send out(v2); receive in2(v2); ) else # ( v 2 == EOS) while (vl I = EOS) { send out (vl); receive inlivl) ; 1 # append a sentinel to the output channel send o u t ( ~ 0 S ) ;
I Figure 7.2
A filter process that merges two input streams.
7.2 Filters: A Sorting Network
301
resulting co~nmunicationpattcrn furrns a tree. as depicted in Figure 7.3. Infoma\ion in the sortii~gnetwork flow!, from left to right. Each node at the left is €hen ~ w o input valuec. which i t merges lo form a stream nt' two sorted value<. The next nodes form htrearns of four sorted value<, and so on. Thc ri~hlinost11odc pnoducrs thc 1in;ll corted stream. The sorting ~lctwo~k conl:linf n-1 procesces; Ihc width of the network is log, n. Tu r e a l i ~ ethe solling nctwork in Figure 7.3. thc input and output cIi;un~iel~ need to be bharcd. In particular. the output channcl ufed by one inst;incc of Merge nccdc to hc the samc ac: onc of the inp~rtchanneIs uxed by the next instance of Herge in Lhe graph. This can be programmed in one of two ways. The Grsl ;ipproacli is to use ~luta'cnaming: declare all channels to he a global array, and havc each instance of Merge receive from two elemcnlx o f the array and send to one other element. T h i ~rciltrires cmbedrling Lhe Iree in an array so that the channel5 accessed hy Mergei are a funclion of i. The second approach is to use dynamic naming, declare ail channels to be global ax ;i?~avc,parameteri ~ the e processes. and givc each process three channels whcn it is created. This rnxkes ~ h cprogramming ol' the Merge procesqes cnhier since each is tcxlually identical. Hi~we\~er, i t scquires having a main proccss that dynamically create$ lhe channel!, and then pasies thcm as parameters to Ihe various Merge processes. A key attribute of filters like Merge is that we can interconnect them i n dilfcrcnt ways. All that is reyuircd is that the outpul produced hy one iiltcr meet Lhe irlprll assumplions of anothcr liltel- A n in~porlantconsequence of t h i ~altr~huteis that 3.; long ax thc cx~ernallyobservable inpul and output hehavicm are the same, w e can rcplace one lilter proces5-or a network of fil~ers-by a different pr-aces\ or network. Fnr example. we cat) I-eplacc the 5ingIe Sort proccss described
Valuel
-
Value2 1
Sorted stream
Value:,
--
Figure 7.3 A sorting network of ~ e r g @processes.
302
Chapter 7
Message Passing
earlier by a netn70rk of Merge procecses plus a proccss (or iietwurk) lhitl dihtributes the i npur valucq to tlic mcrge nctwork. Nctwarks of filtcrr can bc ~ s c dl o snlvc a val-iely or parallel pro~rarnming problems. For example, Secrion 7.6 exarninei; prime numher gcnct'atio~i.and Seclion 9.3 examines dihtnbuted matrix multipIication Thc cxcrcises dcscrihe addi~ionali~pplications.
7.3 Clients and Servers Recall that a 5erves is s procccs that rcpcatcdly handle^ recluesh From cltent processeq. This section sliows how to me aspncl~ionousmexxage pasring to program scrvcrs and their clienis. The firs1 examples show how to turn tnonitors into scrvcrs and how lo implement resource managers. These examples also point oul Ihe d~dalltybetween inon~tol-sand message parsing: cach can directIy simulale the other. We then show how to implement a celf-~chedulingdisk d r l ~ e rand a file server. The self-scheduling disk driver illustrates a third way to structure a xolution to the disk-scheduling problem introduced in Seclion 5.3. l i e file server illustrates an important programming lechniq~iecalled rorzverctltionol conrinr~i~y. Both exaii~plesalso illustrate program qtructures that are directly suppot ted by message passing. in the sense that they yield solution< that are rnorc conlpact lhan is possiblc using any 01' the synchronization rnechanicrns bascd on slzal-ed variables.
7.3.1 Active Manitors A monitor is a resn~ucemanager. Tt en~~ipsulates permanent variables [hat rccord the $rate ot the resource, and it providec a sct of procedures that are called to accev. the rcsourcc. Morcovcr. the procedures execute with inutual e'iclusion and use condition variables for condilio~~ synchroni7ation. Here we show how to simulate these attributes using server procewe' and Inebbage pasung. In short. we show how to program moil~torsas dctive prucesses rather than as palvbe colIcction~:of proccdurcs A\curnc Tor now lhal a mc~nitorhas jusl one operation op and that it does n o t elrlploy conrlition vaiiablcs. 7 ' 1 ~qlnlclure or the monilol- i c tl-iui; monitor Mname {
declarations of permanent variables ; initialization code;
7
7.3 Clients and Servers
303
1 I
procedure op (fornlals) bocly of op; 1
{
1 Assume that M l is the moniloi* invarianl-namcly, ;I prcdicale that speci ties tlie stale of thc permanent variables whcn n o call i s active. To si nlulale Mnarne using Incssage pilssing. we enlploy one sel-ver process Server. The perniancnl variahlcs of Mnarne become Server's local variables; server is tlius the c a r - c ~ ~ k e for. r thohe variables. The server firs1 initializes the v;~riahlcs,then execules 21 permanent loop in which it repeatedly services "ciil Is" of op. A c:11I is simulnled by having a clicnl proccss send a mcsxage to a requesl channcl. rhzn rcccjve the restrll I'rom a reply cl~annel.Thus, the .cervcr repeateclly reccives from thc I-ecluest channel and sends rcsi~ltsto reply channcls. The for.ma1 puramctcrs in mame lsecornc additional variable7 local to Server. To a\ioid having onc client scc the result intended for another, cach clielit neerls its own
I
I
p~+iv;~te r e s u l ~channcl. If the rcsult charinels are declared a$ a global arr-;ry. a client thus needs to pasx the index of' its private eIe111cnrof thc result army lo the server as par1 of the request nicssage. (Soune inessagc-pinsing primitives ;1l1ow a rcceivi~igprocess lo determine llle identity ol' [he scnder.) Figure 7.4 gives outlines for the server ;lnd its clients. Monitor invariant hccomes thc loop invariant of the s e r v e r pmcess. Ji is 1 1 - L I ~aftcr the iniiialization code i s cxeculed atid heihre and alies cnch request is servicccl. In pai'ticular, Mi is true at all points nl which server cornn~~micatcs wilh client processes. hs shown in Figure 7.4, ;L client iinmediatcly waits TOI.:I reply ar'ter scnding a request. Howev-cr, the client caulcl execute nthcr iiclions before wailing i f it has other ulol+lcit could productively do. This is not oficn the case. but it is possiblc since a call is si mulaled by two distinct stalements: send and then receive. I he prograin in Figure 7.4 employs static naming since chat~nelsare globaI to the prucesscs and hence can be rererenced directly. Consequcnlly. each process musl be coded carefully so that it i~scsthe correct channels. For exainplc, c l i e n t [ i ]must nor use the rcply cllanncl of some olher Client [ j 1 . Alternalively. we could employ dynarnic naming by having e a c l client create a privatc reply channel. which it then passes to server as Ihe first ficld of request in place or (he intcgcr index. This would c~isurethat clients could not access each other's rcply channcls. It wn~tldalxt~permil [he number of clicnls to val-y dynamically. (ln Figure 7.4. Ihere IS a fixed number n of clienth.) In genera1, a monitor has multiple proccdun-es. To cxtend Figure 7.4 to handle lhis casc, a client needs to indicate which operation i t is calling, This is done by including an additional argument i1.1 request messages; tlie type of this +
7
, ,
,
,
,.
304
Chapter 7
Message Passing chan request (int client I D , types of illpul \~;llues); chan r e p l y [n](types oi'le~ults);
process Server I int clientID; decla~ationxtrf' other pcrnliir~entvariables; initialization cocle; while ( t r u e ) ( # # loop invariant MI receive request (clientI D , illpul values) 1 code from body of operation op; send r e p l y [clientro](resulth) : 1 process ClientCi = 0 to n - l ] { send request ( i , value arguments ) ; # " c a l l I 1 op receive reply [i] (resull arguments) ; # w a i t f o r reply 1 Figure 7.4
Clients and server with one operation
argument ir; ,un enumerarion type with one enumeration literal for each kind or operation. Since difl'erenl operations w ~ l lno doubt have different value and resl~ll rormalr, we also need to distjnguish between them. This can be pragminmcd using a vari;int record or a unlon type. (Allernatlvcly, we could rn~lke the illgumenl parts of request i d I-eplymcsrages strings of byies, and Ict clients and the herver cncode and decode these.) Figute 7.5 givcs outlines for clients and a server with mulliple opcratrons. The if ctatemcnl in the server is Iikc a casc statement, with one branch for each different kind of operation. The body or each operation retrieves argumenb lroin args and places ]exult values in results. After the i f statement tel-min;ites. Server sends these results to the appropriate client. So far wc have assurnecl that Mname does not employ condi~ion\?ariablcs. Hence. Server never needs to delay while servicing a request since he body of cacli oper;~tionwill bc a cequence of sequenliai statements. We now show how to handle the gcncral case of a rnonito~that has multiple operations and that uses condition synchronization. (The c l i e ~ ~do l s not change since lhey still just hicall" ;In ~1x1-ation by <ending a request and later receiving a reply: the fact that a request inighi not be serviced immediady is transparent.) To ,see ]low lo translate a monitoi- \vith coildition variables into a serwr process, we begin by coilsidering a specific example and then dexribe how to generalize the example. In particular, consider the problem of managing a multiple
7.3 Clients and Servers
305
. . ., op,) ; . .., arg,) ; union(ses,, ..., res,);
type ap-kind = enurn(og,, type arg-type = union(argl, type result-type
=
chan requestrint clientID, op-kind, chan reply [nl (xes-type) ;
arg-type);
process Server I
int clientID; op-kind kind; arg-type args; res-type results; declarations of oLher variables;
ini~ializationcode; while (true) I ## loop invariant MI receive request(clientID, kind, args}; if (kind == op,) { body of op,; 1
-.. else if (kind == op,) c body of oa,; 1 send reply [clientID]( r e s u f i s ) ;
1
1 process Client [i = 0 to n-11 { arg-type myargs; result-type r n y r e s u l t s ; place value argumenls in myargs; send request (i , op, , myargs) ; # wcall" opj receive r e p l y [il {rnyresults); # wait f o r r e p l y
1 Figure 7.5
Clients and server with multiple operations.
unit resource-such as memory or filc blockc, Clients acquii-e unit5 of the I-esourcc, irse them, and Iritter release thci~iback to the manager. Fur s~mplicily, clients acclr~ireancl releaqe unllr one al a time. Figure 7.6 gives a monitor implementation oi' thiq resuurce allocator. We use the method of pasvinp the condition-as dexcrihed in Section 5 I-since that program slructure is most reaclily hnnslatcd into a scrver proccss. Thc free units are stored in a set, which is acce\secl by insert and remow operations. 'Thc resource allocation no nit or has two operations, s o the cqnivalent server process will have the gencral htructure shown in Figure 7.5. One key difrerence is tlial when no unit< arc available, the xervcr process cannot wail when servicing a recluest. 11 inust savc the requcst and dcfer sending a reply. Later, when a unjr
306
Chapter 7
Message Passing monitor Resource~Allocator { int avail = HAXUNITS; set units = initial halues; cond free; # signaled when a process wants a u n i t
procedure acquire (int & i d ) i f (avail == 0 ) w a i t (free); else avail = avail-1; remove(units, id); 1
{
procedure release ( i n t i d ) { i n s e r t (units, i d ) ; if Iempty(free) 1
avail = avail+l; else signal(free); 1
1
Figure 7.6
Resource allocation monitor.
is r d e a x d , the servcr needs to honor one saved reqr~cst,if thew i:, onc. by \ending Lhc rclcased unil to thc requester. Figure 7.7 g i x s an nutline for i11c rcsource alloca~ionserves ancl its clientr. The scrvcr now hac; ncstcd i f r;tatemcnlc. The outer ones have hranches lilr cach kind o f operation. and Ihe inncr ones co~rcspondto the i f statements in Lhe monitor procedures. After sending a reqilesl message. a client naits to receive a unit. However, t~l'Lcrscnding a releasc mcsrage, Ihe clicnt does nut wall for the mexsapc to be processed since there is no need to. Thtq cxampIe illustrates how l o s~mulntea qpecific iuonitor hy 3 servcr proces\. We c ~ l nuxe thc 5amc basic pattern tn simulate any monitc~rt h a ~is 131-0 gra~nmctluqing the technique of pahsing thc condition. Honcver, many of tl~c ~nonitorsi n Chapter 5 had wait statenlcnts embedded 111 loops, or they had u~icondilionallyexecu~edsignal r;taLements. To \imulate such w a i t stnlemcnls. the server would need lo have [he p e n d ~ n gI-equcstas in Figure 7.7 :tnd woulcl also nccd to record what actioni should bc taken whcn the requexi cttil bc ser\,iced. To srmiil;~te'In uncondit~on;~l signal Statenlent. the scsver needs 10 clicck the clueuc of pending rcqucsts. If i t i s cmpty, the server doe:, nothing: if
7.3 Clients and Servers type op-kind = enum(ACQUIRE, RELEASE); chan request(int clientID, op-kind kind, int unitid); chan reply in1 ( i n t unitID) ;
process Allocator { i n t avail = MAXUNITS; s e t u n i t s = initial values; queue pending; # initially empty i n t clientID, unitID; op-kind k i n d ;
declarations of other local variables; while ( t r u e ] { receive reguest(clientID, kind, unitID); if (kind = = ACQUIRE) { if (avail r 0 ) { # honor request now avail--; remove(units, unitID); send reply [clientID](unitID); 1 else # remember request insert (pending, client ID) : else # kind = = RELEASE if empty(pending1 { # return unitID to units avail++; insert(units, unitid); ) else { # allocate unitlD to a waiting c l i e n t removeIpending, c l i e n t I D ) ; send reply LclientID] (unitIn); }
1 1
1 process Clientti = 0 to n-I] { int unitID; send request {i, ACQUIRE, 0 ) # "callu request receive reply [i](unitID); # use resource unitID, then release it send request { i , RELEASE, unit I D ) ;
Figure 7.7
Resource allocator and clients.
307
30%
Chapter 7
Message Pass~ng
tbfonitor- Based P m g ?-unix Permanent variables Proceti~~re idenliliers Procedure call Monitor entry Procedure return wait statement s i g n a l statement Procedure bodies Table 7.1
Local server variables request channel and operation kinds send request(); receive reply receive r e q u e s t 0 send r e p l y ( )
Save pending request Relrievc and process pending request Arins of caqe statement on operation k~ncl
Duality between monitors and message passing.
ihere is a pencling request, lhe server removes onc from the clueue arid processes it qfier processing the operation containing the signal. The exact details dcpcnd on the monitor being simulated: several exercises explvrc specifc examples. 'Thc ~esource-~llocatormollitor in Figure 7.6 and the Allocator yerver in Figure 7.7 poinr out the cl~~alih; betwcen tnonimrs and message passing. I n particular, there is a direct currespondencc between the various n~ecl~anisms uscd in moni~or-basedprograms and those used in message passing programs. Table 7.1 1ih~sLhe correspondences. Since the bndics of rnonilni. procerlures have direct duals in (lie arms of the servci- case statemenl. the relative perrormance of monitol--based versus rncssagcbased programs depends only cm Lhe 1-elativeefficie~lcyo-I" thc implemenlalion of the dil'lcrcnt mech;inisms. On shared-memory machines, procedure calls and actions on condition variables are more efficient than messagc-passing primitives. For this reason, most ()pel-sling systeins for sucl-1. rnacliines are based on a monitor-styIe in~pIcmentation. On the other hand. most distrihuled systems are hasecl an message passing since thal is both efficient and Ihc appropriate abslraction. I[ is also possible to combine aspects of both styles and irnplementalionx, as w e wilI see in Chapter X when we discuss remote procedure call ilnd rendezvous. I n frlc(, this makes (he duality between monitors and message passing even strollsel:
7.3.2 A Self-Scheduling Disk Server I n Section 5.3 we considerecl thc problem oC scheduling access to a moving head d i ~ k .lr-1 h a t scction we constdcrcd two main solut~onr;tlucltrres. In the first
7.3 Clients and Servers
309
(Figure 5.12), the disk rcheduler war a monitor separale from the disk. Thu:, clients first called the scheduler to request access, then used the disk. and finally called the scheduler to release access. In the second ~tructure(Figure 5.13).the scheduler was an intermediary between clients and a disk server prncess. Thux clients had lo call only a single monitor operation. (We also considercd a lhird structure using nested monitor<: when reprogrammed to use message passing, that sstructure is es~entiallyequivalelit to using an ~ntermediary.) We can readily mimic thesc slructures in d mewage-based program by implementing each monitor as a server process using the techniques l'rorn the previous section. However, with message passing an even simpler s1ructu1-c is possible. Tn particular, we can combine the intermedial? and disk dnver of Figure 5.14 into a single. self-scheduling server process. (We cannot do this wit11 monitors since we have co use a monitor tcl implement the comnlunication path between clients and the disk driver.) Figure 7.8 iIlustrates these structures for solving ~ h cdisk-scheduling problem. In all cases, we assume the disk is cnntmlled by a scrver pracess that performs all disk accesses. The principal differences between the three srructures are the client interface-as described above-and the numher of nlessages that musl be exchanged per disk access-as described below. When thc scheduler is qeparate from the disk scrver, five mehsages must he exchangecl per dlsk access: two to request scheduling and gel a reply. rwo to reque5t di\k access and get a reply, and one tcj relcase ihe disk. A clienl I S
Separate scheduler
Scheduler as intermediary
Self-schedulrng d ~ s kdriver
Figure 7.8
Disk-scheduling structures with message passing.
3 10
Chapier 7
Message Passing
i n\olvcd in all five cotntnunicariot~s.Whcil the scheduler i u an intermediary, four kinds ol' tnessage:, have to be cxchanged: The cl~enthas to send a rcquest and wait r o receivc a reply, and he disk driver has to ask thc scheduler for the next request and get the reply. (The dnvcr process cat1 returll the rcsults 01' onc dick acccss recl~~esl when il asks for the ncxt one.) As can be secn iri Figure 7.8, a seli'schcduling dirk driver ha.s the inost attr;ictive structule. In particular, only two rneccages nced to be cxchanged. Thc remaiilder of this. section shows how to ~~rogi+:irn the self-schetIul~ngdril~erprocess, I f the disk driver process d~clno scheduling-i.e.. if disk access \verc first come, first wrveci-then it woultl llavc the structure of s e ~ v c rServer in 1:igut.e 7.4. In c7rtle1- to do schcduling, the driver must examine all pending recpcs\s. w'n:\ck mean5 '\\ TAUS\ .i&c a\\ nchsages that whre waiting on thc request channcl. I1 doeh thi< by executirlg a loop that telminates cthcn the request ch;innel is empty aizd there iu a1 least one saveit request. The drivcr then h e clienl who sctlt the rcquexl. The tlriver can use any or the dislc-scheduling pohcies described in Section 5.3. Figure 7.9 nutlincs a disk driver that employs the shorteqt-seek-time ISST) schcduling policy. The drivel- stoic'; pencling requc5ts in one oi' IWO o i ; l l c ~ ~ d clueueh. l e f t or. right, depending o n whether lhc recluesl i s t o the left or right ol' tlie currcnt porition ol' the disk head. Requests In left are ordered by clocr-easing cyl indur value: those in right are ordcred by increasing cylinder valuc. Viiriable headpos il~dicii~es the currcnt head position; nsaved is a counl ol rhc number oI'sa\led Iequests. Thc illvariant for the ollter loop of the driver i h
SS'T: left ix an ordered queue Irom largest to smullcst c y l A all values of c y l in left are 4 = headpos A right ir: an ordered queue rrom smnllcst lo largest c y l A '111 \slues ol' cyl in right are >= headpos I\ (nsaved == 0) bolh left and right arc emply l n Figurc 7.9, (he empty priinilive is uscd in thc condi~ionof he inner while loop to cletcrrnine whether there are more messages queued on the request channel. This is an example of tbc programming technique called polling. In ihis case, t h e disk driver process rcpcatedly polls the request channel to d e t ~ m ~ i nil'elhcre are pending requesls. If there are, he driver receives anolhcr one, so it has more requests lo cl~ousefrom. I f not. the driver services [he best savecl rccluesl. Polli~lgi s idso useful in other situations and is oftcn
cmployed within hardware-for bus.
example, lo arhitratc access lo a communication
7.3 Clients and Servers
311
chan request ( i n t clientID, int c y l , types ol'othcr argurncnts) chan reply [n] (typcs of lesults) ;
;
process Diek-Driver { queue left, right; # ordered queues o f eaved requests i n t clientID, c y l , headpos = 1, nsaved = 0; variables to hold other argument? In a request; while ( t r u e ) I ## loop invariant SST while (!empty(request) or nsaved == 0) { # wait f o r f i r s t request o x receive another one receive reguest(cLientID, c y l , . . . ) ; if ( c y l c = headpos) insert (left, clientID, c y l , . ) ; else i n s e r t (right, client In, c y l , ) ;
..
.. .
nsavea+*; 1 # select best saved request from left or r i g h t if (size(1eft) = = 0 ) remove(right, clientID. cyl, a r g s ) : else if (sizetright) == 0 ) rernove(left, clientID, cyl, args) ; else I-emuverequesl closest to headpos from left or right ; headpos = cyl; nsaved--;
access the disk; send reply[clientrDl ( r e s u l t s ) ; 1 1 Figure 7.9
Self-scheduling disk driver.
7.3,3 File Servers: Conversafional Continuity As a final example of client/cien?w interaction, wc present one way to implement
lile scrbels. whlch are processes that provide acccss to extciAnalfiles stored on d ~ a k .To access a disk filc. a client Iirht opens [he file. IT the file Lan bc openedthc file exists and the client has permission to access it-then the client make4 ;I series uf read and write requests. Eventually the client closes the file. Suppose that up to n files may be open at once. Further suppose thal access
to each open file is provided by a sepalate filc server process; hence, there are up
312
Chapter 7
Message Passing
to n active file servers. When a client wants to open a file, it first needs to acquire ;I hee tile server. If all lile servers are idcnltcal, any frcc one will do. We could allocale lile serverx It:, clienls by using a separatc allocator proccss. However, since 311 are identical and cornmunicalinn channels are sl~a~*ed. therc is a much simplcr approach. Tn particular, let open he 3 global channel. To aequirc a iile scrver, a client sends a request to open. Each file Yerver, when idle, trics to receive from open. A specific open request from a cIienl will Ihus be received by onc of the idlc file servers. That server sends a sepIy to the client, Lhen pmcccds lo w a i t for accexs requests. A client sends these lo a different channel. access [i],whcrc i is tllc indcx of the file server that replied to the clienl's request lo open a lile. Thus, access is an away of n channcls. Evcntually the client closes the file, nl which time lhe iile server becomes idle and again waits for an open request. Figure 7.10 gives outlines for the lile servers and their clients. A client scnds READ and WRITE requests to the hame server channel. This is necessary since tlie file server cannot In general know the order in which these recluesls w ~ l l bc lnadc and hcncc cannot usc diffcrcnt channel.; for each. For the qame reason, whcn a clicnt wants to closc a channel, it sends a CLOSE request to the m n e access channel. The interaction between it clienl and a serve]- I n Figure 7. I0 1 5 iln example 01' roniy>rc~ltionolr?onriniritv. A client starts ;I conversation wilh a file server when thclt server rcccives the client's open request. The cIient then ctlntinues lo couvcrsc with rhc same servcr. Thir; i 7 p r o ~ r a r n ~ ~by~ ehaving d the file server first reccive from open, tl~cnrcpcatcdly rcceivc from its element of access. The program i n Figurc 7.10 illustrates one possible way to implement file servers. I( assumes lhat open is a shared channel from which any 61c wryer can receive a message. l f cach channcl can havc only one receiver, then a separate file allocator process would he needed Thal procer? would reccivc o p c rcquerl\ ~~ and allocate a free server to a client: file servers wt)uld bus need lo LeIl he allocator when thcy arc frcc. The solution in Figurc 7.10 also employs a fixed nulnher n of file servers. In a language that supports dynamic process and chai~nelcreation. a bcttcrapproach u)ould be Lo creatc filc scn1er5 and access channels dynarriicaliy as needed. Tliis i ~ better , since at any point in time there would only be ac: marly serves? as are actually being uccd; mol-c importantly, there would not he a fixed Lippel- bound on ~ h cnumber of filc scrvers. At tlic other extreme, there couId urnply he one lile serve]-per disk. In tl~iscase, however. cithcr thc filc scrvci or thc cl~entinterface will be much more complex than shown in Figure 7.10. This IS bccausc either I he file server has to keep track of the informatiun a~;stlcialedwith
7.3 Clients and Servers
313
type kind = enum{READ, WRITE, C L O S E ) ; chan open (string fnarne; int clientID) ; chan access [n](int kind, types of other arguments) ; chan open-reply [ml ( i n t servesID) ; # server id or e r r o r chan access-reply [ml (types ol'results) ; #: data, e r r o r ,
. ..
process File-Serverri = 0 to n-I] ( string fname; int clientID;
kina k ; \lafiables for other argument3 ; boo1 more = f a l s e ; variables for local buffer, cache, etc. ; w h i l e {true) C
receive openIfname, clientro); npcn file Ename; if successfu/ then: send open-reply [client~DI( i ); more = t r u e ; while (more) C receive access [ i 1 (k, other argurncnls) ; if (k == READ) process read rcquest; else if ( k == WRITE)
process wrile request; else {
# k == CLOSE
close the file; more = false;
)
process C l i e n t [j = 0 to m-I] ( i n t serverID; declarations ol othor variables; send open("fooV, j); # open file " f o o t ' receive open-reply[jl (serverID); # g e t back server id # u s e file then close it by executing the following
send a c c e s s [ sesverID1 lacccss argumcnls ) ; receive access-rep1 y ljI (I'CSU~IS );
Figure 7.10
File servers and clients.
i
314
Chapter 7
Message Passing
all clients who have liles open 01- clienls havc to p;iss file state infnrmidion wit11 every retlucst. Ye1 anothcr approach, which i s used in the Sun Network Filc System (NFS), is to implement file access solely by means of remote procectures. Then. "opening" a file consist?,of acquiring a descriptor (called a file handle i i l NFS) and a set of fi lc atlr-ibules. These are suhsequct~tlypassed on each call to a file access procedure. Unlike the F i le-Server processes in Figure 7.10, the access p~-ocedurcs in NFS are themselves staleless-all informalion needed to access a file is passed as arguments on each call rn n filc access procedure. This increilses [he
L Q S ~01vxsi~ng Lhrk~ments hut gr.t\y <~rnp\~F\es t\?ek~niS.hn?of both cS~entand servel- crashes, In par.ticular, if a flc servcr C F ~ S ~ C Sthe , cl'lenl Simply resends [he reillrest rrn~jla respc-jnse is received. Il' a clieni crashes. the servcr need do rlothing since it lias no state inforinalion.
7.4 In teracf ing Peers: Exchanging Values The prc\;~o~k cexamples in [his chapter showed how message passing i< u5cd to progi-am filters. client&.and servcrx. Here we examine a simple example of interacting pcerx. Our main pusposcs are lo illustrate three useful communicat~on palterns--cet~tralizerl, jymmetric. and ring-and to describe tltc trideoff~ between ~hcrn.Chap~ers9 and I I give numerous, larger examples of the use of lhesc cornrnu~~ici~tion p;uternc. They occiu €1-equentlyin distributed parallel corn putntion~ant1 in dcccntfi~li~cd cl~\~ributecl systems. Supposc there are n prncehxes and that each process has a local integer v;~lucv. Thc goal is for evcry process to learn the smallest and largest of the n local values. One way to mlvc this problem is to havc one process, say P 101, gather all n valucs, compute the minimum and maximum of them, and then send the re\ulls to ~ h cother processes. Figlzrc 7.1 1 presents a centralized solution ilhiug this approach. Tt employs a total of 2 (n-1) messages. (IT lhere is a broadcast prirnitivc, fihich transmits copies of a message LO every other process, that could he used by P [ 0 1 to send the results; in thir case. thc total number ~I'mcssageswould be n.) In Figure 7.1 1. one proccss does all the " w o r k of compulirig the srnalles! ;~ncllargest valurx; he other processes merely send their values then wait to get back the globill result. A seco~ldway to solve thc problcm i s rtl use 3 sy~nmetric approach ~nwhich each process execuies the same algol ithrn. In particular, each process Lirst sends its local v;llue to all tllc ofhers: then each process, in parallel. computes for itsell' the minimum and maximum or the set of n values. Figure 7.12 give$ the xolution It i s an cxa~npleof a single-piogrnm, ~nulripledata
7.4 Interacting Peers: Exchanging Values
315
chan values{int), results[nl (int smallest, int largest); process P [ O ] { # c o o r d i n a t o r process int v; # assume v has heen i n i t i a l i z e d int n e w , smallest = v, largest = v; # initial s t a t e # gather values and save the srnallesk and largest f o r [i = 1 to n - l ] receive values(new); if (new < smallest) smallest = new; if ( n e w > largest) largest = new; 1 # send t h e results t o the o t h e r processes fox [i = 1 to n-I] send results [il (smallest, l a r g e s t ) 1 process ~= [ 1 to i n-13 { i n t v; # assume v has been initialtzed i n t smallest, Largest; send v a l u e s (v); receive sesults[i](smallest, largest); 1
Figure 7.11
Exchanging values: Centralized solution.
chan values In] (int : process P [ i = 0 to n-11 { i n t v; # assume v has been initialized int new, smallest = v , largest = v; # initial s t a t e # send my value to the other processes f o r [j = 0 to n-l st j ! = i] send values l j l ( v ) ; # g a t h e r values and save t h e smallest and largest f o r [j = 1 to n - I ] { r e c e i v e values [il ( n e w ) ; i f (new c s m a l l e s t ) smallest = new; if (new > largest) largest = new;
I 1
Figure 7.I 2
Exchanging ua\ues Symmetric so2ution.
316
Chapter 7
Message Passing
(SPMD) solutron; In pa~i~cular, each process cxccutes the qmne program, bul works on dilfercnt data. Howe~cs.it elnploy a lotill nt n ( n - 1) me.sx:ijic\ (Aga~n.IIthere is a broadcast pnmilivc, that could bc u\ecl, rerult~ngin a tolid ol'n cI15tinct mewiges.) A third way to solve this problem 1 s Lo organ17e the proces+es into a logic,d ring. in which e x h ptocess P [i] receives rncssagcs from its preclcccssor- and iends incssager; to i t 4 succcssor. 111pdrticular. P [ O ] ?ends to P [I].whrch send.s . procccs to ~ 1 2 1and , so on. nilh ~ [ n - I ]5endlng inessapcc to P C ~ I Each executex two stages. In the first i t receives two ~ a l u c \ deter . mines lhe nlinlrl~u~n and maximum of those two talues plus its own v:~lue,and serlds the resillti to its 5uccessor In the sccond \tage, each pr-occss receives the global mi nirnirm ;uncl maxitnum and pdcqes thcrn on to its successor. 011e process, say P [ o ], acts as the in~tjatorof the coinput ation. Figure 7.13 presents the solution. It I \ altnt~rt syininctric (only P [ O ~ is slightly dil'l'eren~).and it employs otlly 2 ( n - 1 ) Inessascs I'lgurc 7.14 sllustratcs the comrnunicatiorl structures of' these programs f o r the La\e 01' 6 procesws The processes are nodcs In a y-aph, and the edgcs are pairs oi*comrnunication channels As can b e seen, I11c central17ed xolulion I ~ ~ nnc \kt-shaped giaph. with the cooidinalor proces\ in the centcr The tymnielric sol~ilionhas the structure of a complete p a p h In w h ~ c heach nodc is conllected to every olhcr node Tlie i ~ i ~solution g naturalIy hac the structure or a circular ring-ur d ppellne that is closed back on ~lself' The symmetric solution i s the shortest, and it ic casy tn progr~linbccausc every prom53 docs exactly [he sanze thing. Dn [he otlicr hand, it uses Lhe largest nurnhei of mcs5age+ (unless broadcast is a\~a~lablc).These are sent and rece~veclal about the same lime. so they could be Lransmitted In par-allel il' the underlyrng coinrniin~cat~ons netwurk support< cclncurrcnt tranrmiqr;ionc In gcnrral. however. the larger the !-turnher of mcssages sen( by a program, the slanei the program will run Slated differently. communication nverhead greatly dirninishes performance improvements (speedup) that might bc ga~ncdfrom parallcl cxecution (These topics are discussed in detail in the mhoduction to Part 3 ) The centrali~edand ring \olutlons both employ a linear nurnher of mes\ages, 17ilt they havc different communication p~ltternsthat lcad to d~fferttntperfoimance characteristic$ In the centralized solution. thc mcssages sen{ to [he coordinurol- arc all sent at about Lhc same t ~ m e ;Iiencc only the firht receive statement executed by the coordinalo~1s likely to deiay for very long. Siinilarly, he results are hent onc aftcr the other Froin thc coord~r~dtor to the other-proce~ses, so thc other processes should be able to awdkcn rapidly. In the r ~ n gsoIut~on,all processes ale hot11 producen and consumers, anrl they
7.4 Interacting Peers: Exchanging Values chan values [ n l (int smallest, int largest) ; process P €01 { # initiates the exchanges i n t v; # assume v has been initialized i n t smallest = v , largest = v: # initial state # send v to n e x t process, P [lj send values [I] (smallest, largest) ; # get global smallest and largest from P [ n - l ] and # pass them on to P[Ll receive values [Ol (smallest, largest) ; send values [Ll (smallest, largest) ;
1 process P I i = 1 to n-11 { i n t v; # assume v has been initialized int smallest, largest; # r e c e i v e smallest and largest so f a r , then update # them by comparing their values to v receive values [i] (smallest, largest) if ( V c smallest) smallest = v; if (V 5 largest) largest = v; # send the result to the next processes, then wait # to get the global result send values[(i+l) mod n] (smallest, largest); receive values [ i l {smallest, largest} ; if (i < n - 1 ) send values [ i + l l (smallest, largest):
3 Figure 7.13
Exchanging values using a circular ring.
(a) Centralized solution
!
Figure 7.14
(b) Symmetric solution
(c) Ring solution
Communication structures of the three programs.
317
31 X
Chapter 7
Message Pass~ng
(P [n-11)Ira? 10 wail until every r3thel- process has-one at a time-teccived a messagc. donc 3 small ;unoun~of computation, and sen1 its rcsult to the nexl process in ~ h cpipcline. Messages have to crrc~iIiilearound Lhc pipcline two lull lime$ before every proccss has 1e;unecl ll1c globd result. 111 short, the solution i s inlicrently l inear7 with no possibility ol' overlapping t-tlessagetransmissio~~s or of gcuing rid of delays in receive statements. ILctlce, the ring-based solution will pcrfnrm poorly f o r lhls simple problem. On the olhcr hand, this comrnunicalion imlcture I S very ei'rel'ectlve if rcccived messages can quickly be forwarded. and if each proccc? then has to do a rcasonahly l a ~ g eainounl of computat~onbefot-e receiv~ngthc next message. We will sce examples where lhls is thc caw in Chxpters 9 and l I .
7.5 Synchronous Message Passing The send primitive is nonblocking, so :I process thal sends n message can pmteed asynchronously with lespecl to the process that will eventually rcceive the message. With synchrsnous message passing, sending a nlessage causes t h e sender l o block until tlie tnessage is received. We will distinguish a blocking scncl by using a dii'ferent primitive; synch-send. 9'hc arguments or the I\%T primitive are the same-Ule narnc of a channel and the ~ncssageto be sent-hut Ihe semantics arc different. The advantape of synchronous rncssage passing is hat thcrc is a hound on (he size (>I'communication channels. and hence on buffer space. This is because ;I J ~ T O C ~ can S S have at most one illessagc at a t i m e clueued LIPon any channel. Not uli ti l that mesxage is received can the sendi~lgprocesr continue and send ;inother mecsagr. In h c ~ an , implementaticm ol'synchronous message passing call I c a ~ ea nlcssnpe 111 the address space of the sending proccss until the receiver is ready I'cor it. thcn thc i~nplernentalio~~ can copy the mexxage dircctly into t h e receiver's address space. With this kind of implcrnenration, (he cnnteur of a channel i s ~iler-elyii clueue or addrcsces of messages that arc waiting to be sent. Howeve]; synchronous message passing has two disadviintages relatiw to synchronous messagc passing. Firs(, concur-rcncy is reduced. When two pxocesses communicate, at least one oT rhcm rvil l have to block-dcpcnding on \vhet.her Lhe sender or rcceiver tries to coinrnu~licaiefirst. Consider [he f.ollowing pruclucel-iconsumcr cxarx~ple: channel values(int);
process Producer int data[nl;
{
7.5 Synchronous Message Passing
319
for [i = 0 to n-11 { do soine cornputatiotl; synch-send values(data[i]): 1 1 process Consumer { int r e s u l t s [ n ] ; f o r [i = 0 to n - I ] { r e c e i v e values(results[i]); do some computation; 1 J
Suppose the computat~onalparts are such that Producer is at tlmes f'aster ~ h n n Consumer and at times slower. hence, thc proccsscs will arrive a1 comrnunicat ~ o nstatement\ at ddrfferent tnncs. T h ~ w s ~ l cause l cach pair ol send/receive stateinents to dclay, and thc lolnl execution time will be ii~crcaccdby the w n of all the delay time\ In contract, with asynchronous message pacrlng thcrc rnighl he nu delay lime-when the Producer is fastel; it.s me.;silges wrll gct queucd on the channel. and when consumer is rasler. there might be enough nletwge\ queued up so that receive never blocks. To achievc thc same ericcls with sy~ichronous message passing require\ interposing n buffer pr0ces.s hetween Producer and Consumer. (We show how to do so In thc ncxt sect~on.) Concurrency is aIso reduccd in iome clienl/server ~nteractiom. When a client is relearing a resource-as I n Figure 7.7-these i s u~uallyno reason it needs tn delay until the server has rcceived (he release message. Huwever, with synchronous messagc passing, it has lo delay. A secontl cxa~npleoccurx when a cl~entprocess wants to wrile lo a gfiiph~csd~splay,a filc. or w m e other output device managcd by a server proce55. Often thc client wanh to continue irnrncdjately af~erissuing a write request; it docs not care whether the wntc IS acled Lipon now or in the near future. as Loi~gas i t is done eventually. With synchronous message passing, howcvcl; the client hl.2~to wait unril a write reclueqt is actually rece~vedby the scrvcr. The second disadvantage or synchronous message pasrmg is that programs that u\e it are more prone to deadlock. In particular, the programmer has to bc c;ireful that all send and receive slaternents match up in the cenw that whenever a process could bc at a send (receive) qtatement, eventually rtme other proccss w111 be at a reccivc (send) stdlement that u ~ c sthe same c11;mnel. This is typically the case in producerlconsu~ner~nteractions,because a producer cends to and the consumer :rreceivesh u m the channcl that connect? them, It 15 also typically [he caqe in client/server interactions. A c l ~ e n"cdlls" ~ a servci- by execul~nga \end
7
,
t
320
Chapter 7
Message Passing
slatcmenl Uollowecl by a i+cceivcstatement: Ihc serbct executes a receive stalemen! to get a request followed by a hend ~tatementto give a reply. One has to he careful when program~ning interacting pcers. howevcr. Consider thc iolln~vingattcmpt, for cxninplc. in which two processes exchange value5. channel inllint), in2(int); process P 1 (
ink valuel = 1, value2; synch-send in2(valuel); receive in1 (value2) ; process P2 { i n t valuel, value2 = 2; synch-send inl(value2); receive ina(value1);
1.
This program will deadlock because both processes \*ill block at t h e ~ r synch-send qtaternents. One nr the other of the proccwcs-but not botll-has lo be changed so that it executes receive first, In contrast. with asynch~ono~is message paccing, I( is pcrrccdy f nc lo use thc sy~ntnctricapproach above. Moreovcl: 11ic syini~~etrie apllroach is much easier LO scale lu more t11iu1 two processes. TL)\urnint?ri~e,s e n d and synch-send are ollen inlerchange;ible In fact, all b u t onc cxamplc in carl~el-aectluns of this chapter will work cr~riectlqI T s e n d is replaced hy synch-send. The one exception is the program in F~gure7.12, wh~chuser; hy mn~etrlcapproach to exchangrng vnlues. Thus the ma111 difference between asynchronous and synchronous rnewage paving i s the tr-adeoff bctwccn 11;iving possibly marc concurrency and 11;nling boundctl communication bul'rc1.s. Si11ccm c m o q is plcn~ifuland asynchronous \end i \ less prone Lo deadlock, moqr programmers prefer I[.
7.6 Case Study: CSP Communicating Se~luenti~tl Processes (CSP) ir; one of the most influential devclop~vcl~ts i n the history of concurrent programming. CSP was first described in a 1978 paptpcr by Tony I-loarc. That paper inlroduced boll3 synchronous ineshage passing and what has come It) he called gun?-declc(>nzmunic.utintz.CSP dil ectly influenced the design of the Occam programming language and also influenced
7.6 Case Study: CSP
321
sever;il other languages, includi~~g Ada and SR (which are described in Chapler 8). Moreover, CSP spawned a plethora of resea3-ch and papels on formL~lmodels and the scmantics of mcssagc passing. This seclinll describes Communicating Sequential Pri~cesses.The examples include producers and mnqurners, clients and servers, and an interesting program For generating prime nurnhers using 3 pipeline. We use a syntax that is simrlar to the one used in Home's pxper and in other programs in this text. At the end of the sectjon we briefly describe Occam and thc latest kerslon of CSP, which i i now a formal language for modeling and analy~ingthe behavior of concurrcnl communicating systems.
7.6.1 Communication Sfatements Suppow process a wishcs to cntrirnunicatc thc vaIuc of expression e 10 procesq B. This ir acco~nplichedin CSP by chc following program Fragments: process A {
process B B!
{
...
B!e;
. . . A?x;
... ...
) )
e i c callecl an nulpur slotenrcwr. 1r names a destination proccss B ancl specilie~
an expression
e
whose value is lo he sen1 l o [hat process.
A?X
is called an i ~ ~ p u f
statcrnefzt. It names a source process A and cpecifies tlic variable x into which an input message I rim the qource is to bc stored. 'Thew arc called outpul and input
c;tatemenlr rcltlier than cend atid receive statemcnts, bccause [hey are used for exter~ialcommunicarion a5 well as inlerprocesses communication. ( T h e oulpul operato]. ! is pronorinced ",shriek" or "barrg:" !he inpoi operator T is pronounced <'c]llery ")
Assuming thc types ol* e and x are the same. thc two slalements above are said to tnutch. An input or output statement delay\ ihc cxecutins process trntil another process reaches a matching
322
Chapter 7
Message Passing
Destination and Source namc a single process. as ln the first example. or an elerncnt ol' an array of processes. The CSP paper employed this lurid of direcr n a ~ l ~ i nofg communication channels hecausc it is simple. Hawcver, it i?; easicr to
construct modular programl; if channels have names and are global to processer; ( ~ L X in Section 7.1). The Occam language described later uses thi? approach. The source can also namc on?. element of an array of procesres. which is indicated as source C * I . The port names a communicalion channel in the devtination process. The expressions ei in an output statement are sent to the named port of the destination process. An input statement receives a message on the designated port from the source prrlcess and assignq the values to local variables xi. Ports are used to distinguish between dirlerent kinds of messages that a process might receive. If these ic; only one kind of message, the port is omit~ed. Also, i f there is onIy one e, or x,, the parentheses may be omitted. Both abbreviations were cmployed in the t i n t example. As a simple example, the following filter process repeatedly Inputs a characler from process West, then outputs that character to process East: process Copy { char c;
do true - > West?c; Eastlc;
# input a c h a r a c t e r from West # output the c h a r a c t e r to E a s t
Od
1
The inpul htatement waits until west is ready 10 send a characler; the output slaleinent waits unril East is ready to take it. The do and i f statements in CSP use Dijkstsa's ~ r i a r d ~c'ommuncls d noration. A guarded command has the I'orm B - > s, where B i s a Uovlean expressloll (the guard) and s is a slate~nentlist (the command). The above do btatcment lnops forever. Wc will explair~otl~eiuser; of guarclcd coinmand~as they arise. As a second cxample, the followiilg server process t ~ s c sEuclid'r; algcl~~ilhrn compute the greatest common divtsor of two positivc Integers x and y : process GCD int; id, x, y ; do true - > Client[*]?args(id, x, y ) ; # input a # r e p e a t the following until x = = y d o x > y - > x = x - y ;
I1 x i y - > y = y - x ; od
7.6 Case Study: CSP
Client [ i d 1 ! result (x);
323
# return t h e r e s u l t
od 1
waits to I-eceive input on its args port from any one of' an array or' client process. The client that succecds in malcliing with tlio input xtiltement also sends its idcntjty. G c D then cornputcs lhe answer and vcnda 11 bclck to thc client's r e s u l t porl. In the inner do loop, if x r y thcil y i s subtractcd rrnni x: if x < y then x is subtractcd horn y. The loop tcrmrnaies when neither or*these cot~diliunsis rue-narncly, when x == y. A client client [i] communicates with GCD by execut~ng GCD
H e ~ ethe port nalncs are ]lot actually necdcd; however, thcy l ~ c l pindicate thc role ot' each cl~nnncl.
7.6.2 Guarded Cornmunication Input and outpul statements enable processes to communicaie. By themsclvcs. however. they are somewhat limited, becausc hvlh are blocking statctncnis. Olien a process wishes to communicate with morc than one other processperhaps ovcr dll'rerenl port>-yet does no[ know the order in which he nther processes inlghl wish Lo co~nn-~unica~e w ~ t hit. For example, consider exiend~ngthe copy process above so that up to, say. 10 charac~ersare bu rl'erecf. IT more than 1 hut fewer than 10 characterc are br~ffered,Copy can either itlput another character from W e s t or o u t p ~ another ~t character to E a s t . However, Copy cannot know which of west or East will next reach a matching statement. Guarded complzunication stutrmenfs support noiidetcrrninistic comrnunicalion. These corn bine aspccts of condi tinnal and cnrnrn~uiicationxtalemenss and have the fonn
Here B i s a Roolean expression, c is a cotnmur~icatioristatemcnl. and s \i a stalement list. The Roolean expression can bc omitrcd. in which case i l has [he implicit value true. 'Ibgetlier, B ancl c comprise whnl is called the guni-ri. A guard s/acr.eetJr if a is lruc ancl cxeculing c would no[ cause a delay-~.e., home other process is waiting 31 a matching communicalicm statement. A guatd.fails if B is Talse. A guard h1ork.c if B i~ true hut c cannot yet be executed.
324
Chapter 7
Message Passing
Guarded communication statemenls appear within i f and do slaleinents. For exatnple, consider if B,; [ I B,; fi
C,
- > S,;
C,
- > S2;
This statement has two arms-each of which i s a guarded communication statement. You can cxecute it as follows. Fir% evaluate the Boolean expressions in the guards. lr both guards fail, then the i f statement terminates with no effect; if at lcast one guard succeeds, then choose one of then1 (nondeterminislically); if bolh guards block, Lhen wait until one of the guards succeeds. Second, aCter choosing a guard that succeeds, execute the communication statement c in the cllosen guard. Third, execute the corresponding statement s. At this point the i f statement terminates. A do statement irj executed in a sitnilar way. The difference is that the above election process is repeated until all guards fail. The examples below will (hopefully) make all t h i s clear. As a simple tllustration of guarded communication, we can reprogram the previous ver~ionof the copy process as follows: process Copy char c; do West?c
F
Eaat!c; od
1
Here we have ~ntlvedthe input statement inlo lhc guard of the do stale~iien(. Since the guard does not contain a Boolean expression. it will never (ail. and hcnce the do statement will loop forever. As a second example, me can extend the above process 50 rhat i t buffers up to two characters ~nsteadnf ,ius1 one. Initially, copy has lo i11p11tone character fi-om West. But once il holds one ch;ifi~ctel; it could either output that chatacter to East or inpi11ailother character frotn west. It has no way of knowing which choice to make. so it can use gu'lrded comtnunication as follows: process Copy { char cl, c2; Weat?cl; da W e s t ? c 2 - > East!cl; cl = c2; [ I East!cl - > West?cl; od 1
7-6 Case Study: CSP
325
At the slart of evely iteration of do, one character is buffered in cl. The first arm of do waits for a second character from West, output< the first character to East, then assigns c2 to cl. thus getting the process back lo the starting stt~tcfor thc loop. The second arm of do outputs the first character to E a s t , thcn inputs another character from West. If both guards of do succeed-i.e., if West is ready to send another character and E a s t ix ready to accept one-either one can be chosen. Sincc neither guard ever fails, the do loop never lerrninates. We can readily generalize the above prosram to a bounded buffer by implementing a queuc inside the Copy process. For example, the following buffers up
to aa characters: process Copy { char bufferll0l; int front = 0, rear = 0, count = 0; do count c 10; West?buffer[reas] - > c o u n t = count+l; rear = ( r e a r + l ) mod 10; [ j count > 0; East!buffer[front] - r count = count-1; front = (front+l) mod 10; od
3
This version of copy again e~nploysa do slaterncnl with two arms. l ~ u now l the guards conlain Boolean expressions as bell as ccornmut~lcationsraternents. The guard in he firs1 arm succeed'; if Lhere is I+oomIn the bufrer and West i s r ~ a d yto oulput a character; the guard jn the second arm succccds if thc buffer contains a character and East i5 ready to input it. Again the do loop never tcrrnrnatcs, in thic case bccause at least onc of the B~trleanconditions is always true. Another example wtll il1u';tratc lhe usc c ~ frnulliple ports in a hervcr process. In pa~ticulal-.consider the resource alloc;~torprogrammed in Figurc 7.7 using asynchronous message passing. The Allocator: procexs can he programmecl in CSP as lollows: process Allocator ( int avail = MAXUNITS; s e t units = initial values; int index, unitid; do avail > 0 ; Client[*l?acquire(index) - > avail--; remove(units, unitid}; client [index] !reply( u n i t f d ) ; [ ] Client [ * 3 ?release [index, unitid) - > avail++; insert (units,u n i t i d ) : od
1
326
Chapter 7
Message Passing
T h i h prucesx i x rn~lrhinore concise than the A l l o c a t o r in Figure 7.7. In particular, we do not need to merge acquire and release messages on Lhe same chani~cl;instead we use diffcrcnt ports and a da stalcmerlt with one ann i'or each port. Moreover, we can delay receiving an a c w i r e message vntil there are available ~tnils:this does away with ille need to save pendins requests. However. these dilferences are due lo Ihe use of guarded communic;ition; they are not due to thc diffei.enccs belween asynchronous and synchronous inessage pascing. In fact, il i ~ possible ; to uL;e guarded communication with asynchrotlous nlescagc p;l\xing, as we shall see in Section K 3. AS a final example, suppose wc have two processes that wan( ro exchange thc vnliica of two local variablcs-thc problem hvc cxainined al the end of Section 7.5. Thcrc wc had to usc an asymmclric solu~ionLo avoid de;ldluck. By using guartlccl conlmunicalion stalemenls, we can program a xyrnlnetric exchange a<
lollows: process P 1 I int valuel = 1, value2; i f P2!valuel - > PZ?valueZ; 1 3 P2?value2 - > PZ!valuel;
Ei 1 process P2 I i n t valuel, value2 = 2; i f Pl!value2 - > Pl?valwel; [ I Pl?valuel - > Pl!valueZ;
fi
1
This einploys the nondetermi nil;tic choice that occurs when there is more than nrie pair of rrlatching guards. Thc cornrnnnication stateinenl in lhe fil'st guard in P I matches the communication statcrnent in the fecond guard in ~ 2 t~nd , the olher pair ol' comtnunicalion slatemenls also match. Hence. either pair can be ckascn. The soludon is symmetric, hut i( is still no1 ne21rly as simple as a sym metric sol ~ ~ t i c tusing n asynchronous message passing.
7.6.3 Example: The Sieve of Eratosthenes Onc of lhc cxamples in Hoare's 1978 paper on CSP was w interesting parallel program [or generaling primes. We develop Ihe xolution here 130th as a complete exainple 01' il CSP prognun and a< another example of the use of a pipelinc of procesws to paraIlelize ;I secluential algorithm.
7.6 Case Study: CSP
327
The xieve of Eratosthenes-named aftcr the Greek mathematician who dcveloped ~t-is a classic algorithm for deter~niningwhich numbers in a given range are prime. Suppose we want to generate thc primec; betwccn 2 and n. First, writc down a list with all the numbers:
Starting wtth the first uncrossed-out number in the list. 2, go through the lisl and cross out multiples of that number. If n is odd. this yieIds the list:
At thrc pomt, crossed-out numbers are not prime; tlie other nurnbcrs are st111candrdates for being prime. Now mtne to the next trncroised-out n ~ ~ ~ n ihn ethc r li~l. 3, and repeal the abovc process by cross~ngoul rnultlplcs of 3 . TF we continue this process un tll cvcry numbei has been eon5idercd. the trn~rocwd-outnurnherc in he final list will be all (he prirncs between 2 and n. In eccence the sieve catchex the prlmcs and lets lnultiples of (he prirncs fall Ihmugh. Now coilsider how we might pal-allelize th15 algm irhm. Olic poh\ihility is lu as5ign a d~ffttlentprocess lo each nur~lberm tlie list and lience to have each i n parallel cross orrt multiples ol its number. Hornever, this approach has rwo problems First, sincc processes can commun~c;iteonly by euchanglng mcss;iges, wc would have to givc euch procecs a privale copy of all the ~ ~ u m b e r and c , we would Irake 10 u s e anorhcx proces.: to c o i ~ ~ b i nthe e resultr. Sccond. wc woulrI u s e way Loo Inany pl-occqses. We only nced as many procccses as there are prime\, but wc do not know in advance urhrch numberh are primc! Wc can ollercorne both problems by parallel~z~ng the sicve ol Eratosthcnes u h i n ~I? pipeline of' filter processes. Each filter receives a strcan-r ol numhu s from its preilecehxor dnd 5ends a stream of nutnher\ to it? Fuccesxor Thc first number ~h&it ;I filter reccrvcs 1s (lie next lasgcsl prime; it P R S ~ C Son t o its ';ucccssi>r all n u m her.; that arc tlol multiples uf 11ic lir\t. Tlic aclunl program is sllown In F~gurc7.15. The f i r q l process. sieve 11 1. \ends all the odd nu~nhcrfto s i e v e [ a ] . h v e ~ yother process re~eivesa stsc;un of nurl~bclsl'rorn its pre cce%or The fir-st nulnber p that placei\ Sieve [i] receives IS the it11 prime. Each sieve [i] subsec~uentlypasscs 011 ,111 othcr numher5 it rcceives (hilt are not multiples nf itc pri~nep. Thc tntsl number L or Srcvi. proccsscs niusl be Iargc cnough to guarantcc lhal ;LII pnlncs up lo n are gcnoraled. FOI example, thcrc are 2 5 prlmcs less ban 100; the percentage ilecreaser; for increasing valucs ol'n Proccss sieve113 terminate< when it runs out of odd nu~nbersto \end to sieve [ Z I . Every other proccs5 eventually bloclca w;iltir-lg Ibr more ~npulfrom
P
328
Chapter 7
Message Passing process Sieve [l] { int p = 2 ; for [i = 3 to n by 23 Sieve [ 2 ] ! i; # pass odd numbers to S i e v e [ a ] 1 process S i e v e [i = 2 to L3 ( int p, next; Sieve [i-11?p; da Sieweci-l]?next - > if (nexk m o d p ) ! = 0 - > Sieve[i+ll !next;
# p is a p r i m e # receive next candidate # if it might be prime, # pass it on
fi
od
3 Figure 7.15
Sieve of Eratosthenes in CSP.
Its pl-t3deces?or. Whcn thc program stops, thc values ut p i n thc procesr;es are thc primes. Wc could casily modify the progl-am to terminatc nonnally and to prinl the prirncs in ascending order by using u sentinel to nark the end of thc list of nurnhcrs. When a process sces (he sentinel (a zcro for example), it prints its valuc oCp, sends the sentincl to i t s successor. and then quits.
7.6.4 Occam and Modern CSP The Communicaling Secluerltial Processes notatinn described above has been widely used in papers but CSP ilrelf has ncvcr been implcmcnted. However, scvcral languages based on CSP have bccn implemented and used. Occam is [he beyt known nf 11ie.ce. It extcnds the CSP notation to use global channels rather than dilcct naming, provides procedures and Iunctionx, ancl allows nested parallelism. Hoarc and others went on to develop a forrnal languitge that is used to ciesc ti be and reason i~houtconcnrrcnl communicating syslcins. The formal language waq influenced ;by ideac in 111c original CSP notation-and it too is named CSP-but 11 is not a language that is used to write applicalion programs. Rather i t i s uscd to model their behavior. Eelow we rurn tnarize the attrihuter of Occarn and modcrn CSP and give a few examples to illustrate h e flavor of cach. The HistoricaI Notes itt the end ol'
this chapter dcscribc source:, of detailcd information, including a I~bral->I of J a k a pac kagcs tkdt h u ~ ~ cthe ~ r Occ.ctnl/CSP t programming n~odel.
Occam contains a vcry small number of mechanisms (hence she narnc. which
comes from Occam's razor) and hils a unique syntax. The fi~stkersion of Occarr~ was dcsiglled in the mid-1980s; Occam 2 appeared in 1987; a proposal is now underway for an Occam 3. AIthough Occam is a language in it? own right, ii was developed in concert with the trunsputer, an early muIticornputer, ancl wac; essenlially rhe mach~nelanguagc of the Lransputer. The basic units of an Occam program are dcdaratiuns and three pritnilivc "processes": arsignment, input, and output. An assignrnenl process I S cimply an assignment statement. The inprll and output procexscs are himilar to the input and output commands of CSP. However, channcls have names and are global to processes; moreover, each channel must havc exactly one sendcr and one receiver. Pnniitive processer; are combined into conventional processes using what Occam calls constructors. These i n ~ l u d execluentinl construclors, a parallel constructor similar to the co stalemelit, xnd a guarded cnmmunic;ition ~tatement.In Occam's distinctive syntax, each primitive proccss, constructor, and dcclnration occupies a line by itsell': declaratiuns have a trailing colon; and the l;ingi~i~gc irnpoxcs all indzntatjun c u n v e ~ ~ l i u r ~ .
An Occam program contains a static 1111rnber01' processes and stlicic cornriiunication pathh. Procedures and fuiictions provide the only form of modulasiza~ion.They are esqentialIy puarncteri~edprocesses and call share only chdn11eIs and constants. Occam does not support recursion or any form ol' dynamic creation or naming. This makes m;my algorithms harder- to program. However, jr ensures that an Occam compiler can determine cxaclly how many processes a prograin conlains and how they co~ninunicalewith each other. This, and the fact that drfferent constructors cannot sharc variables. makes it possible for a compiler lo assign processes and data to processors on a distributed-mcrnoiy rnachlne such as a transpuler. In mosi languilge<,the default is to execute statement$ ccquentially; the pmprammel- has lo say explicitly to exccutc slalements concurrently. Occaill takes a dXfe1-cnt approdch: There is no dcfaul t ! In~tead.Occnnl contains twcr hasic constructor<: SEQ for sequential execution and PAR for parallel execution. For example, the following program increments x then y:
330
Chapter 7
Message Passing
Since the lwo slaternentc acccqs different variables, they can, of course. be executed ct)ncul+renlly. This is cxprcsscd by changing SEQ to PAR in the ahove program. Occam conliiins a variety of other constructors, $uch as IF. CASE. ~ I L E and , ALT, which is used ror guarded communication. Occam a l ~ ucontains an interesting mechanism called a replicalor. 11 is similar 10 a yuantitier and is uced in sitnilar ways. Processes arc cr-catcd by using PAR constructors. They coinmunicale using channels, which arc acccsscd by the primitive input ? and output ! processes. The Collowing is a siinplc sequential prclgram that echoes key board charr~ctersl o ii display : WHILE TRUE BYTE ch : SEQ
keyboard ? ch screen ? ch
Here keyboard and screen are channels that arc assu~nedto hc connected to peripliel-;il clevices. (Occam provtdcs mechanism$ for bindrng 1/0 channels ro devices.) The above program uses n single characlcr buffcr, ch. It can he turned into a concurrent prograin t h a ~uses double buKering by employtng two proccsses, onc to rcad from the keyboard and one l o write to the screen. The processes communicate using an additional channel cornm; each ha$ a local character ch. CBAN OF BYTE comm : PAR WHILE TRUE -- keyboard input process BYTE ch : SEQ
keyboard ? ch comm ! ch WHILE TRUE
--
BYTE ch : SEQ c o r n ? cb
display ! ch
screen output process
7.6 Case Study: CSP
331
Thi.; program clearly indicates Occam's unique syntax. The rcquiremenl [hat every itern be nn a separate line leads lo long programs, but thc required ~ndentation avoids thc nced for closing keywords. The ALT constructor supports guarded communicsition. A guard consists of an Input procehx, or a Boolean expre.;cion and an input process, 01- a Bnolcail expression and a SKIP. As a simple example, the frollowing deiincs a proccss that can buffer up to two characters of data: PROC Copy (CHAN OF BYTE West, Ask, East) BYTE cI, ~ 2 dummy , :
SEQ West ? cl WHILE TRUE ALT
West ? c2
--
West has
--
E a s t wants a byte
a byte
SEQ
East ! cl
c1 := cz ask
? dummy
SEQ
East ! cl West ? cl
We have declared copy as a procedure to illustrate that ;)spec1 of Occam. A copy process wouId he crcaled by calling the procedure in a PAR conr;tructor and passing it three channels Occam does not allow output commands in guards of ALT conslructors. This tnakcs processcs such a5 the above harder tn program, because {he proccsr; thal inputs Siom channel E a s t first has to use channel ~ s tok say i t wa11tS il b y ~ e . However, this restriction grealIy simplifies the irnplemcnration, which was erpecially important on the transputer. Because Occam aIso does no1 aIlow null messages (signals), a value has to be sent across the ~ s channel k above cven t h n u ~ h that value is not used
Modern CSP A5 noted, CSP has evolved from n progra~nmil~g notation into ;1 fr>rinal language for- nodel ling concurrent systems in which processes interact by mean? of' cornrnunication. In addition. CSP is a collection of formal models and tools lhat help one understand and al~alyzethe behavior of systems described using CSP.
332
Chapter 7
Message Passing
The modertl version of CSP mar; firs1 described in 1985 in a book by Tony FIoare. 'The main purpose of t11c book was to present a ll~euryof communicating proccqscs; in doing so. Hoare moclilicd his original notation to one lhat is rnorc abstracl ant1 nmcnable to formal analysis. Thc underly~ngconcept of CSP remained tlic same-processeh that interact by rncans of synchronous meshage passing-but CSP became a calculus lor studying systemq of cominunicaling proceshcs ralhcr than n language hi-writing application programs. The focus or tlic early NOI-k will] t11c new CSP was clevelnplilg scveral scrnantic thearies; operational. denoialional. trace, and algebraic. Now lhere arc scveral autolnaled proof 1ools t h ~ tmake it practical to use CSP and ils aheories I(> tliodcl practical applications (rce the H~storical Notcs). For example. CSP has heen used Io model communication.; protocols, real-tirnc control systems. security protocols, and T;lriIL tolerant systems, among others. In modcr~iCSP, a process is charac~crizcdby the way it communic;ites wilh i t b environmenl-which could he other processes or the externdl envirnnmenl. All interacl~onoccur? as a result of cotnmumcalion cvciits. A collection 01. comrnunicl~ting procesws is modeled by ~pecifyingthe comillunication pattern nf each process. The new CSP notation is f~~actinnal ralher ban jmperative. The f~~ndarnentnl 0peratcw.s are prefixing (sequencing), recursion (repetition). and guarclecl alternatives (nondetenninivlic choice). Thcre i s a mathematical version of CSP tha( is uscd in book? and papers as u~ellas x machine-re;idable verhion tl-rat ic i~xedhy the analy5is tools. The following siinplc cxarnplcs will give a fl;ivor of modern CSP; they are written in machine-rcadable notation. The prefixing opcrator - 5 is used LO specify a scqucntial ordering of communication evcnlr. IT red and green are two commu~licationcvents. then a ti-a{fic signal that tuln?;green thcll red just O F I C can ~ he specifieil as
1
green -> red - > STOP
The lail elcii~cntabove is the simplest CSP process, STOP, which halts without corninunic;~ting. Recursion is used to spcctfy repet~tion. For example, the followr ng speclfie? LI trarhc lighi that repeatedly turns green then red: LIGHT = green -> red - > LIGHT
This says thiiI plocess L I G H T fir5t cnmmunicates green, then C O ~ W ~ U ~ I red, then repeals. The bchavior of this process call be specified in several other witys-for example. by ~lsiilgtwo tn~~tually recursive processes, one that commu~licateagreen and one that coiumunicates red. The key point is lhat behaviortraces of communication evenis-is what matters, not how the process is "programmed."
~ C ~ I ~ S
7.6 Case Study: CSP
333
The two communicntion events abovc are sent to the external environment. More commonly, one wants Irr have prucesse.; cornrnunicate with each othel-. CSP uses channels for thix. as in thc 170110win:, cxarnpIe of a single ch:~racter buffcr process:
The greatest-common-dihiso~.prograin given earlier in Section 7.6coulcl bc programmcd in lhe new CSP as GCD = Input?id.x.y - > GCD(id, x, y )
GCD(id, x, y ) = i f Ex = y ) then Output!id.x - > GCD else if ( x > y ) then GCD(id, x - y , y ) else GCDI id, x, y - x )
This uscc two mutually recursive processes. The first wails for an input evcnt and then invoke\ the sccond proceys. Thc ,second recurses until x = y . then oull,ut\ tire r e s ~ ~and l l invokes ~ h cfirst procecs lo wait for another i npirt cvenl. As n final example, the following specifies the hehaviol- of a h y stcm that bur'fcrs up lo two characters of data: COPY = West?cl:chas
-D
COPY2(cl)
The second proccsh uses the guardcd altcrnalive operator I ] . Tlie g~;ll-din E ~ first nlternativc wails I'or input from cban~lclWest; the guard in thc secu~lclwaits to send oulpul l o channel East. The choice is nor~determii~istic if both communications can occur. Tlie communication bchavior is thus the same 21% that of he ~wtr-charactercopy procesc; givcn earlier in the scctio~lon guarded communication. Modern CSP also provi~iesan elcgani m7ay to spccify an n-elcrnenl brrffelprocess using what is called a linked y a ~ ~ / I operaror e/ to chain together n instances of 3 parameterized teersionof copy1 abovc.
C
334
Chapter 7
Message Passing
7.7 Case Sfudy: Linda L~iid'i ernbodics il distinctix'e ilpproa~hto conlurlent prog~.;uiiining that syntlic5i7e5 and penerali7e5 acpecls of chased v;lrlables and s synchronous mc\cage passing Not itself d language. Linda i s lidher a collcclicln ol six pl~rniti\e\th;it are u5cd to accew what IS c;llIed ruplc' \parp (TSL a sli:~red.awnciat~veIncrnoig. conslcc~ng01 a cnllcctivn of tagged dala record? called tuples. Any sequcnt~al progr-L~~umi~~g language L ~ l nhc augmented with the Linda pnrmit~veito yield C O I ~ C Lrent I ~ programining varidnt or that la~iguagc. Tuple spacc is like a single shared communicat~onchannel, cxcep thilt luples arc unoi-fieled. Thc operation tn dcpoc~t21 tuplc. OUT. I S 11 kc a send m t c metlt; the operalion to extract tuple, IN. rs like a receive statcmenl. ancl thc opelation to examine a Luple, RD. is like :I receive that leavec, Ihe Ines.;age ctol-ed nn the ch;u~neI. A fourth vperatron. EVAL, provides procesf cre;ition. Thc hnal two operation?, rap and RDP, plovide nonblacking input and read~ng. Although TS i s logically ?hared by p r o c e ~ ~ erls ,can he ~rnplcntentedby cllrtriburing p;irts among processors ill a multicomputt.~or network. Thu? TS can be used to store distributed data 511-ucrurec, and direrent proccssec can concurrent1 y acceqs drfierent elements of the data ~ErUCt11f;C. AS a later example will show. this directly supports the hag-of-~ahksparadigm for process interaction. TS can also be made pershtent-retaimng its content? aftex a program terminates-and can thu5 be ~ s e dto implement file7 or database systems. Linda was fir$[ conceived in thc ~ a r l y1980s. (The name has a ctrlarful orig i n , we the Historrcal Notes.) The initial proposal had thrce primitives; the others were added Iater. Seberal languages have been augmented with the Linda primitives. incIuding C and Fortran.
7.7,1 Tuple Space and Process Interaction Tuple space consists of an unordered collection of passive data tuples and active process tuples. Data tuples are tagged recorcls that contain the shared I;tate of a computation. Proce~stuples are routines thaf execute a\ynchronously. They interact by reading. writlng, and generating data tuples. When a process tuplc terminates, it turns into a data luple, as described below. Eacl data ruple in TS has the form ( " t a g t r , value,,
. .. ,
value,
The tag is a string hat is used to distinguish betureen tuples representing dimerent dala structures. The values are zero or more data values-for example, integers, lealr, or arrays.
396
Chapter 8
RPC and Rendezvous import java.ni.*; import j a v a . m i . server. *; public interface RernoteDatabase extends Remote { public int read() throws RemoteException; public void write(int value) throws RemoteException;
1 class Client ( public static void main(String[] args) { try ( / / set the standard RMI security manager System.setSecurityManager(new RMISecurityManager()); / / get remote database object String name = "rmi://paloverde:9999/databa6e1'; RemoteDatabase db = (RemoteDatabase) Naming.lookup(name);
/ / read command-line argument and access database int value, rounds = Integer.garseInt(args[OJ); for (int i = 0; i < rounds; i++) ( value = db.read ( ) ; System.out.println("read: " + value); db-write(value+l); 1
1 catch (Exception e) ( System.err.printl.n(e);
1 1
1 class RemoteDatabaseServer extends UnicastRemoteObject implements RernoteDatabase ( protected int data = 0; / / the "database" public i n t read ( ) throws RemoteException return data; 1
{
public void write(int value) throws RemoteException data = value; System.out.printLn("new value is: " + data); 1
(
8.6 Case Study: Ada
397
/ / constructor required because of throws clause public RemoteDatabaseServerO throws RemoteException { super ( ;
> public static void main(Stringl1 args) ( try { / / create a remote database server object RemoteDatabaseServer s e w e r = new RemoteDatabaseServerO; / / register name and start serving! String name = "rrni~//paloverde:9999~database"; Naming. bind (name, server) ; System.out.println(name + " is running");
1 catch (Exception e ) ( System.err.println(e);
1 1 1 Figure 8.16
.A client
Remote database interface, client, and server.
is started on paloverde
(11'
:I machine on Tl~c same rletivork by
execuling I
java Client rounds
I
The ~reatlel-i s eilcouraged to cxperimenl with this prograln to see how ic behaves. Fol example, wh21t happens i l there i s more than one client? What happens i f :I clicnt is started before the servel-?
8-6 Case Sfudy; Ada Ada was developed i~nderthe sponsorship of the I1.S. L'>epa~.tmentof Defense to be the hlandard lang~~age for progra~il~ni ng defense applications sa~ngingfroln real-time e~nbecldedsyste~nsto lergc information systen~s. The concurrency features of Ada are ;un imporran1 par[ of the language-and are critical for its intended uses. Ada also contains n rich sct of rnech;uiisms fur secluenlial programming.
.
398
Chapter 8
RPC and Rendezvous
A ~ ; ~.eculted I from an extensive internaliunal design co~npcritioni n (lie latc 1970s nnd was liru standai-dircd in 1983. A d a 83 introducecl the rendezvous n~echanis~n I'or inlerpsoces$ cornmu nicalion; indeed, the term ~-c.t~dezvn~l.s was choscn i 11part because thc lcadel- (SF the dcsign teain is French. A second ver.sion 01Atla was ?;canda.rdizetlin 1995. Ada 95 is upward compatible-so Ada 83 programs rernain valid-but also adds scveral f'eati~res. The two rnosr interesting new I'ei~turesfor concul.rent programming are protected lypes, which are slrnilar to ~nonirors.and a requeue SLatemenl to give the programmeJ. more control over synchronizotjon and scheduling. In this section we first summarize the main concurrency mechanisms of Ada: tasks, rendezvc,us, and protected types. Then we show ho~vto progl.arn a barrier as a protected type and how to prograrn ihe dining philosophers as task5 that inleracr using rendezvous. The examples also illustrate several of [lie sequential pmgramming features of Ada.
8.6.1 Tasks An Ada program is cornprtsccl or .subprograms, packages, and [asks. A subpl-ogralrl is a procedure oc. function, a package is a set of declarations, a n d a [ask is an irldependent process. Each cornponenr has a specil-icarion part acid a body. T h e specification declares visible objects; the body contains local declal-ations and st;~lements. Subprogl-an~sa.nd packages can ;dso be generic-it.. they can be para~neterized by data types. The basic I'orm of n task specific21lionis task Name is
entry declarations; end; Ent~ydeclarations are similar to op declarations in modules. They detioe operations serviced by dle task and have the for~n entry
denti if ier (formals) ;
P;uametess are passed by copy-in (the def;lulr), copy-out, or copy-inlcopy-o~il. Atla supporrs arrays of entries, which are called entry Ii~nilies. The basic form of a task bocly is task body Name is
local declnraliong; begin
slalemenls; end Name;
8.6 Case Study: Ada
399
A task must be declared within a subpsogra~no r package. The si~iiplest concurrent program in Ada is thus a single procedure that contains [ask spccificariot~sand bodies, The declarations in any component are elahomtcd one at ;I ticne, i n Ihe order in which they appear. Elaborating a task cleclara~ioncreate.; an inslance ol' the (ask. AAer all declarations are claboracccl, (lie hecluen~irllslate~nenlsin the subp~.ogrambegin execulioo as an anonymous task. A rask specificationlbody pair clclincs a ,single rask. Ad21 also suppor~s arrays of tasks, but i n a way Lhac is dii'ferent thil~lill ~ ~ oos~ tl i ec~o' n c u i ~ e nprol gramming languages. In parl.icula~,.lhe programmel- first declares a task type and then declares an anay o f instances o f (hat type. The programmer can also use (ask types ill con.juncl.ion with pointer&-tvliich Acl;~calls access typcs-lo create tasks dynamically.
8.6.2 Rendezvous In Ada 83 rende7vous w a s the primary c o m m ~ ~ n i c a l i on~ech;\~\ism n :~nclrhc solc syncliro~iization~ n e c l i a ~ ~ i (Tasks s ~ ~ l . 011 tlic sai~icm21cliine coulcl also rzacl and write shared va~iables.) Everything else had to be progl-:\ni~ned using rcnclezvous. Ada 95 also supports procectecl ly pes TOI- bynchroni7ed acccss to shared ot?jecrs. as described in t l ~ cnest secriol-l. Suppose task T declal-es entry E. Other tasks in the scopc of the speciljcalion o f T call invnlte E as I'ollows:
call T.E (act~lalsl; As usual, execution o f call delays the caller u ~ i r i entry l E has k r ~ i l i ~ l a t e(01d aborlcd or raised an exceplion). Task T service5 c;~lls o f entry E by means of the accept st;~te~nenl. This has the 1i)nn
accept E ( f o r ~ a l s )do starcmcnl lisl; end; Execution of accept ~ C I S tile Y task S until there is an invocation ot' E. copies i n p ~ arguments ~t into itiput formals, then executes (he skitenlent: lisc. When Llle statement list terminates. oulpul formals are copied lo oul.put argume11l.s. At that point., both h e caller and the execu~ingprcjcess coniinue. An accept statement is thus like an input slarement (Section 9.2) with one guard, no synchroni~ation expression, and no scheduling expwbsion.
1
400
Chapter 8
RPC and Rendezvous
To support ~~ondctcr-miniclic task interaction. Ada p~.ovitlesLlirec Icindh o f select slatemcnts: selecti\~ewail, co~~dirional entry call, and timed entry call.
The selccrive wail st:itement suppoits guarded co~nmunication. The m a t common form of 1Iii.s stiltelllent is select when B, => accept slale~nent;additiorlal starernenls;
or or
... when B, = > accept statement; additional sraternenls:
end select;
Each line i s called an alteunntive, Each B~ is a Boolean expression; the when clauses are oplional. An alternative is said lo be open if B, is true 01' the when clause is omitted. This form of selective wait delays the executing process until the accept statement in some open alternative can be execulcd-i.e., there is a pending in\location of [he entry named in the accept statement. Since each guard B, precedes an accept statement. the guard cannot reference the parameters of an entry call. Also, Ada does not provide scheduling expressions. As we discussed in Sectiorl 8.2 and will see in the exa~nplcsin the next two sections, this complicares solving many synchronization and scheduling problems. A selective wait statement can contain an optional e l s e alternative, which i s aelecred if no other alternative can be. Tn place of an accept stalement, the progralnmer can also use a delay statement or a terminate allernalive. An open alternarivz with a delay slatemen[ is selccled i T the delay ~nte~.\)al has tl.;lnspirecl; this provicles a tilneour mechanism. The terminate alternative i s selzcled if all rhe tabks that rendezvous with this one have lerlninated or are hemsel elves wailing at a terminate altel-native(see the cxnniple i n Figure 8.19). A conditional entry call is used il one task wants to poll anolhcr. Tt has lie form select enlq call; additional statements; else ctarements: end select;
The entry call is selected if il call be executed immedialely; otherwise, the e l s e al renative is selected. A timed entry call is used il' a calling task wants to wait at most a certain inlerval of lime. Its I'orni is similar ro ~liatof a conditional entry call: select enlry call; ;~dditionalstatements; delay statement; additional stalemenls; end s e l e c t ;
or
-
-
8.6 Case Study: Ada
401
[n ~ l i i scnac: the entry call i s selected if i~ can be exccu~.ctlbefoi-e r l ~ cdelay inlcrv;11 expires. Ada 83-ancl hence Ada 95-pl-ovides a Feiv additional mech;i~lisnjs I'nr concurrent prognlmming. Taslts can sllarc variables; howr\le~; clle)~ cannor a s u m e that ~liesevarinb!cs ai-e updated except at spncIi~-oniz;~tion points (e.g., rendezvous slalerne~~tc).The a b o r t slatenienl allows one task LO ter~nirlart
I I
nothe her. l'here is a ~nech:inismfor setling the priority of u task. Finally, t1iei.e are sa-c;~Ileclattributes that enable one Lo tlel.enni~lewl'lellie~.a task is callable or h : ~ ter~ninatetlor lo determine rhe numbel-of pending invocalions o f an enlry. L
8.6.3 Protected Types Ada 95 enhances the concurrent progl-a~nmingmechanisms of Ada 83 in scvcral w21y.s. The two inost significant are prolected types, which support spnclironi7ed access to .sll;lled diilkl, and the r e F e u e sti~teinent.which supports scheduling ;~ndsynchrunizi~tion[ha( depends on t l ~ carguments of calls. A prolected type encapsul~ttessharcd data arid svnchroni-ccs access lo i ~ . Eilc1-1inslance o f a protected type is si~nilal-LO a n~onilol.. The ~ p c c i l i c ; ~ t i opart ~i o f a pl-olected type has the l i ) r ~ n
Y
D
I
d L-
,h 1C
1
protected type Name is function. procedure, or ent1.y declaralions; private variable declara~ions; end Name;
I
,II
nis re 1). he
T11e body has rhe form protected body Name is Ciunclion! proceclnre. or entry bodies: end
Name;
I
se
ii n
Prntectetl fi~nctions111-ovide rcad-only acces5 to the private va~.ii~bles: hence, such a Furlction can be called simullaneously by mulliple laslts. P1,oteclecl proccdu~-cs pvtav\E\e exc\uii~ereadlw\.itt access to the p.c'\\fi\(c vaY'\a\o\es. P~atecLeAc\-\ttr'\esaye like protected procedures except h a t they also have a when c\iluse that spcciiies a Boo\ean synch~.onizaLioncondition. A\ most one task al a lime can be cxccnling a protected procedure or entry. A call o f a pro~ected enlry delays until the syncl11-onizationco~lditionis true urr~lthe citller can have cxclusivc access condition callnor. however, deycnd to the private variables; the s)lt)ch~.o~lizalion on rhe parameters of a call. 1
402
Chapter 8
RPC and Rendezvous
C;II Is of prorected procedures and enlrics are sel-viccd i n FJFO order, subject Lo synchronization conclitions on entries. The requeue state~nerllcan bc used wirllin rhe body ol' ;i 111-oleclcdprocedure or entry to defer completion o f tlic call that is being serviced. (It can also be used with.in the body o-f an accept slatement.) This slalell?ei?r has the form requeue Opname; where Opname i s the 11a111eof a11entry or prolectecl procedure [hat either lias n o patanictcrs or has thc same parameters as the opctarior~ being serviced. The ef"l'ec( of requeue i s to place the call being serviced on Lhc clueue for Opname. juhl 21s iTLhc calling tad< had directly called Opname. A s an example of the use OF a protected type ant1 requeue, Figure 8.17 conr:iins Ada code Tor ail N - ~ s lcounter i barrier. We assume that N i s a global constant. An instance of the barrier is declared and used as followh:
protected type Barrier i s procedure Arrive; private entry Go; - - used to delay early arrivals count : Integer := 0; - - number who have arrived time-to-leave : Boolean := False; end B a r r i e r ;
protected body Barrier is procedure Arrive is begin
count
: = count+l;
i f count
c
N then
-- wait for others to arrive else count := count-1; time-to-leave : - True; end if; requeue Go;
end ; entry Go when time-to-leave is begin count : = count-1; if count = 0 then time-to-leave := False; end i f ;
end ; end Barrier;
Figure 8.17
Barrier synchronization in Ada.
8.6 Case Study: Ada
B
:
Barrier;
403
- - declaration of a barrier
-.. B. Arrive ;
--
or "call B-Arrive;"
The lirsl N-1tasks to arrive at [he bavrirr incrcrncllt c o u n t and then delay by bcinp rcqucuecl on privacc ellrry GO. The last task Lo arl-ive ;~t tlie barrier sets time-to-leave lo True; this allows the req~~eued calls that are waiting 011 GO 10 awaken-one at a linie. Each task decrements c o u n t bel'o1.c: ~ c t l ~ r n i r CI.OI~ ig the barrier: (he lasl one resets time-to-leave so (lie harrier can be used agaiii. l task dclayrcl on G O w i l l (The s e ~ n a n t i cOF ~ prolecled Lypzx ensure t h ~ every execute before ilny further call o f Arrive irl serviced.) The i.ea(ler lnighl iirlcl i~ inslruclive Lo co~np;ll-e ~ l i i x1x11.rier10 the one i n Figi11.e 5.18, which is progl-anl~nedu ~ i n g the Pthreads 1ibra1.y.
8.6.4 Example: The Dining Philosophers This scction presents a colnplete Acla pl-ogaln i r Lhe d i n i ~ l gpliilosopllers ~probas lein (Section 4.3). l'hc prograin illusrl-ates thc ucc 01' Laxlts and rcndezvo~~s well as several general features oC Ada. For convenie~lcein the prograrn, wc assume Lhe existence 01' two 1-'unctions, left ( i ) ilncl right (i) lllal rcturn rlic indices of tlie leF1 and right neigllhor of pli ilosoghe~.i. Figure 8.18 conlains llie main p~~ocedu~.c Dining-Philosophers. Beibl-e lhc p~.ocedurcarc with and use clauscs. The with clause says his procedure depends on the objec~sin package Ada.Text-10. The use clause makes the names of the objecls expol.Led by thal package direclly viqible (i.e.. so they tlo not havc Lo be qua1iiiecl by the package name). 'The lnain procedure first declarcs ID Lo be ;III in~egerbetween 1 and 5 . The specilicatiun of Llic Waiter task clecI;u-es two entries--pickup iui(l putdown tIi;it are called by ;I pliilo~oplierw l i e ~ i t wnnls to pick LIP or put clow11 its forks. 'The body o f w a i t e r is scpari~~ely con~piled;it i s givcn in Iziglt~-c 8.19. Philosopher has bccn speciticd as a lnslc lypc so (hat wc can tlecliu-e ;on an-ay DP ol' 5 such tasks. The instiuices oI' llie live philosophers are cl.eated when the declaralion ol' al-rl~yDP is elahoralccl. Each philoxopher first waits l o accept a call or its inilialization entry i n i t , then i t executes rounds iterai~rions.Variable rounds is dec\a~edg1ob.d to \he bodies or the phi\osophers s o wch can rcad it. The body o f tlie tnain procedure a1 tlic erld of Figure 8.18 initialiscs rounds by rcnding an input value. 'Then it passes each philosopher its intlcx by callillg DP(j) .init(]). Figure 8.19 contains Lhc body or the waiter task. 11 is more ccmplicatrd than the waiter process i n Figul-e 8.6, because the when conditions in the Ada
404
Chapter 8
RPC and Rendezvous w i t h Ada.Text-10; use Ada.Text-10; procedure Dining-Philosophers is subtype ID is Integer range 1..5;
task waiter is -- Waiter spec entry Pickup(1 : in ID); entry Putdown(1 : in ID);
end task body Waiter is separate;
task type Philosopher is entry init(who : ID); end;
--
Philosopher spec
DP : array(1D) of Philosopher; rounds : Integer:
- - the philosophers - - number of rounds
-- Philosopher body myid : ID; begin accept init(who); myid := who; end; for j in 1. .rounds loop - - "think" Waiter. Pickup (myid); - - p i c k forks up task body Philosopher is
--
"eat"
Waiter.Putdown(myid); end loop ; end Philosopher;
--
put forks down
begin - - read in rounds, then start the philosophers Get(r0unds); for j in I D loop DP(j) .init(j); end loop; end Dining-Philosophers;
Figure 8.18
Dining philosophers in Ada: Main program.
select stacemen1 canno1 reference entry paralneters. Each waiter repeatedly accepls calls of Pickup and Putdown. When Pickup is accepted, lhe waiter checks LO see if either of the calling philosopher's neighbors is eating. If not, then philosopher i can eat now. I-Iowever. if a neighbor is eating, the call of Pickup has lo be requcued so ihar the philosopher does not get reawakened too
8.6 Case Study: Ada
I
405
separate (Dining-Philosophers) task body Waiter is entry Wait (ID); - - used to requeue philosophers eating : array (ID) of Boolean; - - who is eating want : array (ID) of Boolean; - - who wants to eat go : array(1D) of Boolean; - - who can go now begin for j in ID loop - - initialize the arrays eating(j) : = False; want(j) : = False; end loop; loop -- basic server loop select accept Pickup(i : in ID) do - - DP(i) needs f o r k s if not(eating(left(i)) or eating(right(i1)) then eating(i) : = True; else want ( i) := True; requeue Wait ( i ) ; end if; end ;
I
1 I
or accept Putdown(i : in ID) do eating(i) := F a l s e ;
-- DP(i) is done
end; - - check neighbors to see if they can eat now if want(left(i1) and not eating(left(left(i))) then accept Wait(left(i)); eating(1eft ( i ) ) := True; want (left(i)) : = False; end if; if want(right(i)) and not eating(right(right(i))) then accept Wait(right(i)); eating(right(i)) := True; want(right(i)) := False; end if;
1 or
!I
terminate; end s e l e c t ; end loop;
--
quit when philosophers have quit
end Waiter;
Figure 8.19
Dining ph\\osophers In Ada: Waiter task
I I
406
Chapter 8
RPC and Rendezvous
early. A local away of five entries, w a i t ( I D )is, uscd to delay philosophers who have lo wait: each philosophci- is requeued on a distinct element of rile ai'ray. After eating, a philosopl~ercalls Putdown. When a waiter accepts rhis call, i\checks ifeit\ler neighbor wants to eat and whether rh:u neighbor may do so. If SO. the waiter ;\ccepls a delayed call of W a i t to awaken a philosopher whose call oi P i c k u p had been rcqueuec\. T\le accept statemen\ that services Putdown could sun.ouncI the entire select aiternative-i.c., the end of accept c ~ u l dhe ;~l'terthe two i f statements. We have placed i t earlier; however. because cllerc is no need lo clclay Ihc philosopher who callecl Putdown.
8.7 Case Study: SR Synchronizing Resources (SR) was developed in thc 1980s. The First v e r s i n ~of~ lhc languagti i nrroduced rhc rendez\,ous mechanisms described in Seclion 8.2. The language rvolvz(1 to PI-ouiclethe ~nullipleprimilivcs clescribed in Section 8.3. SR supports bo~hshared-variable and dist~.ibu~ed programming, and it: can bt: used Lo implement di~cctlyalmost all the pi-og,-alns in this bcx~k. SK pl-ograms ciul execute on .sliarcd-tnemory inultiprocesso~~s and netwocks ot wol-kslillions,as wcll w on single-PI-ocessor~nucliincs. Although SR conlains il \~:u-ietyof mechanisms, they are based on a cmall nunibes ol'orlhogotial conccplb. Also, tlre sequential and con.curl.cnt ~nechanisms arc inlcgrated so Lhar similar tl~ingsare clone i n similn~.ways. Sections 8.2 and 8.3 inlroduced ant1 illustr;ued rnany aapecls of SR-wilhout actually saying so. This section su~n~na~.izes acldirional aspecls: program stl-ucture,dynamic creation and place~t~ent, and addilional slatelnenrs. As an example, we present a program t l l a t ai~nulares[he esccutiun of processes entcring and exiling cr.ilic~11 sections.
8.7.1 Resources and Globals A n SR program is co~npriscdul' resources and globals. A r.r.rourc:e declnn~tiorr specihes a pi~l-crrifor a n~oduleand has a str~~cture quite similar LO that ol'amodule: # specification part import clauses operation ant1 type decla~-a~.ioos body name (t'ol'rnals) # body variable and other local cIeclar:~tions
resource name
7 8.7 Case Study: SR
407
initialization code pl-ocedures and processes finalization code end name A yesource contains ilnpol-t clauses if i r makes use of the declarations expor.tec1 from other resources or from g.lobalh. Declarations and irlirializalion code in the body can be intermixed; this supports dynamic arrays and perri~its[lie programmer to conrrol the order i n which variables are initiali-ced and processes are created. Tf there is no specification part, i t can be omitted. 'Hie specification and
body can also be compiled separately. Resource insrances are created dynamically by means of the create starenient. For example, executing reap := create name
(~CIU~I~S)
passes ihe actuals (by value) to an new instance of resource name and [hen executes tbac resource's initialization code. When llie initialization code tenninates, a resource capuDili!y is returned and assigned lo variable rcap. This variable can subsequenlly be used to invoke operations expovred by the resource or to destroy Lhe instance. Resources can be destroyed dynamically be means of the destroy 6l;iternenl. Execi~tion of destroy stops ally activiry in tlie named resource, execules the finalization code (if any), and rhen frees storage allocated to the resource. By default, components or an SR program reside in one addrcss space. The create sratelnent can alho be used to creake additional address spaces, which are called virtual machines: vmcap : = create v m 0 on machine
This ,+,tatementcreates the virtual machine on dlc indicared hoat machine, then retul.nS a capability for it. Subsecluent resource creation statements can use "on vmcap" l o place a new resource in that address space. Thus ST?: ~~nlike Ada, gives the programmer complete control over how resources are napped to rnachines ancl this mapping can depend on input Lo the program. A global con-~poncrrti s used to declare types: variables, operations, and procedures that are shared by resources. It is essentially a single instance of a resource. One copy of a global is stored on every virtual machine rlial needs it. In particular, when a resource is created, any globals rhal il ilnports are created implicitly if they have not yet been created.
I
!
I
i I
408
Chapter 8
RPC and Rendezvous
A n SR progl-nlrl contains one distinguishcti 111ainI.esour.ce. Execution ol' a progi-aln bcgins with implicit crealion of one inslance of this resource. The iuitiitlixalian code in the main resource is lhen executed; it oCten cl-c:ites instances of other resources. An SR progl-arn teoninates cvhen evcry process has tenninatecl or is bloclced. or when a stop statement is executed. A1 (hat point, the tun-lime syscode ( i f any) in the nzain I-esou~.ce arlcl then the finillLe~nexecutes the linal~~ation i ~ a ~ i ocode n (if any) in globals. This provides a way for thc pl~ogritnlrllerro rcgajn control t o print rcsulls or timing inilol-malion. As a simple example, the following SR program writes lwo lines of output: resource silly() write("He1lo w o r l d . " ) final write("Goodbye w o r l d . " ) end end
'The resource is cl.ealcd au~ornatically. 11 WI-ilesa line, then rermiuales. Now [lie finalization code is cxccutcd; i t writes ;I second linc. The effect is the same as if f i n a l and the first end were deleted Fro~nthe program.
8.7.2 Communication and Synchronization The distinguishiog attribuie of SR is ils variety of communic>~tionand sy11chi.onization mech:u~isms. Processes in [lie same resource can share val-i21bles,as call resources in he same address space (rhl-oughthe use of globals). Processes can also communicate and synchroni~eusing all the pritnitives described in Scctiorl 8.3: semaphores, asynchronous lnessage passing, RPC, a~lclrende~voils.Thus SR call be used lo iniplemcnr concurrent programs I'or sllarcd mernory multip~.ocessors as well as for distribuled systems. Operaliol~sare declzued in op declarations, which ha\le rile form giver1 earlier in this chapter. Such declarations ciin appear in 1,esclusce specifications. in resoul-ce bodies, and cvcn u r i t l i j ~ processes. l An opcrntion declrued within a process is callecl a Ivcc~loperufion. The declal-~ngpl-occhs can pass a capability lor a local ope]-ation 10 another process, which can then inyoke the opcnttio~r. 'l'his supports conve~~sational cunli~~uily (Sect-io~i7.3). A n operalion is irlvoked using synchronous call or ;tsyncIironous send. To specify which opel.ation to invoke, an invocalion stateinent uses an operalion capabili~yor :I held of a resource capability. Mriihin illc resource thal dec1iu-e~it. rlle nalne of an operation is in fact a capability, s o an i~~\~oci~tion state~nentc;ui
8.7Case Study: SR
409
use it di~.ectly. Resource and operation capabili~iescan be passed belwecn resources, so communicalion paths can vary dynamically. An operation is serviced either by a procedure (proc) or by Input slatetnents (in). A new process i s created 10 service each rernote call of a proc: calls from within the same address space are opli~nizedso that the caller iiself execulcs the procedure body. All processes in a resourcc execute cot1cu1-rently,at leasr conceptually. The inpuc statement supports rendezvous. 11 has the I ~ O I - I ~ shown I in Section 8.2 and call have boll1 syochronization and rcheduliag expressions t h a ~depend on paratmeters. The input statement can also cont;~iilan optional else clai~se,which is selccted if no other guard succeeds. SR al\o containa scveral ~~iechanisms that are abbreviaiions for common uses of operaiions. A process declaration is an c~bbrevjationfor an op c1ecl~1l.aliol) and a proc to sel-vice invoc:~tiol~sof thc operalion. One instance 01' lhc p~,occ\sis clcacecl by an i~nplicilsend when t1.1e resource is c~.eated.(The IYOgrninlner can also cleclare alrays of processes.) A procedure clec1;~ralioni\ an abbreviation for an op decl:~ra~ionand a proc 10 service iil\!oc;~t.ionsof the operation. Two acldiriotial abbt.eviations are the receive statement and scmapho~~es. A receive abbi-cviates an inpul stateriienl that services one operation e~lrl(hat merely stores the argulnenrs in local variables. A selnaphore cleclaratjon (sem) ;ibb~.evialesrhe declnr-ntion OIa par;~rnelcrless operation. TIic P stalcmcnr is a special case of receive. and the v st.atemcnr is a special case of send. SR provides a f e ~additional ! statemenls that havc proven to bc usel'i~l.'The reply statemenl is a variation of rile return staternenr; it returns values, but Lhe replying pcocehs continues execulion. The forward slate~nentcan bc used 1.0 pass an invocaiio~lon to
I
8.7.3 Example: Critical Section Simulation Figure 8.20 contains a coinplcte program that ernployh several ol' [he rliessagepassing mech;unisms available in SR. The pl-ogram also i l l u ~ ~ r a t ehow s to construct a \imple sirnulalion. in this case of a solutio~ito the critical section problem. The global. CS, expor1.s two operations: Csen'cer and CSexit. The body of cs conlains an arbitrator process that ilnplelnenrs these opel-alions. Ir firs1 usea an input sla(ement Lo waiL for XI invocalion of CSenter:
410
Chapter 8
RPC and Rendezvous global CS op CSenter[id: int) (call) # must be called op CSexitO # may be invoked by call or send body CS process arbitrator do true - > in CSenter(id) by id - > write("userW, id, "in its CS atM, age()) ni receive CSexit()
od end end resource main ( ) import CS var numusers, rounds: int getarg(1, numusers); getarg(2, rounds) process user(i : = 1 to numusers) fa j : = 1 to rounds - > call CSenter (i) # enter critical section nap(int(randorn(100)) # delay up to 100 msec send CSexitO # exit critical section nap(int(random(1000))) # delay up to 1 second a£ end end
Figure 8.20 An SR program to sirnulate critical sections.
in CSenter(id) by id - z write("useru, id, Lin its CS at", age()) ni 'This i s S l i ' s rcndorvous mechiunism. If there is rnorc than one invocation of C s e n t e r , thc onc thal has the s~nallestvalue for pal.alncter i d i s selecled, and a message is then printed. N c x l the arbitrator uses a receive slalelnenl (o wail [or. i ~ ni~~vocntion or Csexit. I n thia program we could have put the arbitrator process and its operations within the main resource. However, by placing lhern in it glohal component, they couJd also be used by other resources in a liugel- program.
Historical Notes
41 1
The main rcsource reacls Lwo cornmand-line afgurnenrs, then ir creales numusers instances of h e user process. Eacli process executes a (or-all loo11 (fa) in which it calls the csenter operation 10 get perniihsion Lo enler its crirical seclio~l,passing its ~ndexi as an argurucnl. We simulate the duration of critical and noncritical sections of' code by having each user process "nap" for a randorn number of milliseconds. After napping the process j.nvokes the csexit operation. The c s e n t e r operation milst be invoked by a synchronous call statement because the user process has Lo wail to get pennission. This ir\ enfol'ced by means of tl~coperation restriction (call} i n the declaration o f the Csenter operncion. flow eve^; since a user PI-ocessdoes not need to delay when leaving its critical secl.ion, i t i~lvolcesIhe csexit operation by means ol' the asynchi-onous send slaterncnt. The program c~nploysseveral of SR's predrfined f~rnctjonrs. l'he w r i t e cltalelnerlt prin1.s a line of ourpui, and getarg I-eadsa com~nand-lineargument. s number or milliscconcls Ihe The age lirnclion in tlie w r i t e stalclnetlt ~ c l u ~ nIhe progranl has been executing. The nap f~ui~tjon cause5 a procah LO "nap'' I'or ~ h c number- of milliseconds specifiecf by its argun1elit. 'l'he random f ~ ~ n c t ~r rot unr n h a psct~do-randomreal number bccween 0 and irs at.gurnent. We also use the i n t type-convetsion f~~nccion to conrerl t hc result from random 1.0 an integer. ;IS rccllrired by nap.
Historical Notes Both remote procedure call (RPC) and rcnclezvous origina~eclin ~ h clate 1970s. Early researcb o n ihc semantics, use, uld i~nplementationor RPC occurred in tlie operaling syst.elns comniunity and conrinues to be a n inierest of thal group. a1 Xcrox's Pdlo Alto Rzsearch Bruce Nelson did many of the early experi~~ienls Center (PARC) and wrore an excellcnl. disse~.tationon the topic [Nelson 19811. While purst~itlgliis Ph.D., Nelson also collaborated with Andrew Birrell ro prod ~ ~ what c e is now viewed as :i classic on how lo impleme~ltRI'C efficiently in an operating syslern kernel [Bin-elland Nelsoli 19841. Dow~ihe road from PARC at Stanli)rd University, Alfred Spctor [I9821 wrore a clisserlalior~on the selnantics and implementation of RPC. Per Brinch Hansen [I 9781 conceived the basic idea of RPC (allhough he did nor call it thac) and designed the fii-sl progra~nniiaglanguage babed 01.1 the concepl. His language is called Distributed Processes (Dl?).Processes jn DP call export procedures. When a procedure is called by another process, i t is execu~ed by a new threi~dof conlrol. A process can also hilve one "background" thread, which is executcd first and may continue ro loop. Threads in a process cxecutc
412
Chapter 8
RPC a n d Rendezvous
They synchronize using slwred variables and the when statement. which is similar lo an a w a i t statemen1 (Chapter 2). RPC has been incli~dedin sevci.;tl otlie~languages, such as Cedar, Eden, Emcrald, and Lynx. Additional languages bascd on 17PC include Argus, Aeolus, and A\l;~lon. The 1;1llei- rhree 1:tnguiiges combine RPC with what are called ~ltonlictl-crizsc~cl.ions.A transaction is a group of operations (procet1~11-es calls). It is atornic i f i r is both i~ldivisibleand recoverable. 11' a \ransaclion cnrrrrwits, all the operations appear to have been execulal exactly once each arid as an i~idivisible unit. If a tl-ansaction a h n r ~ .i~t ,has no visible effect. Atomic fransi~ctionaoriginated in the database community; they are used Lo program fault-tolerant distributed applications. S t a ~ ~ l oand s Gi fforcl [1990] present an interesting generalization of RPC called remote evaluation (REV). With RPC: a server module provides a fixecl set o f predefined secvices. With REV, a clienl can include a program aa an argument in a remole call; when the server reccjves the call, i r cxecutcs the program and then returns results. This allows a server to provide a11unli~nitcdset of ser\lices. Stamos and Gifford's papel- describes how REV can simplify rile design OF Inany disli.ibu~edsysrerns and describes Ihe deveJoperh' experiencc with n prolotype i~nplemenl;ition. Java apylets provide a similar kind of' lunctionali~y,allhough most corr~monlyon the client side; in parlicul~.an applet ib usually returned by a server and executed on rhe client's machine. Rendezvous was developed simultaneously and inclcpendcntly in 1978 by Jean-Raymond Abrial of' the Ada dcsig~iteam and by this book's ailtho]. uihe~i developing SR. The lerrn rendezvou~was coined by the Ada designers, many of whom are French. Concurrent C is another language based on rentlezvorrs [Gcbairi & Roome 1F)SG, 19891. I t exl.ends C wich processes, rent.lczvous using an accept stalement?and guarcled communication using a select scaternent. *. I 11e select statement is like that in Ada, but the accept slalemcnt ic Inore powet.ll~l.In particular, Concurrent C borrows Lwo ideas from SR: synchronization expressio~~s can re,Ference parameters and an accept statemen1 can contain a schduling expi-cssion (by clause). Concurrent C a150 alloc\/s opcralions to be invoked by c a l l or send,as in SR. Laler, Gehani and Roome [I9881 cleveloped Concurrenl C++, which co~ubinesConcurrent C with C++. Several languages include multiple primidves. SR is the besl known (see below for I-efelcnces). StarMod is an extel~sjonof Modula that supporls asynchronous message passing, RPC, rendezvous, and dynilnlic process creation. Lynx suppol-ts RPC rind rendezvous. A novel aspect of Lynx is thal it supports dynamic program reconliguralion and PI-oteclio~l with whar are ci~lled link^. The wtensive survey paper by Bal, Steiner, and Tanenbau~n[ 19891 conrai~u informaliou on and references to all the languages ~nentioneclabove. The antho)c\~itli murual exclusion.
Historical Notes
4 13
ogy by Gehnni and McGettrick [ 19881 contni~~s repl-il~tsOF key papers on several ol-'the lungi~ages(Adit, Argus, ConcurrenL C. UP, and Sli). compnralivt surveys. and assessments of Ada. Mosl distributed operating syslems implelnent iilc caches OII client workstations. The Iilc system outli~jediu Fipul-e 8.2 is essentially identical t o thal in Alnoeba. -F<~~ienbai~m et al. [19c)(lJ give an ovei-view of Arnoeba ant1 describc s comn~unica(io~) sysexperience with the system. Amoebrr uses RPC as i ~ basic tem. MJithin a nodule. Ihreads execute concurrently; they synchronize using locks and semaphores. Section 8.4 descrihecl wayb to implclnenl replicated files. The technique 0.1weighled voting i s cxaminecl in detail in CiifForcl 119791. Thc main l-ensol\ for using replicatiolj is lo make a file syhtem fault to1eral)t. We discuss Fault tolerance and adtlilio~lalways to il~iplcmentreplicated Oles in the Hisiorical Notes itt [he encl of Chapter 9. Remote merhod invocatiotl (KM1) was added to Java starting wilh version 1 . 1 oSt11e language. Explanations of RMI and examples of its use can bc found in Flanagan [lC)971and Hartley [1998]. (See the end of the I-Iistorical Notes i11 Chapies 7 for information on these books and their. Web sites.) FUI-the]in formation on RM1 can also be found at the Innin Java Web sjte www.javasoft .corn. Tn response to the gl-owirig develop~nc~ic and mainrenance cosrs ol' software, the U.S. Department of Defense (DoD) began the "cornmon I-~igherordei- Ianguage" program i n 1974. The early st.ages oT the program produced a series of recluircrnents doc~~inenls (hat cullninaled in what were callcd the Steclmnn specilicat.ions. Four industrial/university design teams subinirrcd language proposals in the spring of 1978. Two-code nalned Red and Green-were selccted for thc linal round ancl given several lnonths to respond lo comlnetils and to refine their p1'0posals. The Red design team was led by Tntern~etrics.Lhe Green team by Cii I-loneywell Bull: bod1 reams were assisied by numcrous oulsicle expcrts. Tile Green tlesign was selected ill the spl-ing o-t' 1979. (lntercstingly, rhe initi;ll Green proposal was based on synchronous message passing sil~lilirl.to ~ l ~ina CSP; l the i n the summer ancl fall of 1978.) design learn changed to rende~\~oils The L)oD narncd Lhe new language Acla. i l l honol- of Augusln Acla Lovelace. daughter or tllc pocr Lorti By,-on and a3jismnt lo Charles Uabbage, (he invcnlor or he Analylical Engine. Based on It~rthercornmenrf rund cacly experience. [lie initial vci,sjon of' Ada was reli~ledful-they and (hen srandal-dized in j983. The new language lnel with both praise and criticism-praise h r (lie rnajo~.improvelnenl over languages then in use by Ihe Don: crilicisln I'or the size alid complexity of thc language. With hindsight. the language no longel- seems ovc1.1~ complex. I11 any cvenl. some of the criticisms of Ada S3 a ~ ~nothe d he^ decade's experience
I
414
Chapter 8
RPC and Rendezvous
using it led to the refinemen~hembodied in Ada 9 5 , which incli~desnew concurI-eeot progr.arnming mechanisnis as described in Section 8.6. Several companies lnarket Ada i~nplernentationsand programming environments for a variety of machines. In addition, there are numerous boolts that describe the language. Gehani [I9831 empl~asizes[lie advanced featui~sof Ada 83; the dining philosophers algorithm in Figures 8.18 an J 8.19 was adapled from tliar book. Burns and Wellings 119951 describe the concurrent progi-anlming inecha~lisrnsof Ada 95 and hhow how to use Ada LO program real-rime and distcibuted sysrems. A compl-ehensive Web source for informalion on Ada is www.adahorne.com.
The basic ideas in SR-resources, opel.ations, inpi11slatemenl.~,and asynchronous ant1 xynchronous invocation-were conceived hy Lhe ai~elloroC Lhis book in 1975; tliey are dcscribetl in [Andrcws 19811. A full language was defined in the early 1980s and i~nplenienled by sever:il gxaduat-c students. And~rwsand Ron Olssoli clehignecl 21 new version in rhc ~nitl 1980h: it atldetl UPC. scmaphcrres, early reply. ancl several additional rnechanisrns [Andrews e l al. 19881. Further experience. plus the desire ro provicic belter support for parallel program~ning,led L o [lie design of ve~.sion2.0. Andrecvs and Olsson 119921 describe SR 2.0, give nilrnervus examples. ancl provicle an overview of the i111plcmentation. A book by Stcpllc~l1-la]-(ley 1.19951 describes corrcurcelir programming in SR and i s intended as a lab book I'or operating sysLcms 01-concurrent ~ SR pro,jecl and implementalion is p~.og~.amn~ing clashes. The ho~nepage f o the www.cs.arizona.edu/sr.
The focus o f this book is on how lo wriie m~~lliihreuded, par.alIe1. and/or distribured programs. A related. bur highei:level lopic i h how lo glue togctlier existing or future application programs so they can work rogether in a distribuled, Web-bahed environnlent. Sofiwaue systcins fhal provide (his glue have come to I>e called midc1le~:are. CORBA, Active-X, and DCOM are dircc of rhe hest known examples. They and most olhel- ~nidtllewaresystems are based on objectorientecl lech~lologies.Com~nonOb-ject Request Broker AI-chilecrure(COBRA) is a collection of specifications and 1001s Lo solve problems of interoperability in distributed systems Acdve-X is a rechnology for combining Web applicalions such as browsers and Ja\;a applels will1 deslttop services such as documcnl processors and spreadsheets. Distribured Componeilt Object Model (DCOM) serves as i I basis for remote communicacions-for example, between Active-X components. A book by Anijad Urriar r1997-1describes lhese technologies as well as many 0~1ie1.s.A useful Web site for COKBA i h www. omg .org; one for Active->( and DCOM is www .ac tivex. org.
References
415
References Andrews, G. R. 198 1 . Synchronizing resources. ACM Tr~rns.o1.1Prog. Lairgr.ltrges nlad Sj~sr~tr?s 3, 4 (October): 405-30. Andrews, G. R.. and R. A. Olsson. 1992. Concurrvnt PI-ogrrrni.nting h SK. Merllo Park. CA: Re~~jamin/Cumining$.
G.K., R. A. Olsson. M. Coffin, 1. Elshofl'. I<. Nilsen. T. Purdin, and G. Townsend. 1988. An overview or the SR languagc and irnplement;~tion. ACM Tmns. o ) Prog. ~ I.anguages and S y r f ~ m s10, I (January): 5 1-86.
Andrews,
Bal, M. E., I. G. Steiner, ant1 A. S. Tanenbaum. 1989. Programming languages lor distributed colnputi ng syslems. ACM Co~r~pu~ing .Survevs 2 1 , 3 (September): 26 1-322.
Birrcll, A. D., and B. J . Nelson. 1984. Impletnenting remore procedure calls. ACM Truns. or?.Computer- Syszoms 2, 1 (February): 39-59.
Brjnch Hansen, P. 1978. Distributed processes: A concuirent progralnrrling concept. C0rn.m.ACM 2 1, 11 (Novem her): 934-41.
Rums, A., and A. Welljngs. 1995. Cloncui-r-c?t~cy in Acln. Cambridge, Ellgland: Cambridge University Press. Flanagan. D. 1997. Java Exal.nples in (I N ~ ~ t s h e l A l : Tuforinl Corr~~panion lo J(~vr* in u Mitshell. Sebastopol, CA: O'Reilly & Associates.
Gehani, N. 1983. A r k : A n Advunced lnlrod~~ctiotr. Englewood Clirls, NJ: Prenlice-Hall.
N. H.,and A. D . McGettrick. 1088. Coi1cur-rc.11f /'rog~-~~rnrning. Reading, MA: Addison-Wesley.
Gehani,
Geliani. N. li., and W. D. Roome. 1 986. Concurrent C. Sr,f;rware-PI-ucfic.e turd Expei-iencc. 1 6, 9 (September): 82 1-44. C;ehani, N.H., and W. D. Roome. 1988. Concurrent C++: Concurrenl proglxm~ n i ~ with i g clas.s(es). Sqji-runre-Pwctice und Experience L 8, I 2 (Decembcr): 1 157-77. Gehani, N. H., and W. D. lioomc. 1989. The Conc.urrer~tC Pr-ogrtrn-riving 1.mguagc. Summit, NJ; Silicon Press.
Gifford, D. K. 1979. Weighted voting for replicated dala. Proc. Sevei7~hSynl). 0 1 7 Operulirrg Sysiems P I . ~ I ~ c ~ (December); ~I~PS 150-62. Hartley. S. J. 199.5. Ol)pp.u~ilzgS ) S / ~ I P~-og~-a~i?rni~g: ~IS T/IP Sl? l-'r-ogr~~t~ir~.i~tg Lungrruge. New York: Oxford U nive~'sityPress.
416
Chapter 8
RPC and Rendezvous
Harrlcy. S . J. 1 998. Concurrent Progmm~ni~tg:The JUI:U Pmgrantmi~lg Languugc. New York: Oxl'ord University Press. Nelson, B. J . 195 1 . Kemote procedure call. Doctoral dissertation, C M U CS-8 I - 1 19, Carnegie-Mellon Uoiversi ty, May. Spector, A. 2. 1082. PCI-formingremule operations efficiently un a local cornputel network. Com~n.ACkl25, 4 (April): 246-60. Sramos, J. W, and D. K Gifford. 1990. Remote evaluation. ACM Tmns. on Prog. Dmguages arrd Sy.rrc?nzs1 2,4 (October): 5 37-65. T~ncnbnum,A . S: R. \>an Kenesse, H. van Staveren. G. I. Sharp, S. J. Mullender. J . Jansen. and 6.van Kossum.. 1090. Experiences with the Amoeba dislribulcd npcraling syslem. C O I T IACM I ~ . . 33, 12 (Decembel-): 46-63. U mar. A. 1 997. Ol,jcc,i-Orienfed Clicnc/Sen~c~r Irrterrtet E ~ t v i ~ n ~ z t ~ wEnglenf.~. wood Cliffs, NJ: Prenlice-llall.
Exercises 8.1 Modil'y the lime server nlodule i n F i g u ~ e8.1 so that lhe clock Ilrocess does no1 get awakened on every (ick oC the clock. Instead, the clock proccss should set the l~at-d\val,e~irncrto go off a1 the nexr interestiny evenl. Assur~lelllar Ihe time of day is m:~inti~inedin milliseconds and 1ha1 lhc ~irnerciui be sel lo ;uiy nunlbel- of milliseconds. Also assume thal processes can read how much time is Jell bcforc h e 12a1dwaret.imcr will go off. Finally, aasume tha! rllc tinier can be reset a1 any li~ne.
8.2 Consider Ibc distribured f le system in Figirre 8.2 (a) Develop complete programs for the file cache and file server modules. De\/elop jrnplemen(ations or tl~ecaches, add synchronization code. and so on. (b) .l'he dis~ibutedtile system modules are programmed usjng RPC. Reprogram the file system using the rendezvor~sprinlirives defined in Secrion 8.2. Give a level of clctail comparable to that in Figure 8.2.
8.3 Assume rnodules have the form shown in Scclior) 8.1 and that processes in d i k r ent modules communicate using RPC. In addjrion, suppose that processes seu.vjcing remote calls execule with muriial exclusion (as in monitors). Cotldirion synchronization is progranlrned ~1sin.gthe statanent when B. This delays [he executing process unlil Boolean expression B is true; B can reference ;uny variablcs in the scope oC (he statcrncnt. (a) Rep-ogl-am tlic rime server module in Pigr~re8.1 lo use these mechanisms.
Exercises
117
(b) Repropra~mthc merge liltel modi~lcin Figure 8.3 to use these nlechanisms.
8.4 The Merge inodule ill Figure 8.3 has cliree procedures ancl a local process. Ch;inge the itnplcrncl?tation lo gel rid oJ'p~.ocessM. [n pa~*ticular, let cl,e processes servicing calls 1.0 i n 1 and in2 lake on lie role o t ' ~ .
8.5 Rewrite the Timeserver pi-ocess in Figure 8.7 so that the delay opcl-alion specifics an i111e1.valas in Figure 8.1 rather than an ~lctualwake-up lilne. Use only the rel~dezvouspri~nilivesdefi~le(lin Scclion 8.2. (Hi,i/: You will 11ced one or Inore aclditional operations, and the client will nor he able si~nplyto ci~lldelay.)
8.6 Consider u sell'-sclleduling disk drivel pl-ocess as in Figure 7.9. Sul,puae [lie pro-
cehs exp0rt;i onc opernlioll: request (cylinder, . . .) . Sho\v how lo usc rendezvous anti an i n staternell1 to imple~nent each of rhc following diskscheduling nlgorilhnlh: shortest seek Lime, circular sc:in. and clcvarol-. (Hir.11: USC scl~cclulingexpl-essior~s.)
8.7 Ada p~.ovidesrendezvot~sprimitives similar Lo thohe defined i 11 Secl~on8.2 (see Section 8.6 f01-details). I-1owcve1-,in the Ada cqui\fi~lcntof [lie i n sht.etnc~)t,synchro~~iz:~tion expressions cannot reference lol-~nalpa~.nmclel-sof operations. Moreover, Atla docs IIDL provide scheduling expl-essious.
Using cithel- the rendezvous primitives o f Seclion 8.2 ancl (his rehtricred form OF i n 01- the actual Ada primilives (select and accept). I-ej>rc)grikln~ I I C I'ollowing aigori~h~ns.
(a) The centralizetl dining pliilo~ophersin Figul-e 8.6, (b) Ttre time server in Figure 8.7 ( c ) The shurt.esr,iob-11ex1allocat.or in Figure 8.8
5.8 Conhider the lollou~ingspecification 1301'a [xogl'arn Lo find the ~nirli~nclrn o l a set oF integers. C;iven is ikn array of pi-ocesses Min [ 1 :n I . Initially. each process has one integer value. The processes repeatedly il?rcracl,with each one trying to give :tno(he~the niinirnurn of the set o f valucs i t has seen. I f n process gives away i t s rnijiimum value, il terminates. Eventually. one process will be left, and i[ will know Lhe niinimutn of the or-iginal set. (a) Develop a p~'0gl-illllto solve this l~mblem using only the RPC primitives clelined i n Scctio~l8. I .
(b) Deve1o.p a progl-am Lo solve (his l>roblem using only the rendezvous 131-imirives del-incd in Sectio~i8.2.
418
Chapter 8
RPC and Rendezvous
(c) Develop a program Lo solve this problem using the multiple primitives clefined in Section 8.3. Your program should be as simple as you can rnake il. 8.9 The readerslwriters algorithm in Figure 8.1 3 gives readers preference. (a) Change the input starement in w r i t e r Lo give writers preference. (b) C h a n ~ ethe input staten~enlin writer so that readers and writers alternate turns when both want Lo access the database.
8.10 The Fileserver [nodule in Figure 8.15 uses c a l l to update remote copies. Suppose the c a l l of remote-write is I-eplaced by asynchronous sena. Will the solu~ionstill work? I F so: explain why. I F nol, explain what car) go wrong. 8.1 1 Suppose processes communicate using only the KPC mechanisms defined in Seclion 8.1. Processes within a lnodule synchl-onize using semaphores. Reprogn~m each of the following algoritl~rns. (a) The BoundedBuf f e r module in Figure 8.5.
(b) The Table module in Figure 8.6.
(c) The S m - ~ l l o c a t o rlnodule in Figure 8.8. (d) The KeadersWri ters module in Figure 8.13. (e) The Fileserver module i n Figure 8.15.
8.12 Develop a scrver process thal irnplen~enlsa reusable barrier for n worker processes. The server has one operation, arrive ( ) . A worker caJls arrive ( ) when it arrives at a barrier; the call terminares when all n processes have arriveti. Use the rendezvous primitives of Section 8.2 ro prograrn the server and workers. Assunie the existence of a function ?opname-as defined in Section 8.3-that relunls the number of pending calls of ogname.
8.13 Figure 7.15 presented an algorithm for checking prirnality using a sieve ol filter processes; i t was programmed using CSP's synchronous message passing. Figill-e 7.16 presenred a second algorithxn using a manager and a bag of tasks: it was programmed using Linda's tuple space. (a) Reprogram the algorith~nin Figure 7.15 using the rnultil~lepi-i~nitjvesdefined io Section 8.3. (b) Reprogram the algorithm in Figure 7. I 6 using the multiple primitives delined
in Section 8.3. (c) Conlpare the performance o i yoiu. answers to (a) and (b). I-low many mcssages get sent to check the primality of all odd numbers from 3 ro n? Count a
4 19
Exercises
sendfreceive pair as one messagc. C0unl a c a l l as two Inessages. even if tlo value is returned. 8.14 Tlre .Snvinjis Accounl Pmblen,. A savings accounc is shared by several pcoplc.
Each person may dcposit or withdraw l'unds (I-om [he account.. Thc currenr balance in Lhe accoutlt is die silln (SF all deposils lo date minus the sum of all withcll-awals to dace. The balance must never become negative.
Using ehr tnultiple primiti\~esnotalion, tlevelop a servcr to solve this prob.letn and sl~owthe client interl'ace lo Lhe server. Thc server exports two ol,ec.atio~ls: deposit (amount)and withdraw(amount). Assume (hat amount is positive and rhal withdraw milst delay until Lhere are suflicien~runds. 8.15 Two kinds of processes, A'S and ~ ' s ,ollter a I-ooni. An A process cannot leavc until it laeers two B processes, and a B process cllnnot leave until it ~rleetsone A
process. Each kind of process leaves the rooni--wil-houI meeting any orher pronu~iibel-of other processes. cebses-once it has mcl thc reqi~i~.ed
(a) Develop a server process lo implement this synchronization. Show c hA 1111er~ fact of rl~eA and B processes Lo [lie server. Usc [he mulljple primitives nolalion defined in Section 8.3. '
(b) Modify your answer to (a) so t11iI1 he first of tbc two B processes [hat meets an A process does not leave the Inom until alkr Lhe A process meets a second B process. 8.16 Supposc a computer center has two printers, A and B, that arc siinila~but not iclenrica). Three kinds of client processes use the printers: those rllat must use A. tltosc rhnt must use B, ;und those that can use eillier A or B. Using Lhe mulliplc primitives notation, develop code 111~1each kind of client cxecutes lo rquesc and release a printcr, and develop a server process to ;~llocatr Lhe prjr~cers. Your solutjon should be [air, assunli~lga clielil using a printer eventually releases it. 8.17 The 1<0/1c1.COLIS~CY Problem. Suppose there are n passenger processes a ~ onc ~ d car process. The passengers repeatedly wail to take rides in the car. which can hold c passengers, c c n. However, the car can go around the tracks only when ic is fiill. (a) De\elop code for the actions of the passenger and car processes. Usc llic multiple primitives notalion.
(b) Generalize your answer to (a) 10 e~nploym car processes, rn > 1. Sincc rllerc they must li~iisli going is only one (rack. cars cannot pass each other-it.,
420
Chapter 8
RPC and Rendezvous
around the track i n the order ~ I Jwhich they started. Again. a car can go around the tracks only when it is full.
8.18 Stable Murviage Proble~n. Let Man and woman each be arrays of n processes. Each man ranks the women from 1 to n, and each woman ranks the men from 3. to n. A pairing is a one-to-one cooespondence of men and women. A pairing is sral~leif, for rwo men rn, and m, and their paired wotnen w, and w,, both of h e following conditions are satisfied: ranks w, I~igherthan w,
01-w,
ranks m, higher lhan m,; ancl
m, ranks w, higher than w,
01. w,
ranks m, higher th:ul m,.
m,
Expressed differently, a pairing is ut~stableif w man and woman would both prefer each other to their ciurenr pairing. A solution to the stable marriage problem is a set of n pairings. all ol'which are stable. (a) Using the multiple prirnjtives notation, write a program to solve the srable marriage problem. (bj The srable roon-unates problem is a generalization of the stable marriage problem. In particular, there are 2 n people. Each person has a ranked lisl of preferences for a roommare. A solurjon to ihe roommates problem is a set of n pairings, all of which are stable in the same sense as lor the marriage problem. Using the multiple pritnitives notation, write n program Lo solve Lhe stable I-oom-
Inales problem.
8.19 The Fileserver module in Figure 8.15 uses one lock per copy of the file. Modify h e program to use weighted voting, as defined at the end of Section 8.4.
8.20 Figure 8.15 shows how to implement replicated files using tlle multiple primitives nolation defined in Scction 8.3. (a) Wrile a Java program ro solve the same probletn. Use RMI and synchronized methods. Experiment with your program, placing different file servers on differen( machines in a network. Write a short report describing your prograln and experiments and w h a ~you have learned. a11 Ada program ro solve the samc problem. You will have to do research lo figure our I~owro distribute the lile servers onto different machines. Experiment with your program. Write a short report describing your progl.am and experiments and what you liwe learned.
(b) Write
(c) Write an SK program to solve the same problem. Expzriment wilh your psograin, placing differenl file servers on different machines in a netwurk. Write a short report describing your program and experiments and what you have Iearnd.
Exercises
421
8.2 1 Experiment with [lie Java prograrn for a remote darabasc in Figure 8.16. Try running (he program and scsing what happens. Modify the program Lo have mul~iple clients. Moclify the program to have a more realistic database (at least wit11 operalions hat take longel-). Write a brief rcport dcscribing what yo11 have Lriecl and what you have learned. 8.22 Figure 8.16 con~ainsa Java program that i~nplemenlsa siniple remote tlar;tbasc. Rewrite the program i n Ada or i n SR. then experiinent with y o u r program. For txalnpls, ndcl ~nullipleclients and make the database more r,e:~listic(a\ least with ope~'ationcthat lake longer). W r i ~ ea brief 1,eporL describing how yo11 implemcnlecl the pl.oZl-am i n Atla 01- SR, what expcri~nentsyou tried. ant1 w h a ~you learned.
8.23 Figures 8.18 ant1 8.19 conlain an Ada program that implcnients a si~nulationol' rl~edining philosophers problem. (a) Get the program lo run. tlien experiment ~ ~ i il.~ lhi i-er;~irlplc.have rhe philosopher-s slccp I'or rar\dom periods when [hey i>rethinlting ancl eating, and try diffcf-en1nurnbel-s or I-ouncls. Write a brief report describing wli;~ryo11 tried and \vhat you lean'lecl. Rewrite tl~cprogram in J:LV;I 01'in S R , llien expe~.i~ncnt with your progl.ilm. For examplc. have the philosophers vleep fol- randorrr periods when tl~cyare ~liinltinga11d ealing, and try differen[ numbcrs OF l.ounds. Write a hrief I-eparl describing how you implementecl the program in Java or SR. what experiments you tried. and w h a ~you learned. (17)
8.24 Figure 8.20 contains an SR program that sirnulares a solu~ionto the critical section problern. (a) Get the prog~.;~ni lo run, [hen experiment wilh it. For example. ~nodil'y the delay inLer\lals or change the scheduling priority. W~.ilca brief report ~lescribingwhar you [I-iedand what you Icr\rnecl.
two
(h) Rewrire the program in Java or Ada, then ex.pcriment with your progl-nyn. For example, modify (he lwo delay iii~.ervalsor change the scheduling priority. Write a bricf report describing how you i~nplemenledthe program in Java or Ada, whal experiinenls you tried. and whar y o u learned.
8.25 Exarise 7.26 describes several parallel and distributed PI-ogl-amniing projects. Pick one of those or someLhi11g similar, the]) design and i~nplernenta solutiol~ using Java, Ada, SR. or a subroclrine library lhat supporls RPC or re~iclczvous. When you have fioished, wri~ea report clescl-ibing your probJaii and solurion and demonsh-ate how your program works.
Paradigms for Process Interaction
As we have seen several times, here are three basic p~.ocess-interactiotlpatterns: produce~~lcons~~~ner, clientlservel-, and interacting peers. Chapter 7 showed how lo program these patterns using message passiog; Chapter 8 showed how to program them using RPC atld rendezvous. The three basic palerns can also be co~nbinedi n numerous ways. This chapter describes several of these larger process-interaction patterns and i l !ustrates their use. Each pattern is a paradigm (n~odei)l'or process interaction. In particular, each paradigm has a ilnique slruclure that can be used Lo solve many diiferent problenls. The paradigins we describe are as follows: managerlworkerb, wkich represent a disrriburecl implementadon of a bag of tasks: heal-tbeat algorithms. in which processes periodically exchange info~mation using a send then receive interaction;
pipeline ;~lgorithms.i n whjch information I-lowsfrom one process ro another i~sitiga receive (.hensend interaclion; probes (sends) and echoes (~'ecei\~es),which dissernic~aleand gatl~erinlorr~iationin trees and graphs; loken-passing algorithms, which are another approach co decenlralized decision making; and replicated server processes, which manage ~trulljpleinstances of resoul-ces such as files.
424
Chapter 9
Paradigms for Process Interaction
The firs1 t111-ccparadig~nsare com~nonlyused
it1 parallel computations; Ihc other f-'o'ourarise in c1isu.i buted sy slelns. We illustrate how tlie paradigms call be used to solve a variety of problems, includi.ng sparse matrix c~~ulljplicalion, image processing, distributed mah-ix multiplication, co~nputingthe topology of a network, distributed mutuid exclusion. distributed termination delcction, and decentralized dining phi1osophe1.s. We lacer use rl~ethree parallel compuring ~ a d j g r n sto solve thc scielltitic cornpuling prublet~~s in Chapter I I . Additional a1)plictilions are described i n h e txercises, including sorting and the ~l-avelingsalcsn~an.
9.7 ManagerIWorkers (Distributed Bag of Tasks) We inkoduced the bag-(~l'tasks paradigm i n Sectio~l3.6 and showed how to g variables for cornmunicarion and synchronization. in~plemenc it u s i ~ ~shared Rccitll tlie basic idea: Several wol-ker processes share a bag hc?l conlains indepertdent tasks. Each worker repealedly rernoves a task fiom rhe bag, executes ir, and possibly generates one or Inore new taslts that i t puts into Lhe bag. Tlie benefits OF this uppl-oach to implemenling a parallel cotnputarion are Ihal il is easy to valy rile number of worlcers and it is relatively easy to ellsure thar each does about [lie same amount of work. Herc we show how to implement the bag-o-f- asks paradigm using message a managcr propassing i.ar11er than shared val-iables. This i s done hy e~nployi~lg cess lo implement the bag; Iiand out tasks, collect results. and delecr termination. The worke1.s gel tasks and deposit resulls by co~nrnunicatiilg cvilh rlie managel.. Thia tlie lnaliager is essentially a server process md the workers are clienls. The lirsl exarnplc below shows how lo mtlltiply sparse matrices (malrices for which most enrrics are 7ero). The second example revisic.s [he quadl-aturc problem and uses a combination of static and adaptive inrcg~.ationintervals. In borli examples. the total nil mbcr of tasks i s f xed, bul the amount of wo1.1~ per task ib variable.
9.1. I Sparse Matrix Multiplication A s s ~ ~ mlhat e A and B are n x n mauices. Tlie goal, as before. is to compute the i matrix product A x B = C. This requires computing na inoer products. f i ~ c l inner producr is the sum (plub reductio~i)of the pai~.wiseproducts 0.1two vectors of lenglli n. A ~niltrixi s said lo be dense i f mob1 e~ilriesare norizel-o and il i s . c p l : v c J if lnosl entries are zero. If A and B are dense malrices, then c will also he dense
9.1 ManagerNVorkers (Distributed Bag of Tasks)
425
(utlless diere is considei.able cancellation i n the inner products). On the o(11e1. hand: if either a or B is sparse, tI1e.n c will also be sparse. This i h becauce each zero in A or B w i l l lead to a zero in n vector products and hence will not contribu~ero thc resull of n of rhe innei- products. For example, if solme row of A contains nll zeros, then tlie correspondjng row of c will also contain all zeros, Sparse rnalrices arise i n many contexts, such as nu~nel.icalapproximations to partial differential equa~ionsand large systems of linear equations. A rricliago11a1~natrixis an example of a sparse mauix; it has zeros everywhere except on the rnain diagonal and Lhe two diagonals immediately above and below the main diagonal. WI-(en we know that matrices are sparse, we can save space by storing only the nonzero entries, and we can save tim.e when multiplying matrices by ig11o.1-ing entries that are zero. We will represenr sparse matrix A as follows by storing infor~nationabout the rows of A: int lengthA[n] ; pair *elernentsACnl ;
The value of lengthA [i1 is the number of nonzero entries in row i of A. Variable elernenteA [iI points to R l i s t of [he nonzero encries in row i. Each entry is represented by a pair (record): an integer colutnll index and a double precision data value. Thus if length~ti]is 3, then there are three pairs in the list elementsAfi]. These will be ordered by increasing values of the column i n.dex.es. As a concrete example, consider the following:
This r-epresei~lsa sjx by six matrix conrailling ~ O L I Inonzero elements: one in ]-ow 0 colu~nn3, one in row 3 collunn I, one in row 3 column 4, and one i n row 5 , column 0. We will represent matrix c in rlle same way as A. However. lo facili~ate matrix multiplication, we will represent rnatrix B by colu~nnsrather than rows. [n particular, lengthB indicates the number of nonzero elements ill each column of B. and elementsB contai~~s pairs of row index and data value. Computing h e m a r ~ i xproduct of A and B requires as usilal examining the n2 pairs of rows and columns. Will) sparse ~natrices,the most natural size for a
426
Paradigms for Process Interaction
Chapter 9
module Manager type pair = (int index, double v a l u e ) ; og getTask(resu1t int row, len; result g a i r [*]eLemsl; op putResult ( i n t row, len; pair [ * I elems) ; body Manager i n t lengthA[n] , lengthC 1x11 ; pair *elementsA[n), *elementsCIn]; # matrix A is assumed to be initialized int nextRow = 0 , tasksDone = 0;
process manager t while (nextRow < n or tasksDone < n) { # more tasks to do or more results needed in getTask(row, l e n , elems) - z row = n e x t R o w ; len = lengthA li I ; copy pairs i n *elernentsA[il to elems; nextRow++; [ I putResult (row, l e n , elems) -> lengthC[rowl = len; copy pairs in elems to +elementsC [row]; tasksDone++; ni
!,.,, ,,, , .r, I..'
'h,
88,
,
I
4
,#..
:
."I
1
1.4 1;;:
1
\mi
end Manager Figure 9 . j (a)
Sparse matrix multiplication: Manager process
c, because an entire row is represented single vector of (column, value) pairs. Thus there are as many tasks as tl1e1.e arc rows of^. (An obvious optimization would be Lo s k p over rows of A that arc e n l i ~ d yzeros-namely. Ulose for which lengtkA[i] i s 0, b e c a ~ s ethe corresponding rows of c would also be all zeros. This is no1 likely in realistic applica-
task i s one row OF the resull rtiatrjx
by
:I
Lions, however.) Figure 5). 1 contains code to imple~nentsparse matrix multiplication using a manager and several workcr processes. We assume that malrix A i s already initialized in the manager and that each worker contains an initialized copy o-F rnalrix B. The processes inleract using the rendezvous primitives 01 Chapter 8, because that leacls to the simplest progi-am. To use asynclrronous message passing, the rnanager would be programmed in !he style o.F an active monitor (Section 7.3) and the call in the workers would be replaced by send then receive.
9.1 Managerworkers (Distributed Bag of Tasks)
427
process worker[w = I to numWorkers] { i n t l e n g t h 8 [nl; pair *elementeB[nl; # assumed to be initialized int row, lengthA, lengthC; pair *elementeA, *elementsC; int r, c, na, nb; # used in computing double sum; # inner products while (true) ( # get a row of A, then compute a row of C call getTask(row, lengthA, elementsh); lengthC = 0; for [i = 0 to n-11 INNER-PRODUCT(i); # see body of text send putResult(row, lengthc, elementec);
Figure 9.1 (b)
Sparse matrix multiplication: Worker processes.
The manager process in Figure 9.1 (a) implements &liebag and accumulates results. It services two operations: getTask and putResult. Integer nextRow identifies lhe next task-namely, the next row. Integer tasksDone counts the number of results that have been returned. LVhen a task is handed out. n s x t R o w is incrernented; whet) a result is returned, tasksDone is incre~nented. W h e n bokh values are equal to n, all tasks ha\~ebeen completed. The code for the worker processes is shown in Figurc 9.1 (b). Each worker repeatedly gets a new task, perl'or~nsit, then sends the result to the manager. A task consists of conlputing a row of the result matrix c. so a worker executes a for loop to compute n inner products, one for every column of B. Howevel; the code for computing an inner product of I w o sparse vectors differs significantly F~omche loop for computing an inner product of two dense vectors. In particulu, the meaning of INNER-PRODUCT ( i ) in Lhe morker code is sum = 0.0; na = 1; nb = 1; c = elementsA [nal->index; # column in row of A r = elementsB [il [nbl->index; # row in column of B while (na c = lengthA and nb < = lengthB) ( if (r == C ) { sum += elementsA[na] ->value * elementsB [i] [nbl ->value; na++; nb++;
428
Chapter 9
Paradigms for Process Interaction
} }
c = elementsA [na]->index; r = elementsBIi1 [nbl->index; else if { r < c) { nb4-+; r = elementsB [i] [nb] ->index; else t # r > c na++; c = elernentsAtna1 ->index;
1
1 if (sum
!= 0.0) ( # extend row of C elementsC[lengthCl = pair (i, sum) ; lengthC++;
1
The babic idea is lo scan rli.ro~~gh the sparse representarions for the ow of A and column i of B. (The above code assumes that the ]-ow ai,cl colunln each contain at 1c;lst one 11on~el-o element.) 'The inner producl will be nonzero only if there are pairs ol'values that have the surne column index c in A ant1 row i~~clcx r in B. The while loop above finds all tllcse pairs and adds up (heir products. I ( the sum is nonzero when the loop lermilzatcs, hen a new pair is added lo lhc vector ccprcsenting Lhc cow of c. (We assume tllar spacc for the elements 11;~salready been allocated.) After all n inner products have beer) compurcd, a worker- sends the row of c to the manager, then calls rhe manager ro gel another rask.
9.1.2 Adapfive Quadrature Revisifed '171ie quadrature problem was introduced in Section 1.5, whcre we presented static (iterative) and dynamic (~ccu~sive) algorill~msfor approxil~~atilzg the integral of firnclion f ( x ) from a lo b. Sectjun 3.6 sliowed liow to implement adaptive quadral~~re using a shared bag of tasks. I-Iere we use a manager Lo i~ilplernenta djst~.ibutedbag ul' tasks. However, we use a cotnbination of the static and dynamic algorilhms rather than the "pure" adaptive quadrature algorithm used in Figure 3.2 1 . In particular, we divide Llle interval I-'I.OITI a to b inlo a fixed number ol' subintervals and then use the adaptive quadrature algorithm wilhjn each suhinte~.val.This combines Uie simplicity of an iterative algorilh~nwilh the greatei- acculxcy of an adapLivc algorithm. In particular, by usiirg a s~alic11umbe1of tasks, the manager and workers are sinlpler and t11el.e are far fewer interactions between [hem. Figure 9.2 conlains the code for the manager and workers. Because the m a n a p is esbex~tiallya server process, we again use rendezvous for interaction
9.1 ManageriWorkers (Distributed Bag of Tasks)
429
module Manager op getTask(resu1t double left, right); op pu tResul t (double area) ; body Manager process manager { double a , b; # interval to integrate int numIntervale; # number of intervals to use double width = (b-a\/nuTntervals; double x = a, totalArsa = 0.0; int tasksDone = 01 while (tasksDone c nvmIntervals) { in getTask(left, right) st x < b - > left = x; x += width; right = x ; [I p u t ~ e s u l t(area) - > totalArea += area;
tasksDone++; ni
1
pi.i111[he result totalArea;
1 end Manager
double f 0 { ... ) double quad ( . .- 1 {
I
I
il
)
# function to integrate # adaptive quad function
process w o r k e r [ w = 1 to numWorkersJ { double left, r i g h t , area = 0 . 0 ; double f l e f t , f r i g h t , lrarea; w h i l e (true) { call getTask(1ef t, right) ; fleft = f (left); fright = f ( r i g h t ) ; lrarea = (fleft + fright) * (right - left) / 2 ; # calculate area recursively as shown in S e c t i o n 1.5 area = quad(1ef t, right, f l e f t , fright, lrarea) ; send putResult (area);
I
I
.. .
}
1
3
Figure 9.2
Adaptive quadrature using manager/workers paradigm.
430
Chapter 9
Paradigms for Process Interaction
betwecn the manager and workers. Thus the manager has the SiIIlIc stiucture as thi~tshown in Figure 9.1 (a) ancl again cxporls two operations, getTask alld putResult. However, the operations have different parameters, because now a task is defincd by the enclpoinls of an interval, Left rrnd right. and a result consists of the area under f ( x ) over thal interval. We assume the values of a. b, and numIntervals are given, pcrllaps as command-line arguments. From tl~csevalues, the manager co~npu[esthe width of each interval. The managel- then loops, accepting calls of getTask and putResult, until il has received one rcsult for each interval (and hence task). Notice the use of a synchronization expression "st x< b:' i n the arm for g e t T a s k in the input statemenr; this prevenrs getT a s k from handing out another [ask when the bag is empty. The worker processes in Figure 9.2 share 01. have theil- own copies of r l ~ e code for functions f and quad, where quad is the recursive function given in Section 1.5. Each worker repeatedly gets a task rrorn the manager, calculates the ;mguments needed by quad, calls quad to approximate the area under f(left) to f (right),[hen sends the resull to rl~emanager. When the program in Figure 9.2 terminates, every worker will be blocked at ils call of getTask. This is often harmless, as here, but at times il can indicaite deadlock. We leave to the ~eadermodifying the program so that the workers lesininate nor~~~ally. (Hint:Make getTask a fu~unclionthat retiu-ns true 01.false.) Tn this program the amount of work per task varies, depending on how rapidly the Function f varies. Thus, if there are only about as many [asks as workers, [he co~npulatjo~~al load will almost cerrai~lly be unbalanced. On the olher hand, if there are loo many tasks, then there will be unnecessary interaclions belweei~c-11e manager and workers. and hence iilinecessary overhead. The ideal is to have just enough tasks so that, on average, each worker will be responsible for about the sarne roral arnounl or work. A reasonable heuristic is to have from lwo to three times as rnaliy tasks as workers. which here means that numrntervals sllould be two to three times larger than nurnWorkers.
9-2 Heartbeat Algorithms The bag-of-tasks paradjg~nis useful for solving problems that have a fixecl numbe]- of independent tasks or that result from use of [he divide-and-conquer parxligm. The Atcirlbeur puiudigm is useful f o ~many data pa~.allelirerative applicalions. In parlicular, it can be used when [he data is divided among workers, each is responsible lor updating a particular pall, and new data values depend on values held by workers or their in~~nediate neighbors. Applications include
9.2 Heartbeat Algorithms
431
grid computations, which arise in image processing or solving partial diFferentia1 equations, and cellulw automata, which arise in sirnillations of phenomena such as forest fires or biological growth. Suppose w e have an mray of data. Each worker is responsible for a part of the clara and has the following code outline: process Workerli = 1 to numworkers] (
decla~ations01' local variables ; i~~itialize Local variables; w h i l e (notdone) { send values to neighbors; receive values From neighbors; update local values; 1
We call (his type of lzrocess interaction a Izeur-rbuar algorithm because I-heactions of each worker are like the heating 01' a heart: expand, sending infol-~~~ation out; conl.racr, ~allieringrlew info~-nx~tion; then process thc in-formation and I-epeat. IF the dara is a two-dimensional gricl, i t could be divided inlo strips or blocks. With strips ilierc woulcl he a veclor of worlters, and each \vould have t w o neighbors (except perhaps for the worltcrs at the cnds). With blocks (here would be a matrix of worlcers, and each would have from two ro eight neighbors, depending on whether the block is on thc cornel-,edge, 01.interiol- o.l' the array of data. and depending on I~owm:uny aeiglzbol-ing values are needed to update local values. Three-dimensional arrays of data could be divided simil:~r\yinto planes. rectangular prisms, or cubes. The sendreceive interaction pattern i n a heartbeat algoridlm produces a "fi~zzy''barrier anlong Ihe wo~.lters. Recall that a barrier is a synchronization poi111 that all workers musr reach before :;ny can proceed. In an iterative colnputa~ion,a barrier ensures that every woi-ker linisl~esone ireration before starling the next. Above, the message exchange ensures that :I worker docs not begin a new ~~pditte phase ~ u i t i lIES neighbors have co~nplctedthe previous updale phase. Workers lbat are not neighbors can get more than one il.cralion aparl, but neighbors cannot. A true barrier is not ~.equiredIxcause workers sl~al-edata only with llleir neighbors. Below we develop lieartbeat algoritlin~sf01- two problems: region labeling, an exstrnple ol' image processi~lg;ancl 111e Game of Life, an txa~npleof cellular automata. Chapter I 1 and h e exercises describe additional applications.
432
Chapter 9
Paradigms for Process Interaction
9.2.1 image Processing: Region Labeling An ir170ge is a representation OF a picture; i t typically consists ol' a matrix O F numbers. Ez~clieJe~nentof an image is called a pixel (picture element). and has a value tlial is its llgllt intensity or color. There a l r doderls of image-processing operations, and each can bend<(from parallcli 7;irion. Moreover, the same operatio11is ofLen ;~pplicdto a stream of irnages. Image-processing operations range from point operarions on individual pixels. such as contrast slretching, to local operations 011 groups of pixels, such as smoothing or noise reduction, to global operations on all rhe pixels. such as encoding ox decoding. Here we examine a local operation called region Inheling. Let an image be represented by a matrix i m a g e [m,n 1 of integers, and for simplicity assume that each pixel has value 1 (lit) or 0 (unlit). Assume that the neighbors of a pixel are the four pixels above and below and l o the lei:[ and right of it. (Pixels on the corners of the image have two neighbors; those on the edges have three.) The region labcling proble~~z is to find regio~lsof lit pixels and to assign a unic)ue label to each. Two lit pixels belorlg to Lhe same region if they are neighbors. For example. consider the image below, in which lit pixels are displayed as dots and unlit pixels as blanks.
The jrn;ige contains tlzree eegions. The "curve" i n the lower-right corner does not foun a regioil beca~tsethe dots are connected by diagonals, nor horizontal or verticid lines. The label of each pixel is stored it] a second matrix label [m,n]. Initially, each point is given a unique label, such as (he linear function m*i+ j of i t 5 cool.dinzltes. The final value of l a b e l [ i ,j ] i s to be the largest of the initial labels in the region co which the pixel helongs. The natural way to solve this problem is by means OF an iterative algorithm. 011each iteration, we e x a ~ n i ~ every ~ e pixel and ils neighbors. If a pixel and a neighbor are both Ijt\ then we set the label OF the pixel to the maximum o-F its current label and that of its neighbor. \Qe can do this in parallel for every pixel, because the values of labels rlever decrease. The algorithm terminates if no label changes (luring an iteralion. Typically regions are fairly compact. in which case Lhc algorithm will terminate a f w . about
9.2 Heartbeat Algorithms
433
O(m) iterations. However, in (lie worsl case, O(m n) iterations will be rcqui~.ed since there coulcl be a region that "snakes" around the image. For this PI-oblem pixels are independent, so we can e~nploym e n parallel tasks. This would be appropcialc on a massively pa~.allelSIMD machine. but each cask is too small to he eriicienl on an MIMD machinc. Suppose wc have an MIMD machine with P processors. and suppose lhat m is a ~nullipleof p. A good way to solve the region-labeling problem is to partilion (lie image inco P strips or blocks ol pixels and to assign one worker process to e;lcli slrip 01- block. We will use strips. bccause chat is sirnpler to program. Using strips also ~zsultai n fewelmessages ~lianusing blocks, bccause each w o r k c ~has fewer nciphbors. (On machines organized aa a grid or a cube; i t would be efl'rcie~irLo use blocks of poin~s,bccause the interconneclion network would i~ndoubtedlyhupport sirnul~aneous message transmissions.) Each worker cornputes the labels for the pixels in ils strip, To do so, i t needs i ~ own s strip of i m a g e and label as wcll as the cclges of the skips above and below i ~ . Since rcgio~lscnrl span bloclc boundarics. eacli process needs ro interact wifli its neighbors. In particular, on encll irera~iona PI-ocessexchanges the labels of pixels un the edges of ils slrip wilh its two neighbors, then i t computcs rlcw labcls. Pigurc 9.3 (a) contains an o u t l i ~ ~occ rlic wol-l<erprocesses. AfLer initializing local v:u-iables, each worker exchanges [lie edges or its portion of image w i ~ hits neighbors. Firs! i~ sends an edge to the neighbor abovt: i t , Lhen its sends an edge to the neighbol- below, then il receives horn Lhe workel- below, the11 i t $ends to the worker above. Two arrays o f channels, f i r s t and second, are used for the exchange. As sJiown: workers I and P are special cases because they have only one neighbor. At the star1 of each iteraljon of the w h i l e loop, neighboring workers I ~ same message-passjng paltern as before. exchange eclges of label, L I $ ~ I (lie Then each worker updates the labels of (lie j~ixelsin i t s strip. Tlzc upclatc cocle could examine each pixel j u s t oncc, 01.i t could ircrace unlil no label c1i;inges in the Local strip. The latter approach will result in [ewer exc1i;lnges of' 1~1bels belweer, worlccrs, and hencc will improve perrormance by illcreasing [lie ratio of computation to communication. In this application, a worlcer cannot determine by itself when to terminate. Even if there is no local chmgc on an ilel-ation, some label in :ulotlier strip could have changed, and that pixel could belong 10 a region [liar spans more tl1a11one strip. The computalion is done only when no label in the elllire image changes. (Aclually, i t is done one iteration earlier, but tlxrc is no way lo delect this.) We use a coordjuator p~.ocessto cietecl (ennination, as shown in Figure 9.3 (b). (One of the workers could play lhe I-oleof the coordinator: we hnvc used
-
7
I
9.2Heartbeat Algorithms
I
chan result(boo1);
# for results from w o r k e r s
process C o o r d i n a t o r
{
435
boo1 chg, change = t r u e ; w h i l e (change) { change = false; # see if there has been a change in any strip for [i = 1 to P J ( receive result(chg); change = change o r chg; 1 # broadcast answer to every worker for [i = 1 to PI send answer[iJ ( c h a n g e ) ;
1 1
Figure 9.3 (b) Region labeling: Coordinator process,
could ~cduccthe execution tiwe lo O(log2I?). Better- yct, we could use a global reduction operation, such as (he MPI-Allreduce operation in [hc MPI library. This would definitely simplily lhe cocle, but wlierlie~~ i t u~oulcli~nprove~>crl'orlnence (lepends on how the MP1 library is implemcn~edon a given machine.
9.2.2 Cellular Automata: The Game of Life Many biological ol ghysical systems call be modeJccl as a collection of bodies that r.epealeclly inlevact and evolvc ovcr time. Solne systems, especially simple ones, can be ~nodelcdiisi~igwhat are called cellular ilutomala. (We exasnine a tnol-e col-rlples system, grilvitationnJ interaction, in Chapler I I .) The idea is to divide Lhe biological or physical proklem space into a collection of cells. Each cell is a finite state machine. After the cells are iniliatized, all nialte one slate transition. then all make a second, and so 011.Each transition is based upon the currenl state of 111e ctll as well as rhe states of its neighbors. Idere we use cellular aulomata to n~odelwhal is called the Game of Lice. A two-cli~nensionalboard of cells is given. Each cell either contains an organism (it's alive): or i t is emply (ir's dead). For (his problem. cach interior cell I~aseight neighbors, located above. kelnw, left. right. and along the I'our diagonals. Cells in the corner have three neighbors, chose on the edges havc live.
I
-
436
Chapter 9
Paradigms for Process Interaction
The Garne ol' Life is played as rollows. First, the board is initi;~lized. Second: every cell examines its slatc and the state of its ncighbors. then makes 21 stare transii inn according to the Following rules:
A live cell with zero or one live neighbors dies from loneliness A live cell with two or tl~reelive neighbors survives for anorher generalion.
A live cell with
fo111-or
more live ncighbors dies due lo ovel-population.
A dead cell with exactly three live neighbors becomes a1ive.
This process i s repeated for some number of generations (stepb). Figure 9.4 outlines a program for simulatit~gthe Game of Lice. The processes interact usiug the heartbear paradigm. On each iteration a cell serlds a message to each of i t s neighbors and receives a message from each. I1 then updales its local state accorclirlg ro the above in~lcs. As LISUBI with a heartbeat algoritbtu: the processes do nor: execulc in lockslep. hut neighbors nevel- ger an iteration ahead 01.' each other. FOIsi~nplicity. we have progratnmetl each cell as a process, nl tliough 1he board could be divided into strips or blocks of cells. Wc have also ignored the special cases of edge a~zd corner cells. Each process cell [ i ,j 1 rcceives
chan exchange [ l : n , 1 rn] (int row, column, state) ; process cell[i = 1 to n, j = 1 to nl { int state; # initialize to dead or alive
declal-a~jonsoll ollicr variables; f o r f k = 1 to numGenerations1 ( # exchange state with 8 neighbors for [ p = i-l to iil, q = j - 1 t o j + l l
if ( p ! = q J send exchange [ p , ql (i, j , state) ; for [p = I to 81 ( receive exchange [i ,j 1 ( r o w , column, value) ; save value o f neighbor's stale;
>
updare 1oc;ll slate using rules in text; }
Figure 9.4
The Game of Life.
1 I
9.3 Pipeline Algorithms
437
.
lnessages From element exchange [ i j I of the mattix of corn~nunicacionchannels, and jt sends messages to neighboring eIemenrs of exchange. (Recall that our channels are buffered and that send is nonblocking.) The reader might find it instructive to imple~nentthis program and to display the srates of the cells on a graphical display.
UFFE ,Ocm
9.3 Pipeline Algorithms Recall that a filter process receives data from an input pol-t, processes it, and sends ~esultsto an outpul port. A pipeline is a linear collection or filter processes. We have already seen the concept several rimes. includil~gUnix pipes (Section 1.6): a sorring network (Section 7.2), and as a way to circulate values among processes (Section 7.4). The paradigm i s also useful in par.nllel compucing, as we show here. We always use some number of wo~.kerprocesses to solve a parallel computing pl-oblem. Sornetitncs wc can program the workers as fillers and connect then1 together into a parallel computing pipeline. There are three basic strucllrres for such a pipeline, as shown in Figure 9.5: ( I ) open, (2) closed, and (3) circular. The I V , L o MJr, are worker processes. In an opeiz pipeline. the illput source and oulput destinatioil are nut specified. Such a pipeline can be "dropped dowo"
1 (b) closed
c
(c) circular
Figure 9.5
Pipeline structures for parallel computing.
438
Chapter 9
Paradigms for Process Interaction
nnywlierc that ir will f i t . A closed pipeline is an open pipeline (.hilt is connected (o a Coo~-clincr,of. process, which produces the input ceded by the lirsl worker and consumes the resulcs producecl by Lhe last ~orlter. Thr. U n i s command "grep pattern f k l e I wc" is an example of an open pipeline Lhat call be put in various placcs; when it is actually executed on a shell co~n~nnnd Jine. il becomes par1 o f a closed pipeJine, wirh the human user being he cool-dinator: A pipelinc i s c i w u h r il' Lhe ends are closed on cach other; i n this case data circulates :imong c.he worke~x. In Sccdun 1.8 we inLrocluced L\vo distributed i~npleme~ltations of matrix multiplication a x b = c, where all of a, b, and c are dense n x n mal~ices. Thc Lirst solutiorl simply divided Lhc work lip among n workers, one per row or a iuld C , but each process needed to store all of b. The second solution a J s o used n wo~.kess,but each needed Lo store only one column of b. That solution of b circulating among ;~cluallye~nploycda circulnc pipeline, with [lie coli~~nns the wo~kers. 1-lcre we examine two addilio~lal diswibured implcrnenlations of dense nrsltrix multiplication. The firs1 e~nploysa cJosed pipeline; the second employs a mesh of cisculal- pipcliues. Each has intevest i ng properties relative to lhe priorsnlu~ions,and each illustrates a pattcrn that has otlier applications.
9.3.1 A Distributed Matrix Multiplication Pipeline For simplicity, we will again assume tl~ateach matrix has size n x n, and we will employ n worker processes. Each worker will produce one row of c. I-lowevec initially the c\/orlcers do not have any pzwt of a or b. Instead, we will connect t t ~ eworkers in a closed pipcline through which [hey acquire dl data items and pass all resu.lrs. In particular. the coordi~~ator process sends every row of a and every column of b down the pipeline to the first worker; eventually. he coordinator I-eceivesevery row of c fi-om I-he last worker. Figure 9.6 (a) contains the acrjons of the caordinacor process. As indicated, (.he coortlinator sends rows a [ 0, * I ro a [n-1, *I lo channel vector [ 0 I , which is the input cli:unnel for worker 0. Then rhe coordinator sends colu~nnsb[*, 0 1 lo b [ * ,n- 11 to worker 0 . Finally, the coo~.dinato~receives the I-owsof I-esults(it will get Ihese from worker n-1). However, [he results arrive in (lie order c [n-1,*I to c (0,*I 1-01reasons explai~lcdbelow. phases. First: il. reccives rowh of Each wot.lie~pi-occss h;rs ~lirceexeci~~ion a , keeping the (i~.sl one il receives arld passing the otllcrs on. This phase distrihulcs the rows OF a among the workers, with Worker [i] saving ~ h cvalue oC a [i,* I . Second, workers rcceive columns of b. immediately pass then1 on r o
9.3 Pipeline Algorithms
439
rl~enext workel; then compute one inner product. This phase is repeated n rimes by each worker, afrer which time it will have computed c [ i ,* I . Third, each worlter sends its row 01 c to the next worker, and [hen receives arid passes on rows of c from previous workers in the pipeline. The last worker sends its row of c and others it reccives to the coordinacor. These are sent to the coordinator in order c [n-I, * I to c [O, * I , because ~ l ~ isa lthe order in which the last worker sees them; this also decreases communication delays ancl means that the last worker does not need to have local storage for all of c. Figure 9.6 ( b ) shows the actions of the worker processes. The comments indicate the three phases. The extra details are the bookkeeping to distinguish between llle last worker and others in t!le pipeline. Tli is aolution has several interest.ing proper.ties. Fir-sl, messages chase each other down h e pipeline: first the cows of a. then thc colurnns of b: and finally the rows of c. There is essentially no delay between tlie t i m e rllat n worker receivcs a message and the ri~neit passes it along. Thus messages keep flowing conlinuously. When a worker is co~npulingan inner product. il has already passed aloag the column it is sing, and hence another worker can get it, pass it along, and start computing its own inner product. Second, it taltes a total of n message.-passing tjrnes tor the first worker tu receive nll the rows of a and pass them along. It takes anolI7er n- 1 messngepassing rimes to lill the pipeline-namely, to get every worker its row of a. However, once the pipeline is ~'LIII, inner products get computed about as fast as messages can flow. This is because, as observed abovc. the co1uni.n~o f b
chan vectorCnl (double v[n]); chan result(doub1e v[n]);
# messages to workers # rows of c to coordinator
process Coordinator ( double a [n,n] , b [n,nl , c ln,nJ; inilialize a and b; f o r [i = 0 to n-11 # sene a l l rows of a send vector[Ol ( a [ i ,*I ) ; for [i = 0 to n-I] # send a l l colurrms of b send vector[0l ( b [ * , i J ) ; f o r [i = n-1 t o 01 # r e c e i v e rows of c receive r e s u l t (c l i ,* ] ) ; # in reverse order 1
Figure 9.6 (a)
Matrix multiplication pipeline: Coordinator process.
410
Chapter 9
Paradigms for Process Interaction process Workerfw = 0 to n-11 { double a [nl , b [nl , c [n]; # my r o w or column of each double temp [n] ; # used to pass vectors on double total; # used to compute inner product # receive rows of a; k e e p f i r a t and pass others on receive vector [w](a) ; for [i = w+l to n-1] { receive vector [wl (temp); send vector [w+l](temp);
1 # get columns and compute inner products for [j = 0 to n-11 [ receive vector[w](b); # get a column of b if ( W < n-1) # if not last worker, pass it on send vector [w+l](b); total = 0.0; for [k = 0 to n-11 # compute one inner product total += a[kl * b [ k ] j c [ j l = total; # put total into c 1
# send my row of c to next worker or coordinator if ( W < n-1) send vector[w+ll (c): else send result(c); # receive and pass on earlier rows of c for [i = 0 to w - I ] { receive vector [wl (temD); if (W < n-1) send vector [w+l](temp); else send result(temp); 1
1 Figure 9.6 (b)
Matrix multiplication pipeline: Worker processes.
I
9.3 Pipeline Algorithms
441
i~n~nedintcly Ibllow the rows of a, and they pet passed as as soor) as \hey are and ~cceive received. If it takes longer lo compure an inner producr than 1-0se~~cl a message, then the compula.tion lime will stair1 to dominate oncc the pipcline is 11111. We leave to the reader the interesting challenges 01' devising ilrlalytical p a h ~ m a n c equat.ions e and couducling timing experiments. Another interesting property of the sojution is tlwt i t is trivial to vasy the number of colu~rinsof b. All wc need lo change are lhe uppcl- bounc!s on the loops that deal with columns. In fact, rl~esilrne code could be used to multiply a by any stream of \xcrnrs. prodt~cinga scream of \/eclors as I-ehl~lts.For and the example, a could represent a set of coefticienls of' linear eq~~alions. slream of veclors coulcl be different combinalions ot' values fbl val-iables. Tbc pipeline can also be "shrunk" to use fewer workers. The only change needed is to Ix~veeach worker store a stvip of the rows of a. We can still have the colucnns of b and rows 01' c pass through the pipeline in the same way, or we could scnd Fewer, but longer messages. Finally the closed pipeline used in Figure 9.6 can be opened up and clle worke~.scar) bc placed into another pipeline. FOI example. instcatl ol' having a coordiiii~torprocess that produces the vecloss ancl consumes the rcsul~s,the veclors could be produced by other pmces.ses-such as another ~nalrix~nultiplicatjon pipeline-and the results can be consun~edby yet a~iotherprocess. To make the worker pipeline fully general. however, all vect.ors would have to bc passed along tlie entire pipeline (even rows of a) so that they would come 01.11tlie end ant1 hence would be available lo some other process.
9.3.2 Matrix Multiplica fion by Blocks The perfo~.rnanceof the previous algorithm is delertnined by the length of the pipeline ancl the time it Lakes to scnd and rcceive message.s. The comn~unick~tion network on some high-perCormance macbirles is organized as a two-di~nensiorlal 111eshor in an arrangement called a hypercj:ibc.. These kinds OF interconnection networlts allow messages belween different paj.rs o l neighboring processes to be in tr-ansir at the same time. Moreover. they reducc the disla~lcebetween yrocessol's rcl.ative to a ljnear arrangement, which reduces message ll-ans~nissiontime. An efficient way to inultiply ~nalriceson meshes and hypercubes is lo divide Lhe lnalrices into reclangulal- blocks, ancl l o employ workel. PI-ocess pelbloclc. The workers and dara ;Ire then laid out or1 the processors as a twodimensio~~al grid. Each worker bas f o u ~ neighbors, . above atid below, ancl to the leh and right o.f it. The workers on the top and bottom or the gircl are considered lo be neighhorb, as are the workers on the lefl and rig111 o f tlze grid.
442
Chapter 9
Paradigms for Process Interaction
Again Lhe problem is lo compute the lnalrix producl o i Lwo n x n matrices a and b, storing tlie resul~in matrix c. To ximplify the code, we will use one worker per rnatrjx eleruent and index the rows iuld c o l u i n ~ ~from s 1 to n. ( A t (lie end of he section. we describe IIOW lo use blocks of valucs.) Let Worker [I:n, 1 :n] be the matrix of workeipr.occsses. Matrices a and b are dislribuled initially so h a t each Worker [ i , j I has the corresponding elements of a and b. To compute c [ i ,j ] , Worker [i,j ] needs to mulliply every element i n ow i of a Ily tlie corresponding elenient in column j orb and sum (lie results. B u r the ordcr in which the worker perfo:orrus these ~nultiplicationsdoes not matter! The challenge i s to find a way to circulale ( h e values among the workers so that ei~cllgels every pair OF values Ihal it needs. First consider Worker [1,1]. To cornpule c [1,1 1 , il needs lo get every elemenl of 1 . o ~1 01' a and colu~nn1 ol b. Iuitially i l has a 11,1] and b [I,11, so it can ~nultiplyLhern. If we the11shiti row 1 OF a to the leA one column and sliifl column 1 of b up one row, Worker C1,1] will have a [I,21 and b C2.11, which ic can ~nultiplyant1 add Lo c [I,11 . 11' we repeat this n- 2 marc tinlcs, hliifting Lhe elelnenls o f a left ancl the elemenls of b up. Worker [I,1 1 will scc
all the values it necds. Unforluna~cly,this ~nultiplyand shift sequence will work correctly only for worlters Iinndling elcmenrs on Lhe diagonal. Other workers will see every value Lhey need, bul nor in Lhe r i g h ~comb~nalions.However, i t is possible t o rearrange a and b berol-e we s k u l llle mulltply and shifl sequence. In p;u-ticular. first shift t.ow i of' a cil-cularly lert i colu~nnsand column j OF b circularly up j rows. ( I t is nol at all obvious why lhis parlicular inilial placelnent works; people came u p with it by examining s~nallmatrices and then generalizing.) The following display illustrates the l.esult of the initial rearrangement of the values of a and b for a 4 x 4 onavjx:
Afcer this initial which it stores in local able cij lo a i j * b i j values a i j arc scnr
real-rangetnenl of values; each worker has two values, variables a i j and bi j. Each worker next initializes varjand then executes n-l sliift/co~nputephases. I n each, )eft onc column: valucs in bij are sent up one row; and
9.3 Pipeline Algorithms chan left[l:n,l:n](double); chan up 11-n,1:n] (double);
443
# f o r circulating a l e f t # Eor circulating b up
process Worker[i = 1 to n, j = 1 to n] double aij, b i j , c i j ; i n t LEFTI, UP1, LEFTI, WPJ;
{
initialize above values; # shift values in a i j circularly left i columns send left [i,LEFTI] (aij); r e c e i v e left [i. j ] ( a i j ) ; # s h i f t values i n b i j circularly up j rows send up [UPJ,j] (bij) ; receive up [i,j] (bij ) ; c i j = aij * bij; f o r [k = 1 to n-11 ( # s h i f t ai j left 1, bij up 1, then multiply and add send left [i,LEFTl]( a i j ) ; receive l e f t [i, j J (aij); send up[UPl,jl(bij); receive up[i,jl(bij); c i j = c i j + aij*bij; }
I Figure 9.7 Matrix multiplicat~onby blocks.
new values are received, multiplied togelher, and added to c i j. When the workers ter~uinate,the matrix producl i:, stored i n the c i j in each worke~.. Figure 9.7 contains [he code for this ~natrix~nult,iplicatjonalgoi~i(11ni.The workers share n2 channels for circulating values Icft and another n2 for circulating values up. Thcse are used to for1112 n iutersecl,itig circular pipelines. Workers in lhc same row arc connecled i l l a circula~pi.peline Lhrougb which v a l ~ ~ e s [low to t h e left; workers in the salut: c o l i ~ ~ nare n connected in a circulal- pipeline through which values flow up. The conslanls LEFT^, uP1. LEFTr, and uPJ in each worker are initialjzed to the appropriate \lalucs and used it) send statements to index Ihe arrays of channe.1~. The program in Figurc 9.7 is obviously inel'licicnl (unlcss i t \vel-e Lo be implemented directly in liarclware). There are way log Inany processes and mescages, ancl way too Litlle coinput.adon per PI-ocess. However, the algorithm can readily be generalized to use square or I-ectang~~lar blocks, In parricular, each worker can be assigned blocks of a and b. The workers f 1-51shift their block of a lefi i blocks of columns ancl their block of b up j blocks of rows. Each worker then iriilializes ils block of d ~ eresult matrix c ro the inner products ol' its
444
Chapter 9
Paradigms for Process Interaction
new blocks of a ancl b. Tbe worlcers rher) execute n-1 phases of shifc a left one l i e inner producrs, and add rhem 1.0 c. block, shift b u p one block, c o ~ n p i ~ t ~new We leave the details to the reader (see the exercises at the end of this chapter). Ari additional way to improve the efficiency of' the code it1 Figure 9.7 i s to execule bolh sends bclore either receive when shifting values. In particular, change t h e code from send/receive/send/receive 10 send/send/receive/receive.Th.is will decrease the Ii kelihood that receive blocks. [I also makes it possible to rransmit messages in p;~rallcIif lllat is supported by the hardware ioterconnection network.
9.4 ProbdEcho Algorithms Trees and graphs are used in many applications, such as Web searches, databases, games, and espert systems. They are especially impo~tantin distrjbured computis a graph in which ing since the structure of many distribulecl comp~~larions processes are nodes and colnrnu~~icatiot~ channels are edges. Dep1.11-lirsc search (DFS) is one of the classic sequential programlning ~~aradigms for visiting all the nodes i n a tree or graph. 111a. m e , ihe DFS srrategy for each node is to visit the children of that node all-d the11 to return to the palcot. This i s called cleplh-first search since each sewcli pal11 reaclles down c.o a lcaf before che next pat11 is traversed; for example, rhe path in the tree from the 1-0o1lo the lel'tmost leaf is traversed first. In a general graph-which lnay have cyclesthe sarne approaclj is used, except we neecl to mark nodes as rhcy are visited so that edges out of a node are traversed only once. TI1is seclion describes he probe/echo paracligm for distributcd computarjolis on graphs. A probe is a message sent by one node to its succcssot: an echo is a subsequent reply. S itice processes execute conco~ciitly,probes are sent i n parallel to all successors. The probetecho pa~.adigmis thos the concurrent progrdmming analog of DFS. We first illustrate the probe paradigm by showing how to broadcast information 10 all nodes in a network. We then add the echo par~digtnby developing an algocitlvn I'or conslructirig the topology of a network.
9.4.1 Broadcast in a Network Assunle that we have a network of nodes (~>rnccssors) connected by biclirectsonal directly only with its com~nunicationchannels. Each node can co~nlnl~nicate neiglibors. Thus lha network has lhc structure of am undirccred graph. Suppose one source node s wanrs to broadcast a lnessagc to all o~liernodes. (More precisely, suppose a process executing on s wants to broadcast a message
9.4 ProbeIEcho Algorithms
445
to processes executi~igon all the other nodes.) For example. s might be rhe site of [he network coordinaLor, who wants lo brondcasl new status inronnarion to all oilier sileh. J f every other node is a neighbor ol' s, broadcast would be trivial to irnplement: node s would simply send a messilge directly to every other node. 1-lowever, in large networks, each node is likely lo have only a srnall number of aeighbors. Node s can send the message to its neighbors, but they would have to forward i r Lo Lheir neighbors and so on. I n short: wc necd n way to send a probe [(I all nodes. Assume clial node s has a local copy of the network ropology. (We laler show how to compute il.) The Lopology is repreaented by a symmetric Boolean matrix; eotry topology [i,j 1 is true if nodes i and j are connected and it is false othcrwise. An cfficicnt wag for s to broadcast a rnessnge is first co construct a spann,ing lr(>e o l Ulc network, wirh itself as the 1.001of the rrcc. A spanning tree of a graph is a wee whose nodcs arc all h o s e in tlie graph and whosc edges are a subset of those in (lie graph. Figure 9.8 canlains an example, with node s on the left. The solid lines are the edges in a spanning tree; the dashed lines are the other edges in the graph. Given rpannirlg tree t, node s car1 broadcast a message m by sending m together will1 t to all its children in t . Upon receiving [lie message, eve]-y node examines t lo dete~.mineiLs chilcb-en in Lhe spanning Lree, then i-Lrwarcls both m and t lo all of them. Tbc spanning tree is sent along wirh m since nodes other than s woi~ldnot othcrwise know what spanni~lgwee ro use. Tlie I'i~ll algoritli~n is given in Figure 9.9. Since t is a spanning bee, evenlually rl~emessage will reach every node; moreover, each node will receive il exactly once, liorn its parent in t . We use a separate Initiator process on node s to start the broadcast. This makes lbc Node processes on each node identical. Tlic broadcast algoritbril in Figlirc 9.9 assumcs that (hc initiatol- node knows the entire topology, wliich i t uses lo compute a spanning tree that guides the
Figure 9.8
A spanning tree of a network of nodes.
446
Chapter 9
Paradigms for Process Interaction type graph = boo1 [ n , n J ; chan probefnl(graph spanningTree; message m);
process Node[p = 0 to 11-11 [ graph t: message m; receive probe [ p l (t , m) ; for C Q = 0 to n-1 s t q isacliildof p in t ? send probe [q] ( t , m) ;
> process Initiator { # executed on source node S graph topology = network topology; graph t = spalllling tree of' topology; message m = mcssage to br.oadcas1; send probers1 (t, m ) ; 1
Figure 9.9 Network broadcast using a spanning tree.
broadcast. Suppose instead rhal every node knows only who its neighboi-s are. We can s t i l l broadcast a inessage m to all nodes as follows. First. node s sends n to all i ~ neighbors. s Upon receiving m from one neighbor, a node formards m 10 all its O ~ ~ Pneighbol-s. I If the links delined by (lie neighbor sets happen to Tom a tree roored ai s, the effccr of this approach is the samc as before. In general, however, [he network will contain cycles. Thus, some node lnight receive m frorn two or 1nol.e r~eighbors. 111 fact, two neighbors might send the lnessrige to each othel-at itbour the same lime. It wolild appear that all we need to do in the general case is ignore inulliple copies of m h a t a node might receive. l-lowever, his can lead 1.0 message pollution 01.lo deadlock. After receiving rn for tbe first time and sending it aJong, a node cannot know how m:uny limes to wait lo receive m from a dil$crenl neighbor. if Ihe node does not w a i l at all, extra meshages could be left buffered on some ol' the probe channels. I f a node waits some fixed number o f tirncs. ir might deadlock uuless at leas1 thal many messages are sew; cven so, therc might b e Illore. We car1 solve lhe probleni of unprocessed messages by using n fillly symrne11.i~ algorirhin. 111 particular, after a node ucceives m for the lirsl time, it sci\ds m to a11 its neighbors, iicluding the one from whom it received m. Then the node receives redundiitil copies of m From all its other neighbors: Iliese i f ignores. The nJgoritl?~nis given i n Figure 9.10.
9.4 Probe/Echo Algorithms
447
I
chan probe [nl (message m) ;
process Node[p
=
1 to n]
(
boo1 l i n k s [n] = neighbors o f node P; int num = nil~nberof neighbors; message m; receive probe [pl (rn) ; # send m to all neighbors for [q = 0 to n-1 st links[gll send probe [gl (m) ; # receive nun-I redundant copies of m for [q = 1 to num-11 receive probe (p] (m);
1 process Initiator { # executed on source node S message rn = message tu broadcast; send probe [ S l (m); 1 Figure 9.10
'
Broadcast using neighbor sets,
The broaclcast algorilhm usi~iga spanning tree causes n-1 messages to be sent, one Ibr each parent/child edge in the spanning tree. Tlie algoritbln using neighbor sets causes two messages to be sent over every l i n k ill the nelwork, one j,o, each direcrion. The exact number depends on the topology of the elwo work, bul in genwal the number will be much larger than n-1. For example, il' the network topology is a tree rooted al lie initiator process. 2 ( n - 1 ) messages w i l l be sent; for n complete graph in which there is a link between every pair o f nodes, zn(n-1) nlessages will be s e o ~However, dle neighbor-set algorithm does not ~ node to know the ky~ologyor to eompule a spanning tree. In require t h initjator essence. 21 spanning [I-eeis consrrucled dynamically; it co~isistsof the links along which the first copies of rn are sent. Also, the tnessages arc shorter in the neighbor-sel algorithm since [he spanning lree need not be sent i n each msssikgc. BoLll bl-oaclcast algorith~nsassiltnc Lhac the Lopology of the network does not change. In par-cicula.,neither works con-ectly if there is a processor 01- con~lnunicution link failure while the algorithm is executing. If a node fa'ails, it cannot receive the message being broadcasl. Ifii link fails, it rnighl. or rnight not he possible Lo ccacli the nodes connected by the link. The H istol-ical Notes at [lie end of this chapler describe papers tl~ataddrcss the problem o f implementing n faultI.olerant bmadcasl.
448
Chapter 9
Paradigms for Process Interaction
9.4.2 Computing the Topology of a Network The efficient b~.oaclcastalgorithm in Fignre 9.9 required knowiiig tlie topology of the netwol-k. Here we show how lo conlpute il. Initially, every node knows its local Lopology-namely; ils links lo its neigl~hors. The goal is to gather all the local ropologies, because their union is the nel.work topology. The topology is g;~tlieredin two phases. First. each node sends a ptobe to ils ~leighhors,as llappened in Figure 9.10. I.,arer, each ode sends an echo cont:lining local topology information back to the node from which it received the first pr.obe. Eventual1y, tbe iniliating noclc has gatherecl all the echoes, and hence has gathered the topology. I t coulcl then, for example, cornpule spanning lree ant1 broadcasl the ~opologyback to lhe other nodes. For now assume that (he topology 0.1 the uetwork is acyclic. Since n nctis a n undirected graph, this Ineans (lie structu~'eis a tree. Let node s be the WOI-k root ni' this lrec. and l e ~s be tlie iniliator node. We can then gather [be topology 11s rollows. First s sends 21, probe to all its children. Wlie~ithese nodes receive a probe; (hey send i l lo all their children, and so 011. Thus probes propagate tk~.oughh e tree. Eventually they wil! rcacll lcaf ~iodcs.Since lliehe have no children. they begin the echo phase. In particular. each lea( sends an echo containing its neighbor sel to its parenl i n Lhe tree. Alfer receiving echoes from all ol'its children. a node combines them and its own neighbor set and cchocs Ulis inkornation ICI its parent. Eventually the root !lode will receive echoes 1'1-om all its children. The union of these will contain the entire l.opology since tlie initial probe will rei~cllevery node and every echo contains rbe neighbor set ol'lhe echoing nocle together with those of i t s descendants. for gathering [he network lopvlogy in a tree The full probejecho algoritl~~m is shown in Figure 9.11. The probe phase is essentially the broadcast algorithm from Figurc 9.10, escepl that probe messages indicate the identity of the sender: The eclio phase returns local topology information back up the rree. In rliis case, the algol.ithms for the nodes are nor fully symmetric since the instance ol' Node [PI execuf ng on node s needs Lo know to send its echo lo the Initiator. To colnpurc the topology of a network rhal contains cycles, we generalize (hc above alporill~mas follows. Atter receiving a probe, a node sends the probe lo ils other neighbors; then Lhe node waits I'or an eclio From those neighbors. 1-lowever. b e c a ~ ~ sofe cycles and because nodes execute concurrenlly, two neighbors might send cach olliel- probe5 al aboul Lhe same rime. Probes otller ban rhe fi~.srone can be echoed irnnledintely. In particular; if a node receives n subsecluenl l~-obewhile waiting for echoes, it ilnnlediately sends an echo containing a null topology (tl~isis suflicie~llsince tlie local linlts of thc node will be contained in tlie echo it will secid in response ro the first pi-obe). Evonlually a node will
449
9.4 PrabeIEcho Algorithms type graph = boo1 [n,n] ; chan probe [n]( int sender) ; chan echo[nl (graph topology) chan finalecho(graph topology)
# parts of the topology # Einal topology
process Node [p = 0 to n-11 ( boo1 links [n] = neighbors of node p; graph newtop, localtop = ([n*n] false); i n k parent; # node from whom probe is received localtog[p,O:n-11 = links; # initially my links receive probe lp] (parent); # send probe to other neighbors, who are p's children for [q = 0 to n-1 st (links[q] and q ! = parent) J send probe lql ( p ) ;
# receive echoes and union them into localtop for [q = 0 to n-1 st (links[gl and q ! = parent)] receive echo [pl (newtop); localtop = localtop or newtop; # logical or 1
if ( p
(
== S )
send finalecho(localtop);
# node S is root
else send echo [parent]( localtop) ;
1
process Initiator ( graph topology; aend probe[SI(S) # start probe at local node receive finalecho(topo1ogy);
1 Figure 9.11
Probelecha algorithm for gathering the topology of
a tree
receive an echo in response to every PI-obs. A1 i b i s point, it sends an echo to h e node from which it got the first probe; the echo co~ltainsthe union o,f the nodc's own set o f links logether with all the sets o f links ir received. The general p~.obelechoa l ~ o r i t h n lfor co~nputingthe nerworlc topology is shown io Fi&u~.c9.12. Bccnusc a nodc mighl rcccjve subsequenl probes while waiting for echocs, the two types of messages havc ro bc merged inlo one channel. ( I f Lhey came in on sepal-ale channels, a node would have to use empty and
450
Chapter 9
Paradigms for Process Interaction type graph = bool [ n , n ] ; type kind = (PROBE, ECHO); chan probe-echo[n] (kind k; int sender; graph topology); chan finalecho(graph topology); procees Node [p = 0 to 11-11 bool links [n] = neighbors otnode P; graph newtop, localtop = ( [n*n] false) ; int first, sender; kind k; int need-echo = nuurtberof neighbors - 1; localtop[p,O:n-I] = links; # initially my links
receive probe-echorpl (k, first, newtop); # get probe # send probe on to to all other neighbors for [q = 0 to n-1 st (links[ql and q != first)J send probe-echo [ql (PROBE, p, 0 ) ;
while (need-echo > 0 ) ( # receive echoes or redundant probes from neighbors receive probe-echo [gl (k, sender, newtop) ; if (k == PROBE) send probe-echo [sender](ECHO, p, 0 ); e l s e # k == ECHO { localtog = localtop or newtop; # logical or need-echo = need-echo-1; 1 1 if ( p = = S) send finalecho(loca1top); else send probe-echo[firstJ (ECHO, g , localtog); 1
process Initiator { graph to~ology; # network topology send probe-echo[source](PROBE, source, 0 ) ; receive finalecho(topo1ogy); 1
Figure 9.1 2
Probelecho algorithm for computing the topology of a graph.
9.5 Broadcast Algorithms
451
pollitlg 10decide which k.incl of message to I-eceive. Alrecnalivcly, we cot~lduse reudczvous, with probes and echoes being separate operations.) The correclncss of the algurich~nresults from the following facts. Sjllce the ~xeiwo~.kis cotinec~ed,every node eventually receives a probe. Deadlock is avoided since e\lc1-)/ probe i s echoed-the lirst one just beforc a Node process rerminales, the others while they are wailing lo receive echoes in 1.espo11selo d~ejr own probes. (This avoids leaving nlessages bul'l'ercd on Lhe probe-echo channels.) The lasl echo sen1 by a node conlains its local neiglibor set. Hence, the union of lhe neighbor sets evenlually reaches Node [s],which sends the topology lo Initiator. As with die algoritlin~i n Figure 9.10, (he links along which firs[ probes al'e sent fortn a dynamica.lly coinpuced spannin~tree. The ne~work topology is echoed back LIPlhis spanning tree; the echo iiom each node contains d ~ topology e of the subtree roored at that node.
9.5 Broadcast A lgorithms In tlie pi.evious seclion, we showed how lo broadcast information in a ~lelwork that has the strucrut.e of i\ graph. In most local area networks, processors share a common communication channel such as an Ethernet or 1o.ken ring. In Lhis case, each process01 is directly connected to every other one. In fact, such comm~uijcation networks often support a special network prirnirive called broadcast, which transmits a nlessage frorn olie processor to all othel-s. Whether suppol-ted by communicatiun ha]-dwae or not, message br.oadcas1 provides a useful programming technique. Let T [n] be an 'way of processes, and let ch [n] be an an-ay of channels, one per process. Then a process T [i] broadcasts a message m by executing broadcast ch(m);
Execution of broadcast places one copy ol m on each channel ch[i], incli~ding that oi T [i]. The effect is thus Ihe same as executing co [i = 1 to nl send c h [ i ](m) ;
PI-ocessesreceive both broattcasc and point-lo-point messages using tlie receive primitive. c in the Messages broadcast by the same process arc queued on ~ h channels ortlel- in which lhey are bl-oadcnst. However. broadcast is no1 alornic. In particular, messages b~.oadcastby two procchaes A anti B migh~be received by other
452
Chapter 9
Paradigms for Process Interaction
processes in different orders. (See the Historical Notes for papers that discuss how to impJe~neni atomic broadcast.) We can use broadcast to disse-minate informat ion-for example, to exchange processor state information in local area networks. We can also use it to solve many distributed synchroniznrio~~ problems. This section develops a broadcast algorithm that provides a distributed implemeotalion of semaphores. The basis for distributed scmapl~ores-and many oLher decen tralizcd synchronizalion plar-ocols-is a total ordering of co~n~nunicalion events. We hus begin by showing how to inlplernent logical clocks and how to use them to order events.
9.5.1 Logical Clocks and Event Ordering The actions of processes in a dislributed program can be divided into local aclions, such as reading and writing variables, and communication aclions, such as sending and receiving messages. Local actions have no direct effect on oll~er processes. Howe\ler, communication aclions affect the execution of other processes since they transmit information and are synchronized. Communication ;~crionsare thus the significant events in a distributed program. We i~sethe term eve~?.r below to refer to execution of send, broadcast,and receive slalements. If two p ~ ~ c e s s eAs and B are executing local actions; we have no way of knowjng the relative order in which rhe actions are execuled. However, if A sends (01-bl.oadcasts) a message to B. then Lhe send action i n A must happen before the col-respoildine receive ac~ionin B. [f B subsequently sends a inessage to process c, then the send action in B must happen before the receive action i n C. Moreover. since the receive xcliorl in B happens before the send aclioo i n B. there cvents: the se~ldby A is n total orde~jngbetween the four co~nn~unication happens beforc the receive by B, which happens before the send by B, which pens before the receive by c. H o p ~ ~ ~ nbefore . . s is thus a transitive relation between causally related events. Although there is a total ordering between causally related events, there is only a partial ordering between the entire collection of events in ;I dishibuted program. This is because unrelated sequences OF events-ror example, cornmunications betweel, differenr sets of processes-mighr occur before, after, or concurrently with each other. If there were a sj.ngle cencral clock, we could totally order cornrr~unication events by giving each a unique limestamp. In partici~lar.when a process sends a message, il could read the clock and append the clock value to the message. When a process receives a message, it could read the clock and record rhe time at
9.5 Bioadcasl Algorithms
453
which the receive eveill occul.retl. Assuming the gi-anularity or' [lie clock i s such that it "licks" between any send and tlie corresponding receive. an cvcnt [hat happens before another will thus have an earlier Limestamp. Moreover, if processes have unique identities, then we could induce a [oral ordering-for example, by u s j 1 1 the ~ smallest process identity to break rjes if ur)relaled cvonts in cwo processes happen to have the s a n e limestamp. Unforlunatcly, it is unrealistic to assume [he existence of 21 single, central clock. I n a local area net\vork, for exa~nple,each processor has its own clock. If these were perfeclly synchronized, then we could use tlie local clocks for timcstamps. However, physical clocks are never perfecrly synchronized. Clock synchron ization a lgol-itbtns exist for keeping clocks fairly close to each olliel. (see the I-listorical Notes). but percect sy~~chronizalion is impossible. Thus we need a way 10 sj~uulalcpliysicnl clocks. A 1~)gicnlclock i s a simple inleger counlcr chat i s incremented when events occur. We assume Lhat each process has n logical clock that is initialized to zero. We also assume that every message contains a special field called n limesfun~p. The logical clocks are incremented according to the following rules.
1,ogical Clock Update Rules. Let A be a process and let Ic be a logical clock in the process. A updates the value of lc as follows: ( I ) When A sends or broadcasts a messilge. it sets tlie tirnesh~npof t l ~ c message Io the currenl value ol' Ic and then increments L C by I.. ( 2 ) When A receives a message with timeala~npts, i c sets lc mum of lc and ts+l and then increments lc by 1.
LO the
~naxi-
Sjnce A increases lc after every event, every message sent by A will havc a diffcrenl. increasing tiinesrru~ip. Sjnce a receive event seLs lc to he larger than the in the rcceivcd mcssagc, [lie timesla~npin any lnessage subsequently ti~nesra~np sent hy A will have a larger timestamp. Using logical clocks, we can associate a cloclc value with cacb evclll as follows. For a send event, the clock value is the timestamp i n the ~nessage-i.e., (he lowl value of lc at the sl:irt of [lie send. For a receive event. the clock value ~ bef0r.e it is is the value of Ic ailer it is set 10 the maximum of lc ant1 t s + but incremented by d ~ ereceiving process. The above rules for updatir~glogical clocks ensure that i f event n happens before ever11 b. then the clock value associated with n will be sln;d!er than that associated with b. This i~lducesa pai-rial ordering on the scl of ca~~sally I-elateclevents i n a program. 11' each process has a unique identity, Lhen we can gcl a total ordering between all events by usi~igthe s~nallel-process iclentity as a liebreaker i n case Lwo evenls happen to lia\ie the same rimeslamp.
454
Chapter 9
Paradigms for Process Interaction
9.5.2 Distributed Semaphores Semaphores are normally implemented using shared va-iables. However. we c o ~ ~jrnplernenl ld them i n a message-based program using a scr\lei.process (active rnonirol-i, as shown in Seclion 7.3. We can also implement thcm in a decenlralized way without using a central coordinal.or. Hcre, we show how. A semaphore s is usirally represented by a nonnegative integer. Execu~ing ~ ( s waits ) until s is positive [hen decrements the value; executing v ( s ) jncrerrlcnts the valuc. In other words. at all tirnes the numbcr of compleled P operations ia a1 mosl the number of completed v ope~,alionsplus the inicial value of s. Tllus. to iii1plernen( selnaphores. we need a way to count p and v operalions and b; way to delay P operations. Moreover, the processes thal "share" a se~naphore need LO cooperale so (hat they maintain (lie semaphore in~arial~r s >= 0 even ~houghthe program state is distributed. We can Ineel these requirements by having processes broaclcast tnessirges whe.11lhcy want to exccule P and v operations and by having lhcm esamine the Iqe5cages Lhey receive LO deterrnis~ewhen to proceed. In particular, each process has a local message queue mq and a logical. clock LC, which is updaled according to the Logical Clock Uptlale Rules. To si~nulaleexeculion of a P or v operario~?: a process broadcasrs a message lo all the i~serprocesses, including itself. The message cotirains the sende~.'sidentily, a tag (POP or VOP) indicaling which kind of operation, a.nd a timestamp. The rjniesta~npin every copy of the message i s the cun-ent value o f l c . When a process rcceives a POP or VOP messagc, it stores the message in ils message queue mq. This queue is kept soiled in increasing order of the limesramps in the Inessages; sender identities are used to break ties. Asuume for b e 11-~ornen~ that every process receives all messages that have bccr) broaclcast in the same order and in increasing order of tilnestamps. Then every process would kr~owexactly the order in which POP and VOP messages were senr and enc h could cou~ll tlie nutuber of corresponding P and V opcralions and ~nsjntain the setnaphore invariant. Uncortuilately, broadcast i s no1 an atomic operation, Messages broadcasl by two dil'l'ercnt processes might be reccjved by others in different orders. Moreover, a message with a smaller ti rnestamp might be receivcd after a message with a larger timestamp. However, differenl messages broidcast by one proccss will be recei\red by [lie oLher processes in rhe order they were broadcasr by rhe liI-a1 p~.occss.and Lhcse messages will also have increasjng ti ~.nesta[nps.These prope~.ticsfollow from the filcls that (1) execulion of broadcast is the same as concurt'ent execulion or send-which we assume psovides ortlcl-ed, reliable delivery-and (2) a process increases its logical clock after every cornmunication evenl.
9.5 Broadcast Algorithms
455
The fact lhat consecuti\e messages sen1 by every process ha\/e itlc~casing tjlncstamps gives us a way Lo make synchroniiralion decisions. Suppose a process's message queue m q contains a message m with Limestamp ts. Then, once che process has received a nlessage with a larger timestamp from every othci-process. i t is assured [hat it will never- see a message with a slnaller 1irnesLarnp. A t this point, message m is said to be fully cccktowledged. Moreover: once rn is ft~llyacl~nowledged,the11all orher messages i n h-ont of il in m q will also be fi~lly ti~nestamps.Thus the par1 of mq conacknowledged sjnce ihey all have sm.alle~taining fully acknowledged messages is a aablo prefix: no MW messages will ever be insertccl into i t . WI~enevcra process receives a POP or VOP message, we will have i l broadcast an acknowledgement (ACK) message. These are broadcast so that every process sees them. The ACK messages have timeslamps as usual, but they are not stored in the message queues. They are used sirnply to delelaminewheli a regular me.ssage in m q has beco~uefully acknowledged. (If we did nut usc ACK mrssages, a process coulcl not determine that a message was Fi~llyacltnowletlged urllil it received a later POP or VOP message lrom every other process: this would slotv lhe algorithm clown and would lead lo deadlock i f s o w user did no1 wall1 to execute P 01- v operations.) To complete the imple~nentationof distri butecl semaphores: each process uses 21 local variable s to repl.escnt the value of clle sem;lphore. When 21 pcocess gets an ACK message, it updales Lhe stable prefix ol' its message queue mq. For every VOP Inessage, the process increments a ancl dele.les the VOP rnessage. Ir then examines Lhe POP messages in ti~nestarrlpo~,ds~-. IF s > 0, lhe process deccernenis s and deletes the POP message. I n shorl, e;ich process maintains the following PI-edici~te, which is its loop invarjanr: DSEM:
s
>= o
A
m q i s ordered by tilnestnnips in lnessages
The POP messages arc processed i n the order i n which tllcy appear in the srable prelix so that every process makes the same decision about the order in which P operations co~nplete. Even ~Iioughtbe processes inight be at difkrent stages in handling POP and VOP messages, each one will I~andlefully acknowlcdgecl messages in t.he same order. The algorirl21n for distributed sernaphorcs appears in Fig~u-e9.13. The user processes are rcgular application processeh. There is one helper process for each user, and the helpers iliteract with each other in orcler to ilnple~ncnt[lie P and v opetntions. A user process iniriales a P or v operation by communjcating wit.11 its helper: j n the case of a P operarion. the t~serwaits u111iI i t s helpel says it can proceed. Each helper broadcasts POP. VOP, and ACK messages ro rhc other 11elpel.s and manages its local rnessage queue ah described above. All nlessages to
456
Chapter 9
Paradigms for Process Interaction t y p e kind = enum (reqP, reqv, VOP, POP, ACK) ; chan semog[n] (int sender; kind k; int timestamp); chan go [ n ] (int timestamp) ;
process Userli = 0 to n-1) i n t I c = 0, ts;
{
.-. # ask my helper to do V ( s ) send semop [i] (i, reqV, lc) ; lc = Ic+l;
- -. # ask my helper to do P(s), then wait for permission send semop[i] (i, r e q P , I c ) ; lc = lccl: receive go[i] (ts); lc = max(lc, ts+l): lc = lc+l;
1
process Helper[i = 0 to 11-11 { q u e u e mq = new queue(int, kind, int); # message queue int lc = 0 , s = 0; # logical clock and semaphore int sender, ts; kind k; # values in received messages while (true) ( # loop invariant DSEM receive semopril (sender, k, ts) ; lc = max(lc, ts+l); lc = lc+l; if (k = = reqP) { broadcast semop(i, POP, lc); lc = lc+l; else if (k == reqV) ( broadcast semop(i, VOP, lc); IC = Ic+l; ) else if (k == POP or k == VOP) ( inscrl (sender, k , ts) at appropriate place in mq; broadcast semop(i, ACK, lc) ; lc = lctl; 1 else { # k == ACK record that another ACK has been seen; f o r (all f i ~ l l yacknowletlged VOP message$ in mq) ( remove the rnessage from mq; a = s + l; 1 f o r ( a l l Fully acknowledged POP messages in mq st s > 0) { remove the message from mq ; s = a - 1 ; if (sender == k ) # my user's P request ( send go[i] (lc): lc = lc+l; 1
1
1 Figure 9.13
Distributed semaphores using a broadcast algorithm.
9.6 Token-Passing Algorithms
457
helpers are sent or broadcast to the semop array of chruinels. As d~own.every a logical cloclc which il uses to place timeslamps on messages. process maintai~~s LV? can usc distributed se~llaphoresto syncl~ronizeprocesses in a djst~.ibuted program in essenlially the same way we used regular sernap1io1-es in shared vi~riableprograms (Chapter 4). For exa~r~plc, we can LISC them to solve ~nulual \Vc can also use exclusion proble~nb,sl~chas locking files 01.clalakase ~~ecords. the same basic approach-broadcast messages and ordered cjueues-lo solve addilional problems; the Historical Notes and thc Exercises describe several applicalions. When broadcast algorjthins are used to make s y nclironiza~iondecisions; every process musl participate in every decision. In particular. a process mils1 hear from every other i n order lo determine when a message is f~ullyacknowledged. This means that broadcasl algorithms do not scale well lo interacdons among large ~lurnbei-sof processes. 1t also means that sucb algorithms must be nlodilied to cope with failures.
9.6 Token-Passing Algorithms This section describes token passing. yel another process interaction paradigm. A rok-rn is a special kind of message that can be used eilher to convey pel-mission or to galller global state information. We illustra~ethe use of a loken lo convey pe.onissio~zby sjmple, distributed solution lo the critical section problem. Then we illustrate gatherirlg slate infor~uatio~~ by developing r wo algorithms for cletectcomputation liils terminated. The next section presents an ing when a distrib~~lecl additional example (see also the liislorical Notes and the Exercises).
9.6.1 Distributed
Mutual Exclusion
Although the critical secrion proble~narises pri~narilyi n shared-variable programs, it also arises in distributecl propl-arns whenever Illere is a sharccl resource tlmt at most one process at a lime can use-for example, a cotnmunication link lo a sa~ellile. Moreover, lhe critical section problem is c~ftena component of a larger problem, such as ensuring co~~sistency i n a dislributed Ale or database system. One way to solve the critical section problem is to employ an active norl lit or that grants permission to access the critical secrion. For many problems, such as implementing locks on files. lhis is the sirllplesl and most efficient approach. A second way to solve the problem is lo use discri buled semaphores, implemented
458
Chapter 9
Paradigms for Process Interaction
D Helger [ n !
Figure 9.1 4
A loken ring of helper processes.
us shown in thc pi.cvious section. That approach y-iclds a dccentl-alized solulion in which 110 one process has ;I special role, bur i t requires exchanging a large number or messages for each semaphore operalion since each broadcast has to be acknowledged. Here we solve the problern in n cliird way by using a token ri19g. The soluand fail; as is a solution using dixtributed sernaphol.es, but it tion is dcce~.~c~-alized I-eqi~ ires the exchange of far fewer messages. Moreover, the basic approach c;u\ be generalized 1.0 solve olhel synchronizat.ion problems. Let User [1:n3 be a collection of appJication processes Ihat contain critical i~ndnoncritical sections. As usual, we necd to tlc~~elop ellcry and exit proiocols that Lhcse processes execute before and after their cl-itical sectioil. In adclition, the pl'otocols should ensurc ~nutualexclu$ion, avoid deadlock and unnecessary delay, u~idensure eventual enu-y (Tail.ness). Since Lhe user processcs 1)ave oUier ulorl~Co do, we do not want h e m ;dso t o \lave to cii-culntc the token. 'Thus we will e~nploya collection of additional procesces, Helper [ 1 :n] , one per user process. These lielpcr processes I'orm i I ring, iIs shotvn in Figure 9.14. One Loken circulates between [lie helpel-s, being 11assed f1'01nHelper [ 11 LO H e l p e r [ 21 and so on to H e l p e r [ n ] , which passes i t back CO Helper [lI . When H e l p e r Ci I receives the rokcn, i t checks to see wlietl'ler its client user [i] wants to enter its critical scction. II' not, Helper [i] passes the token on. Otherwise. H e l p e r [i] ~cllsU s e r [i] it lnay enter its critical section, lhen waits onti1 U s e r [i] exits: at this point Helper ti1 lxlsses llle token on. Thus the helper processes cooperate to ensure that !hc ~ollowinppredicate i s always true: U M U T E X : U s e r [i] is in its CS =i, H e l p e r C i l has 111e token there is ex:~cllyone token
A
9.6 Token-Passing Algorithms
459
chan token[l:n] 0 , enter[l:n] 0 , goC1:nJ 0 ,exit[l:nJ 0 ;
process Helper[i = 1 to n] ( while ( t r u e ) { # loop i n v a r i a n t DMUTEX receive tokenril ( ) ; # wait for token i f (not empty ( e n t e r [i]) ) { # does user w a n t i n ? receive e n t e r [ i ] ( ) ; # accept e n t e r m s g send go[il 0 ; # give permission receive exit[i] ( ) ; # wait for exit
1 send token [ i%n
+
11 ( ) ;
# pass token on
1 1
process ~ s e r [ i= 1 t o n l while (true) { send enter [i] ( ) ; receive go [ i l ( ) ;
[
# entry protocol
crilical secliot~; send exit [il 0 ;
# exit protocol
noncritical sectioti ;
Figure 9.1 5
Mutual exclusion with a token ring.
The program is shown in Figure 9.15. The token ring is ~cpreseriiedby an array of token cllannels, one per helper. For this problem. the loken itself cal.xies no data. so it i s represented by a null message. T l ~ sorher cliannel~are used for communic~rionbetween the usa-s and their helpers. When a helper holds the token, i t uses empty to determine whether ils user wishes to enter its critical scction; i F so: the helper sends the user a go message, and then waits to reccivr an
message. Tlie solucio~iin Figure 9.1 5 is fail--assuming as usual that processes event ~ ~ r d lexit y critical sections. Th.is is because the token corltinuously c.irculates, and when nelper[i] has it, user [i] is permitted to enter if i t wanrs lo do so. As progl-unmed, the token moves contiri~~ously between the helpers. This is in fact what happens in a physical tokewing network. In a software token ring, however, it is probably b e s ~to add some delay i n each helper so that the token moves more slowly arouncl the ring. (See Section 9.7 for anollier token-based exclusion algorilhm i n which tokens do 1101 circulate conlinuously.) exit
460
Chapter 9
Paradigms for Process Interaction
This algolithm assumes that failures do not occur and lhal Lhe token is not lost. Since conlrol is distribured, however. ii is possible to modify the algorithm to cope with failures. The Historical Notes describe algoritlms for regenerating a lost token and for using two tokens that circulate in opposite directions.
9.6.2 Termination Detection in a Ring lt is simple to detect when a secluentjal program has lelminated. I1 is also simple to detect when a concurrent program has terminated on a single processor: Every and no 110 operations are pending. However, it proccss is bloclced or ~cr~njnaced, i s not ar all casy to detecr when a distributed program has terminated. This is bccause the global slate is no1 visible lo any ollc processor. Moreover, even when all processors arc idlc. therc may be messages in transit between processors. There ;u-e several ways lo detecl when a distributed cornputation lhas Lerrninated. Thix section develops a token-passing algorith~n.assuming Lhai all communication between processes goes around a ring. The next section generalixes the a lgorilh~nI'or a co~nplerecommunication graph. Addilional approaches are dcscribcd in thc Hislo]-ical Noles and Lhe Exercises. L e t ~ [:nl 1 be (he processes (tasks) i n sojile distsjbutecl computalion, and let c h [I: n ] be an array of co~n~nunication channels. For now, assume that the processes i-'orrna ring and that all co~~r~tlunication gocs arourld the ring. In pariiculw-,each procesh T [ i ] receives messages only from ils own channel c h [i] and sends messages o11ly LO [he next channel ch [ i%n + 1 I . Thus T 111 sends messages only lo T [23, T [ 2 ] sends only to T [ 3 ] . and so on, wilh T [n] sending messages to T I1I . As l~sualwe also assume ilia1 messages from every process ale rcceived by ils neigl~borin (1-16ring in Ihe order i n wli ich they were sent. A1 any point in time, each process is active or idle. Initially, every process is active. I t is idle i f ' it has terminated or i s delayed at a receive statemenr. (II' a process is temporarily delayed wbile waiting for an I10 operalion lo lerminnte, we considel it srill to be active since it has not terminated ant1 will eventually be awakened.) Afler recejving a message, an idlc p~.occssbeco~nesactive. Thus a dist~,ibutedco~nputationhas te~.minatedif ihe following IWO conditiorls hold:
DTERM: every process is idle
A
no ruessages are in lransi~
A message is in ll-;unsit if' i l has been sent but not yet delivered Lo Lhe destinaiion chi~ntlel.The second condition is llecessary because wl~enLhe message is delivered. i t coi~ldawaken a clclayed process. Our task is l o superimpose a termination-deleclio~lalgorithm on an ;u-bitvary distributed coniputatiol), subjecc only to the above- assunlplion ilial the processes
9.6 Token-Passing Algorithms
461
in the cotnpuralion communicate in a ]-in&. Clearly terminatitrn is a properly o f [he global slalc. which is the uriion 01' the states ol' individu;~l processes p l t ~ sIhe conlents o f Inessage chanl~els. Thus the processes have to coi.ntnunicatc with each o t l i e ~i n order Lo determine if the cornputstion has tcnnjli:~ccd. To detect Icnnination, let there be onz tokcrl, which is passed al.ound i n special nlessages t11;lt are not pa1-t of rhe computation proper. The process that holds the tokcn passes i t o n when it-beco~nesidle. (Tf a process hab terminated its computalion, it is idle bul c o n t i n ~ ~ etos particjpatc in [he lermination-derection ttl.gorithm.) Pi.oce~sespass h e token using the same ring o f communication channel!, that they use i n the colnputarion itself. When a process receives the token, i t knows Lhut the sender was idlc a1 tlie time i t sent the token. MOI-eoves, when a process receives the token, i t has to be idle since it is delayed ~cceiving1'1orn its channel ~ l n dw i l l not become active again until i t receive\ a 1.egi11a1 message (hat i s p ~ ofr thc ~ dis(ributed computatio~i.Thus, 11po11receiving (lie token, a process s e ~ ~ d(lie x token to its neighbor, rhen waits to receive ano~liermessage limn i t s channel. Thc queslion now is how to detect that the entire computalion lras tcr-rninaled. Wheo ~ h ctoken has made a complete circuit OF the communicntion rjng, we know that every process was idle a1 some poinl. But Iiow can the l ~ o l d e ro f rlie loken determine i f all othei- processes are slill idle and that there are n o tnessages in t.ransit :' Suppose one proccss. T [ l l say. initirllly Iiolds the tokcen. When ~ [ l l beco~rlesidle. i t initiates tlie ter~ninntion-detectionalgal-ithm by passing the token to T [2]. After [he tolten gets back lo ~ 1 1 1 .[lie cornl~utationhas terminated if T [I] haa been c o n ~ i n ~ ~ o ~idle ~ . ssince ly it hrst passed h e loken to T [2]. This is because (he tolten goes arouncl [lie same ring that regular messages do. and messages are dclivc~edin the order in which they are selll. Thus, when (he token gets back to T [ 1 I there cannol be any regular mcssages either clueued or in Iransi~. I n essence. tlie token has "Hushcd" rhe channels clean, pushing all regular Inessitges :~lle:~dof it. LVc call lnakc the algorithm and its correctness more precise nc; I-'allows. Firsr. associi~tea colol- with evc.ql process: b l u e (cold) for idle ancl red (hot) for activc. lnitiallyall PI-ocesses are active. so they arecolored red. When apl'occss ~.cceivesthe tolten, i l is idle. so i t colors ilself blue. passes lhe token on, and wail5 lo receive a~iolhetInessage. I F the process later receives n regular mcssilge, i l colors itself red. Thus a process Illat is blue became iclle. passed the l o k e ~ i on, 2und has remained idlc siuce passing the token. Second, ahhtxiate a value with the token indicating how many channels are enlpty i f T [ l ] is still idle. Let token be this value. When T I 1 1 hecoolcs idle, it
.
462
Chapter 9
Paradigms for Process Interaction
G lob:il i~wariantKI/VG: T [I] is blue
j( T I 1 1
... T [ t o k e n + l l ale blue A ... ch [token%n + I] are empLy )
ch [21
actions o f T [I] when ir first bccot?ies idle: color [l] = blue; token = 0; send c h [ 2 ] (token); T [ 2 1 , .... T [n]upon receiving a regular message: color [i] = red;
ac~iansof
actions of T L21. ..., T [n] up011recejving he token: c o l o r [i] = blue; token++: send c h [ i % n + I] (token);
actic>nsof T 111 upon receiving the token; if (color[Zl == blue) announce telmination and hall; color [ll = blue; token = 0 ; send c h [ 2 l (token) ; Figure 9.1 6
Termination detect~onin a ring.
colors itself blue, sets token to 0, and dien sends The Loken to TC21. When T [21 receives the token. it is idle and ch L21 might be empty. Hence, T [ 2 l colors ilself blue, increments token to 1,and s e ~ ~ d the s token ro T 131. Each pro-
cess T [i] in turn colors itself b l u e and increments token bcfore passirrg it on. These toketi-passing r ~ ~ l are e s listed in Figure 9.16. As indicaled, the rules ensure that predicate RING is a global invariant. The inv;uiiance of RlNG follows from Lhe fact [hat ifT [ l l is blue, i t has not sent any regi~larrnesqages since sending the token, and hence there are no regular messages in any channel up to where the token resides. Moreover, all these processes lhave remained idle since they saw the token. Thus if T 111 is still blue when the token gets back to it, all processes are b l u e and all chailnels are empty. Hence T [13 can announce that the cornp~ltationhas terminated. i
I
9.6.3 Termination Detection in a Graph In the previous sec~ion,we assumed all communication goes around a ring. In general, the com~nunicationstructure of a disrribured conipucation will form an arbitrary directed graph. The nodes of the graph are the processes in the computation; the edges represent communication paths. There i s an edge from one process to another if the first process sends to a channel from which the second receives.
9.6 Token-Passing Algorithms
463
Hcre we assume chat the co~ntnunicaliungraph is c,on~,/)iere-namely. that one edge Fi.orn evcry process to every other. As berore, there arc n processes T [ 1 :n ] and channels ch [ 1 :nl , and each proccss T [ i I receives frorr~its private input channel chli]. However, now any process c;ul send meshages t o thcre is
ch[il. With these assu~nprions,we call extend thc previous termination-detection algorilh~nas clescribed below. The resulting illg~itlimjs adequatc co detect Lermination in any network in whidi rhe1.e is a direcl communication path hum each
processor to every other. Il can readily be extended to arbitrary cc~mrnunicarion graphs and multiple channels (see the Exercises). Detecling Lernunation in a con~pletegraph is more clifficull than in a ring because incssages can arrive over any edge. For example, co~~sider the complete graph of tl~reeprocesses shown in Fig~1r.e0.17. Suppose the processes pass the token only J'roim T [I1 lo T [ 2 ] lo T [31 and back to ~ [ l l Suppose . T [I] holds rhe token and becomes idle: hence it passes the token 1.0 T 12 I . Whcn T 12 I becornes idle, i t in turn passes die toke11 to T [ 3 l . Bul betore T [31 ~.cceivcsthe token, it could send :I regular messapc lu T t 2 1 . Thus, when the Loken gets baclc to T I1I . it cannor conclude that the cu~nputationhas lerillinated even if it Ilas re~nainedco~~tjnuously idle. The key to tlic ring algorilhm i n Figurc 9.16 is [hat all cornmunicatio~igoes around the ring, and hence the token flushes out regular tnessages. ln particular, Ihe token traverses every edge OF Lhe ring. We can cxrend i h a ~algorirh~nto a complete graph by ensuring Illat the token 11-aversesevery edge of h e graph, which rrleans that i t visits every process rnultiple times. If svr?p process has remained continuously idle sii~ccir li~.stsaw the token, then we can concludc that the computation has terminated. As before, cach PI-occssis colored red or blue, with all processes inirially red. Whcrl a process receives n regular message, it colors irself red. Whcn a proccss rcceives thc token, it i s blocked waiting to reccive the ncxt message on its
Figure 9.17
A complete communicatjon graph.
464
Chapter 9
Paradigms for Process Interaction
inpul channel. Hence the process colors itself blue-if it is not alraady blueand passes tlie token on. (Again: i f a process termi~latesi t s regular co~nputation, i t contirlues to hand 1.e Lol<en messages.) Any conlplele directed graph contains a cycle [hat includes cvcly edge (some ~iodesmay need to be i~lcludedmore dian once). Let c be a cycle in tlie communication graph. :rnd lei nc be its length. Each process keeps Lrack of the 01-dcrin whjcli its oulgoing edgcs occur in c. Upon I-eceivh~g Lhe token along one edge i n c, a process sencls it oirl over che nexr edge in c. This ensures that the token traverses every edge in the communication graph. Also as becore, the token carries a value indicating the ~iu~nber o f times i n a row rhe token has been passed by idle processes and hence the number of cllantlels that might be empty. As the above exa~nplcillustt-ates, however, in a complete graph n process chat was idle rnjght become active again, even if T [ 11 remains idle. Thus, we need a different set of Loken-passing rules and a different global invariant in order to be able to conclude that the cornp~itationhas termi-
nated. The loken starts at any process and initially has v;llue 0 . When that process becomes idle for the first time, ir colors itself blue and then pabses Lhe lolteil along the firs1 edge in cycle c. Upon receiving the token, a process takes the actions shown in Figure 9.18. If the process is red when it receives the toke~l-
Glohal invarianl GRAPH: token 112s value V =, ( the lasl V channels in cycle c were empty A the lasl V processes to receive t h e tokcu were b l u e )
actions of T [i] upon receiving a regular message: c o l o r l i l = red:
actiol-is of r [ i l upon receiving the token: if (token == nc)
announce termination and halt; i f (colorti1 == red) { color [il = blue; token = 0; } else token++;
set j to index of channel for next edge i n cycle c; eend ch [ j l (token); Figure 9.18
Termination detection in a complete graph.
9.7 Replicated Servers
465
and Ilence was active since last seeing 11-lhe procehs colors ilsell'blue and scls [he value or token to 0 before passing il along the rlexl edge in C. This cflflectively ~rcinilialeathe termjnatio~i-detectionalgorithm. I-lowever. if rlie pvocess is blue when it receives the Colten-and hence has been continuously idlc s i ~ ~ last ce sccing the token-the process increments tlie value of token before pasaing i t on. The token-passing rules ens~u-elhat predicate GRAPH is a global invariant, Once Ihc valuc of token gets to nc, the Icnglh o f cyclc c, rhen (lie computation i s known to have tel-minated. In particular, a l that pojnt the liisl nc channets the tolten has traversed were empty. Since process only passes the tolten when it ih idle-ant1 since it onlg inc~.easestoken if it has ~.ernainedidle since last seeing h e roken-all channels arc emply and all processcs are idle. [n hct., (he colnpuration had actually rerminated by Ihe time the token slal-led its last ci~.cuitaround the token has ~licgraph. However: no pfocess could possibly know lliis ~~ntiJ madt: another complete cycle around the graph to verify [hat all processes are still idle ancl that all cllnnnels are emply. Thus rhc roken hiis to circulnle a rnini~nujn of rwo cirnes around the cycle after any activity in the ca~npulationpropel.: firsl ro turn piwesses blue, and then again to verify that they have I-emeinedblue.
9.7 Replicated Servers The final process-inte~aclioii paradigm we describe i s replicated servers. A scl.ver, as usual, is ;I process that manages some resourcc. A berver niiglit be replici~ledwhen there are rnultiple diatinct instances of a resource: each server would then manage onc of tlie instances. Replication can also be used to give clients rhe jllusjon that rliere is a single resource when in fact there are many. We siiw ;in cxanlple or this earlies i n Section 8.4, where we sllowed how to i~nple~iierllreplic2lled riles. Tliis seclion illustr~tesboth uses of replicated servers by developing two ~ a l are fi vc i\dclilional solutions lo the dining yhilosop1ie1-sproble~n. As i ~ s ~ there philosopliers and five forks, and each phi lo sop he^' requires two li)rks i n OT(ICI. to eat. Tliis j~roblen~ car1 be solved in ~lireeways j n a clistl-ibuted program. Lcl PH be I: pliilo~cphe~' process and let W be ;I waiter process. One approach is to have a single waiter process dial manages a11 five forks-the cen~rcllizodst]-uclure shown i n Figure 9.19 (a). The second approach is to distcibute the forks, with one \miler managing each tork-the clisrrihulcd structure shown in Figure 9. I9 (13). Thc third approach is to liavc one wailel- per philosophu-the dt~ceir~ ~ r ~ l i zstrucri~~-e ed shown in Figure 9.19 (c). We presented a centralized solulion earlier in Fig~1r.c8.6. Here, we develop discribu~edand decent~alizedsolutions.
466
Chapter 9
Paradigms for Process Interaction
(b) Distributed
(a) Central~zed
(c) Decentralized
Figure 9.19
Solution structures for the din~ngphilosophers.
9.7.1 Distributed Dining Philosophers The centraliz.ed cljninz philosopher's solutio~lshow11 ill Figill-e 8.6 is cleadlockfree bill it is not f a i ~hlorcover. the single waiter process could be a borrle~leck because all philosop1ie1-sneed to interact wilh tt. A distributed solution can he deadlock-free, fair, and not have a bottleneck, bit1 at the expense oC a more coinplicaled client inlelface and rlzore messages. Figure 9.20 contains a distributed solution programmed using (lie mulripie pri~nilivesnotation of Section 8.3. (This leads to the shortest program; however, the progr:lm can readily be changed lo use just message passing or just rendezvous.) There arc live waiter processes; each manages one fork. In particular, each wailel repeatedly wails Ibl- a philosopher lo gel the iisrk then GO release i ~ . Each philosopher inceracrs wit11 two waiters lo obuin the forks jt nceds. Howeves, to avoid deatllock, [he philosopJiers cannol all execute the identical program. Instcad, llle first fo'our philosophers get Lheir lefi fork thcn lhe right, whereas the last philosopher gets the right fork die11 the IeFt. The solution is thus very similar to the one using se~naphoresin Figure 4.7.
-
9.7 Replicated Servers
467
module Waiter [51 op g e t £orks ( ) , relforks ( 1 ; body p r o c e s s the-waiter { while ( t r u e ) { receive getforkso; receive relforks()j
1
1
end Waiter process Philosopherli = 0 t o 4 1 { int f i r s t = i, second = icl; i f (i == 4) ( first = 0 ; second = 4; )
while ( t r u e ) { call Waiter [first] .getf orks ( ) ; c a l l Waiter Csecondl .getf orks ( ) ; eal;
send Waiter [first] .relforks ( ) ; send Waiter [secondl .relforks ( ) ; think;
1
1
Figure 9.20
Distr~buteddining philosophers.
The distributed solution in Figure 0.20 is fair because the forks are reqi~cstedone at a li111eand invocal-ions of g e t f o r k are serviced i n rhe order they are called. TIlus each call of getforks is serviced eventually, assu~ning pllilosophel-s eventually release forks rhey have acquired.
9.7.2 Decentralized Dining Philosophers We now develop a decentralized solulion thal has one waiter pcr philosopher. The process interaction prlttern i s similar to lhac in tke replicated file server.s in Figures 8.14 and 8.15. The algorit.hm en-iployed by the waiter processes is another exa~npleof tolceli passing, with the tokens being the live forks. Our solu[ion can be adapled lo coordinate access to replicated files or to yield an efficient solution to the clist~ibutedmutual exclusioa problem (sce the Exercises).
I
468
Chapter 9
Paradigms for Process lnteraclion module Waiter [t = 0 to 43 op getforks(int), relforks(int); # for philosophers # for waiters op needl ( ) , needR ( ) , passL(), pasea( ) ; op forks(bool,bool,bool,bool); # for initialization body
op hungry0, e a t 0 ; # local operations boo1 havel, dirtyl, haveR, dirtyR; # status of forks int left = (t-1) % 5 ; # left neighbor int right = (t+l) % 5; # right neighbor proc getfoxks0 i send hungry(); # tell waiter philosopher is hungry receive eat(); # wait for permission to eat 1 process the-waiter i receive forks(haveL, dirtyl, haveR, dirtyR); while (true) ( in hungry ( ) - > # ask for forks I don't have if ( ! haveR) send Waiter [right].needL ( ) ; if (!haveL) send WaitertleftJ-needR0; # wait until I have both forks while (!haveL or !haveR) in passR() -s haveR = true; dirtyR = false; [ I passL0 -> haveL = true; dirtyL = false; [ 1 needR( ) st dirtyR - > haveR = false; dirtyR = false; send waiterfright].passL(); send waiter [right]. n e e d L ( ) [ I needL() st dirtyL - > haveL = false; dirtyL = false; send Waiter[leftl.passRO; send Waiter[leftl.needR(); ni # let philosopher eat, then wait for release send eat(); dirtyL = true; dirtyR = true; receive relforkso; [ I needR() - > # neighbor needs my right fork (its left) haveR = false; dirtyR = f a l s e ; send Waiter [ r i g h t ] .passL ( ) ;
9.7 Replicated Servers
[I
469
needL() ->
# neighbor needs my left fork (its right) haveL = false; dirtyL = false; send Waiter[left].gassR();
ni }
1 end Waiter
process Philosopher[i = 0 t o 4 1 while (true) ( call waiter [i] .getforks(); eat; call Waiter[il.relforks()j
thi&
{
;
1
1 process Main { # initialize t h e forks held by waiters send Waiter[O].forks(true, true, true, t r u e ) ; send Waiter[l] .forke(falee, false, true, true) ; send Waiter [2] .forks (false. false, t r u e , t r u e ) ; send WaiterC31 .forke(false. false, true, true); send Waiter[4].forka(false, false, f a l s e , false);
1 Figure 9.21
Decentralized dining philosophers.
Each fork is a token that is held by one of two waiters or is in transit between them. When a philosopher wants to eat, he asks his waiter to acquire two forks. If the waiter does not culrently have both forks, the waiter interacts wilh neighboring waiters to get them. The waiier then retajns control of the forlts while the philosopher eats. The key to a correct solucion is to manage the forks in such a way thar deadlock is avoided. Ideally, the solulion should also be fair. For this pi-oblem, deadlock could resull i f a xaiter nescls two forks and cannot get thern. A waiter certai.nly has to hold on to both forks while h i s philosopher is eating. But w l x n the
philosopher is not eating, a waiter should be willirlg to give up his forks. However, w e need to avoid passing a fork back and forth from one waiter to another without its being used.
470
Chapter 9
Paradigms for Process Interaction
The basic idea for avoidjl~gdeadlock is to have a wailcr give up a fork that has been usecl, but to hold onto one that ir jost acquired. Specifically, when a philosopher starts eating, his waiter marks boih forks as "dirty." When another waiter wants a fork, if il is dirty and no1 cur~cndybeing used, the first waiter cleans rhe fork and gives i t up. The second waiter holds onto thc clcan fork unLi.1 it has been used. Howcver, a diriy lork can be reused until il is needed by thc olher wairer. This cleccnrrali~edalgori(hm is givcrl in Figure 9.21. (11 is colloquially called the '"hygienic phjlosophers" algorithm because of tlic way forks are cleaned.) The solution is progra~nlned using the mi~llipleprilniiives rlotatiori clescribetl in Section 8.3, because it is convenienI to be able to use all of remote proceclure call. rendezvous, and message passing. %'her1 a philosopher wal?ts to eat, 11e calls lhc getforks operation exported by his table module. The getf orks operation is implemented by a procedure to hide [he fact that getting l'orks requires sending a hungry rnessage and receiving iui eat message. When a waiter process receives a hungry nlessagc, it checks Ihe sratus of the two forks. If il has bolt), it lets the philosopher eat. then waits for the philosopher to release the forks. 1t the waiter prwcss does not have both forks, it has to acquirc those i t needs. The nee&. needR, passl, ancl passR operations are used for this. In particular, when a philosopher is hu!~gryand his waiter needs a fork. (hat wailcr sends a 11ee.d mcsaage to (he waiter who has the fork. The othcs waiter accepts h e need message w h e ~the ~ fork is d i r ~ yancl 1101 being used, and then passes h e fork to the lirst wailer. Thc needL and needR operations are invoked by asynchronous send sat her than syncl~ronousc a l l , because cleacllock car1 result i f two waiters call each oiher's opevetions at the same time. Four variables in each waiter are used ro record the SlahIs or the forks: havel, haveR, dirtyl, and dirtyR. These variables u e initialired by having the Main process call Ole f o r k s operation i l l Ihe Waiter modules. Inidally, waiter zero holds two dirty forks, waiters one lo U~receilch hold one dirty fork, and waiter foiw holds no fork. In order to avoid deadlock. it is irnpenirive that Lhc forks he distl-ibuled asyrnrne~ricallycurd that rhcy all be dirty. For example, if every waiter initially has one fork and all philoophers want to ear, each waitcr could give up the fork be has a.nd then hold on to the one he gets. I f any I'ork is initially clean, then the waiter that holds it rill no1 give it u p until afier his philohopher has eaten; if the philosopher ter~i~inates 01. lievet- wallts to eat, atlother philosopher could wait Torever to get Lhe fork. The program avoids scarvation hy having waiters give up rorks that al-e dirty. gel In par.iicular. if one wailer wants a fork that aoo(11e1-holds, he will cvel~t~lally
Historical Notes
471
passes it to h e first waiter. IF the fork is dirty and in use. eventually r l ~ eothu. philosopl~er will q u i t eating, and hence Lhe orlicr waiter cvill pahs the fork lo the first waitec 1T the fork is clean, it is because chc olhel- philosopller is hun.gry, thc orher waiter jusL got the fork, or the other waiter is waiting to get a second fork. By similar reasoning, the other waiter w/ill eventually get the second fouk, because there i s no state in whicll every waiter llolds one clean fork and wants a second. (This js another reason why asyinrnelric initialization is imperative.) it. I f the fork is dirty and nol in use. the second waiter immediately
Historical Notes
1
n :r :s le !I-
JO
:s : ng
ly.
rk. ted illy ork Ihc
the forirly. r f"l
All rhe paradigms described i n this chapter were de\leloyed between the mid-1970s a11d rile mid-1980s. During lllar decade, there was a pletho1.a of activity refining, analyzing, and applying the various paracliglns. Now the focus is more on using them. Indeed, rnany problems can be solved in [nore than one way, hence ilsirlg inore than one paradigm. We describe some of these below; additional examples are given in the Exercises and in Chapter 11. Tbe manager/worker paradigm was introduced by Gentleman [1981], who called it the administratodworker paradigm. Calriero and Gelernter have called the same idea a distributed bag of tasks. They present solutions to several problems in C a ~ i e r oet al. [1986]and Carriero and Gelernter [1989]; the solutions are prograrn~~~ed using their Linda primitives (Section 7.7). Finkel and Manher [I9871 use a distributed bag of tasks to implement backtracking algorithms. The used in paallel computations, managerlworker paradigm is now co~nmo~lly where the technique is sonletimes called a work pool, pr'ocessor farm, or work farm. Whalcver the term, the concept is the same: several workers dynamically dividing up a collection of tasks. Heartbeat algorithms a-e routinely used in disl~ibutedparallel conlpuLations, especially for grid cornputatiotls (see Secrinn 11.1). The author of this book creheartbeat algorithm in the late 1980s because that phrase seems to ated d ~ term e characterize the actions of each process: pump (send), coi1t.l-act (I-eceive),prepare for the next cycle (compute), then repeal. Per Brinch Hansen [I9951 calls i~ the ce.llulas autonzala paradigm, alcl~oughthat term seerns better Lo describe a hnd of applicatioo than a progra~n~ning style. In any event, people do 1101 ~~sually use any name Lo refer lo the now canonical send/receive/compute PI-ogramming style-QJ. they just say that processes exchange i nfoonation. 'In Section 9.2 we preser.lted heactbeat a.lgoritlin~sfor h e region-label prohleln from irnage processing and I'oJ-the Game of Life. whjch was invented by tnatheinatician John Conwily in Lhe 1960s. Image processjng, cellulai- automala, and the relaled topic of genetic algorithms are covered in some detail by
472
Chapter 9
Paradigms for Process Interaction
Wilkinson and Al.len [1999]. Brinch Hansen [ I 9951 illso describes cellular Inany auromnta. Fox et at. [ 19851 cover numzr-ous applicikcions and algo~.ithii~s, oFwliic1i arc programmed usirig the heattheat slyle. T h e conccpl of a soft\varc pipeli~legoes back at least as f i i ~ as , [lie i~u~.oduction o l Unix in the early 1970s. 'Vhc concepL of n h a r d w a r pipcline goes baclc even li11-ther.Lo (he em-ly vectol- processol-s in the 1960s. 1-Iowevel; the use OF pipelines as a getieral parallel co~npulingpal-adigm is more recent. Pipeli~leand ring (closed pi pel i fie) algorithms have now been developecl k) solve Inany problems. Examplcs can be found in m o s t books on garallel computing. A few good soul-ces are Brinch Hansen [I 9951. Fox et al. [19851+Qujnn [l994], and Wi lkinpipclines are covcred in detail iu Hwang 1 1.9931. son and Allen I 19991. l-lardwa~~e The behavior of borli software and hardware pipelines i s si~uilar,so [heir perfo1.mauce cilll be a~lalyzcdin similar ways. The concept of a probelecho paradigm was iriverlted sitnulta~ieoc~sly by scvera1 people. The mosr cornpreIiensi\le eal-ly work is ol'l'ered by Clung 119821, who presents algorithms for sevetal graph problems, including sorling, computing biconnected colnponents, and knol (deadlock) deccccioll. (Clhang called them ccllo algoritli~ns;we use the ltnn pr-obelecho to indicate h a 1 here are two djstinct phases.) Dijkslra and Scholten [I9801 and F r a n c e ~[ 19801 Llse Lhe same paradig~ii-w~tlioul giving it a name-lo delecl ler~ninatio~l ol' a distributed program. We 11sed the network topology prohlem lo illu~lralelhe w e of a PI-obclccho algorithm. The problem was first described by Lzunyor~lL982J in a papcr that showed liow 10 derive a dislribuled algorithm in a systematic way by relining an algorithin tliat uses shared variables. Thar paper also showed how to deal wi1.b ;I tlynit~nicnetwork in which processors and linlts might fail and [hen rccover. McCurley and Schneide~[I9861 syslelnalically clerive a heartbeat algorirhm for solving [he same problem. Several peoplc ha\rc investiga~edthe problem of providing reliable or l'iiulllolcrant broaclcast, which i s co~~ccrned with ensuring thal every f'unctio~iingand ~.cacliableprocessor receives thc message being broadcast and that all agree upon tlle same value. For example, Schneidcr el al. 119841 present an algorithm for fault-tolerant broadcast in a (I-ee,assuming that a I'ailecl processor stops executing and rl~olfailures are detectahle. Lnmporl el al. 119821 shoc\/ how lo cope with tl~iluresthat can result i n arbitrary behavior-so-calletl Byzantine failures. Logical cloclcs were developed by Lamport 11 9781 in a now classic paper on liow t o order evcncs i o distributed systems. (Marzullo and Owicki 119831 describe the problenl of synchronizing pl~phicalclock&.) Scluleider 11982) tlevelof djstribu~edsemaphores similar Lo the one in Figure oped an i~iiple~nentaiion 9.13; his papel- also shows I~owlo modify this kind ol' algorillim to deal w~tll
Historical Notes
473
fai lu res. Tlie same basic app1.oac11-broitclcasted measnges and orclerecl queuescan ;11so be used lo solve addi~ionnlprobletns. For exil~nple,Larnporr [ 19781 presenls a n algorillinl f(')~.distributed ~nulualexclusion. and Sclirlcidcr [I 9821 presents a distribu(ed i~nple~nentation of the guat,ded itipul/oucput cornslands of CSP described in Section 7.6. Thc algorithms in cliese papers do not assunie that messages are lotally ordered. However, li,r mally problerns it helps if b~,oadcaslis atolnic-namely. [hat every process sees Incssagcs thal have been broadcasl in exaclly the same order. 'rwo ltey papels on (lie use and imple~nenk~rion of reliable and aloillic com~nunicatiouprimitives are Bi~mlanand Joseph [ I987 j i111d Birn~anet al. j19911. Secrion 9.6 showed how to use ~olcen-passingto irnplernent distribulcd ~nutualexclusion and termination detection, ancl Section 8.4 sho\ved how lo use lokens to synchronize access to replicaled files. Chandy and Misra [I9841 use Loken passing to achieve fdir col~flictresolution. atid Cliandy and Lampo1.1[ I 9851 use lokens to determine global states in distl-ibuled compu~ations. The token-passing solution lo the distributed mutual exclusion prohle~n eiven in Figure 0.15 was developed by LeLann 119771. Tliat paper also shows * liow lo bypass solrlc node on the lain?, if ir should fail ancl how to regencrate the token if ii. should hecome lost. LeLant1.s ~~lelhocl requires knowi~lg~naxin~um communication delays and process identities. M~SI-a (19831 lacer developed an algorithtn that ovel.cornes rhese li~n.itationsby at sing two tokens (ha1 circula~e around the ring in opposite directio~~s. The distributed mutual exclusion j~roblemcar) also be solvcd using a broadcast algo~itlim. One way js to use dist~ihuted semaphores. I-lowever. this I-equires exchanging a large number of messages since every Inessape has lo be acknow lcdged by every process. More efficient. br~oaclciisl-hasedalgoi.i(hrn.ijk
174
Chapter 9
Paradigms for Process Interaction
Deadlock detectiorl in a distributed system is similar lo la~ninaliondctection. Several people have also developed probdecho and rokcn-passing algorithms €0.1this problem. For example, Cha~ldyel al. [I9831 give a probelecho algoritl~m. Knapp [I9S7'] surveys deadlock detectioi~algorithms in distributed database sysiems. Section 8.4 described how ro iruple~nenlreplicated files using tokens or weiglil-cd voting. Distributed semaphores call also be used, as described by Schneider 119801; his approach can be made fault tolerant using the techniques described in Schneider (19821. 'fo make token or lock algorithms fault Lolcrant, one can regenerate the token 01- lock as described in LeLann [I9771 and Misra [I983]. When there are nlultiple tokens and a server crashes, the other servcrs need to hold an eleclion to detel-mine wl~iclione will gct the lost toke~is.Raynal [lc)S8]descl-ibes several election algorithms. Several limes we have men~ionedthe issue o f coping with failures, and scvera1 of lhe above papers show how to make specific algorithms Cault tole]-an[. is beyond Clle scope of I-lowever. a full lrearment of faull-loletant PI-ogra~n~ning lhis br~ok. Schneider and Lampol-t [I9851 give an excellent overview of faliltroleixnl progratnrning and describe sevc~.algeneral sol urion paradigms. Jalote [I9941 is devorcd to laulr tole,-ance in distributed systems. Bacon 119981 and Hernstein and Lewis [J993] each conlain several chapters on discribuied systems and aspecls 01' faaulr tolefiuncc. Both these books also describe 11-a~lsac~ions and recovery if1 distributed tlalabase systerns, one of the ~nairlapplications ~haliuvariably requires fault tolerance. An excellen1 source for the stale of the arr (as o-l' 1993)on all aspe.cts o l distribukd systems i s a colleclion of papers edited by Sape Mullender [I9931 and writlen by an all-star cast of experts.
References 13 acon, J . 1 998. Conc~irrcnlS~~sr~n7.s: Oper-ding S,y.yt~ms,Drltabusc. arrd Disrribzcted Systc~~ns: An Inle,qratecl Approach, 2nd ed. Rcading. MA: Addison-
Wesley. Bernstein, A. J . , and P. M. Lewis. 1993. Concurreni.y in Progmn7ming ojzd D L ~ ~ L L.Yyst(~~ns. ~ B S C ' Bostoti: Jones and Bartlctt. Bivman, K. P.. and T. A. Josepli. 1987. Reliable cornrnunicaiion in the presetlce of f'a'ailurcs. ACIW Truiis. on Computer Sysrevrzs 5. 1 (Fcb~x~ary): 47-76.
Hirrnan. K. P.. A. Schiper, and A. Stephenson. 1991. Ligbtweighi. causal, aucl atomic group ~nulticast.ACM Tr~1n.s.on Computirtg Syste172~9, 3 (August): 272-3 14.
References
475
Brinch Hansen, P. 1995. S1~idie.cill Cornl~ura~ionnl Scierlce. Englewood Cliffs, 1\73: Pr.cntice-Hall. Cacricro, h'.,and D. Gelernter. 1989. How to WI-iteparallel pl.ogl-anls: A guide to the pcrplcxecl. ACM Ci~~nywfing S u r v e s 21. 3 (Scpte~nber):32-3-58. C;u-riero. N., D. Cclernler, allti J. Lejchter. 1986. Distributed data structures in Linda. Tl~irteentl?A CIVI Syr11/1. on P ~ ~ i n o / ; / ~of' l e Prog. s Laogs., J ;~ni~acy, pp. 236-42. Chandrasekaran. S., and S. Vcnkatesan. 1990. A message-opli~nalalgoritlirn for ~lislril>~~ted rermin;~tioncletec~io~~. .loumctl qf Pc~/-l,nllel ~rndllistrihrltecl Cowp~llilzg8: 245-93.
K.M.,L. M, I-laas, and J. Missa. 1983. Distributed deadlock detection. ACM I ~ Z I ~or1 Z SConzpuie~. . Systems 1 , 2 (May): 144-56.
Cliandy,
Chnndy. K . M.. and Lamport, L. 1985. Distributed snapshots: Determining global states of' distrihuled systems. ACM Trans. 0 1 1 Coni.p~iterSssren1.r 3, 1 (February): 63-75. Chandy, K.. M.. and Misra. J . 1984. The drinking plii1osophe1.sprohlern. ACM Trcil.l.s.O H Pfolj7. La~zguagesund Sy,~~erns 6. 4 (October): 632-46. Chang, E. J.-H. 1982. Echo illgorithms: depth parallel operations on gzncrnl grapl~s.IEEE Ti.~m.r.on Sqfiwnte Engu. 8% 4 (July): 39 1-40 1 .
Dijkstt-it, E. W., W. H. J. Feijen. and A . J. M. van Gasreren. 1983. Derivation of a terrnjnation detection algal-ilhm l'or dislributed computation. lnfi)~-r~~.a~iuri Proc.ts.ringLelters 16, 5 (June): 2 17- 1 9. Dijksll-ii, E. W., and C. S. Scholten. 1980. Termination cleccction in diffi~sing coml~ulalions.Infornla~ionPipoces$ii~gLeilei-r I I 1 (August): 14. Fillkcl. R.. and U. Manbcl: 1987, UlB-a disrributecl implemenlalion of backt~-ncki~lg, ACIM n~~zs. on Prog. 0lngrmge.r rrrld S y s f c i ~9~,s2 (April): 235-56. Fox, G.C.. M. A. Johnson, G. A. Lyzenga, S. W. Otto, .I. K. Salmon. and D. W. Wiill
Hall. Francez, N . 1980. Disrributed tc~mination. ACIV Ti.~rns.on Pmg. Luagrruges anrl Systent.~2, 1 (Jan ualy ): 42-55. Genilcrr~:~n,W. M. I98 I . Message pncsing bctween sequentii~lprocesses: the rcpl y ]>r.i~n i tjve and thc ad1ninistta101-concepl. Si?~~vni-c.-Prclcti~~c~ c~ndExper f ~ n c cl l . 435-66.
476
Chapter 9
Paradigms for Process Interaction
1-1w ang , K. 1 993. Advanced Conrpu~erA rchitect~~re: Pitroiielistn. S~:c~la.bili~, Prog~-i/.ni.nz~ibili/y New York: R4cGraw-Hill. Jalote, P. 1994. Fault K)lertriicc! iti Distrihl,tret/ Syrrems. Englcwood Cliffs, NJ: Prenrice-l-lall.
Knaop, E. 1987. Deadlock de(ection in distributed databases. ACM Conrpuring S ~ ~ r v e y19,J s (December): 303-28. Laniporl, I.. 1978. Time, clocks, and the orcler-ing of events in disiributed systems. Conun.. ACM 21, 7 (July): 555-65. La~nporl,L. 1982. A.n assertiunal correctncss proof nl' a clistributcd a)go~.irh~n. Science i.Con-ymter Prog. 2 , 3 (L>ecember): 175-206. Iaiunpo~l.L., R. Shosrak. and M. Pease. 1982. The Uyzncltine generals problem. A CM Tvnrrs. on Pnlg. Larrgwc~ges0n.d .Tyslen~s3 , 3 (July): 382-40 I . IA,ann, 0. 1977. Distril>ured sysierns: Towards il formal approach, /-'roc,. I1firnlc~rio17 PI-nc-essirzp77, North-l-[olland, Amsterdam, pp. 155-60.
Maekawa, M.. A. E. Olclehocfi, ancl R. R. Oldelioe-1.1. 1987. O]~~ra?irzg Systenlr: Atlvanc~(1Concppts. Menlo Park. CA: Her~jaminlCum~nings. Marzulla. K., find S . S. Owicki. 19133. Maintilining he ci~nci n a cliat~.ihuleclsystem. Punt.. Secorld ACM Syrnp. 012 I-'rirrcip/es oj' l>i.~isrr:Corr[j)r,l/ing, August, pp. 295-305. McCurley, E. K., atid F. B. Schneider. 1986. De~.ivatiouoT a disrribuled algorithm (lor finding paths in directed networks. Science t,lConlp~tter.Prog. 6: 1 (January): 1-9.
Misrn, J. 1983. Detecting terminalion of distributed computations using ~nitrkers. Proc. Secorzd ACM Syrn.p. on Prinrilh.9 y/'Di.srr. Cornl>u/illg,August! pp. 290-94. Morgan, C. 1985. Global and logical time in distribuled nlgorithnis. l~lji)r-~r/u/inrr Processing Leilria 20, 4 (May): 189-94. Mul lender, S . , ed. 1993. Disir-ihuled Syslems, 2 n d ed. Reading. MA: ACM Press and Acldison-Wesley.
Quinn, M. J . 1994. IJurullcl Comp~,~rir~,g: T / T P ocirzd ~ ~ Prrxctice. New York: McCraw-Hill. Rana, S.
P. 1983. A distributed solution uf the distribucecl terminalion problem.
Ifijorrnurion P~~-oces.\ing Letfe~:r 17, 1 (July): 43-46.
Raynal. h4. 1986. Algori1h~n.rfi)r M ~ i ~ u aExchnion. l Cambridge, M A : MIT
Press. Raynal, M. 1988. Distt-ibutedcl1goritt1.n-r~ and Protocols. New York: Wiley.
Schneider. F. B. 1980. Ensuring consiscency in a distrib~~lecl database system by use of distributed semapl~ores.Proc. nflnr. Symp. or! Di.s/rihu.teclDatcibnses, March, pp. 183-89. Schneider, F. B. 1 982. Synchconization i n clisrributcd ptogrcuna. ACM Tr~r~zs. on Prog. Lnngunges arzci Systems 4 , 2 (April): 12548.
Schncider, F. B., D. Cries; and R. D. Scl~licliting. 1984. Faull-lolerant broadcasts. Science of Conr.p~/!er. P ~ n g 4. : 1- 15. Schneidel; F.€I and .,L. Lan~po~z. 1985. P~tradigmsfor distributed progl-alns. In Distributed Systr!ms: Merllocix and 7hols ,fi)r Specjficofion, An Advunc.ed Cou~se.Lecturc Noce~in Co~nputesScience, vol. 190. Berlin: Springe).Verlag. pp. 43 1-80. Willplicaiio~~s Using Nei\~~orltcd 144)t-kstaiinns and Pc~r~rllelConr./~ure~u. EngJewood Cliffs. N N Prentice-I-lall.
Exercises 9.1 Section 9.1 describecl one way t o represent sparse n~alricesand sko~vcdhow to m~~lliply rwo sl~arsematrices using a distribuled bag of raskh.
(a) Suppose mall-ix a is ~epresen~ed by rows as described. Develol) code lo compute [he transpose of a. (b) Tmplen~enr,and experiment with the program in Figure 9.1. First use input ~natricebLhac you have en eta led by hand and for which you know [he resull. Then wriie a small pragrani to generate large mati-ices, perllaps using a ra~idonl nl~mbesgcneratur. Measure the performance of the program for (he lal-ge rnatriccs. Write a bricf report explaini~igLests you ran and the results you observe.
9.2 Section 3.6 presencecl n bag-olliasks progran Vor. ini~ltiplyi~lp dense malrjces.
(a) Construcl a distributed versjon ol' that program. In pill-iicular, change i t to use the managerlworker paradigm. (b) I~nplco~cnt your answer Lo (a), and implen~cntthe program in Figul-5 '3.1. Compare the performance 0 T the two programs. In parricular, genei-ate some I;~rge, sparse matrices and multiply them together using both programs. (The
478
Chapter 9
Paradigms for Process Interaction
pl.ogralns will, of' course. represent t l ~ ematricea di ffel-enrly. ) W r ~ t ea brief report explaini~lgyour tests and comparing the time and spikce rcquircmcnts of the two programs. 0.3 Tlic adaptive quadralure program in Figure 9.2 uses a fixcd number of task.s. (It divides Ihe inlerval from a ro b into a lixed nu~nberof subjntervals.) Tke algorilli~nin Figure 3.2 1 is f'ully adapbve: it sta~tswith just one task and generates as n u n y i:~'; are required.
(a) Modify the program in Figure 9.2to use the fully adapiive approach. This means lhar workers will never calculate a a are21 using a recursive procedure, You will lia\/e ro figure 0111how to detect 1erm.i~iatiorl! (b) Modify your anscve1 to (a) to use a il7reshl1uld T that defines the maximum size of (he bag of casks. After this many taslts have been generated, when a worker gets: a task from he bag, the lnanager should tell the worker to solve a Lask recursively and hence nor to generate any more tasks.
(c) Irnplemenl tlie program i n Figure 9.2 ant1 your answers to (a) ancl (b), then conipare the ~>el.l'ol~mance of the three programs. Conduct a set of experiments, then write a brief report describing the experiments you ran and the ~.esullsyou observed. 9.4 Quicksort is ;I recursive sorljrlg rnelkod that parlirio~is an array inlo s~nalle~. pieces and then combines them. Deve.!op a program to i~rlplelnentqi~icksort c~singthe ~nanager/workerparadigm. Array a [ l : n ] of integers is local to the manager process. U h e w ~ v o r k uprocesses to do (lie sorting. When your prograni te~.~ninales, the rcsult should be stored in adm.it~islrato~ array a. Do not use any shared variables. Explain your solurion and justify your design choices.
9.5 The eight-quecns problem is concerned with placing eiglir queens on a chess boi~rdin such a way that none can i11tacli another. Develop a program to gencsate all 92 solutions Lo the eight-clueens problc~nusing the rnani~ge~,/worke~pal-acligm. I-lave an administrator process pul eight inilia1 queen placements in a shared bag. Use w worker processes ro exlend partial solutions; when 21 worker finds a coinplete sol ut ion, i~ should sencl il to the admi nistralor. The pr-ogmm slioi~ld Do not use any sliarcd variables. co~npureall solutions and ~cm~.inate.
9.6 Consider tlie problem of detertninil~g[he numbex of words in ;I dictionary llwr contain ~ ~ n i clet.ters-namely. l~~s (lie number of worcis in which no lelter appears versions of a letter as the same letlet. more than once. Treat upper and lower (Most Unix systems conlain one or Inore onli~ledictionaries, ['or example in /usr/dict/words.)
Exercises
479
Write a disuibuted parallel program to solve this problem. Use tlie manngei./workers paradigm and w workel. processes. A t tlie end or the progl.ani, the manager should print the number of wor.cls that contain imiclue letten, and also prinl all hose thal are the longest. Tf your workers have access to shared Inemory, you )nay have them share a copy of the dictionary file, but if tliey execute on separate machine$, they should each have chcir own copy.
9.7 The Traveling Sulrswmn Problem (TSP). T1ij.s js a clashic comhini~lorialproblem-and one Lhar is also pracdcal, because jt i s the basis for things like scheduling planes and perso~incla1 an airline conlpany, Given are n cities and a symmetric malrix dist [l:n,1:n). The value in a i s t [i, j J is the distance from city i lo cily j-e.~., tlie airlinc miles. A anlcsmen starts in city 1 and wishes lo visit every city exaclly once, ending bnck i ~ ) city 1. Tlie problein is ro deler-mine a path Lhat mini~uizeslie dislnnce Lhe salesman nlust travel. The result is lo be stored in 21 vector bestpath[l:nl. The value of bes tpath is to be a pe~.mutationof integel-s 1 to n si~clithat the sum 01' the dislances belween adjaceill pairs OF cities, plus tlie dist.ailcc back to cily 1: is minimized. Develop a dislributcd parallcl program to solve this problem using the managerlwoi-kcrs paradigm. Make n ceasonable choice for what consrirutes e task; ~liercshould not be roo many, nor too few. You should also discard rasks [hat cannot possibly lead to a better resillt than yo11current1y have cu~nputed. (a)
(b) An exact TSP solution has to consicler every possible path, but Lllerc are n! of rhcm. Conscqiiently, people have developed a number of heurislics. One is ci~lled[lie neur.e.tl nci,plthor algorilhm. Starting wit11 city 1. li~slvisit tlie city. say c , nearest to cily 1. Now extend the paiia.1 tour by \~isitingtlie city nearesc to c. Coritjnue in this Pashion until all cities have bee11 visited, hen return lo cily 1. Write a program to i~iiple~nent this algorithm. (c) Another TSP heuristic is called the 1zeare.sI inserrinn algorithm. First find the other. Next find the unvisited city nearest lo pair o f ciLies Lhat are closest to either of these. two cities ancl insert it between thcm. Conti~iueto find thc unvis-
ited city with minimum distance lo some cily in h e partial lour. and inscrl rllal city between a pair of cities alre;tdp in [lie lour so that the inserlion causes the mini.cnurn jncrease in tlie Lotal length of the parlial tour. Wrile a pl-ogmm to i~npleinerltthis algorilhm. (d) A third traveli~igsalesman lxeunsric is to partition the plane of cilies into
strips. each ol which contains some bounded number, B of cities. Worker processes in p a ~ ~ l l find e l ~ninirnalcost lours from one end of the strip to lhe othe~a.
480
Chapter 9
Paradigms for Process Interaction
In odd-numbercd strips the lours sllould go from lhc top to the hotton?: in e\lennumberecl sb'ips they sliould go from Ihe bouoln to the top. Once tours have hwn founcl li)r all ships, they al-e connected togelher.
(r) C00lpa1-e the perl:ol-mance and accuracy 01' your programs for parts (a) tl~rough(d). What: are their esecutio~lLinieb? I-low good or had is the ap111.oxinlale solution rl~alis gencraled? Flow mucll larger a problem can you solve using lie appl.oxirnate algo~.ithmsttliir~ the e x x r algorithm'? Expel-imenr with several rours o f v a r i o ~ ~siz,cs. s Writ,e a report explaining the tests you corlductecl and the resu Its you nbse~.ved. "There are several additio~xal heurisljc algorithms and local opli~nijlnrion tzc}~niquesFor solving the travel in2 hnlesnlan problem. For cxa~nple.lhece are tecIi.niques called cutting planes and simul.ated annealing. Start by finding good c nlgo~:.itlims,wr.ite a refe~.erlceson the TSP. Then pick one 01'more of ~ h hette~' program to iniplenlent it? ;und conduct a se1~ie.sof espesi~ncn~s to see how well it perConns (both in terms of execution time and 11ow good a solution it genel'ates).
(f)
9.8 Figure 9.3 co~ltainsa program for [lie region-labeling prublcm.
(a) Irr~plcmen~ t J ~ cprogram using thc MPI, com~nur icntion 1ibrary. (b) Modify :y/ou~answer to ( a ) to get rid of b e separate coordinator process.
Hint: Use MPl's global co~n~nunication pri~nit.ives. (c) Compare the percormance of the I.wo prognlms. Develop a set o l sample images. then delennine the per{-ounance of each program for those images and different nu~nbel-sof workers. Write a lrpolt rl~alexplains and a~~alyzes your results.
9.9
Sectjon 53.2 describes ~ h cregion-labeling PI-oble~n.Consider a diffei-ent image proccssi ng problem caHecl a~zootlzing. Assi~meimages are repcescnted by a ~nabix01' pixels as i n Sectio~i9.2. The goal tlow is to smooth an image by rernoving "spiltes" :~tld "l-ough cdges." In yartict~lar,start w i 111 the i ~ipulimage, the11 modify llie image by unlighling (sctring lo 0) all lit pi.xels /hat do not have (11 I ~ N .d~rrc!ighho?s. I Each pixel in the original image should be considercd indepentlently. so yo11 will need to place thc new image in a new matrix. Now use dle new image 21nd repeal this algorithm. unlighting all (ncw) pixcls that do not have at leabl d neighhol-s. Keep going until tlzerc) are no chn/lge.r betr4)ee~rzho current and new: irrroge. (This Incans th;u you canoot know in advance how many times lo repeal the s~noollii ng algori Ilim.)
(a) Write a piu-allcl program Tor solvilig this problem; use Llle heartbeat paradigm. Your prograln should uve w worker processes. Divide t11e image into
Exercises
481
w erlual-sized st~.ipsand asaig~ione worker process l o each strip. Assume n is a multiple of w.
(b) Test your pr-oglxln (01- tli Werent images. di [rerent nu ~nbersof u~orkel.s.and different values of d. Write a brief rcpol-1 explaining your tests and rew~lts. 9.10 Figi~re0.4 con~ainsa program for the Garne of Lire. Modi'ly the pmgr:\m to use w worker processes. and have each worker manage either a strip or. block of cells. Imple~nentyour progra~nusing a discl-iburet1 programming language or a sequel)rial language and a subroutine library such as MPI. Display the oulpur on a g~.aphicaldisplay clcvice. Experi~nerll will1 your progral,n, Ihe~i write u b v i d repol'l explaining your experiments and the results you observe. Does the game ever converge?
0.I I The Game of Life has very siniple organisms and rules. Devise a more complicaled galnc [hat can bc nodel led using cellular aulo~natn:bul do not ~nalteit too compljcaretl! For example, nod el interactions between sharks and lish. between rabbits and foxes. or betweeu coyoles and roadrunners. Or model something like tllc burning of trees and brusli in a I'urcsc f re. Pick a problem Lo rnodel and devise an inif i l l set of iules: you will probably wan1 to usc rando~nness10 make interactions pi'obnl~i1ist.i~ mthe~than fixed. Then i~nplcmenly o ~ urnodel as a cellular autoinalon. experi~nenlwith ir, ancl modify the rules xo tlial the outcome i s no1 trivial. For example. yo^^ mighc strive for population balance. WI-ile a report describing yoill. garne arjd tlic I-esul1s you observe. 0.12 Section 9.4 describes ;I probelecho algorilhm for computing rl~etopology or a nelwork. This prohle~iican also be solvcd ~ ~ s i ral ghearlbeat algoritlin~. Assume as in Section 9.4 that a process can communicate only wit11 jfs neighbors, and [.hat initially each process knows only aboul those neighbors. Design a heartbeat nlgorilhnl Lhat has each process repca~edlyexchange informa~ionwith its neighboy-s. Wllcn Ihe pl'ogcam terrninaces, cvet-y PJ-oceshsllould know the lopology of the entire neLwotk. You will need lo figure out what [lie processes shoulcl exchange and 11ow rliey can tell bhen lo lenninale.
9.13 Suppose n2 processes are arranged in a square grid. Each process can communicate only wilh the nciglibors to ~ h cleft :and right, arid above and below. (PISOcesses on the corncrs have only Lwo neighbors; othcl.s on the edges of the grid have ~hreeneighbors.) Every process has a local integer value v. Write a heartbeat a1go1.itlim lo computc llle suln of tile n2 values. When your program terrniilares. each process sbould know the sum.
482
Chapter 9
Paradigms for Process Interaction
9.14 Scction 9.3 descl-ibes two algorilhms
1'01-
disl.ributed matrix inulliplication.
(a) Modify the algori.lbms so that each uses w worlter processes, where w i h much e s and n that make Lhe arithmetic easy.) s ~ ~ ~ athan l l an. (Pick v a l ~ ~ ol'w
(b) Compare the perfor~nance,on paper, of your answexs to (a). For give11values of w iund n, what is the rofal nu~nbel-of messages required by each program? Some of the messages can be in transit at the same time; for each program, what is the best case for the longest chain of messages that cannot be overlappecl? W h a t are the sizes of the rnessages in each program? Whal are the local storage requirements of each program? (c) Implement your answers to (a) using a programming language or a subroutine library such as MPI. Coinpare the performance of the two programs for different size matrices and different numbers of worker processes. (An easy way to tell whether your prograins are correct is to set both source matrices to all ones; then every value i n the result matrix will be n.) Write a report describing your experiments and the results you observe. 9.15 Figure 0.7 contains an algorith~nfor multiplying matrices by blocks,
(a) Show the layout of the values of a and b after the initial rearrangement for n = 6 a n d f o r n = 8. (b) Modify tbe program lo use w worker processes, where w is an even power of two and a factor of n, For example, if n is 1024,then w can be 4 , 1 6 , 64, or 2 5 6 .
Each worker is responsible for one block of the ~nal~ices. Give all the details for the code. (c) l~nplementyour answer to (b), for example using C and the MPI library. Conduct experiments to measure jts performance for different values of n and w. Write a report describing your experinlents and results. (You might want to initialize a and b to all ones, because then the final value of every element of c will be one.)
9.16 We can use a pipeline ro sort n values as follows. (This is not a very efficient sorting algoritlim, but what the heck, this is an exercise.) Given are w worker processes and a coordinator process. Assume that n is a rnultiple of w and that each worker process can store it[ most n/W + 1 values av a time. The processes Corm a closed pipeline. Initiallq; the coordinatw has n unsorled \talues. It sends Lhese one at a time to worker 1. Worker 1 keeps some of the values and scnds orl~el-son l o worker 2 . Worker 2 keeps some of the values and sends olhers on to worker 3 , and so on. Eventually, the workers send values back lo L-hecoordinator. (They can do this djrectly.)
Exercises
483
(a) Develop code For the coordi~latorancl (lie workers so that thc coordinalo~'geh back sorfed values. Each process can insert a value inlo a list or rcclnove 21 v;~lue from a list, bur it rnny not use an internal sorling routine. (b) How man!, messages are used by your algoricl~m.?Give your atlswel as a function of n and w. Re swe 1.0 show how you al-rived at the answer.
9.17 Givcn are n processes. eitc1-1corresponding to a node in a connected gl.apIi. Each node can colnmunicate only wid1 i t s neighbors. A spanning wee ol' a graph is a tree thui incli~desevery lode of che graph and a subsel of ilie edges. Wrile a program Lo construct a spanning I:ree on thc I'ly. Do no1 lirst colnpulc cllc topology. Lnste.ad. conslrucl the tree From the ground u p by having the processes iolesact wirh rlicir neighbors to decide which edges to put ill ilie tree and which to leave out. You may assume processes have unique indexes. 9.18 Extencl the probe/ccl~oalgorithm for co~npulingthe topology of a network (Fipilre 9.12) to handle a dynamic topology. I n particular, commiuiication links ~nigl~t fail during the cotnputatioo and later recover. (Alilure o'l' a processor can be nodel led b y failure of all i t s Links.) Assuine tl~alwhen a link fails, i t silently lhrows away i~ndeliveredruessages.
Def ne any additional pri~nilivesyou need to detecc when a I'ai1ul.e 01-I.ecove1.y has occurred. and explain brieHy how you would implemeni thenl. You might so that it 1-elurnsan error code if the also want lo modify the receive 1~1.imitive chaunel has failed. Your algorithm 5hould le.r~n.inale.tissum ing that e\leniually failures and recove~.iesquit happening for a long enough interval Lhat every node can agree otl the topology.
9. I9 Given are n processes. Assurne Lhal che broadcast pl-i~nitivesends a message from one process lo all n processes and that broadcast is both reliable and totally ordered. That is, every process sees all messages that arc bl-oadcast and sees them in rhe smne u~-dcl: (a) Using Lhis broadcast primitive (and receive: of ccn~l-se),develop a Ihir soliltion lo Ihe distribut.ed ~nutualexclusion pi-oble~n. In particular. devise entry and exit protocols that each process executes before and aftec n critical xection. Do not use additional helpcr processes; Lhe n processes should comniunicute dircctly with each other. (b) Discuss how one ni iglit ilnl>len)entatoniic broadcasL. What are [he pl'oblerns char have ro he solved'? How woul.d you solve ilieln'?
9.20 Conbider the following tl91.e~processes, which co~nmunicateusing asynchrono[~s nlessage passing:
481
Chapter 9
Paradigms for Process Interaction chan chA( ... ) , c h B ( ... ) ,
chC( ... ) ;
...
process A ( send chC ( . . .) ; send chB ( . - ) ; receive & A ( . send c h C ( ) ; receive ch~(-..);1
...
...
.
process B { send c h C ( . . . ) ; receive c h B ( . receive chB( ) ; send chA(
...
. .) ;
. .);
...) ;
)
process C { ... receive chC( . . . ) ; receive c h C ( . . . ) ; send chA(...); send chB( . . . ) ; receive chC( . . . ) ;
}
Assume that each process I~asa logical clock [hat is initially zero, and that it uses Llle logical clock update rules in Section 9.5 to add timestamps to messages and to update its clock when it receives messages.
Whar are the linal values of the logical clocks j.n each process? Sliow your work: the answer may not be unique. 9.21 Consider the irnplernentation of distributed semaphores in Figure 9.13.
(a) Assume there are four user processes. User one initiates a v operation at time 0 on its logical clock. Users two and thm both initiate P operations at time 1 on Lheir logical clocks. User four initiates a v operation at time 10 on its logical clock. Develop n trace of nil the communication events that occur in the algorithm and show the clock values and timestamps associated with each. Also trace [.he contents of the message queues in each helper. (b) In ihe algorithm, both user processes and helper processes maintain logical clocks, but only the helper processes interact with each other nnd bave to make decisions based on a total ordering. Suppose the users did not have logical clocks and clid not add timestarnps to messages Lo their helpers. Would the algorithm still be correct? If so, clearly explain why? If not, develop an example that shows what can go wrong. (c) Suppose the code were changed so that users broadcast lo all helpers when they wanted to do a P or v operation. For example, when a user wants to do a v, it would execute broadcast semop (i, VOP, lc)
and then update its logical clock. (A user would still wait for pe~.missionF I - O I ~i t s helper aftel. broadcasting a POP message). This w o ~ ~ simplify ld the program by gelliog rid of the first two arms o f the if/thedelse statement in llie helpers. Unfortunately, ir leads co an incorrect program. Develop an example (hat shows whal can go wrong.
Exercises
485
9.22 The solution to the disuibuted mutual exclusion problem in Figure 9.15 uses a token ring and a single toke11 that circulates contit~~ioi~sly. Ass~uneinstead that with every other-i.e., the com.rnunicaevery Helper process call co~nnl~inicate tion graph is complete. Design a solution that does not use circulating tolccnh. In parricular, the H e l p e r processes should be idle except when some process U s e r t i ] is tryirlg 1.0 enter or exit its critical seclion. Your solution should be fair and dead lock-flfr.ce.Each Helper should execute lhe same algorith~n,and the regular processes user [i] should execule the same cock as ill Figure 9.15. ( H i n t : Associate a token with every pair of proccsses-namely, with every edge of the co~nmunicationgraph.)
9.23 Figure 9.16 prese11t.sa set of rules for [erm.ination deleclion in a ring. Variable token coul~tstbe number of idle processes. 1s Ihe value 1-ea1ly~ieedcd.or do the colors suffice to detecl te~.ininatio~\? In other words. \+hat exac~lyis the role of
token?
9.24 Drinlting Philo,soptlers Prohlrm [Chandy ant1 klisra 19841. Consider the I'ollowing gcnel.aliration of the dining hilo lo sop hers problem. An undi~cccedgraph G is given. Philosophers ;ue associated wilh nodes of tlie g1.apb alld can communicate only with nejghl>o~'s.A bottle is assocjaled wid2 ei1c11edge of G . Each philosophcl- cycles between lhree staleb: tranquil. thirsly, and drinking. A wanrluil philosopher may bcco~ncthirsty. Before drinking, the philosophc~-must accluirc tlic bottle associated wilh every edge connected to the philobol>lier's noc;le. After drinking, a philosopller again becomes 11.anquil.
Design a solutiotl to this problem that i s fair and deadlock-free. Every pliilosopher should exccilte the same algorithm. Use toltens to l.ep~-esenlIhe bortles. 1t is permissible Tor a t~c?~lquil ~>l~jIosopher to respond Lo requests from neighbol~sfo~. any bottles I-he philosopher may hold. 9.25 You have been given a collection of processes thai co~nmunicateusing asynchronous message passing. A cljff~ningconzpl~t<~tiorz is one in which one main process starts the cornput-ntion by sending messages lo one or rrlore other pr-ocesses [Dijksrra ~ltldScllolten 19801. After lirst receiving a mcssage, another process may send messages.
Design a signaling scheme that js superimposed on the cornpiilation propel and lhar: allows the main process lo determine when the coml>uralionhas te~minatcd. Use a probefecho algorithm as well a5 ideas from Section 9.6. (kllnr: Keep counls of messages and signals.)
9.26 The distributed termination detection rules i n Figure 9.18 assumes that the graph is co~npleceand thac each process receives from exactly one channel. Solve the following exercises as separale proble~ns.
I I
486
Chapter 9
Paradigms for Process Interaction
(a) Excend (he roken-passing rilles to handle an arbi lrary connccled graph. E x t e ~ dthe token-passing rules to handlc rhe situation in which a process receives From mulriple channels (one at a rime, of course).
(13)
9.27 Consider the following variation on thc rules i n Figure 9.18 for termination delection i n a complele graph. A process lakes the same actions upon receiving a regular' Inessage. B u t a prowss now takes the fol1owin.g actions when il receives rlie token: if (color[il == blue) t o k e n = blue; else token = red; c o l o r [il = blue; set j to index oTcharlnel for next edge in cyclc c ; send ch [ j 1 (token) ;
I
/1,
E; .I
Tn other words. the value of t o k e n is no longer modified or examined. Using this new rule, i s rhere a way Lo dcrect termination'? If so. exp1ai.n when the computation is know~lto have ler~nina~ed. If not, explain why the rules are instif-
f cient. 9.28 Figure 8.15 shows how Lo impleme~ltreplicated files using one lock per copy. In thal solution. a process has to get evely lock when ir wants to updale the file. Suppose illstead that we use n tokens aud that a Loltcn does not move until it has to. Initi;~lly,every file server has one Loken.
(a) Modify the code in Figure 8.15 ro use tokens insread of locks. Inil,lernent read operations by reading llle local copy and write operaljons by updating all copies. When a clienl opens a lile for reading, its sel-vel needs to ac~juireone token; for wriling. the server needs all n rokens. Your solution shoulcl be fair and deadlock-free. (b) Modify your' answer 10 (a) to use weighled V O I I I I ~wit11 tokens. In pilrli~ular., when a clielll opens a tile for reading, its server has lo acquire any readweight tokens; I'or writing, the server needs writeweight tokens. Assume that readWeight and writeweight sac-isl'y llle conditions a1 the end 01Section 8.4. Show in your code bow you maintain tirncs1.ainps on files and how you determine whjch copy is cun-en(w h a l you open a file for reading. (c) C O I I I ~YOLII. ~ Canswers LO (a) and (b). For the diffcrcnl cljent operations, how Inilny messages do the servers have lo exchange in clie two solutions? Consider the best case and the worst case (and define whal they are).
Implementations
-This chapter clescribes ways to ilnplemenl the va~.iouslanguage mechanisms described ill Chapters 7 and 8: asynchro~iousand syncl>ronousmessage passing, ICPC. and ~.endczvous. We First show how to implement asvnchronous n1ess;tpe passing using a Iternel. We rlien use asyrrclironous messages to implcmenl synchronous message passin2 and guarded corn~nu~~icalion. Next we show how to implement RPC using il kernel. rendezvous using asynchronous message passing, (arid multiple prjmitivea) in a kcrnel. The i~nplemenkitjon 2uid finally rentle~\~ous l asynchr0nou.s Inesoi'synchro~~ous message passing is Inore complex than r l ~ aof sage passing because both send and receive statements are bloclting. Siinilarly, the imple~nentationof rendezvous is more colnplcx ~ l i t ~that n of RPC or asynchronous message passi tlg because rendezvous 1x1sboth rwo-way co~nmunication and ~wo-waysynclironizarion. The starring point for the various implementations is the shared-memory kernel of Chapler 6. Thus, even though programs tlm use message passing, RPC, or reuclezvous are usually written fol- distributccl-memory machines, they call readily execute on shared-memoly niachines. 1L so happens chat the same kind of relationship is Lrue for shatcd-variable programs. Using what is callecl a dislrihurrd .shared morloty, it is possible to execute shared-variable programs on tlistributed-memory machines, even though they are usually written LO execuk o n slial-ecl-memory machines. The l u h r section ol' this chapter describes how lo implement a distributed s h;lred memory.
I
488
Chapter 10
Implementations
10.1 Asynchronous Message Passing This section presents two irnplemenra~ion$OF asynchronous rnessage passing. 'The first adds channcls and message-passing pl.imitives to the shared-me~nory i s sujri~ldcl'ol. a single processor or a Iternel ol' Cliaptc~'6. This irnplernenti~tio~i shared-memory multiprocessor. Tllc secoild i~nplementationcxtel).cls i lie sharedmcniol-y Iternel co a tlistrjbutecl Iternel that i s suitable for LI multicomputer or a ~)e(cvo~.lce,d collcc~.ionof separare ~nacliines.
10.1.1 Shared-Memory Kernel Each channel in a program is represented in a kernel by means of n chui?ttel r1escripro1-. This contairls the heads of a message list and a bloclted list. The nlcssage list contains que~~ed messages; the blocked list contail~sprocesses waiting to receive messitges. A1 least one of these lis~swill always be empiy. This is because a process is no1 blocked it' there is iur ava.ilable message, and a message ia no1 queued if rherc is a bloclced process. A descriptor i s creared by means of Lhe kerne.1 prinlilive createchan. This is callcd once for each chan declaration in n program, bcfore any processes are createtl. An array d cllannels is created eithcr by calling createchan once fc)~. each eJement or by piuameterizing createchan with the array size ant1 calling it j o s c once. The createchan primitive rehros [.hename (index or address) of the dcscriptot.. The send slate~nentis i~nplernentedtising rl~esendChan pr'inlitive. First, (he sending process cvaluales tlie exill-essionsand collects Lhe values together into a single message. stored typically on the sendii~gprocess's execution stack. Tlieil sendChan is called; its arguinents are the channel name (returned by create~ h a n )and the message itbelf. The sendChan pl:irnitive first Linds 111e descriptos of the cliiini~cl.I f there is at least one process on tlie blocked list, Ihe olclest proccss is removed fro111 that list and t.be nlessage i s copied into the process's adclress space; rl~ntpi-ocess's descriptor i < then inserted on rlie ready list. If there is no blocked process, the rnessage has to be saved on the descriptor's message list. This it; necessary because send is nonblocking: :und hence thc sender has to be allowed to continue execi~ting. Space for the saved mcssage citn be allocated dy~i.;unicallyfrom a single buffer pool, or t1iei.e can be a cojnrnunicalion buffel- associated with each cliiuinel. I-lowever. usynchi.onous message passing ~uiseson impo~lnn[i~npleme~llalion issuowhat i f r1iei.c is no more kernel space? The kerrlcl has two choices: halt the program due to bufFer overflow or block Lhe sender until diere is enough
10.1 Asynchronous Message Passing
489
buffer space. Halling the program is a dras1.i~step since free hpace could soon become available, hur it gives immcdia~efeedback to the progl-alnnier that messages are being produced fastcr tlia~ithey are being consumed. which usually i~ldicille~ an el.l.ol. On r11e o~hcrhand: blocking llle senclef- violates lllc nonblocking semantics of send and complic;~testlie kernel somewhat since thcrc i s an additjona.l cause ol blocking; then again. the ~vriler01' a concu~'renlprograni cannot assume anything about the rate or order in which a process executes. Operating system kernels block senders. and swap blocked processes o ~ of ~ t menlory if necessaly, since they have to avoid crashing. However, lhalling the program is a reasonable choice for a high-level PI-ogramminglanguage. The receive statement is implemented by tlie receiveman primitive. Its arguments uu-eflit name of the ct-\c?nneland the address of a message buffer. The actions of receivechan are the dual of those of sendchan. First lhe kernel finds the descriptor for the appropriate cl~annel,then it checks the message list. If tlie message list is not empty, the first mebsage is removed and copied into the receiver's Inessage buffer. 1F the message list is empty. the receiver js inserred on the blocked list. Arter receiving a message, the receiver- unpacks the message from the buffer into the :ippropriate variables. A fourlh primitive. emptychan, is ~ ~ s etdo irnplcmenr the fiinctior) empty(ch). I t simply finds the c1escri.ptor and checks whether the message list is Empty. In fact. il the kernel data smctures are not in a protected address space, t(le execuli~lgprocess could simply check Tor it~elfwhet lie^ (hc message l i s t is empty; a cricica1 seclion is not requjrcd since [he process only needs Lo examine rlle head of the message list. Figure 10.1 contains outlines of these four primitives. These pri~nitivesare added to the single-processor kernel in Figure 6.2. The value of executing is the address of the descriptor of tlie currently executing process, and dispatcher i s a procedure that schedules processes on the processor. The actions of sendChan and receivechan are ve~ysimilar to the actions of F and v in the semaphore kernel in Figure 6.5. The main difference is that a channel descriptor contains a message list, whereas a semaphore descriptor merely can~ainsthe value of the semaphore. The kernel in Figure 10.1 can be turned into one for a shared-memory multiprocessor using the techniques described in Section 6.2. The main requirements are to store kernel data structures in memory accessible to all processors, and to use locks to protect critical sections of kernel code that access shared data.
490
Chapter 10
Implementations i n t createChan(int r n s g s i z e ) { gel an clnply channcl dcscl-iptol- and initialize it; sel rclu~nvalue lo lhc index 01addi-ess of tlie desa-ipmr; dispatcher ( ) ;
1 proc sendChan(int chan: byte m s g [ * ] ) f i r ~ descriptor l of channel chan; if (blocked list empty) { # save message acqi~irebut'kr and copy m s g ~ n i ~t ; o inscri bufCc1- ai end of message list; 1 else { # give message to a receiver rernove pi.ocess from hlockecl 1i.s~; copy msg into 1I1e [)I.OCCSS'Snddrtss space; insert lhc process at end of rcaclq list; }
dispatcher ( )
;
1 proc receiveChan(int chan; result byte msg[*l) find descriprol- of channel chan; if (message l i s t emply) { # block receiver inscri executing a( COO 01'b10cked list; stolr adclress of msg in descripto~. 01 executing; executing = 0;
{
}
else {
# give receiver a stored message
remove buffer from !nebsage list; copy contents of bu.l'l'er inlo msg;
> dispatcher();
1 bool emptyChan(int chan) { bool r = false; l i ~ l dtlehcril~tol-oi'chal~~ielchan; if (mehhnge list empl)]) s = true; save r as the return value; dispatcher ( ) ;
1 Figure 10.1
Asynchronous message passing in a single-processor kernel.
10.1 Asynchronous Message Passing
49 1
10.1.2 Distributed Kernel We now show how to extend the shared-memory kernel to support distributed cxecurion. The basic idea i s to replicate the kernel-placing one copy on cacl~ machine-and to have the differ en^ kernels cornlnunicate with each other using network com~nunicationprimitives. Each channel is stored on a singlc nlachine in a distribvr.ed program. For now. assume that a channel can have any nurnber of sendcrs but illat it has only one ~tceivsr. Then d ~ elogical place to pur a channel's descriptor is on the machine on which the receiver executes. A process executing on that machine accesses the channel as in the shar-ed-memory ke~nel. However, a process executing on another machine cannot access che channel directly. Instead. the kernels on rlle rwo ~~-\achines need to interact. Below we describe how to change the shared-memocy kernel and how to use the network to implerr~enta distributed prograrn. Figure 10.2 illustrates the structure of a distributed Iternel. Encb ~nachine's kcniel contains descriptors For the channels and processes located on tha~ machine. As before, each kernel has local interrupt handlers €or supa-visor c;~Ils (intenla1 traps), timers. and inputJoutpul devices. The communication network is a special kind of input/output device. Thus each kernel has network interrupt handlers and contains routines lhal write ro and read from the netwol.k. As a concrete example, an Ethernet is typically accessed as follows. An Ethernet con11.ollerhas two independent pans. one for writing and one for reading, Each part lias an associated ioterruyc handler in the kernel. A write inle~rupt is triggered when a write operation completes; the controller itself takes care of
Application
Application
Primitives
Primitives
Descriptors Kernel
Figure 10.2
Descriptors Kernel
Distributed kernel structure and interaction.
492
Chapter 10
Implementations
network access arbitr;uic',n. A read intcl-ruyl is triggered on a processor when a message For that procebbor arrives over lhc network. When a kerncl pri~niti\le-executing on behalf of an application proccssweds lo scnd a message tu a~iothcrmachine, it calls kernel PI-ocedurcnetwrite. This pi.oceclure has three argumcnls: a destination processor, 21 message kind (see below), and the rnessage itsell. Firsl, netwrite ;~ccluiresa bufier, limnats the message, and stores it in the hu fl'er. Then. if tllc wrili ug half of Llic network conh.oller is free. an nc~rralwrile is initialed; otherwise the bul'rer is inse~teclon a qileue of write requests. I n either case netwrite returlls. Later, whcn a write internipt occurs, rhe associated jntcrrupt handler frecs the buFl'e~.containing (.he Inessage that was just written; jf its write queue is no1 empty, the interrupt handler lhcn initiates another network write. Input horn lie network is typically handled in a reciprocal fashion. When a lnessagc an-i\les at a kernel. the network read jnten-upr handler is entered. It first saves (lie slate of the cxccl~tingprocess. Then i t alloc~~tes a nc\v buffer for tlie nexi uctwo1.k input message. Finally, the read handler unpaclts he first field of h e input lncssage to deter!iiine thc kind and then ca.lls (be appropria~cIternel p~imitive.' Figure 10.3 contains outlines for the nclwork inlcrfilce rouiines. These include the network int.e~.ruptbandlers and Chc netwrite procedure. The netRead-handler S C ~ V ~ C CLhree S kinds of messages: SEND. CREATE-CHAN, all(( CHAN-DONE. These are sent by one kernel a.ncl sel-\iced by another, as described l>elow. The kernel's dispatcher routine is callcd a1 the end 01' netwritehandler to resume execution of the interrupled proccss. However, the (lispalcher i s nor called at the end of netRead-handler siilce that I-uutinccalls a kernel primitive--depending on Ltre kind of' input ~nessage-and thar prirnilive in tul'li calls ilie dispatcher. For simplicity. we assome that ~~etwork transmission is error-free and hence ~ h a messages i d o not need to be acknowled,oed or retransmitted. We also igno1.e the 131-oblemof r.unning ol~ro l huffel- spncc for outgoing or irlcomilig messages: ~ I I practice, the kernels would employ wha.~is calledjow corz/~r)lt.o limit the riurnbcr o'l'bul'Ferec1 messages. The Historical Notes section citcs lilel-alure that describes how to addrcss these issues. Siuce a channel can bc stored eilher locally 01-I-ernotely,a channel name now nceds 1'0 have two fields: a machine number and an index or oCCset. The I An ; i l l c r ~ ~ ~ u appl-oacli vc LO 11;11itllin: ~rcrworkinpul ih to tmgloy A d a e ~ ~ l oprocew n lliiit cxcculca oll~siclc I I k
10.1 Asynchronous Message Passing
493
t y p e mkind = enum(SEND, CREATE-CHAN, CHAN_DONE); boor writing = false; # status of network write other variables Ior the write cjllcuc and tran.yniss.ion bul'fers;
proc netWrite(int dest; mkind kind; byte data[]) acquire burfer; Formal message and store it in the but'fer; if (writing) insert the message bul'l'er on the write queue; else ( writing = true; slart transm.itting the message on the network; 1
{
1 netwrite-handler: { # entered with interrupts inhibited save state of executing; flee the current ~mnsnijsaionbuffer; writing = f a l s e ; if ( ~ ~ 1 - i l e ~ ~ l l e ~ e n O l C( ~ #p tstart y) another write remove firs1 buffer fro111rhe q u e u e ; w r i t i n g = true;
slarl transmitting the message on the ncrwork; )
dispatcher ( )
;
1
netRead-handler: ( # entered with interrupts inhibited save slake ol' executing; a c q l ~ i lnew - ~ bulr'f~:~; pl-epar-enetwork conti-oiler for next read; unpack firsr field o f input message to delel-mine kind; i f ( k i n d = = SEND) remotesend (cl~annelname, b u f k r ) ; else if (kind == CREATE-CHAN) remotecreate (rest of nlessage) ; else # kind == CHAD-DONE chanDone ( I C S L of lnesaage) ;
1 Figure 10.3
Network interface routines.
494
Chapter 10
Implementations
machine number ilzdjcales where rl~edescripror is scored; tlle index indicales whare to find the descriptol- i n that machine's kernel. LVe also need to augment the createchan primitive so lhat il has an additional algumenl indicating the machine on which (he channel is to be created. Within createchan, the kernel checks this argument. Tf the creator atid channel are on the same machine. rhe kernel creares Lhe chan~lelas in Figurc 10.1. Orhenvise tlie kernel blocks the exec~~ling process and rcanslnits a CREATE-CHAN message to the I-emote mnchi~ie. That message includes the idenlitg o f the executing process. Eventually the local kernel will receive a CHAN-DONE message indicaling that (he channel has been created on (lie remote ~nacliine. This message contains the channel's name and indicates the pr-ucess Col- which tlie channel was created. As shown in Figure 10.3. when netRead-handler receives this message, il calls a new kernel primitive, chanDone, wliich utlblocks (he proces5 that asked to have the channel created and returns the channel's name to i t . On the olher side of he network, when a kernel daemon receives a CREA T E C H A N message, it calls the remotecreate primitive. Thal primitive creates the channel then sends a CHAN-DONE message back to the first kernel. Thus, to create a channel on a remote machine, we have the following sequence oC steps: An application process invokes Lhe local createchan primitive.
The local kernel sends a CREATE-CHAN lncssage to the remote kernel. The read interrupt handler in the remole kernel receives the n~essageand calls the rernote kernel's remotecreate primitive. The rernote kernel creares the channel and then sends a CHAN-DONE message back to the local kernel. The read intel-rupc handler in the local kernel receive4 the message and calls chanDone, wliicl~awakens the applicalion process. The sendChan primilive also needs Lo be changed in a distributed kernel. Howevel; sendChan is much simplet than createchan since the send statement is asyt~cJ~ronous.In particulai-, if tlie char~neli s on thc local machine. sendChan lakes the same actions as in Figurc 10.1. If the chn~inclis on annllier machine, sendchan translnit-s a S E N D message tc) rhat machine. Ar this point, the executing process can continue. When the mws.agc arrives a1 lhe ren?ole kernel, that kernel calls d ~ remotesend e prirnicive, which lakes esse~~tially he same actions as the (local) sendChan primitive. The o~)lydifference is [hat Lhe incoming message is already storcd in a buffc~;and hence tlie kernel does 1101need to allocare a new one.
10.2 Synchronous Message Passing
495
Figrrre 10.4 conlains a~lclinesfor tlic primifives of' (he disll-ibuled Ice,-nel. The r e c e i v e c h a n and ernptychan prilnitives are the hame as in Figure 10.1 as long as each cha~inelhas only one seceivu' and the channel i s hlored on the same machine as [lie receiver. However, i f [his i s not he case. (hen additional messages are ~)eedecI l o c o m m u ~ ~ i c a t between e rhe ~nacl~ine 01) which a receivechan or empty prin~jtiveis invoked arid [he machine on whicli the channel is stored. This cornln~~~lication is rulalogous to chat for crea~inga channel-the local ke~nelsends a messagc to the relnole kamel, which execules the primitive and then sends the result back lo the local ker~lel.
t y p e chanNarne = rec(int machine, index);
chanName createChan(int m a c h i n e ) (
chanName c h a n ; i f (machine is local) { get an elrlpty channel descriplol- ;lnd ini~ializeit; chan = c h a n ~ a r n e ( 1 0 ~~nachinc al number, address or'dcscriptor) ; 1 else { netWrite(machine, C R E A T E - C W , executing); insert descriptor o( executing on delay list; executing = 0; 1
dispatcher(); 1 proc remoteCreate(int creator) charname c han ;
(
get an ernpry channel descriptor and initialire it; chan = chanName ( local machine number, adclress of descripto~.); netwrite (creator, CHAN-DONE, dispatcher0 ;
chan) ;
1 proc chanDone ( i n t c r e a t o r ; chanName chan) { remove descriptor of process creator from h e delay list; save chan as return value for creator; insert the descl-jptor of c r e a t o r at the end of the ready list; dispatcher(): 1
496
Chapter. 10
lmplementations proc sendChan(chanNarne chan: byte mag[*]
)
i
if (than-machine is local) same aclions as sendChan in Figure 10.1 ; else
netWrite(chan.machine, SEND, msg); dispatcher ( ) ;
1 proc remoteSend(chanName chan; int b u f f e r ) { find desci-ip~or of channel chan;
(blockcd list empty ) insert b u f f e r on message lisl; else { remove process fl-om blocked list; copy message from b u f f e r to the process's address space ; insert the process a[ the end of the ready list; 1 if
dispatcher ( )
;
1 proc receiveChan(int chan; result byte msgl*] )
{
same actions as receivechan in Figure 10.1;
boo1 emptyChan(int chan) { same actions a s emptyChan in Figure 10.1 ;
1 Figure 10.4
Distributed kernel primitives.
10.2 Synchronous Message Passing Recall that with synchronous message passing. both send and receive are blocking pvjrni(.iues. In pal-ticular, whiclieve~.process tries Lo commuuicate first has to wail u~ztil Lhe other one is ready. This avoids the need for potentially unbounded clueues cF bufl'el-ed messages, but rcq~~jres thal the sendel- and receiver exchange control siz~lalsin order to synchronjze. Below we show how to implement synchronous message passing using asynchronous message passing, then we show how to implement the input, ourput, and guarded communication statements of CSP using a clearinghouse process. The second iniplementatjon can also be adapted to imple~nentLinda's tuple
10.2 Synchronous Message Passing
497
space (Section 7.7). The Historical Notas at the end ol: this chapter give references Ibr decentralized irnplernentations; see also Ihe cxel-ciscs at the end o f this cllapler.
10.2.1 Direct Communication Using Asynchronous Messages Assirme tJla1 yo11 have been given a collcc~ionof n processes that comlnunic~~te using synchrono~~s message passing. The sending sidc i n ;L co~nmunication narnes the intended receiver, but h e receiving side call accept a niessage from any sender. In pal-ticulilr. a source process s sends a message to a destination p~'ocess D by execl~ri 113 synch-send(D, expressions);
The destination pi-ocess waits to receive 21 message frotn any source by execi~ling synch~receive(source, variables);
Once both plucesses arrivc a1 thesc slaternents, tlie identity of the sender and the \~alueso l the expressions are transmitted as a message ti-om process s to process D; these values are then slored i n source and variables, rcspectively. Tlie receiver thus learns the identity of llie sender. We can i.mplemen1 the above primitives using asy~~chronor~s message passing by employing three arrays of channels: sourceReady, des tReady. and transmit. The firs1 two are used Lo exchange control sig~lals,the rI1it.d is used for data ~ransmissir>n.The clin~inelsaye used as shown in Figure 10.5. A I-eceiving process waits for a ~nessageon its elemenl of the sourceReady array; thc niessagc identifies the sender. The receiver tliel) lells d ~ esender lo go ;!head. Finally. the lliessage itself is ~ransniitted. Tlie code i n Figure 10.5 handles sending to a specific tlestina~ionwhile ~cccivingfrom any source. JI'both sides :ilways have to name eacll other, tlic~)we could get ritl 0.1' the sourceReady cllannels h Figurc 10.5 and liave [lie receiver simply scnd a signal to the source when tbc receiver is ready for n message. Tlie remaining sends and receives tu.e surlicient ro synchl-oni.zethe two processes. On the orher hand. il' a process doing a receive has lhe option of cilher naming tlie sourcc or accepiing lllessages Crorn ally source, the siluation is inore co~nplic~~ted. (Tlie MPI library supporrs this option.) This is bccause we either have to have one channel for each communication path and poll the cllannels, or the rccei\fing process has to cxamine each nlessage ancl save the ones it is no1 yet ready to accept. We leave to the readw modifying Ilie irnple~ne~lta~ion to liandlc ll~iscase (see the exercises at tlie end of this cl~spler-).
498
Chapter 10
Implementations
shnre.tl variables: chan sourceReady [nl (int); chan destReady In] ( ) ; chan transmit[nl (byte msgr*]);
# source ready # destination ready # data transmission
Syt\~Ii~~onouh send executed by source process s: gal.her expraessiooaillto a message buffer b; send sourceReady[DI (S); receive dest~eady[S]0: send transmit [Dl (b);
# tell D t h a t I am ready # wait for D to be ready # send the message
Synchronous receive exe.cuted by destin~ttionprocess D: int source; byte buffer[BUFSIzEl; receive sourceReady[D](source); # wait for any sender send destReady t source 1 ( ) ; # tell source I'm ready receive tranemit[Dl(buffer); # get the message unpack the I~uffrlinto (he variables;
Figure 10.5
Synchronous communication using asynchronous messages.
10.2.2 Guarded Communication Using a Clearinghouse You we again given a collecrion of n processes, but assume now that they communicate and synch.l-oni,zeusing the input and o u t p ~ stalelnents ~t of CSP (Section 7.6). Recall fl~a(these have tile forms ~ource?port(variables) ;
# input statement
~ e s inat t ion !port i expression^) :
# output statement
The statements ntntch when (he illput stalemeilt is executed by process Desti nation, tbe output statement is executed by process source, the port rlalnes are identical, rhere are as many expressions as variables, and they have the same types.
CSP also introduced guarded communication, which provides nondeterministic communication order-. Recall that a guwdec! communicauon statement has the form
wllere B is an optional Boolean expression (guard), c is an input or oulput
10.2 Synchronous Message Passing
499
slaternents are itsed sratemenl, and s is a stateinelii list. Guarded co1nmunicalio11 wi thjrr i f a~itldo stnlemenls to choosc a~norrgseveral possible coinmunicarions. The key lo imple~nenlinginput, oulput, and guardcd stalelneuts is to pair up processes thal wan1 lo cxecutc matching comm.unic;ition stnlc~nents.We will use a "cleariclghouse" process to play the role o f matchmalter. Suppose regul;lr process pi wants to execute an output statcnient wilh pj as destination and that p j wants to execute an input statement with P, as sour.ce. Assume thac the port name and the messape types also match. Tl~cseprocesses inta-act wit11 rl~e clearinghouse ancl each olher as illustrated in Figure 10.6. Holh P, and pj send a message to the clearinghouse; this message describes h e des.ired con~muni~~tion. The clea~.inghousesaves Ihe first of these mess:lges. M1heli it ~eceivestlie secor)d, it retrieves the first and determines that thc two processes want to execute malchitig statements. The clearirlghoi~sethen sends I-eplicsto both processes. Alter geuing tlie reply. P, sends the expressions in its outpul statement to P,. \vhich 1-eceives[hen) into the vari.ables i n its input slatejrlent. A( this poiill, each pl-ocess starks executing the code following i1.s cokn~.rl~inicalion slaleiuent. To realize lhe pcogratn structure in Figure 10.6. we need channels for each processes communicalioa path. One channel is used I'or message!, from ~cgi~lar to 11ie clearinghouse; these contain te~nl,platesto desci-ibe poss,ible matches. We also need one reply channel for each regular process; these will be used for messages back From the clearinghouse. Finally, we need one data channel i.01.each regulnr process that contains input statements; these will be used hy ocher l e g i ~ l ~ u processes. h a 1 is an integer between 1 Let each regular process have a unique ide~~tjry and n. These will be used lo index reply and data channels. Reply lnessages specify a direction f'or a co~nniunicationand tJw identity of the o l h e ~process. A message on a data channel i s sent as an array of bytes. Wt assurne LIlat the message itself ih sell-describjng-namcly, that it coiitai~~s tags that allow the I-eceiver to determine the types of data in the message.
Figure 10.6
\nteraction pattern with clearinghouse process.
500
Chapter 10
Implementations
When ~.egulal-processes reach inpul, output, or guarded comrriiinication slaremcnts, rliey send tentplules lo the clearinghouse. These are ~isedto seleci matching pairs of statements. Each te~nplatehas four fields: direction, source, destination, p o r t
The direction is OUT for an output srate~nentand IN (!or an input statemelit. Source and destination are rhe identities ol' the sender and inle~ldedreceiver (for output) or illtended sender and receiver (for inpul). The port is an inte~erthat urliqi~elyidentifies the port and hence the data types in rhe input and output statements. There nlusl be one of these for each difl'erenl kind of port in the source program. This rncaos every explicit port identilier needs to be assigned a unique inreger value. as does every anonymous port. (Recall that j>ort names appear io the sou~.ceprogram: ltence numbers can be assigned starically at compile lime.) Figure 10.7 conlains cleclaralions OF the shared data types and coinmiine l s the code that regular processes execute when thcy reach outication c h a ~ l ~ ~and pul and input scatell-renls. For unguarded colnrnur~icatio~l statements., a process sends n single template to the clearinghouse, (hen waits for a reply. The clearingllouse will send a reply when il finds a match, as discussed below. After gclting its reply, the source process sends the expressions in the output stacement to the destination process, which stores them in the variables in the input statemenl. When a process uses a guarded cornmunicatioi~sratement, jt f rsl needs to evaluate each guard. For each guard that is true, the process constructs a Lemplate arid inserts it into set I;. After evaluating all guards, the process sends t to tbe clearinghouse and then waics for a reply. (If t is empty, the process just continues.) When he process receives a reply, i t mill indicate which ocher process has becn. pajred with this one and the direction in which com~nunjcatio~i is to occur. If the dil-ection is OUT, the process sends a message to the other one; orl~envisei t waits to receive data fro111 the other. The process then selects rlte appropliate guarded statement and executes it. (We assume that direct ion and who are sufficient EO deler~ninewhich guarded communication stalernent was the one matched by the clearinghouse; in general, ports and types will also be required.) Figure 10.8 contains the clewinghouse process cH. Array pending contains one ser: of templates for each regular process. If pending [il is not empty, then regular process i is blocked while waiting for a matching com~nunication statement. When CH receives a new sel t , it first looks a1 one of the templates ro determine which process s sent t. (If [he djrection i n a templarc is OUT, the11 Lhe source is s; if the direction is IN, then the dcstinatlon is 8 . ) The clearinghouse then compares ele~nentsof t with templates in pending to sce if there is a match. Because of the way we have constructcd templates. two match if the
10.2 Synchronous Message Passing
501
type direction = enum(OUT, IN); type template = rec(direction d; int source; int dest; int port); type Templates = set of template; chan match (Templatea t ); chan reply [ l : n l (direction d; int who); chan data[l:n] (byte mag[*] ) ;
UP'PE 'CCEN M $1
BIBLIoTI~;~:A
output statemen1not in a guard:
Templates t = template(OUT, myid, destination, port); send match (t) ; receive reply [myid](direction, who); # direction will be OUT and who will be destination
galher exp~essionsinto a inessage buffer; send data[who] (buffer); input statement not in a guard: Templates t = terr@late(IN, source, myid, p o r t ) ; send match(t); receive reply [myid](direction, who) ; # direction will be IN and who will be myid receive data rmyid] (buffer);
unpack Ihe buffer into local variables; gua-ded input 01'output sraleinent: Templates t =. 0 ; # set of possible conununications for [ boolean expressions i n guards that are true ] inscrt a Lernplale for the input OJ-outpu~statelnent inlo set t ; send rnatch(t); # send matches to clearinghouse receive reply [myid](direction, who) ; use direction and who LO determine which guardecl
communicatioo statement was the one that matched; if (direction == IN) ( receive data [myid](buffer);
i~npackthe buffer into local variables ; else (
galher expressions into a ruessage buffer;
send data[whol (buffer); ) execute appropriate guarded statement S; Figure 10.7
)
# direction == OUT
Protocols for regular processes.
502
Chapter 10
Implementations # global types and channels as declared in Figure 10.7
process ca { Templates t, pending (1:n] = ( [n] 0 ); # # if pending [il ! = 0, then process i is blocked w h i l e (true) ( receive match(t); # get a new set of templates look a1 snmc remplate ill t r-odelerlnine sender s; f o r [ eachtemplatein t ] ( if ( ~ l i e r e i s a ~ n a ~ c h i n g p a j ~ ~ i n pendingii]) some { if ( s is rhe source) ( send reply[sl (OUT, i); send reply [il [IN, s ) ; 1 else ( # s is the destination send reply [s] ( IN, i ) ; send r e p l y [ i l (OUT, s) ; 1
pendingril = 0; break; # get out of the f o r loop
1 1 if
(j10 n~atchiugpair was found) pendinglsl = t;
1 1 Figure 10.8
Centralized clearinghouse process.
directions are opposite, the por~sare idenlicnl, and the source and deslination are identical. I l ca li~ldsa rnalch with some process i, it sends replies lo both s and i;the replies tell each process the identity ol' the other and the direction in which they are lo comlnunica~e. 10 t1)j.s case, CH then clears pending [i] sillce pl'ocess
i is no longer b1ocke.d. I f CH does nor find a ~natchfor any template in t, i t saves t in pending I s ] . where s is [lie se~itlingprocess. An example will help claril'y how these protocols wo&. S u p p o x we have two processes A and B that wan1 Lo excha~igedata by executing the following gual-ded co~nrnuriicat ion stulemenls:
10.2 Synchronous Message Passing
503
process A t i n t dl, a2; if B!al - > B7a2; [ ] B ? a 2 - > B!al; fi 1 process B i n t bl, if A!bl [ J A?b2 Ei 1
When {
A
(
b2; - > A7b2; - > A!bl;
slalts executing i t s i f stalernent. it builds a set with two templates:
(OUT, A, 8 , ~ 2 1 , (IN, B, A, PI) 1
Here we assume that p l is the identity of A'S port and p2 js the identity of B'S port. Process A the11 sends these te~nplatesto the clearinghouse. Process B rakes similar actions and sends the following set of lemplates to the clearinghol-~se: (OUT, B, A , pl), (IN, A, B, p2) 1
When the clearinpliouse gets the second set of templaies, it sees that there are two possible matches. It picks one: sends replies to A and B, and Lhen throws away b o ~ se1.s l ~ of te~nplales.Processes A and B now execute the matching pair of co~n~nunication statements selected by the clearinghouse. Next they proceed to the commnnicncion statements in tlie bodies of tlie selected guarded statements. For these, each process sends one template to the cleal-inghouse, waits for a reply, and then com~nunicatobwith Ihe o h e r process. Tf the ~natch~nakitig done by the clearingliouse process always looks at pending i n some fixed order, some blocked plncesses might never get awakrned. Howcve~,,a si~nplcstrategy w i l l provide fairness-assuming tlie application program is deadloclc free. Lac start be a integer thar inclicates where tcr start searching. When CH receives a new set of templates, i t first examines pending [startJ , tllen pending [ s t a r t + l j, and so on. Once pltlccss start gets the chance lo colnmunicate, CH incrernentb s t a r t to the next process whose pending set is not empty. In ~liisway, start will con~inuallycycle around the processes-assunling process s t art is no1 blocked forever-and thus each process will periodically get the chance to be checked first.
504
Chapter 10 Implementations
10.3 RPC and Rendezvous 'I'his sectio~ishows how to irt~plernentRPC i n a kernel, rendezvous using ahyncl~r-onuusmessage passing, and rnulriple prirniliveh-including rendezvoi~s-i~i a kernel. The UPC kernel illustrates how to handle lwo-way co~nmunicarionin a kernel. The implc~nelltalion of renclezvous using Inessage passing sl~owsthe extra communication rhal is recluired lo support rencle~vous-stylesynchl.onii.alion. The multiple pr-irniti\les ke1.11elsho\r,s how to i~nplemelitall of the various conl~nunicationprimitives in a single, unified way.
10.3.1 RPC in a Kernel Since RTC supports only communication, not synclironizatioll, it has the silnplesl iinplernenladon. Recall that, with RPC, a PI-i~gramconsisrs of a collectiol~of modules (ha1 contaitl procedures and processes. The procedures (operatjons) declared in rhc specification part of a module can be invoked by processes execulitrg in othel n~odules. All parts of a module reside oo the same machine, bur different nt0dule.s can reside on diifereni macl~ines. (LVe are nor concerned here with how a progzimrner specifies where a rnodule is located: one such mcclianism was described i n Section 8.7.) Processes execuling in thc same module intel-ncl by means ol' sharccl variablcs, and (hey synchronize i~singsemaphores. We assume that each rnacliiile hits a local kernel ~ h a limplements processes and semaplloses as descl-ibed i n Chapter 6, and that the kernels conrajn clie rlerwork inlcrface ~.oiit.inesgiven in Figurc 10.3. Our laslc here is to add kernel p~imirivcsand routines Lo implement RPC. There arc three possible ~.elarionsbelweer) a caller and a procedure: They are in the same module, and her~cea n the same machine.
They are in different modules but are on the same machine. They are on different machines. I11 the first case, we can use a conventional procedure call. There is no need to enter a kernel if we know at compile Lime that Lhc procedure is local. The calling process can sirnply push valuc arguments on ils black alicl jump ro rhe procedure: when the procedure retill-ns, the calling process can pop results ofr' the stack and continue. For iote~t.nodulec;llls. we can uniqi~ely idenlify each procedure by a (machine,address) pair, where machine inc1ic;ites where the procedul'e body
I
I
10.3 RPC and Rendezvous
505
is slored and address is the enuy point of rile procedu~.t.We can then irnplc~nerltcall statements as follows: if
(machine is 10c;ll) execule a conventional call to address:
else rgc (machine, address, value arguments l
;
To use a corwcnrio~~al procedure call, the procedure must be guaranteed to exist. This will be (hc case il' procedure idciitities cannol be alleyed and if modules to enter Ille local cannot be dcslroyed dyn;tmical\y O~l~erwise. we would kernel to verify that the procedure exists belore nitk king a convenlional call. To exccule a remote call. the calliiig process needs to send value arguments to the ~crnolcm:ichine, then bloclc uric-il results are rcrurned. When h e 1-elnote. machine receives i\ CALL ~ ~ e s s a g ite .create5 a process to execute the procedure body. Rclore thal process terminates, i t c;~llsa primilive i n the remote ke~.nclto send resr~ltsback to rhc first macl~ine. Figure 10.9 contains kernel primitives for implementing liPC; it also shows the new netwo~k:kcat! inlerrl~pthandler. The 1.ouli11esuse the netwrite procedure o f the djsrributecl kernel fo~'asynchronous message passing (Figure 10.3), which in turn interacts with the associate(l in~crrupthandler. The roI low; ng evcnls occur ill PI-ocessing a remore, c:~lI: The cilller invokes the rpc primitive, which sends Ihe caller's idenfity, proceclure nddress, and value acgumenlh to I-hcremole maclline.
Thc remote kernel's read ir\tcr~vprhandlel- receive< Ihe lnessapc ant1 calls handle-rpc, w h ~ creates h a process to sa'\llce the call. The server process executes the body of the procedurc: then invokes rpcReturn to send resul rs back 10 the caller's Iternel. The read inte~.~.upt Iiandler in the caller's kernel receives fhe return mes.;age m d calls handleReturn. which unblocks the callel:-
In the handle-rpc primitive. we assume thal there js n list o f previously creilted descriptors for processes that will secvicc calls. This speeds up handling a remore call since il avoids thc overhead of dynamic storage allocatio~l n ~ i d clescriptor initialization. \Ve also assume that each sen1erprocess is coded so h a t its first action is 1.0juml-, to the appmpriatc procedure and its 1.ast action is to call kerncl priinj tjve rpcReturn.
506
Chapter 10
Implementations declaralions of networlc buffers, free descriptors, delay list netRead-handler: { # entered w i t h i n t e r r u p t s inhibited save state of executingp; acquire new buffer; prepare nerwork co~lrrollerfor next ].cad; unpack erst field of input message lo determine kind; i f ( k i n d == CALL) handleRPC (caller, a d d x e s s , value arguments) ; else # kind == RETURN handleReturn (caller, results) ; 1 proc rpc(int machine, address; byte args [ * ) ) { netwritecmachine, CALL, (executing,address,args));
insert dcsc~iptorof e x e c u t i n g on delay list; dispatcher();
1 proc handle-rpc (int caller, address; byte args [ * J ) acquire Cree process descriptor; save identity of caller in it; put address in a regisler for Ihe process;
unpack args and push rliern onto the stack of the process; insert process descriptor on ready list;
dispatcher( ) ; 1
proc rpcReturn(byte r e s u l t s [ * ] ) ( rewitwe identity of c a l l e r from descriptor of execut ins; n e t w r i t e (caller's machine, RETURN, (caller, resulls) ) put descriptor of executing back on free descriptor list; dispatcher ( ) ; 1
proc handleReturn(int c a l l e r ; byte results C * ] removc dcscripto~.o f caller fro~ndelay list;
put resulis on caller's slack; insert descriplot' oC caller on ready (kt; dispatcher ( )
;
3 Figure 10.9
Kernel routines for implementing RPC.
)
{
;
(
10.3 RPC and Rendezvous
507
10.3.2 Rendezvous Using Asynchronous Message Passing We now see Iiow to iniplcmenl rendezvous using asynchronous mcssilge passing. Recall that Illere ;\re two partners in a rendezvous: the caller, \vhich invokes an operation using a call slatetllcnt, and thc senlei., which services the operalion using an i n l ~ u lstatement. In Section 7.3 we showed how to simulate a caller (client) and sc~-\,er using asynch~.onousmessage passing (see Figures 7.4 and 7.5). 14e1-ewe exlend that simulation to implement l.endezvous. The key to implemendng I-enclezvous is im.plementing input statemenls. Recall thni an inpur stalemer\t contains one or more guarded operalions. Execution of in delays a process u~iril1ke1.eis an acceptable invocalion-one ser-viced hy the input stalelnent and for which the syncli~.onizalionexpression is hue. Folnow we will ignore scheduling expl-essions. An operalio~lcan bc sel-viced orlly by the process (ha1 declarcs il, so wc car1 store per~dirlginvocations in that process. Thel-e are two basic ways we can store invocalions: hnvc one rlueuc per operation or llavc one queue pel. pl-ocehs. (Thcre is in fact a third clioice that eve will use in the next section For reasons that are explained thcrc.) We will employ one queue per pi-occsfl'incc thal leads Lo a si~nplerimple~~ientillion.Also. many o.F die cxarnples in Chapter 8 elnployed a single input slatelnent per server pl-ocess. 1-low eve^., a server lrlighl use 1no1-ethan one i~lpulslalemenc, ancl thesc night service different operations. Jn this casc, we ~nighlhave to loolc at invocations ilia1 could nol possibly be selecied by a give11input statcn\eiii. Figure 10.10 illti,trates an implemcnlation of I-endezvoususing asynchronous message passing. Each process c that cxecules call slale~ncntshits n reply chanuel (i'om which ir receives results horn calls. Each process s t h a ~executes input sratemenls has an invoke channcl I'vom which i L receivcc ~nvocalions.Each sclwer process also has a local cjucrre, pending, that co~ilainsinvocalioris thal have no1 yet been serviced. A n invocation n m s a g e contains Lhe caller's idenlily, the operation being callecl, and the value a~.gumentsfrom !he citll statement. '10 implerncni an i ~ p u stalement, t server process s first looks through pending invoca~ioos. If il firlds one that is acceptable-the invocation slat.enlent services that open~l.ionand Ihe syllcht~onizationexpression is true-then s removes the oldest such invocation from pending. Otherwise, s receives new invocations until il fincls one that is acceptable, saving Lhose tlnat are not yet acceptable. O w e s has found an acceplable invocn~ion,it executes (he body of the gum-dcd opelntio11,and thcn sends a reply to the caller. Recall that a scheduiing expression afl'ects wb ich i nvocarion is selected if oiol-e than one is accsptable. We can implemenl scheduling expressions by
508
Chapter 10
Implementations
shared channels: cban invoke[l:nl (int c a l l e r , opid; byte values[*] 1 ; chan reply[l:n](byte results[*]);
I
call starement in process c to operation serviced by process s: send invoke [ S ] (C, opid, v d u c argumellts) ; receive r e p l y [c](result variables ) ;
input statement in process S: queue pending;
# pending invocations
ex;imine queue o f pending invocatio~~s; i f (some invocation is acceptable) I.cmove oldest acceptable invocation from pending; else # get another i n v o c a t i o n and check i t while (true) { r e c e i v e invoke[Sl(caller, opid, values); if (Il?.is i~lvocativnis acceptable) break; else inseit (caller, opid, values) in pending; 1 execute the appropriate guarded operation; send reply [caller](result values) i
Figure 10.10
Rendezvous using asynchronous message passing.
extending rhe irnplemenlation in Figure 10.10 as follows. First, a server process needs to know about all pending ir~vocationsi n order to schedule them, These are the invocations in pending and others that might be queued in i t s invoke channel. So, before looking at pending, server S needs to execute while (not empty(invoke[S])) { receive invoke IS] (caller, opid, values) ; inserl (caller, opid, values) in pending; 1
Seconcl. i f [be herver Linds an acceprable irlvocalioe in pending, ir ncecls L o look rhro~cghthe rcsl of pending lo bee il' there is anolher invocation of the same operatio11 thal is also acceptable and that ~ninirnizesthe value or the scheduling expression. IF so, the server removes rhac invocation frorn pending and services
10
10.3 RPC and Rendezvous
509
ir instead of the first one. The loop in Figure 10.10 does not need to change, I~owever.If pending does not contain an acceptable invocation, the first olle that the server receives is lrivialiy the one that minimizes the scheduling expression.
10.3.3 Multiple Primitives in a Kernel We now develop a kernel irnpleniencalion of the multiple primitives nolation described in Section 8.3. Ihis combines aspects of the dis~ributedkernela for nlcssnpc passing and liPC and rlie i.rnple~nentationof re~~clezvous using asynchronous message pass.ing. I[ ;dso illus[ra~esone way lo implement I-endezvous i n a kn.nel. With the ~uulljpleprimi~ivesnoratio~l,operntions can be illvokeci in two ways: b y synchronous calI statelnenls or by asynchronous aend scale~nents. O1)eralions can idso be sel-viced in two ways: by procedures or by input statelrlenls (bul nor by both). Thus the kernel nceds lo know how each opetation is ir\voked and serviced. M7eassume hat a ~cferenceto at1 operation is represerltcd by a record with three fields. The first indicates how the operation is sel-viced. The second identifies [he rrlachine on which the operation is serviced. For an operation serviced by a proc, the third field gives the entry point of the procedure as in the kernel for RPC. For an operation serviced by input statements, the ~hirdfield gives clle address of an operation descriptor (see tke next page). With rendezvous, each operation is sel-viced by the process that declares it. Hence in the imple~nentacionof rendezvous we employed one set oE pending i~lvocationsper server process. With the multiple primitives aotatjon, how eve^; an operation can be serviced by input statements i n more than one process in the module thar declares it. Thus server processes in ltlc same ~uodulepotenlially tlced to share access to pending invocations. We could employ one pending set per module, but then all processes in the niodule would compete for access lo Lhe set even if Lhey do not service the same operation. This would lead co delays waiting to access the set and to unnecessary overhead examining invocations that could not possibly be acceptable. Consequa~rly,we will employ rnult,iple sets of pencling invocations, one per operalion cluss as defined below. An opoaliolz cluss is an equivalence class ol' rhe trarlsitivc clos~u-eol' the relation "se~*viced by the salrre illput ~tatemenl." For example, if ope~.ationsa and b appeal. i n an input staccn~ent.they are in the sanle clahs. 11'a and c appear ill ;I different input statement-which W O L I I ~I ~ \~I C to be in the same modulethen c i s also in the same class as a and b. In tlie worst case, every operation in a nodule is in the salrle class. In tlie best case, each operation is i n its own class (e.g., C i each i s serviced by I-weivestate~nents).
510
Chapter 10
Implementations
A reference to an operation that is serviced by input statements co~ltair)~ a pointel. LO an operalion descriplor. This in (.urncontains a pointer to an operationclass descriptor. Both descriplors are stored on the machine on which the operation i s serviced. The class descript.or conlains the followi~iginformalion: lock - irsecl for ~i~utually exclusive access pe~ldi~ig lisl- pending invocntior~sof operations i n [lie class new lisl - ~nvocatio~~s that arrive while thc class is locked access list - processes wi~ititlgfor the lock w21iting list -processes waiti~~g ['or new invocations lo arrive
Tlie lock is used lo ensure that at most one process at a lime is examining pending invocations. The ocl~erlields are used as described below. The call ;u~dsend slatenlents are irnplernenred as follows. If an ope~.atio~? is se~.vicetlby a pi-ocedul-eon the salnc machine, tbcn a call statcmenr i s turned into il tlil-ecl procedure call. A process can determine this by loolting at the ficlds of' the opelation refererice defined above. Ifan 01x1-ationis on another-machine or i t is serviced by input scatemcnls, then a call slacement ciecuies the invoke pi'imitive on lhe local machine. Indcpendenlly of how an operation is serviced. a send slatemenl executes the invoke primitive. Figure 10.11 contains the code I'or the invoke prirnit.ivc and two Iternel t.outines il uses. The first al'gulnent indicates the kind of invocation. For a CALL invocation, the kernel blocks the executing process untjl the invocation has been .serviced. Then the kernel delermines whc1Iie1-the operalion is serviccd locally or o n a relnorc machilie. If it is senliced re~notelg,the kernel sends an INVOKE message Lo rlie remote lccrnel. The 1ocalInvoke routine gets executed by the ke~-lie1 rhat services the operalion. type howInvoked = enurn(CALL, SEND); t y p e howserviced = enurn(PROC, IN); type opRef = rec(h0wServiced how; int machine, o g i d ) ;
proc invoke(how1nvoked how; o p R e f op; byte values[*]) { if (how = = CALL) insert executing 011call delay list; if (op-machine is local) localInvoke(executing, how, op, values); else ( # machine is remote netWrite(machine, INVOKE, (executing,how,op,values))i dispatcher ( ) ; 1 1
10.3 RPC and Rendezvous
51 1
proc local~nvoke(int caller; howlnvoked inv; opRef op; byte values[*]) { if (0p.how == PROC) [ get free process descriptor; i f (inv = = CALL) save idenlity of caller in the descriptor; else # i n v = = SEND record (hat there is no caller (set caller field to zero); set progl-am counter for the process Lo OP .address ; p ~ ~ svlai l u e s onto process stack; insert descriptor on ready list ; 1 else [ # op.how == IN look up clahs descriptor for the opetalion; if (inv == CALL) agpend(opclass, caller, op.opid, values); else # inv == SEND append(opclass, 0, op.opid, values); 1 dispatcher ( ) ; 1 proc append(int opclass, caller. opid; byte values[*]) if (opclass is locked) { insert (caller, opid, values) into new invocations list; move plocesses (if any) from wait list to access list;
1 else [
# opclass not locked insert (caller, o p i d , values) into pending
if (wait list not empty) { move firsr process to ready list; move other processes to access lisc; set the lock;
1 1 }
Figure 10.11
Invocation primitives.
list;
{
512
Chapter 10
Implementations
"I'lie 1ocalInvoke rouljne checks ro see i'f an operalion is serviced by a proccdure. 11' so. it grabs a I'ree process tlescriplor. and dispaiches a sccver process to execute the i~~ocedure. as happened ir-r the RPC Iccmel. The kernel also records in tllc descriptor whether the operalion was called. This information is used Iirre~.to clelel-mine whether Lhel-i: is a caller rhal ~leed?; to bc awakened when the procedure returns. Lf all operalion is serviced by ru~ilipul statement, localInvoke cxamines class descriplor. I f tlie class is locked-because a process i s executing an input statelnenr ant1 sxa~niningpending invocations-llle kernel savcs Ihe invucation in (he list of new invocatio~lsand moves arly processes waiting for new in\jocutions to the accehh list of the class. If tl-te class is not loc.ked, [he kernel saves the invocation in t h e pending sel and then checks the waiting list. (The access list is eml~cywliencver rhc class i s not locked.) If some process is waiting lilr new invocations, onc waiting process is awakened, the lock is set., and any o11ie1-waiting processcs are n~ovedto the access list. When il process iinishes cxccilling a procedure, i t calls kernel primitive procDone, shocvn in Figiur 10.12. This pi.iiniiive bees the process descripto~. iuncl then awnkcns the caller i f Lhcl-e is one. 'Thc awakencaller I-outine is execu(.ed by the Iternel on wlzich the c;~llel-resides. For inpul slaleluents, a process execures Lhe code sllown in Figure 10.13. Thitt code hen calls thc input statement priniitivea sl~ownin Figi~rc10.14. A pyocess first :~cqui~.es exclusive access to the class descriptor ol' the opcl.ation and
proc procDone (byte results [ * I ) { put executing back on lice descriptor list; look up identity of caller (if any) i n pi-oocess descr.iplor; i f (caller is local) awakenCaller(caller, results); else if (caller 1s remote) netwrite (callel-'s machine, R E T U ~ , (caller, results) ) ; dispatcher();
proc awakenCaller(int caller; byte results [ * ] ) ( remove de~criptorof caller from call delay list; pot results on caller's slack; insert descriptor o n ready list; 1
Figure 10.1 2
Return primitives.
10.3 RPC and Rendezvous
513
startIn(opc1ass); while (true) ( # loop until find an acceptable invocation search pending lisl of opclass 'for acceptable invocation; (I'o~rndone) break; # exit the loop waitNew(opc1ass); if
1 iclnove the invocation from pe~iclingIis~OS opclass; cxecule the appropriate guarded operation ; inDone(opclass, caller, reslllt va.lues);
Figure 10.1 3
Input statement code executed by a process.
then scarches pending invocatio~~s f o i one tliar is acceptable. If no pc~iding invocation is acceptable, the process calls waitNew to delay until there is u new invocnlion. This pri~nitivrmight return irntnediately jf a new invocation a]-rives while the process js searching the pending list, and hence the class descriptor is locked. 011ce a process finds an acceptable invocation, i r executes the ap1~1.opl-iate guarded ope~.ationand then calls inDone. The Icernel awakens the caller, iT there then updates the cl;lss descl-iptor If new invocn~ionsni~ivewllilc the is one, input staternen1 js execuring, lliey are rn~\~etl to the pending sel, 2nd any processes waiting I'or new invocations are moved to Lhe access list. Then, if sotnc process js waiting to access rhe class, one wailing process is awakened; otherwise, the lock is cleared. This implementation of input staLe~r).cnlshandles the ~uostpenera.1 case. IL can be ol~riniizcdfor d ~ following, e fairly Rhecluenlspecial cases: If a11 operalions in a clash are serviced by just one procehs, then the access list i s not needed sincc the server will never have lo wait to acquire the lock. (The lock itself is stdl nee(led, however, so that che append routine can delermine wl~etherto insert a new invocalion in {he new or pending lists.) I I' an operation i s in a class by itself and ih scrviced by receive stalenients or by i n p u t sta.temenrs that d o riot contain sy~ich~~oniratjon oi. scheduling expressions, iheil pending invocations are scrviced in FIFO order, as Lhey alr with Inessage passing. To handle this case, wc c~unadtl a kel-ncl primitive ro delay tlre sel.ver until there is ;I pending invocatioll a n d then relurn il
1
514
Implementations
Chapter 'I0
p r o c startIn(int ogclass) { if (opclass locked) irisert executing on access list for opclass; else
set the lock; dispatcher ( )
;
1 grot
waitNew (int opclass)
{
look up descrjptor of operation class opclass ; i f (new incrocacionslist not emply) move new invocations to pcnding list; else
(
insert executing on wait list of opclass; i f (acccssljstempty) clear the lock; else
inove firsr process from access list to ready list; }
dispatcher ( ) ;
1 proc innone (int opclass, caller; byte results [ * I ) ( if (caller islocal) call awakenCaller(caller, reeultsl; else if (caller is remote) netwrite (caller's machine, R E T U ~ , (caller,results) ) i f (new il~vocarionslist of o ~ c l a s snot empty) t move i~~vocaLions From new to pending list;
move processes (if any) fi-om waiting list Lo access list; J
if (access lisl of opclass nol empty) move first descriptor to ready list; else
clear the lock; dispatcher(); 1
Figure 10.14
Input statement primitives.
;
10.4 Distributed Shared Memory
to chc server. There is search the penditlg lisl.
110
515
neecl for he servcr 1.0 lock the class and thcn
If an operation is effectively a semaphore-il has no parameters oi. reburn it can he value, is invoked by send, and is servicecl by receive-then imple~nenredby a scrnapliore. These opti ~mizationslead to signi licanc perl'ormmce i Inprovemenls.
10.4 Distributed Shared Memory As noted in rile introducrion Lo this chapter, shared-~nernorymachines are usually pi-ogranuned using shared variables, ant! clisrributed-me~mymachines are ~ s u ally prosram~~ied using message passing (or RPC or rentlczvous). However, it is strilighli'orwa~.dro support message passing on shared-cnemory machines, as w e showed in Section 10.1. It is also possible-allhough less slraigl~llbrward-lo SLIP POI-^ shared variables on distributed-tnemory machines. Here we summari7e how Lhis is done: the Hislol-ical Noles and References give poinlers to dctailed descri pi ions. Recall that i n a shared-~nemorymultiprocessor, every pcocessor can access every ulcmory location (see Section 1.2). Because memory accehs rime i s ~utlch larger than the cycle litne of processors. inultiprocessors einploy caches lo inlprovc perfvl-mance. Each processor has one or more Jevcls of cnclle [hat contain copies of h e rnernory locations most recently referenced by tlie code execuked by thar proccssor. A cache is organized as a collection of cncl~elines. wol.ds 01' menlory. The hardware each o f which contains one or' more contig~~ous i mplemenrs inenlory consistency protocols to keep tlie conlents of caches consisIcol with each other and with primary mernoiy. A ~lis~rib~lfed slurred menzory (DSM) is a sof~wal-eimplementation ol' lhese salne concepts on a diat~.ibu~ed-rnemoi-y machine. I n particular, ;I DSM provides ;I vh-tual address space that is accessible 10 evecy processor. This address spacc is ~ y p i c a l l yorganized as a sel of pagcs, which are distributed amollg Lhe local mernories 01' the processors. There might be one copy of each page, or it ~niglirbe I-cplicared,with copies on niore than one processor. When a process references an ndclress on a nonloca.1 page, il has to acq-uirea copy of rhe page. A page L:ons i v t e ~ protocol q is used lo manage rbe movelncnl of pages and their contents. The rationale for providing a DSM is tbal most prograrn~ncrsfind i t easier lo wi-ire concun-ent programs using shared variables rather than incssage passing. This is due in par1 to the fact that shared variables are a familiar concept in
516
Chapter 10
Implementations
secluenlial programming. However, a DSM introduces overhead to service page i>li~lts.Lo send and receive pages, and to wait for a renlole page to arrive. The challenge is to ~nitlimizethc overhead. Below we descrjbe how a DSM is implemented; Lhe jmplemenlation itself is mother example of a distribured prograni. Then we describe some of the common page consistency protocols and how rhey suppolt data access palterns in application programs.
10.4.1 implementation Overview A DSM is a software layer that lies between an application prograni and
an
underlying operating system or special purpose ke~nel. Figure 10.15 shows the overall structure. The address space of each processor (node) consists o.f both shared and private sections. The shared variables in an application prograin are stored in the shared seclion. The code and private data of the processes are stored in the priva~esecrion. Thus, we can think of the shued section as containing the heap for the program, with the private sections containil~gthe code segments and pl.ocess stacks. The private section of each node also contains a copy of the DSM software and he node's operating system or coinmu~~icacion kernel. The DSM ilnplemc~icsand manages the shared section of thc address space. This is a linear array of bytes that is conceptually replicated on each node. The shared section is divided into individually protected ~ ~ n i teach s , of which resides on one or a few of the nodes. Most commonly. [lie u~iilswe fixed-size pages, although they could be variable-size pages or individual data objects. Here we will assume fixed-size pageh. The pagcs are managed i n a way anaIogous to a paged virtual memory on a single processox In particulw a page is eilhex resident (presenr) or not. If present, il is read only or readable and writable.
- ----- - ---- - - - - - - - --Prrvafe Code and Data
Figure 10.15
- - -- -- - ---- -- -- - -Pn'vafe Code and Data -
Application DSM subsystem
Application DSM subsystem
OSlkernel
OSlkernel
Structure of a distributed shared memory system.
10.4 Distributed Shared Memory
517
Each shared variable in an application is ~riapped into an address in the shared section, and hence it has the same address on all nodes. The pages in the shared section are initially dis~ributedamong die nodes i n some I:asliio~i.For novY, we wri1.I assume that [here is exaclly one copy of each page and t h a ~it is both readable and writable 011 he node that has tlie copy. When a process references a slialrd variable on a ~.esidentpage, ic accesses the variable directly. However. when it references a shared variable on a nonresident page. a page fault occurs. The page fault is handled by the DSh4 so[[ware, which detr.rmincs where [he page is located and sends a nlcssage lo lhat node requesting tlie page. Tbc second node marks Lhe page as no longer resident ancl se~ldsit back to the first node. When the first node receives the page, it updates che page's protection, then relurns to tlie application process. As in a vi ri11a.l memory system: the applicalion process reissues the insu-uction that caused the page fault: the reference to the shared varii~blenow succeeds. To illustrate the above points, consider the following s~mpleexample: process P1 x = 1; 1
(
process P2 y = 1; I
(
Two nodes harye been given. Process pl executes on the firs1 node and p 2 on the second. Assume h a t the shared variables are stored on a single page t h a ~ir~ilially resides on the first node. This scenario is shown at the top of Figure 10.16. The processes could execufe in parallel or in either order, but assume tllal PI executes first and assigns Lo x. Because the page contaioing x is culrently located on node 1, the write succeeds and the process terminates. Now ~2 executes. IL: atrernpts to write y, ~ U tile L page containing y is 11oc resident, so the write causes a page fault. The fat~lthandler on node 2 sends a request for the page to node I,which sends the page back to node 2. After receiving the page, ; write to y now succeeds. So in the final state, the node 2 restarts process ~ 2 the page is resident on the second node and both variables have been updated. When a DSM is implemented on Lop of a Unix operating system, protection for shared pages is ser using the mprotect bystem call. The protection for a xesjdent page i s set to READ OY to READ and WRITE; the protection for a nomesider~t page is set to NONE. When a nonresident page is referenced, a segmentation vialation signal (SIGSEGV)is generated. The pageCL~ult haridler i11the DSM caccJ~es
518
Chapler 10
Implementations
Node 1 initial state
x=o, y - 0
1.
write x. no lault
Node 2
2.
write y , page Sault
3.
send rcques~lo Node 1
4.
seild page to Node 2
5.
receive page
6. final scarc
write y, ao fault x = l , y = l
Figure 10.16 Page fault handling in a DSM.
lhis signal. I l setrds a page-request message using U N lX comlnunicatjun prirnitivcs (or cuslornized ones). The arrival of a message a l a node generates an 1 0 signal (SIGIO). The handler for 10 sig~lalsdetermines the type o f message (page recluest 01-reply) and takes r11e appropriate action. Tbe signal handlers need to execute as critical sections bcca~~se a new signal could occur wh.11~ one is being handled-for example, a request lo1 a page could arrive while a page faulc is bej~lghandled. A DSM can be either single-thrcaded or multilhreaded. A single-threaded DSM st~ppo.rcsordy one application process per node. Whcn Lhat process incu~.s page fault, i t delays unlil rhe Uatllt has been resolved. A rnultirl~,rcacledDSM supports more t11a11one applicatio~~ process per- node; so when onc, process causes ;I page liiull. another can be executed while the tirsl is wailing for tlie (auk lo bc resolvctl. 1r i s easier to imple~nenta single-threaded DSM because rllere are kwel- critical sec~ions.However, a n~ultilhreitdedDS'M has a much better cl~ance to mask rhe latency of a re~nolepage reference, and hence is lilcely lo have much betrec performance.
10.4.2 Page Consistency Protocols The performance of ;In application cxecured on top of a DSM is affccted by the el'ficiency o f the DSM itsel.[ This includes whether the DSM i s able Lo mask page access latcncy as well as the eftiicicncy of the signal haxldlers and especially the communication protocols. The performance or an application js also greatly
10.4 Distributed Shared Memory
519
dependent ilpon how pages are managed-namely, upon what pc~geconsi.strwcy prnrocol is employed. Below we describe three of the possibilities: rnigl-aioi-y, write invalidate, and wrile shared. I n the exarnple in Figure 10.16, we itssurned there was exacrly one copy of the page. When another node needs it, the copy ~noves.This is called the tnigwfor,!:profocol. The contents of the page are obviously consislent at all times because there is only one copy. However, what I~appensi f two processes on difierent nodes merely want to read variables oo the page? Then, the page will keep bouncing betwcen them in a process called thrashing. The rur-ifeitn;i~lihreprotocol allows pages to be replicated whet1 they are be]ng cead. Each page has an owner. W lie11a process tries to read a remote page, i t acquires a read-only copy from lhc owner. The owner's copy is also at that rime rnatked as read-only. When a process Lries to write into a page, it gets a copy (if necessary), invalidates the other copies, and Lben writes i r ~ t othe page. In ga~ticular., he page hull hanclJer on the node doing the write ( 1 ) cont.acts Ihe owner, (2) gets back tbe page and ownership or it, (3) sends invalidate messages to nodes that have copies (they set the proteclion on t l ~ e copy i ~ to NONE), (4) sets the protection on its copy to READ and WRITE, and lhen ( 5 ) resumes [he application process. The write invalidate protocol is vety efficient for pages that are read-only (after they are initialized) and for pages tliat are infrequently modified. However, it leads to thrashing when false shari~tgoccurs. Co~lsideragain h e scenario in Figure 10.16. Variables x and y v e not shal-ed by the two processes, but h e y reside on the same page and that page is sharsd. Hence, the page would move between the Iwo nodes in Figure 10.16 i n tl~esame way with the write il~validate protocol as it does with [he migratol.y protocol. False sharing can be eliminated by placing variables such as x and y o n different pages; this can be done sta~icallyat compile time or dynamically at run time. Alternatively, false sharing can he tol.erated by using the write shared pmrocol, a prc)cess that allows mi~ltipleconcurrent writers to a page. For example, when process ~2 executing 011 node 2 in Figure 10.16 writes into y, node 2 gets a copy of the page from node 1 xnd bocli nodes have permission to write into it. Obviously, ihe copies will become inconsistent. At application-specilied syncl~ronizationpoints. such as barriers, the copies are merged into a single copy. If there was false sharing on d ~ page, e the merged copy will be comect and consisLent. Howeve]; if there was true sharing on the page, tbe merged copy will be some aondetern~injstjcco~nbinationol-' the values that were written. To hwdle this, eacb node keeps a lisl oC the changes ir makes Lo a write-shared page; these lists are rhen used to make a merged copy that reflects a possible interleaving of the wri tcs made by differen l processes.
520
Chapter 10
Implementations
Historical Notes Implementations of comniunication primitives have cxisted as long as h e pri~rlitivea Llieniselves. The Historical Notes o f Chaplers 7 rhl-ough 9 mention several papers thal describe lnew primitives; many of those papers also tlescrihe how io implement them. T ~ v ugood general references are l3;~con[I 9981 and Td~nenbaum [ 19921. These hooks describe communica~ion prirriitives and i.mple~nentation s opcrali.ng sysreins. issues i n gcncral and also give cilse ~ t ~ ~ doTi eimporlanr I n rhc dislribukxl kernel in Seclion 10.1, we assumcd tllat netwol'k transmissiori was error Irre and we ignored burl'er manageinent and flow control. Books on co~nput.ei-networks describe how LO handle these issues-for s s a r ~ ~ p l sce c, Ti~ncnbaum I 1988 1 or Peterson and Davie [1996]. The centralized irrlplernentation of synchronous mehsage puss~ng(Figui-cs 10.6 lo 10.8.) was developed by this author. Several people have developed dcccntj~~~lizerl i~np[ementationsthat do not havc a centrriJizcd coo~.dinalor.Silbers c h : ~ l[ ~19791 assumes processes form a ring. Van cle Snepschcut [ I 98 I ] conside1.h hierarchical systc~ns. Rernslein L19XO] presenls an i~nplemcnlntion hat wol-Its for any communicalioo topology. Schneider [ 1 9821 presents a broadcast algnrillirrl that essentially replicnres the pending sels oT our clearinglio~~se process (Figill-e 10.8). Scli~i~idcs's atgorithrn is si~npleand fair, bul it tequii-os a largc number ol' messages since every process has to acknowledge every broadciiht. Buckley and Silberscharz 119831 give a fair, decentralized algorithm that is a $ene~.dizadonof Rernstein's; i l is more efficient thao Scbneider's but is much mol-e cornplex. (All the above algorithms are describecl in Raynal [l9881.) Bagl.cxlja [I9891 describes yet another algorithm tht~tis simpler and rnorc cficient t11a.n Buckley and Silberscharz's. In Seclion 10.3. wc prescrlted itnple~nentalionsol' I-endezvous using asynoC rcr~clezvousi s in chro~iousmessage passing and a kernel. The pei~l'oi~mance gene~.;~l clui~epoor relalivc Lo olhcr synchronization mechanisms becni~sei l is n~orccon~plex. However. i n many cases it is possible to tralls,l:orm progr;~msso [hat rcndczvous i s replaced by less expe~lsivemecha~risn~s such as proccduxes and semaphores. For example, Roberts el al. [I 98 1 j show how to t~.nnsfo~'~n many insrance.~oi' Ada's rendezvous rrrechanisms. McNarnee and Olsson [I9901 present a nlore extensive set of transf'orrnarions and analyze the speed-up that is gained: which is up lo 95 perccilr in sornc caser. -The concepl of a clist~.ibutedshared menlory (-DSM) was invented by lCai Li in a docto~.alclisse~.tatjonat Yale U~iiversity supei-\/ised by Paul I-luclak. Li and Hudak [I9891 describe that work. which introduced the write invalidarc page co~isis~ency prorocol. Li made a landmark contribution hec;iuse al the Lime praclically everyone believed Lhal il was impossible to simulate shared melnory
Reterences
521
using message passing with anything close to reasollable performance. DSMs are now fairly cotnmonplace and are even supported by many sLlpercotnputer ~nanufaclurers. Two 01' the lnore important recent DSM systems are Mullin and TlradMarJts, which were developed at Rice Universily. Carter, Be~inelt, and Zwaenepoel [199!'l describe the implementation and performance of Munin, which irltroduced the wi-jle shared protocol. Amza el al. 119961 describe TreadMarks; Lu er al. 119971 compare the performi~nce01 PVM and T~.eadMatlcs. Tanenbauln [I9951 gives an excellcn~overview of DSkls, including Li's work, Manin, and otbers. For even more recent worlc. see the special issue on DSMs in P7oceedin.g~qf'JEEE (March 1999).
References Amza, C., A. L. Cox, S. Dwarkadas, P. Keleha: H. Lu, R. Rajamony, W. Yu,and W. Zwaenepoel. 1996. TreadMarks: Shared lnelnory co~apu ting on netw o ~ k of s workstations. IEEE Cotny~iter29, 2 (February): 18-28. Bacon. J. 1998. Concurrent Syszenr.~: Oyercrting Sy.rle)n.s, Database and Distributed Systenzs: ,411 Ii~/egrafedAl?prunch, 2tld ed. Reading, MA: Addiso~iWesley. Bagrodia, R. 1989. Synchronization of nsgnchronous processes in CSP. ACM Truns. on Prog. L.un8uage.c and Sy~lcrn~ 1 I. 4 (October): 585-97.
Elernshein, A. J. 1980. Output gnards and non-determinism in CSP. ACM Tnzns. on Prog. Languuge.,esund Sj,stenzs 2. 2 (Apri I): 234-3 8. BuckIey. G. N.,and A. Silberschatz. 1983. An ecfective implementation Tor the generalized input-output co~lstructof CSP. ACM fians. on Prog. Lanpuccges and Sq:.rfems 5 , 2 (April): 223-35. Carter, J. B., 1. K. Bennett. and W. Zwaenepoel. 1991. Tmplementatio~~ and performance of Munin. Proc. 13th ACM Sy91posiu1non 0per~ttin.gSystern.r P~.in.cil,les.October: pp. 152-64. t i , K,, and P. Hudsk. 1989. Memoly coherence in shared virtual menial-y cysZ SC. o r r l ~ ) ~. tS~~~S (~' ~~M.7.4 I S . (November): 32 1-59. terns. ACkI T ~ L L Ion
Lu, H., S. Dwarkadas, A. L. Cox, and W. Zwaenepoel. 1997. Quantiljing the performance differences between PVM and TreadMal-ks. .locli-nu/ of' PUI: and Distl: Corrryutrrtion 43, 2 (.Tune): 65-78.
522
Chapter 10
Implementations
McNamee, C. M., ancl R. A. Olsson. 1990. Trar)sfos~nacionsfor oplimizing inlerprgcesg com.mun.ica~ionand synchronization ~nechanisms. I n / . Journctl rd' t'~11-czlle1 P~~gr-a~nnzing 1 9, 5 (October); 357-87.
Pcte~.son,L. L.:ancl 0. S. Davie. 1996. Co/npu~~?i. Networks: A System.s Appwnch. Sari Francisco: Morgan Kauf~nann. Ray nal, M. 1 988. Dis~ribuledAlgorithms u~tclPmtoc.ol.s. New Yolk: LVi leg. Roberts, E. S., A. Evans, C. R. Morgan, and E. M. Clarke. 198 1 . Task manageSoftwc%r~ment ill Ada-a critical evaluation for real-time m~~ltiprocessors. Pr~rcticc?~l.l.id Ei-tperience I 1 : 10 19-5 1 .
Schncider, F. B. 1982. Synctuonization in distribulecl progmrns. ACM Trans. or1
P m g . 1,arrg~lagesrrnd Syrrems: 2 (April): 125-48. Silbesschai~,A. 1979. Comniunication and synchronization in distributed programs. lEEE E-cms. o n S ~ f l w a i61tgl: r SE-5, 6 (November): 54.246.
Tanenbau~.n,A . S. 1988. Con.r,pute~-~Verworks.2nd ed. Englewood Clil'fs, N.1: Prenlice- Hall. Tanenb;~um.A. S. 1992. P~-e~~tice-T-lall.
1\!0d~171
Olxrcrting Sy.rte~ns. Englewood Cliffs, NJ:
f i n e n ba~uil,A. S. 1 995. L)istr-il~ifed 0pel.utirzg Syste~ns. Englewood Cliffs;NJ: Prentice-Hall.
van de Snepschcut: J. L. A. 1981. Synclironous comnlunication between asynchrotlous components. Ir!./ Proc. Lettei-s 13, 3 (Dccernba-): 127-30.
Exercises 10.1 Consider the distsjbuced kernel in Figures 10.3 and 10.4.
(a) Extend (he i~nplemenlarionto allow a channel Lo have nlultiple receivers. In particular, change the receivechan and emptyChan prinlili~~s SO that a process on one machine can access a cliallnel stored on another machine. (b) Modii'y the kernel so sendChan is what is called semtsynchroi~ows.In parriculu, when a process invokes eendchan, it should delay until the message has been ql~euedon the channel (or given to a receiver)-even if the channel is slored on anolher machine.
(c) Add termination detecrio~icode to the kernel. Ignore pending 1/0;hence a computation has terminated wl~enall ready lists are empty and Lhe network is idle.
Exercises
523
10.2 The implenientation of synchronous message passing in Figure 10.5 assulnes [hat both the source and destination processes name each other. It is more coininon for a deslination process lo be abie either to specify tlth source or to accept ;Ime$sage fi.om any source. Assuine processes are numbered from 1 to n and thac o is used by a receiving process to specify that il wants a message from ally source; in the latter case, the input statement sets source to the. identity of the process ckat sen1 the outpul message. Modify the communication protocols in Figu~x10.5 lo handle this situation. As the figwe, yo11 are lo use asynchronous message passing as a foundation for i~nplemenliiigthe above form of synchronous message passing.
it1
10.3 Develop a kernel in~ple~nentation of h e synchro~~ous message passing prinlicives synch-send and synch-receive detined at the start of Section 10.2. First develop a single-processor kernel. Then develop a distributed kernel having the structure shown jn Figure 10.2. You tnay borrow any routines you might need from Figirres 10.3 and 10.4.
10.4 Assume that you have been given the processes P [ 1 :n] , each initially [laving one value a [il of an array of n values. The following pogram uses synchronous m.essage passing; it is written in the CSP notation described in Section 7.6. 111 the program, each process sends its value to all the others. When the program terminates, every process has the entire m a y of vdues. process P [i = 1 to nl ( int a [ l : n l ; # a [i] assumed to be initialized boo1 sent [ l : n ] = ( [nl false); int recvd = 0 ; do [j = 1 to n] (i ! = j and not s e n t [ j J ) ; P [ j ] ! a [ i ] - > sent[j] = true; [J
[j = 1 to n] (i ! = j and recvd < n - 1 ) ; P [ j J ? a [ j ] - > recvd = recvd+l;
od
I
TIx quantifiers in the urns of the do stalenle1-11itldicate that thece are n copies aF each a m , one for each value of quanlilier variable j.
(a) Provide a ti-acc of one possible sequence uf messages that \would be sent i f chis program is implelnented using the centralized clearinghouse process o.1 Figure 10.8. Assume n js three. how he messages chat would be sent by [he processes and by the clearinghouse, and show Llie contents a1 each stage of [he clearingl~ouse'spending set of templates.
524
Chapter 10
Implementations
(b) What is the rotc~lnumbe~01' lnessagcs sent L o und From the clenl-inghonsei r i your answer Lo (a)'! What i s thc ~otalnuniber 8,s a fu~lc~ion of n (or largec values ol' n?
10.5 The algoritli~n in Figure 9.13 gives a fair. decentralized implementation oC senlaphores. II uses broadcast, timestarn[>s, ant1 tolally ordered message queues. (:I) Using 1he
same ki~idof algorilhm, develop n fair. dcce~lwalizedirnplcmenva01synchrono~~s messilge passing. Assume hotli input and ourpul stacerncllts can appear in guards. (Hilt/: Genc~.alizrthe cetitraJized implementation in SecLion 1 0.2 by rep1icating the cleai-inghouse'spend.ing set of (ernplates.) hioll
(b) Jllusn'ale Lhe execillion of your algorilhm hl-the program conlaining processes A a11c1 B near the erid of Seclio~i1 0.2.
10.6 The kcrliel in Figill-es 10. I 1 to 10.14 implemcnls (he inultiple prin~iLive.snotation defined in Scclion 8.3. (a) Si~(>)>ose a language hiis just the rendezvous mechanisms defined in Section 8.2. In particular. opel.aLions arc invoked only by c a l l and sel.viced only by i n
slarelnents. Also, each operation i s aerviccd only by he one process thal cleclk~res it. Simplify the kel-nel as much as possible so llial il implements just Iliih ser of ~neclia~~isms. Tn Ada. [lie cq~iivnlentof rhe i n state~nentis I'ur(1ier restricted 3s I'ollows: Synchl~o~iizarior~ exp~.essinnhcilllnol refe.teuce fo'ormal piiranlelers: slid there are no scl~cduling exp~.essions. Modify y o ~ ~answer r to (a) to implement this reslriclcd form ol' i n btitteme~li.
(13)
1 0.7
Consider [he Linda pr.ilnitives defined i n Seclion 7.7.
(a) Develop an implemenl-scion of thc primitives by modifyi 11g the centralized clearinghouse process in Figure 10.8. Also show the actiolls tha~~.egularprocesses would take for each of [he Linda pri~nitives(as in Figure 10.7). (b) Develop a clistributed kernel implementaiion of the L,jnda primitives. The level of detail should be cornparable to rl~alin Figures 10.3 and 10.4.
10.8 Figure 10.16 illustrates the actions of a DSM that uses the migratoly page consistency protocol. In the example. there are two processes. The first executes on Node 1 and writes variable x: the second process executes on Node 2 and write,s vruiable Y. Both variables are stored on Lhe same page, and rhat page is initially rzlol-ed on Node 1.
(a) Develop a [.race of the actions of a DSM that uses the col. Assume that the process rhal writes x executes first.
write invalidate proto-
Exercises
(b) Repeat (a) assuming that the process that writes
y
525
executes first.
7
(c) Develop a trace of Lhe actions of a DSM that uses the write shared prolocol, Assume that the process that writes x executes first.
(d) Repeat (c) assuming that the process that writes y executes lirst. ( e ) Repeat (c) a s s ~ l ~ n i nlhzlt g the processes execute concurrently.
(f) When a DSTvl uses the wnre shared protocol, n page bccomcs inconsislent if two or more processes write into il. Assume that pages are ~nadeconsisle~~t at barriel- synchronizalion points. Extend your answer co part (e) to includc a barrier after the two writes, then extend your 11.;lce with actions the nodes should take to ~nalteconsis~entthe two copies of the page.