Advances in Computers, Volume 45

Advances in COMPUTERS VOLUME 45 This Page Intentionally Left Blank Advances in COMPUTERS Emphasizing Parallel Pro...

Author: Marvin Zelkowitz

27 downloads 1268 Views 18MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Advances

in COMPUTERS VOLUME 45

This Page Intentionally Left Blank

Advances in

COMPUTERS Emphasizing Parallel Programming Techniques EDITED BY

MARVIN V. ZELKOWITZ Department of Computer Science and Institute for Advanced Computer Studies University of Maryland College Park, Maryland

VOLUME 45

ACADEMIC PRESS San Diego London Boston New York Sydney Tokyo Toronto

This book is printed on acid-free paper. Copyright 0 1997 by ACADEMIC PRESS All Rights Reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Academic Press 525 B Street, Suite 1900,San Diego, California 92101-4495,USA http:llw ww.apnet.com Academic Press Limited 24-28 Oval Road, London N W 1 7 D X ,UK http:llwww.hbuk.co.uWapl ISBN 0-12-012145-X A catalogue for this book is available from the British Library

Typeset by Mathematical Composition Setters Ltd, Salisbury, UK Printed in Great Britain by Hartnolls Ltd, Bodmin, Cornwall

97 98 99 00 01 02 EB 9 8 7 6 5 4 3 2 1

Contents CONTRIBUTORS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . PREFACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ix xv

Control in Multi-threaded Information Systems Pablo A . Straub and Carlos A . Hurtado

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Process Model Control Specification . . . . . . . . . . . . . . . . PetriNets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Simple Control Property . . . . . . . . . . . . . . . . . . . . 5 . A Theory of Threads of Control . . . . . . . . . . . . . . . . . . 6 . Applications of Thread Theory . . . . . . . . . . . . . . . . . . . 7 . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix: Proofs of Theorems . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1. 2. 3. 4.

2 7 13 17 25 33 46 47 50

Parallelization of DOALL and DOACROSS Loops . a Survey

.

A . R. Hurson. Joford T . Lim. Krishna M Kavi and Ben Lee

1. 2. 3. 4. 5. 6. 7.

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Loop-scheduling Algorithms for DOALL Loops . . . . . . . . . . Comparative Analysis of DOALL Loop-scheduling Schemes . . .

DOALLLoopSchedulingonNUMAMultiprocessors. . . . . . . Comparison of Affinity-scheduling Schemes . . . . . . . . . . . .

DOACROSS Loop Scheduling . . . . . . . . . . . . . . . . . . . Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

54 55 59 63 67 71 90 101

Programming Irregular Applications: Runtime Support. Compilation and Tools Joel Saltz. Gagan Agrawal. Chialin Chang. Raja Das. Guy Edjlali. Paul Havlak. Yuan-Shin Hwang. Bongki Moon. Ravi Ponnusamy Shamik Sharrna Alan Sussman and Mustafa Uysal

.

1. 2. 3. 4.

.

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Compilation Methods . . . . . . . . . . . . . . . . . . . . . . . . Runtime Support for Pointer-based Codes: CHAOS+ + . . . . . . V

106 108 117 124

vi

CONTENTS

5. Interoperability Issues: Meta-Chaos

................

6 . Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

135 143 148 149

Optimization Via Evolutionary Processes Srilata Rarnan and L. M. Patnaik

1. 2. 3. 4.

5. 6. 7. 8. 9.

Lntroduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Evolutionary Strategies (ESs) and Evolutionary Programming (EP) Genetic Algorithms (GAS) . . . . . . . . . . . . . . . . . . . . . Extensions to Genetic Algorithms . . . . . . . . . . . . . . . . . Other Popular Search Techniques . . . . . . . . . . . . . . . . . . Some Optimization Problems . . . . . . . . . . . . . . . . . . . . Comparison of Search Algorithms . . . . . . . . . . . . . . . . . Techniques to Speed up the Genetic Algorithm . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

156 160 162 168 177 184 192 193 193 194

Software Reliability and Readiness Assessment Based on the Non-homogeneous Poisson Process Arnrit L. Goel and Kune-Zang Yang

1. Introduction and Background . . . . . . . . . . . . . . . . . . . 2 . Software Reliability and Readiness Assessment . . . . . . . . . . 3. NHPP and its Properties . . . . . . . . . . . . . . . . . . . . . . 4. Trend Testing for Software Failure Data . . . . . . . . . . . . . . 5 . Parameter Estimation for NHPP Models Using Laplace Trend Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. Software Reliability Evaluation . . . . . . . . . . . . . . . . . . 7 . Readiness Assessment . . . . . . . . . . . . . . . . . . . . . . . 8. Readiness Analysis of a Commercial System to . . . . . . . . . . 9. Readiness Analysis for an Air Force System . . . . . . . . . . . 10. Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

198 202 214 220

225 234 241 244 254 263 264

Computer-Supported Cooperative Work and Groupware Jonathan Grudin and Steven

E. Poltrock

1. TheCSCWForum . . . . . . . . . . . . . . . . . . . . . . . . . 2. Research and Development Contexts . . . . . . . . . . . . . . . 3. From Small-Group Applications to Organizational Systems . . . 4. CSCW in North America, Europe and Asia . . . . . . . . . . . . 5. Groupware Typologies . . . . . . . . . . . . . . . . . . . . . . .

270 272 276 278 282

CONTENTS

vi i

6. Communication Technologies . . . . . . . . . . . . . . . . . . . 285 7 . Shared-information-space Technologies . . . . . . . . . . . . . . 291 304 8 . Coordination Technologies . . . . . . . . . . . . . . . . . . . . 9 . Challenges to Groupware Development and Use . . . . . . . . . 309 10. New Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 311 11. Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . 313 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 Technology and Schools Glen L . Bull

1. 2. 3. 4.

Technology and Schools . . . . . . . . . . . . . . . . . . . . . . 322 Trends in Educational Computing . . . . . . . . . . . . . . . . . 323 Diffusion of Innovation . . . . . . . . . . . . . . . . . . . . . . . 335 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354

AUTHORINDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

357

SUBJECTINDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

365

CONTENTS OF VOLUMES IN THISSERIES

. . . . . . . . . . . . . . . . 377

This Page Intentionally Left Blank

Contributors Gagan Agrawal is an assistant professor of Computer and Information Sciences at the University of Delaware. He received a Bachelor degree in Computer Science and Engineering from the Indian Institute of Technology, Kanpur, in 1991, and M.S. and Ph.D. degrees from the University of Maryland, College Park, in 1994 and 1996, respectively. His research interests are in compiler optimization techniques, distributed memory compilation, parallel and distributed systems and runtime support design. Glen Bull is an associate professor of Instructional Technology in the Curry School of Education at the University of Virginia. He served as director of Teacher-LINK, a regional K-12 Internet system, and designed (with Tim Sigmon) one of the nation’s first statewide telecomputing networks, Virginia’s Public Education Network (PEN). He is a founding member and past president of the Virginia Society for Technology in Education, and is currently president of the Society for Information Technology and Teacher Education. Chialin Chang received his B.S. in Computer Science from National Taiwan University, Taiwan, in 1987 and his M.S. in Computer Science from UCLA in 1991. He is pursuing his Ph.D. in Computer Science at the University of Maryland, College Park, where he is currently a research graduate assistant in the High Performance Systems Software Laboratory. His research interests include parallel and distributed computing, compiler and operating system runtime supports, and high-performance databases. Raja Das is an assistant professor in the College of Computing at Georgia Institute of Technology, Atlanta, GA. He has worked as a staff scientist at the Institute of Computer Applications in Science and Engineering, NASA LaRC, and as a Research Associate in the Department of Computer Science at the University of Maryland, College Park. His research interests are in the areas of compilers for high performance workstations and parallel architectures, interactive environments and out of core compilation techniques. Dr. Das received a B.S. in Mechanical Engineering from Jadavpur University in 1984, an M.S. in Mechanical Engineering from Clemson University in 1987, and a Ph.D. in Computer Science from the College of William and Mary in May 1994.

IX

X

CONTRIBUTORS

Guy Edjlali is a post-doctoral Research Associate with the Department of Computer Science and the Institute for Advanced Computer Studies at the University of Maryland, College Park. He is working on development of runtime support for heterogeneous environments. Dr Edjlali received a B.S. in Computer Science from the Burgundy University in 1989, an M.S. in Computer Science from the Pierre and Marie Curie (Paris 6) University in 1991, and a Ph.D. in Computer Science from the Pierre and Marie Curie (Paris 6) University, Paris, France in 1994. Amrit Goel received the B.S. degree from Agra University B.Eng. from the University of Roorkee and the M.S. and Ph.D. from the University of Wisconsin, Madison. His Ph.D. was in Engineering with a minor in Statistics. He is a professor of Electrical and Computer Engineering and a member of the Computer and Information Science Faculty at Syracuse University. He also taught at the University of Wisconsin, Madison and was a visiting professor at the University of Maryland, College Park, and the Technical University of Vienna. He has served on the editorial board of Computer. In 1979 and 1980 he received the P.K. McElroy Award from IEEE/RAMS. His current interests are in software reliability and testing, fault-tolerant software engineering, machine learning algorithms, and software metrics. He was a distinguished visitor of the JEEE Computer Society and was elected a Fellow of IEEE for his contributions to the reliability of computer software.

Jonathan Grudin is an associate professor of Information and Computer Science at the University of California, Irvine. He works in the Computers, Organizations, Policy and Society (CORPS) group. He earned degrees in Mathematics at Reed College and Purdue University, followed by a Ph.D. in Cognitive Psychology from the University of California, San Diego. He has been active in the ACM Computer and Human Interaction (SIGCHI) and Computer-Supported Cooperative Work (CSCW) organizations from their inceptions. He is currently interested in identifying factors that contribute to successful use of groupware applications, and in the indirect effects of these technologies on individuals, organizations, and societies. Paul Havlak is a Lecturer in the Department of Computer Science at Rice University in Houston, Texas. As a Research Associate at the University of Maryland from 1993 through April 1996, he collaborated on compiler research at the High Performance Systems Software Laboratory. His research interests include improved compiler understanding and restructuring of programs through language extensions and symbolic and interprocedural analysis. Dr Havlak completed his Ph.D. degree in Computer Science at Rice University, Houston, Texas, in 1994. He designed global

CONTRIBUTORS

xi

and interprocedural methods for symbolic analysis and dependence testing in the PFC vectorizer and the Parascope programming environment, predecessors to the D System. A. R. Hurson is a professor of Computer Science and Engineering at the Pennsylvania State University. His research for the past 14 years has been directed toward the design and analysis of general as well as special purpose computer architectures. He has published over 1.50 technical papers in areas including database systems and database machines, multidatabases, object oriented databases, computer architecture, parallel processing, dataflow architectures, and VLSI algorithms. Dr. Hurson served as the Guest CoEditor of special issues of the IEEE Proceedings on Supercomputing Technology, the Journal of Parallel and Distributed Computing on Load Balancing and Scheduling, and the Journal of Integrated Computer-aided Engineering on Multidatabase and Interoperable Systems. He is also the cofounder of the IEEE Symposium on Parallel and Distributed Processing. Professor Hurson has been active in various IEEElACM conferences. He served as a member of the IEEE Computer Society Press Editorial Board and an IEEE distinguished speaker. Currently, he is serving on the IEEE/ ACM Computer Sciences Accreditation Board.

Carlos A. Hurtado received a degree in Industrial Engineering and a Master in Engineering Science (Computer Science) from the Catholic University of Chile in 1995. He is currently at the Department of Computer Science at the Catholic University of Chile, where he has taught courses on computer languages and formal methods. His research interests include formal methods for the modeling and analysis of information systems, coordination issues, and collaborative work. Yuan-Shin Hwang received the B.S. and M.S. in electrical engineering from the National Tsing Hua University, Hsinchu, Taiwan, in 1987 and 1989, respectively. He is a Ph.D. candidate in computer science at the University of Maryland, College Park, where he is currently a research assistant in the High Performance Systems Software Laboratory. His research interests include parallel and distributed computing, parallel architectures and compilers, and runtime support for sparse and unstructured scientific computations targeted to massively parallel supercomputers. Krishna M. Kavi is a professor of Computer Science and Engineering at the University of Texas at Arlington. For two years he was a Program Manager at the National Science Foundation, managing operating systems, and programming languages and compilers programs in the CCR division. He was an IEEE Computer Society Distinguished Visitor and is currently on the editorial board of the IEEE Transactions on Computers. His research

xi i

CONTRIBUTORS

interests span computer systems architecture (dataflow systems, cache memories, multithreading, microkernels), formal specification of concurrent processing systems, performance modeling and evaluation, load balancing and scheduling of parallel programs. He has published over 100 technical papers on these topics. He received his B.E. (Electrical) from the Indian Institute of Science, and M.S. and Ph.D. (Computer Science) from the Southern Methodist University.

Ben Lee received the B.E. in Electrical Engineering in 1984 from the Department of Electrical Engineering at the State University of New York at Stony Brook, and the Ph.D. in Computer Engineering in 1991 from the Department of Electrical and Computer Engineering, the Pennsylvania State University. Since joining the ECE Department at Oregon State University in 1991, he has taught a number of courses in computer engineering. In 1994 he was awarded the Loyd Carter Award for Outstanding and Inspirational Teaching from the College of Engineering at Oregon State University. His research for the past 10 years has been directed towards the design and analysis of parallel architectures, including numerous technical papers in parallel processing, computer architecture, program partitioning and scheduling, and multithreaded systems. Joford T. Lim is a Ph.D. candidate in the Computer Science and Engineering Department at the Pennsylvania State University. His research interests are in the area of loop scheduling, program allocation, parallel processing, and computer architecture. He has published several papers on DOACROSS loop scheduling. Joford Lim received his B.S. in Electronics and Communications Engineering from De La Salle University, Manila, in 1985, and his M.S. in Computer Engineering from the Pennsylvania State University in 1993.

Bongki Moon received his B.S. and M.S. degrees in Computer Engineering from Seoul National University, Korea, in 1983 and 1985, respectively. He is pursuing his Ph.D. degree in Computer Science at the University of Maryland, College Park, where he is currently a research graduate assistant in the High Performance Systems Software Laboratory. From 1985 to 1990, he worked for Samsung Electronics Corp. in Korea in the Communication Systems division. His current research interests include high performance spatial databases, data mining, and parallel and distributed processing. L.M. Patnaik obtained the Ph.D. in 1978 in the area of real-time systems, and D.Sc. in 1989 in the areas of computer systems and architectures, both from the Indian Institute of Science, Bangalore, India. Currently, he is a professor with the Electrical Sciences Division of the same Institute and directs the Microprocessor Applications Laboratory. His research interests are in the areas of computer architecture, parallel and distributed computing,

CONTRIBUTORS

...

Xlll

real-time systems, neural networks, genetic algorithms, CAD for VLSI circuits, and mobile computing. In these areas he has published over 300 papers in refereed international journals and conference proceedings and has co-edited seven books. He is a Fellow of the IEEE and serves on the editorial boards of ten international journals. He has served as the Program/ General Chair and Member, Steering Committee, of many IEEE sponsored international conferences.

Steven E. Poltrock is a Senior Principal Scientist in the Research and Technology organization of Boeing Information and Support Services. He manages the Workgroup Integration Technology program, leading projects that introduce and evaluate groupware and workflow technologies. He earned degrees in Engineering from the California Institute of Technology, Mathematics from UCLA, and Cognitive Psychology from the University of Washington. He has conducted research in perception, cognition, mathematical psychology, and human-computer interaction. He has researched and written about collaborative user interface design and development practices and about deployment of groupware systems. Ravi Ponnusamy received the Ph.D. in Computer Science from Syracuse University in 1994, and his B.E. in Computer Science and Engineering from Anna University, Madras, India, in 1987. His research interests include parallel I/O, parallelizing compilers, supercomputer applications and performance evaluation. He has been designing and developing toolkits and techniques for High Performance Fortran compilers to produce efficient parallel code for large-scale scientific applications. Srilata Raman is a Senior Staff Engineer in the Unified Design Systems Laboratory of Motorola in Austin, Texas. She holds a Ph.D. degree in Electrical Engineerihg from the University of Illinois at Urbana-Champaign. Her research interests include optimization algorithms, computer-aided design of VLSI circuits, and parallel algorithms for VLSI CAD. She serves in the Technical Program Committee of IEEE conferences and the editorial board of an international journal. She is a member of the IEEE, IEEE Computer Society and ACM. Joel H. Saltz is an Associate Professor with the Department of Computer Science and the Institute for Advanced Computer Studies (UMIACS), and the Director of the High Performance Systems Software Laboratory at the University of Maryland at College Park. He leads a research group whose goal is to develop tools and compilers to produce efficient multiprocessor code for irregular scientific problems, i.e., problems that are unstructured, sparse, adaptive or block structured. He collaborates with a wide variety of applications researchers from areas such as computational fluid dynamics,

xiv

CONTRIBUTORS

computational chemistry, computational biology, environmental sciences, structural mechanics, and electrical power grid calculations.

Shamik D. Sharma received his B.Tech. in Computer Science at the Indian Institute of Technology, Kharagpur, in 1991. He is pursuing his Ph.D. in Computer Science at University of Maryland, College Park, where he is currently a research graduate assistant in the High Performance Systems Software Laboratory. His research interests include compiler and runtime support for parallel scientific applications, operating system support for parallel architectures and distributed computing over the Internet. Pablo A. Straub received a degree in Industrial Engineering from the Catholic University of Chile in 1985. He received his Ph.D. from the University of Maryland at College Park in 1992. He is currently an Assistant Professor of Computer Science at the Catholic University of Chile, where he has been since 1985. His research interests include software engineering, formal methods, information systems, and business process models. Alan Sussman is a Research Associate with the Department of Computer Science and the Institute for Advanced Computer Studies at the University of Maryland, College Park. He is working on developing effective compiler and runtime techniques for parallelizing various types of irregular and adaptive applications, and on various methods for supporting distributed applications with large 1 / 0 and communication requirements. He received a B.S.E. in Electrical Engineering and Computer Science from Princeton University in 1982 and a Ph.D. in Computer Science from Camegie Mellon University in 1991. Mustafa Uysal is a Ph.D. candidate in the Computer Science Department of the University of Maryland, College Park. His research interests include parallel and distributed computing, high performance and scalable 1 / 0 architectures and systems for workstation clusters, operating systems. He received a B.S. in Computer Science from Bilkent University, Turkey, in 1992 and an M.S. in Computer Science from the University of Maryland in 1995. Kune-Zang Yang received the B.S. in Electrical Engineering from TsingHua University, Taiwan, in 1982 and the M.S. and Ph.D. in Computer Engineering from Syracuse University, NY, in 1991 and 1996, respectively. He was a software system engineer at Chung Shan Institute of Science and Technology, Taiwan, during 1984 to 1988. His current research interests are software reliability and metrics, artificial neural networks and pattern recognition.

Preface Advances in Computers, first published in 1960, is the longest running anthology in the computer industry. The goal is to highlight those computerrelated technologies that have the most impact on our lives today. Topics range from software, to the underlying hardware, to how computers affect the social fabric of our society today. This volume in the series is no exception. We present a variety of topics that are affecting the information technology industry today and will continue to have an impact in the years to come. The first three chapters all look at the problem of multiple computer systems. They discuss how to divide a program across several machines in order to allow this parallelism to speed up overall program execution, by simultaneously executing different parts of the program on different processors. In the first chapter, Pablo A. Strdub and Carlos Hurtado discuss “Control in Multi-Threaded Information Systems.” For simple program designs, such as on a personal computer, a computer executes from the first statement of a program until the last statement, and then the program terminates. At any time, only one statement is executing. However, with more powerful machines, in order to solve complex problems more quickly, several processors may be executing at the same time, each processing a different part of the program. Understanding which sections are in control at any time and coordinating the execution behavior across multiple machines is a major design problem in large-scale applications. Straub and Hurtado discuss these control issues and present their theory on parallel control flow. The second chapter, “Parallelization of DOALL and DOACROSS Loops - A Survey” by A. R. Hurson, Joford T. Lim, Krishna M. Kavi and Ben Lee, continues the discussion of parallel program execution that began in the preceding chapter. Most program execution time is spent in executing loops, so mechanisms to allow for multiple processors to simultaneously execute different paths through a loop at the same time would greatly speed up program execution. They discuss the concept of static and dynamic processor allocation via concepts they call the DOALL and DOACROSS loops. In chapter 3, “Programming Irregular Applications: Runtime Support, Compilation, and Tools” by Professor Joel Saltz and his colleagues at the University of Maryland, concepts similar to the preceding chapters are explored. Given data stored as large irregular arrays, what dynamic

xv

xv i

PREFACE

techniques can be developed to allow for efficient processing of this data across networks of such machines? For data that cannot be represented as regular arrays, dynamic programming techniques are more efficient than statically developed optimization algorithms. Using the CHAOS system as a model, they describe algorithms for processing such arrays. In chapter 4, Srilata Raman and L. M. Patnaik discuss genetic algorithms in “Optimization Via Evolutionary Processes.” Their basic problem is optimizing a function for a solution. Typically, a “cost function” is developed, and then a solution with the best cost functional value is chosen. A genetic algorithm is a stochastic search algorithm based upon the principles of biological evolution. They discuss genetic algorithms and then survey several variations of such algorithms and discuss their search properties. In chapter 5 , Amrit Goel and Kune-Zang Yang present “Software Reliability and Readiness Assessment Based on the Non-homogeneous Poisson process.” A major problem in software design is to understand the reliability of the system that is produced. Simply stated, how long will the system execute before it fails due to an error in the software? Various techniques, developed from hardware reliability theory, have been applied to this software problem. Goel and Yang survey many of the common reliability models and then discuss their own extensions to these models using non-homogeneous Poisson processes. Increasingly, with the spread of the Internet, worldwide networks, and intranets within a corporation, software development for a given product may be spread over a large geographical area. Coordinating the activities of this group of individuals to produce a single well-designed product has become known as Computer-Supported Cooperative Work (CSCW). In “Computer-Supported Cooperative Work and Groupware”, Jonathan Grudin and Steven E. Poltrock discuss these concepts and provide a broad overview of the current trends in CSCW developments. In the final chapter, Glen Bull discusses “Technology and Schools.” There is a constant stream of writing decrying the lack of computers in the primary and secondary grades, from kindergarten through twelfth grade (the socalled K-12 grades). However, what would a school do with a computer if it had one or more? In this chapter, Bull discusses many of the options available to the K-12 computer specialist and discusses many of the problems faced in trying to use such technology effectively. Having the hardware is not the main problem; understanding how computers can aid in education is the major effort. I would like to thank the authors for contributing their time and expertise in writing their chapters. This book has taken about 18 months to complete, from the time the authors wrote their manuscripts, had them reviewed, then revised, and then had this book produced. I have enjoyed working with them

PREFACE

xvii

in creating this volume. If you have any suggestions for future topics to discuss, you can reach me at [email protected]. I hope you find this volume of interest.

MARVINV. ZELKOWITZ

This Page Intentionally Left Blank

Control in Multi-threaded Information Systems* PABLO A. STRAUB AND CARLOS A. HURTADO Depto. de Ciencia de la Cornputacion Universidad Catolica de Chile Santiago, Chile

Abstract Information systems design has traditionally dealt with both data modeling and process modeling. Regarding process modeling, most design techniques based on structured design or object-oriented design specify possible data flows and interactions among components, but are not very precise in specifying system control flow. On the other hand, the introduction of computer networks has enabled technologies like work flow automation and ED1 to coordinate collaborative work in and across organizations, allowing process re-engineering to shorten process times by introducing parallelism. The introduction of these technologies requires a precise specification of control for a parallel process. Moreover, process specifiers are not necessarily computer scientists, nor are they willing or able to learn complex languages. Several languages have been developed to specify control in worwlow systems. Most languages specify control using diagrams similar both to traditional single-threaded control flow diagrams and CPM charts. These languages can represent both choice and parallelism. But this combination of language constructs, required for modeling processes, can introduce control anomalies, like useless work or even worse, deadlock. This paper is a first treatment of multi-threaded control flow in information processes. It presents common language constructs and some extensions, semantics, analysis methods, and a theory of threads of control which is used to analyze process models and to define the semantics for some of the language extensions. The formalization is given using Petri nets.

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Automated Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Process Model Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Control Anomalies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Contents of this Article . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Process Model Control Specification . . . . . . . . . . . . . . . . . . . . . . . . . .

2 3 4 6 7 7

*This work is funded in part by COMCYT through project FONDECYT 1940677.

1 ADVANCES IN COMPUTERS, VOL 45

Copynght 0 1997 by Academic R e v Ltd All nghts of reproduction in any form reserved

2

PABLO A. STRAUB AND CARLOS A . HURTADO

2.1 Basic Control Constructs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 CICN: A Standard Control Language . . . . . . . . . . . . . . . . . . . . . . . 2.3 Advanced Control Constructs . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. PetriNets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Place/Transition Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Free-choice Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Partial Order Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Behavioral Properties and Invariants . . . . . . . . . . . . . . . . . . . . . . . 4 . The Simple Control Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 The Control Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Petri Net Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Behavioral Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 . A Theory of Threads of Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Thread Labels and Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Threads and Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 . Applications of Thread Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 BaseModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Alternatives within a Thread . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Alternatives between Multiple Threads . . . . . . . . . . . . . . . . . . . . . . 6.5 Unbalanced Connectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Summary: Incremental Case Composition . . . . . . . . . . . . . . . . . . . . 6.7 Dealing with Unspecified Situations . . . . . . . . . . . . . . . . . . . . . . . 6.8 General Unbalanced Connectors . . . . . . . . . . . . . . . . . . . . . . . . . 7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix: Proofs of Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8 10 11 13 13 14 15 16 17 17 18 21 25 25 27 31 33 34 36 37 38 40 43 44 44 46 47 50

1. Introduction Organizations are distributed. interactive. parallel systems. that handle incomplete and inconsistent information . From the perspective of this definition. it is possible to realize that computer science can help to understand and support work within organizations. for parallelism. interaction. data handling. etc., are clearly in its realm . In fact. in the last decade or so the coordination of collaborative work has been automated by so-called collaborative systems. which are based on techniques loosely identified by the term Computer-Supported Collaborative Work (CSCW) [21] . The idea of process automation traces its origins back to the invention of the assembly line in the beginnings of this century . Taylor’s theories on rationalization of functions within organizations led to the definition of organizational processes. defined as sets of interrelated functions performed by several individuals. Only recently. with the development of inexpensive networked computers. has the possibility of automating coordination of work by many people been realized .

CONTROL IN MULTI-THREADED INFORMATION SYSTEMS

3

1.1 Automated Processes Automated processes are usually more complex than similar manual processes because of added cases and parallelism. Automated process models usually have more cases (Le., choices of possible execution paths) than manual processes, because manual processes-followed by intelligent people-can be changed during process execution if need arises. In fact, most procedures from organizational procedure manuals are just a sequence of unconditionally performed steps. While most organizational processes have intrinsic parallelism due to the relative independence of some activities, manual procedures usually cannot take advantage of it, because coordination becomes too complex. Automatically coordinated processes, on the other hand, can safely handle complex parallel procedures. Another dimension of complexity that can be handled in automatically coordinated process models is the structure into models and submodels, with complex call relations, even including recursion. Many organizational processes are fairly rigid or at least there is a preestablished set of possible choices. They are called routine processes. There are also many organizational activities that cannot easily be structured in terms of fixed procedures. They are called non-routine processes. Most processes fall between these two categories, so they are usually executed as routine processes, but may sometimes include non-routine parts. It is not possible to create a meaningful process model for a completely non-routine process. Of course, automation of routine processes is simpler than that of non-routine processes (see Table I). Support for nonroutine processes has been an active line of research and development. Some systems use messaging to handle exceptions by supporting informal communication, which is initiated once the process is off its normal course of action [S].Another approach is to explicitly model the interactions among actors involved in the process [ 141. However, those aspects pertaining to the control flow of a process during and after exception handling have not been treated in depth. In particular, TABLEI ATTRIB~JTES OF AUTOMATEDPROCESSES

Kind of process Attnbute Process definition Tool support

Routine

_

Eimple good

Semi-routine ~ hard fau

_

Non-routine impractical poor

4

PABLO A. STRAUB AND CARLOS A. HURTADO

when processes have parallelism not every state is acceptable because of the possibility of control anomalies, like deadlock.

1.2 Process Model Components To reliably automate a process, the process has to be formally defined, that is, there must be a formal procedure or process model, usually written using a graphical language. The process model is enacted (i.e., executed) creating particular processes from the same model. Process models comprise function, data, behavior, organization, and other process features [lo]. These features can be dealt with separately by a family of related process modeling languages (e.g. Kellner [lo] used the Statemate family of languages [8] to model processes). A complete process model must describe not only the activities performed, but also their sequencing, parallelism, resources, organizational context, etc. Kellner [ 101 identifies four main perspectives of process models: function, behavior, data, and organization; these perspectives cover most process aspects. Kellner’s process perspectives are present in the representation of information processes under new automation technologies, because information processes can be modeled at different levels. In this work we describe these process models using a so-called generic model, which is based on several specific process modeling languages and techniques [7]. In the generic process language, there are four related submodels for a complete process model: control model, context-independent model, organizational model, and context-dependent model. The relationship among these models is shown in Fig. 1. 0

Control model. The control model is basically a graph whose nodes are

Context-dependent model Context-independent model

7,

Grganizationd model scripts role assignment object references

FIG.1. The four submodels of the generic process language.

-

CONTROL IN MULTI-THREADED INFORMATION SYSTEMS

5

activities and connectors. Activities represent basic units of work: they are either atomic activities, calls to other models, or special. The control model is a transition diagram with constructs resembling control flow diagrams (CFD) and critical path method (CPM) charts. Control is a subset of process behavior. Process behavior is defined as “when and how they are performed through feedback loops, iteration, complex decision-making conditions, entry and exit criteria, and so forth” [lo]. From the point of view of coordination theory [ 131, control models are the specification of causal interdependencies between activities. That is, the control model represents partial orders between the activities of the different processes for a given process model (as opposed to total orders or sequences, due to parallelism). The control model does not represent functionality, and other aspects like resource sharing, timing, etc., even though all these aspect do determine actual process behavior. Like programming languages, process modeling languages include both basic control constructs-like selection, iteration, and synchronization-and more advanced constructs-like recursive calls to submodels and activity replication-some of which will be described in this chapter. 0

Context-independent model. The context-independent model is an extension to the control model, which adds local data and adds a functional description. That is, this model adds the description of what data is passed from activity to activity, and how data is changed in the activities. This model is independent of the organizational context in which it is executed, not unlike the way in which a program written in a high-level language can be executed in different computers and operating systems.

0

Organizational model. The organizational model includes classes of objects, types of objects, types of object relationships, actual objects and actual relations between objects. Each class has a defined set of operations or methods for objects of the class. There are two distinguished object classes called actors and roles and also a relationship between actors and roles. This model represents resources-people, machines, data-and resource organization-organizational structures, data structures-.

0

Context-dependent model. The context-dependent process model comprises the context-independent process model, the organizational model, and their relationships. This model assigns roles to activities, references to organizational objects, and scripts which call organizational object methods. Executors are related to activities. Object

6

PABLO A. STRAUB AND CARLOS A. HURTADO

references are treated like data. Scripts are mainly used to fetch data from organizational objects before executing the activity and to store data in them after the activity executes; scripts are thus related to activities.

1.3 Control Anomalies As was previously mentioned, the capacity to handle complex processes with parallelism and choice is the main difference between process execution under automatic coordination and process execution under manual coordination. However, even though parallelizing a business process might ‘‘dramaticallyreduce cycle times and the resultant costs” [17], the existence of both parallelism and choices might lead to anomalous behavior, like deadlock or the execution of useless activities. Thus, some process modeling languages constrain the forms of parallelism (e.g., parbegin and parend) [2] that can only describe correct behavior. Alas, languages that do allow general forms of parallelism do not test for incorrect behavior, a notion that is not even well defined. Sequential processes cannot have behavioral anomalies (except infinite loops). On the other hand, parallel processes without choice cannot have control anomalies. Thus, it is not surprising that the usual way to avoid these anomalies is by defining simple-minded process models that inhibit the natural parallelism of activities and abstract away the handling of exceptional cases. Oversimplification is especially relevant when dealing with complex processes. While there are many real-world simple business process models, a study cited in [ 3 ]on the modeling of 23 processes of the US Department of Defense included 17 process models with more than 50 activities, and 3 of them had more than 200. There are three main approaches to find control anomalies: (1) Build a model and then verify its properties. One way to verify is building the space state and exhaustively checking the properties (e.g., a deadlock is an inappropriate final state). Another way is finding net invariants; so-called place invariants and transition invariants can be computed by solving systems of linear equations derived from the net [12]. (2) Build a model that is correct by construction, because all grammatical rules used guarantee correct behavior. Abbati et al. [ 1] presents a grammar that introduces parallelism using a construction similar to parbegin-parend pairs. DeMichelis and Grasso [2] annotates the activities in a defined manner to the same effect. (3) A third approach is using only small models by abstracting models

CONTROL IN MULTI-THREADED INFORMATION SYSTEMS

7

and submodels; the intent is that it is more likely to have correct behavior if the models are simpler. For example, [9] suggests models with less than 10 activities. The first approach does not explain why a particular model has control anomalies nor which specific part of the model is the culprit. The second approach works by limiting the forms of parallelism; in particular, it is impossible to model all the parallelism in a PERT chart. The third approach is just a rule of thumb, that may or may not work and inhibits parallelism. Besides, in addition to the need of having correct control in a process model, exception handling poses the additional problem of ensuring correctness after an exception is handled, even if due to the exception the model needs to be changed at run time.

1.4

Contents of this Article

In this article we are concerned with the modeling and analysis of control, the definition of a notion of control correctness, a theoretical framework for these purposes that we call the theory of threads of control, and applications of the theory. Section 2 describes constructs for control specification languages in general and the particular language used in this article. Section 3 introduces those aspects of the theory of Petri nets that will be used in latter sections. The main contributions of this work are in Sections 4 to 6. Section 4 formally defines CICN and its mapping into Petri nets, and then describes a series of control anomalies and control properties that characterize control correctness. Section 5 introduces an algebraic theory of threads of control that explains control anomalies. Two applications of the theory of threads are presented in Section 6: an incremental development method and a new advanced language construct called the unbalanced connector, which is used in the development method. Finally, Section 7 puts these results into perspective.

2. Process Model Control Specification Most graphical languages to specify control flow have similar constructs. Thus, instead of doing a detailed language survey we will analyze language constructs on a small sample of representative languages. Language constructs are classified as either basic or advanced. While this classification might seem arbitrary at first, it is based on whether or not the semantics of a construct can be expressed using place/transition nets (as in Section 4.2) using a bounded number of nodes.

8

PABLO A. STRAUB AND CARLOS A. HURTADO

In most process modeling languages, control is expressed by a directed graph with several kinds of nodes. One such kind of node is an activity node that must be present in one form or another in all languages. For example, control flow diagrams (CFD) have three kinds of nodes: statements, twoway conditionals, and two-way joins' (a case-like statement needs also n-way conditionals and joins). Even though edges are not always drawn with arrow heads, they are directed and we will call them arrows; if the arrow head is not present we will assume the edge is directed from left to right.

2.1

Basic Control Constructs

In this section we present basic control constructs like sequencing, choice, introduction of parallelism, and synchronization. We also present abstraction and simple procedure calls. These constructs constitute a meaningful set in which to write control process models. Models written using only basic constructs are called basic models-although they might be very complex. 0

0

Sequencing. In almost all languages, sequencing between activities is expressed using an arrow. Thus, an arrow from an activity A to an activity B means that B can start execution only after A has finished execution; if A is the only predecessor of B , usually finishing A executions is the only precondition that enables B.2 In fact, this is true of both CFD and the critical path method (CPM), as well as most other modeling languages. Arrows do not express sequencing in data flow diagrams or similar languages like SADT or IDEFO [16]. This is a common source of confusion on the semantics of this kind of languages, which do not specify control flow, but merely constrain it. Choice. There are two common ways to represent choice: by special choice nodes or implicit in activities. In CFDs choice is represented by a diamond-shaped node to split control flow and the joining of two arrows in one (sometimes the arrows join in a dot). Languages like CICN use or-nodes, which might be either or-split nodes or or-join nodes, both of them drawn with clear circles. Informally, the behavior of an or-split is that control reaching the node flows to just one of the outputs, and the behavior of an or-join is that control reaching an input flows through the output.

'Joins in CFDs are implicitly represented as the joining of two arrows, or explicitly represented as a small filled circle. 'As far as control is concerned, i.e., abstracting away from resource utilization and other behavioral conditions.

CONTROL IN MULTI-THREADED INFORMATION SYSTEMS

0

0

0

9

A combined or-join-split node is present in several languages. In fact, in several languages in which activity nodes allow multiple inputs and multiple outputs, the activity itself behaves as an or-node, with implicit choice. That is, the activity is enabled by just one input and at the end it produces just one output. Iteration. Iteration is not a primitive construct in graphical languages. It is defined using choice constructs, so it is not discussed here, except to note that the possibility of infinite loops is not a concern in control flow: knowing which branch of an or-node is selected is a behavioral problem, but not a control flow problem. If a loop is forced to be infinite because there is no path that leaves the loop, then this is a control flow problem. Parallelism. Like choice, parallelism is represented either by special nodes or implicit in the activities. CICN uses and-splits to introduce parallelism (a thread of control is divided in two or more parallel threads) and and-joins to synchronize threads (two or more threads are replaced by a single thread). A combined and-join-split node is also usual. In the critical path method (CPM), for instance, all activities behave like an and-node, that is, the activity starts when all its predecessors finish, and upon finalization control flows to all outputs. This semantics is very convenient for CPM charts that express parallelism, but cannot express choice, for there is only one kind of node: the activity itself. Simple abstraction. When processes are complex, it should be possible to abstract parts of them in subprocesses. Conversely, given a process, it should be possible to refine it adding more detail. For example, an activity might be replaced by another process. If a process P has an activity A that is refined by a process P' we say that P calls P'. Thus, complex processes are represented by a set of processes and a calls relation, with one root process that directly or indirectly calls all others3 If the culls relation does not allow cycles it represents simple abstraction. In other words, the abstraction is simple if there are no recursive calls. Simple abstraction has a semantics known as the copy rule [19, page 2881, which is basically the semantics of macro expansion. Most commercial and experimental work-flow systems that support abstraction have this semantics, and thus disallow recursion.

'Lamb [ l l ] recognizes several relations between program elements in addition to the culls relation, like the uses and defines relations. In process models, usually the defines relation is empty (i.e. there are no local definitions) and the uses relation is equal to culls (i.e. all uses are calls and vice versa).

10

PABLO A. STRAUB AND CARLOS A. HURTADO

This is not surprising for two reasons: the semantics is easily understood, and it is not always apparent why a process modeler might need recursion. Even simple abstraction is not very simple when abstracting from a multithreaded process. If an activity A is refined by a process P' it must be the case that there is a one-to-one mapping from inputs and outputs of A to those of P ' , or else there will be dangling arcs once the copy rule is applied. In the particular case that both A and P' have a single input and a single output, semantics is simple (e.g., CICN). When more than one input and output is allowed, abstracting control is powerful but might be misleading. There are several possible ways in which abstraction takes place: there is a fixed or semantics (e.g., Rasp/VPL, Copa), that is, activities are single-input-singleoutput, there is a fixed and semantics (e.g. CPM), there are both or and and semantics, there is a general propositional semantics (e.g. P I ) .

2.2

CICN: A Standard Control Language

Information Control Nets is a family of models developed at the University of Colorado for information system analysis, simulation, and implementation 151. The ICN is a simple but mathematically rigorous formalism that has similarity to Petri nets. ICNs are actually a family of models which have evolved to incorporate control flow, data flow, goals, actors, roles, information, repositories, and other resources. This integrated model is well adapted to the traits of a generic process model as described in the introduction. ICNs have been studied in academia and applied in industrial workflow products. The control ICN (CICN) is a simple, known, and representative language for the specification of control in ICNs models. A CICN is a graph in which nodes are labeled as an activity, as an or-node, or as an and-node. There are two distinguished activities start and exit, without a predecessor and without a successor, respectively. Other activities have one predecessor and one successor. Usually or-nodes and and-nodes have either one predecessor and more than one successor (a split) or more than one predecessor and one successor (a join). An or-split represents a decision point. Graphically, activities are depicted as labeled ovals, and-nodes are depicted as dark circles, and or-nodes as clear circles. As un example, consider the credit application process in Fig. 2, whose activities are explained in Table II. The execution of this process begins with the start node, and then executes activities A, B, C and D in parallel. After both C and D are completed, activity G is executed in parallel with the

CONTROL IN MULTI-THREADED INFORMATION SYSTEMS

11

n

FIG. 2. A mortgage loan process model in CICN notation. Activities are represented by ovals and and-nodes are represented by black circles. TABLEI1 ACW?ITIES OF THE MORTGAGELOANPROCESS Activity

Description

Start

Fill out application form Verify creditworthiness of customer Register life insurance Set up expense account Identify property to buy Get legal data for property Verify legal status of property Appraise property and verify value is okay Notify customer of outcome

A B C D E F G exit

sequence E, F. When A, B, G and F are done, the process executes the exit node and stops.

2.3

Advanced Control Constructs

Like some programming languages, some process modeling languages have an array of relatively sophisticated control constructs. Among these advanced constructs there are exception handling constructs, which define complex state transitions, but whose semantics is expressible using P/T nets. We will show in Section 6 how the theory of threads provides a framework to define a very general state transition, called an unbalanced connector. In this section we briefly mention two other kinds of constructs: recursive calls between process models and replication of activities.

12

PABLO A. STRAUB AND CARLOS A. HURTADO

0

Recursion. Consider negotiation processes between a provider and a requester. Using software such as the Coordinator system [14], the state of a negotiation between a person A acting as supplier and another person B acting as consumer can be maintained and transitions can be recorded by the computer. It is usual that as a result of a negotiation process the supplier requests something from another supplier C, that is, the negotiation process is intrinsically recursive. If process models are not recursive, the subordinated negotiation process will not have any formal link to the original process. Changes in the state of one of these processes might affect the other. For example, a renegotiation between B and A might trigger a renegotiation between A and C as a subprocess. With recursion, this trigger is automatic. Implementing recursion requires separate states for the execution of called models; the state of the computation of a model must be linked to the state of execution of the calling model. This is simpler than in a language such as Pascal where separate dynamic and static links are needed, because process models do not have local submodels (much as C functions cannot define local functions). On the other hand, recursion in the presence of parallelism cannot be implemented simply with a stack and a single current instruction pointer, as in most sequential programming languages. The semantics of recursion cannot be expressed using P/T nets, because the copy-rule as semantics of procedure calls cannot be applied [19]. High-level nets can be used to express recursion.

0

Replication. Replication of activities occurs when an activity must be performed several times in parallel, as opposed to several times in sequence. If “several” means a fixed number n of times, it can be modeled by an and-split followed by n copies of the activity followed by an and-join. If n is not fixed, a new construct is needed. In languages like Rasp/VPL replication is denoted by an icon that looks like a stack of regular activity icons. But does replication occur in practice? Yes, for instance, consider a software development process, where a set of modules needs to be implemented by possibly different people. This can be modeled by a replicated activity. Another use for replicated activities is defining a single activity performed by an unspecified number of people, like a meeting. Replication can be implemented in a workflow system by a loop that creates all required parallel processes; to synchronize these processes, an integer semaphore initialized to n might be used. Again, the semantics of replication cannot be expressed with a fixed P/T net, but can be expressed using high-level nets.

CONTROL IN MULTI-THREADED INFORMATION SYSTEMS

13

3. Petri Nets Petri nets are a mathematical formalism developed in the early 1960s from the work of C.A. Petri in 1962, in which he formulated a basic theory on the communication of asynchronous components of systems. Petri nets are a generalization of transition-state diagrams from automata theory. Unlike ST diagrams, the global state is defined as the union of local states which enable transitions in the net by themselves. Having a distributed state allows the expression of not only choices but also parallelism within a system. Petri nets have been widely used to describe the semantics of many process modeling languages and workflow systems (see, e.g., [ l ,5,18, 25,28,29]). Its success is due to the possibility of formally analyzing and simulating dynamic aspects of systems. There are many kmds of Petri nets. They are classified in two groups: low-level nets and high-level nets. In both groups, a system model is presented by a graph with two kinds of nodes (places and transitions) and a set of tokens. In the first group, tokens represent boolean conditions that enable the firing of transitions. Upon a firing of a transition, the tokens that enable the transition are deleted and new tokens are created, as if tokens were moving through the net. Important kmds of low-level nets are elementary nets and place/transition nets. In the second group, tokens represent not just boolean conditions, but complex data items or predicates. Important kinds of high-level nets are colored Petri nets, predicate/transition nets, and environment/transition nets.

3.1 Place/Transition Nets Place/transition nets are adequate to model basic control constructs. However, they are not useful to model more complex forms of control like recursion and replication of activities, let alone other process model issues like functionality, timing, resource utilization, etc. In this section we will show some basic aspects of P/T nets, which will be used to base the semantics and analysis of basic control models. A P/T net is a directed graph with two kinds of nodes, called places and transitions. A net is a triple N = ( P , T , F ) , where P is a set of places, T is a set of transitions, and F is a set of edges from places to transitions and from transitions to places, that is, F C ( P x T ) U (T x P ) . Places and transitions are called nodes. A node cannot be both a place and a transition, i.e., P n T = 0. The flow relation F defines for each node x E P U T a set of successors, denoted x', and a set of predecessors, denoted 'x. Nets are drawn as follows: places are represented by circles, transitions

14

PABLO A. STRAUB AND CARLOS A. HURTADO

p2

t2

d b2

P4

(a)

(b) FIG.3. (a) A Petri net; (b) one of its processes.

are represented by rectangles, and the flow relation is represented by a set of directed arcs (arrows). For example, for the net in Fig. 3(a), p = { P I , P ~ ~ P ~ , P T~ =, P{ f i~? fIz ~t f 3 1 , and F = { ( P i , f i ) , ( f i , P 3 ) 9 ( P 3 r f 3 ) , ( f 3 , P ~ ) (, P 2 9 f 3 ) v ( P 2 r f 2 ) r ( f 2 9 P 4 ) I .

A path in a net N is a non-empty sequence n = x o x l ... x, such that ( x i - l ,x i ) E F , for 1 C is k. A net is said to be connected if every pair of nodes ( x , y) belongs to the symmetric, transitive, reflexive closure of F , i.e. (x, y) E ( F U F - I ) " . A net is strongly connected if for every pair of nodes ( x , y) there is a path from x to y. A P/T system is defined by a net and an initial marking or state, where a marking is a mapping from places to numbers (i.e., the count of tokens in where N = (P.T , F ) is a net and each place). A P/T system is a pair ( N , Mi) M i:P + N, is the initial marking. For example, the initial marking in Fig. 3(a) is M i =( p i + + 1, p2++1,~ ~ - ~0 ~, - 0~ ~, - 0 ) . If in a marking M all input places of a transition t have tokens, the marking enables t. If a transition is enabled at a marking M , t can fire and produce a marking M' by removing one token from each input place and producing one token in each output place. A marlung M is reachable from a marking M' if there is a sequence of firings enabled at M' whose final marking is M.A marking M is reachable if it is reachable from Mi.

3.2

Free-choice Nets

In a P/T net, synchronization is represented by a transition having more than one input place, and choice is represented by a place having more than one output transition. In general, choice and synchronization could be mixed. For example, in the net in Fig. 3(a), choice at place p 2 is interfered with by synchronization at transition f3, for choice depends on the existence of a token in place p 3 . Free-choice Petri nets [4] form a subclass of the P/T-nets, in which choice in a place is not interfered with by other parts of the system. This means that choice and synchronization do not mix, or that choices are in a sense free. A sufficient condition for free choice is that places with more

CONTROL IN MULTI-THREADED INFORMATION SYSTEMS

15

than one output transition (where choices are made) are not connected to transitions with more than one input place (where synchronization occurs). While this condition is not necessary for free choice as defined by [4],it is satisfied by the systems considered in this paper. The importance of a net being free-choice is that many interesting properties can be decided in polynomial time.

3.3 Partial Order Behavior A sequence of enabled transitions in a system describes executions of the net. These sequences, however, do not determine causal dependencies between transition occurrences. To describe causal dependencies, an alternative representation of executions is needed, in which an execution is a partial order of firings (as opposed to a sequence, which is a total ordering). This partial order is also described by a net called a causal net. A causal net has no choice nor cycles. A causal net is a place/transition net N' = ( B , E , F ' ) , where B is a set of places, E is a set of transitions, F' is the flow relation, each place has at most one predecessor and at most one successor, i.e. V b E B : # ' b s 1 A # b * G 1, and finally the net is acyclic, that is, there are no paths with two occurrences of the same node. For example, the causal net of Fig. 3(b) defines a process for the system in Fig. 3 ( 4 , where q(b,)= p i . q ( e i ) = t l , d b z ) = p 2 , 4 ( b 3 )= p 3 , q ( e d = t 3 , q ( b J = Ps. A process is represented by a causal net which is related to the execution of a P/T system.

Definition 3.1 (Process) A process is a tuple n = ( B , E , F ' , q ) where ( B , E , F ' ) is an acyclic place/transition-net without choices, q is a total function from elements of n to elements of a P/T net N = ( P , T , F ) , and the following holds: q ( B )c p A q ( E ) , V e € E : q ( ' e ) = ' q ( e ) A q ( e ' ) =q(e)'. The first condition relates places and transitions of the process with those in the system. The second condition implicitly defines F' in terms of F. The initial state in N before the execution of the process is defined by the number of places without predecessors in the process that corresponds to each place in N :

M,(p ) = # { b E B l p = q ( b ) ,'b= 0 } . Likewise, the state reached in N after the execution of the process is defined

16

PABLO A. STRAUB AND CARLOS A. HURTADO

by the number of places without successors in the process that correspond to each place in N .

M,( p ) = #( b E B 1 p = q ( b ) , b' = 0 ) .

3.4 Behavioral Properties and Invariants There are several important concepts and properties related to the behavior of a P/T system. Some of them are defined here. 0

0

0

0

A deadlock in a system is a reachable state in which no transition is enabled. A system is deadlock-free if for any reachable state M there is a transition t , enabled at M . A system is live if for any reachable state M and for any transition t , there is a state M', reachable from M that enables t. A system is n-bounded if for any reachable state M and for any place p , M ( p ) s n. A system is bounded if there is such an n. A system is safe if it is 1-bounded.

A comprehensive study of these and other properties is in [4]. The dynamic behavior of a P/T system depends upon the net structure and its initial marking. The influence of the structure is very important, because it holds under every initial markmg. Moreover, the structure can be studied independently from the initial marking. Invariants are properties that hold in all reachable states of a system. Different kinds of nets have different kinds of invariant predicates over net states. In particular, functions that map places to rational numbers which depend only on the net structure have been studied extensively. These functions determine an invariant property of a system: the weighted sum of the tokens in the net is constant under all reachable states. By a slight abuse of notation, these functions are known as place invariants or S-invariants. Given a net, a place invariant is a rational solution to the homogeneous set of linear equations

c,,.,,I ( P ) = c,,;, I ( P ) where the unknown variable I is a vector whose dimension is the number of places in the net. The fundamental property of a state invariant I is given by the following equation, which defines the conservation property described in the preceding paragraph:

C P EI(p p ) x Mi(p ) = &, I( p ) x M ( p ) = constant where M iis the initial marking and M is any reachable marking.

CONTROL IN MULTI-THREADED INFORMATION SYSTEMS

17

S-invariants are strongly related to behavioral properties of Petri nets. Among other things, the existence of an invariant in which all components are positive ensures that the net is bounded. We will see (Section 5) that a special kind of non-numeric invariant is related to notions of correctness in control models.

4. The Simple Control Property Figure 4 shows a simple process model to approve a loan for a home. The first activity, a , is the customer’s application, then and-node x splits execution in two parallel activities ( b , credit approval and c, mortgage approval). After each activity, a choice is made (at or-nodes u and v). If both activities are successful, they synchronize in and-node y and the process proceeds to the exit node, so that the credit might be issued. Of course, not all applications are approved; if the credit is not approved the mortgage approval activity c becomes useless; in a realistic situation this activity is composed of several other (now useless) activities. Moreover, upon finishing the mortgage approval activity and-node y will attempt (and fail) to synchronize with the credit approval activity b. On the other hand, if both the credit and the mortgage are rejected, the process will produce two tokens in the w or-node, both of which will reach the exit node (the process exits twice!).

4.1

The Control Model

A CICN is a directed graph with three lunds of nodes: activity nodes, ornodes, and and-nodes. There are two special activities: the start node and the exit node. A node that is not an activity node is a control node.

Definition 4.1 (CICN) The control model is a directed graph (A, 0, N , start, exit, R ) , where A is a set of activities, 0 is a set of or-nodes, N is a set of and-nodes, start is the start node, exit is the exit node, and R is the flow relation. The set of all nodes is V = A U 0 U N .

FIG. 4. Example of a CICN net with a potentially anomalous loan approval process model:

Or-nodes are represented by white circles.

18

PABLO A. STRAUB AND CARLOS A. HURTADO

The following conditions hold: 0

0 0 0 0

Start and exit are activities, i.e., (start, exit) A . Activities and control nodes disjoint, i.e., A r l 0 = A n N = 0 f~ N = 0. R is a relation on nodes, i.e., R C V x V. Start has no predecessors; exit has no successors. For every node x there is a path from start to exit that includes x , i.e. Vx E V: start R*x A x R* exit

where R * is the reflexive and transitive closure of R. The semantics of CICN can be expressed directly, without the use of Petri nets [ 5 ] . A marked CICN is a CICN along with a function m from nodes and edges to the naturals, i.e., unlike P/T nets, all nodes and arcs might be marked. The initial marking has a single token in the start node. In general, marked edges enable the start of a node and marked nodes represent activities in execution and thus enable the termination of the node. Thus, the initial marking enables the termination of the start node, which will mark the edge following the start node. Or-nodes require one token in one of the incoming edges to start; upon termination they add a token to one of its outgoing edges. And-nodes require one token from each incoming edge to start, and produce one token in each outgoing edge upon finishing. Activities require one token in their incoming edge to start, and upon termhation produce one token in their outgoing edge. While not part of CICN, the most usual semantics for activities with several inputs and outputs is that of an or-node. 4.2

Petri Net Semantics

The behavior of CICN can be modeled by a P/T net by translating each of the elements in the net into part of a P/T net as in Fig. 5 and then connecting these parts. In CICN edges can be marked, hence we translate each CICN edge into two P/T net edges with a place in the middle. CICN nodes can also be marked and have two changes of marking: start and termination. Hence each node will be translated into a sort of sequence: transitionplace-transition. An and-node is translated as a sequence of one transition connected to the place connected to one transition. An or-node with n incoming edges and m outgoing edges is translated into n transitions connected to a place connected to m transitions. A regular activity may be regarded as an or-node; if it has one incoming edge and one outgoing edge it translates into the same as an and-node. The sfurt (respectively, exit) activity has a special translation, as a sort of incomplete activity, because it does not have a predecessor (respectively, successor).

CONTROL IN MULTI-THREADED INFORMATION SYSTEMS

CICN

PIT-net

CICN

19

P/T-net

FIG.5. Translation of CICN components into P/T net components.

It is not hard to realize that the resulting P/T net has the same behavior as the CICN, because the correspondence between CICN marlungs and P/T system markings is trivial.

Definition 4.2 (Place/transition net of a CICN model) The place/transition net corresponding to a control model At is the place/ transition net N(&) = ( P . T , F ) where P=V UR T = ( a , I a E A - (start]] U (a,( a E A - {exit]]

[transitions of activities 1

u ( x0 1 0 E 0 , ( x , 0)E R ] u ( ox 1 0 E 0, (0,x) E R 1 u{n,InEN)U{nfInEN) F = { as-u I a E A - {start]1 U u-afI

[transitions of or-nodes I transitions of and-nodes I

a E A - ( e x i t )}

[within activities]

u (xoHOIOE 0, (x,o)ER)U (o-ox(oEO,

(0,X)ER)

[within or-nodes1 U(n,Hn(nEN]U(nHnf(nENJ [within and-nodes1 u ( x p(x,y) 1 X E A u N ) [from activities and and-nodes1 [from or-nodes] u b Y H ( 0 , Y ) I 0E 0 ) U I (x,Y ) + + x , 1 x E A U N ) [to activities and and-nodes] [to or-nodes1 u ((x, o ) + x 0 1 o E 0 ) An example is in Fig. 6(a) which shows the place/transition system corresponding to the model in Fig. 4. This translation creates only free-choice nets [ 2 5 ] , because the only places with more than one successor are those of or-nodes, but their

20

PABLO A. STRAUB AND CARLOS A. HURTADO

FIG.6. (a) Place/transition net corresponding to the loan approval in FIG. 4. (b) One of the processes corresponding to the net, representing the case in which the mortgage is found acceptable but the credit is rejected.

successors have only the place as predecessor. That is, when there is choice there is no synchronization. The P/T net for a given model JU becomes a P/T system if it has a defined initial marking.

Definition 4.3 (Placeltransition system of a ClCN model) A control model At has one place/transition system defined over the P/T net N(JU). The initial marking is

Call semantics. The translation above does not include the possibility of assigning a whole process model to an activity, i.e. having an activity whose execution comprises the execution of another process. This implies that there is a hierarchy of models. The semantics for simple calls can be expressed by the copy rule as in programming languages [19], that is the process model hierarchy is flattened before its conversion into Petri nets. Another possible translation for a call can be developed as a refinement of Petri nets. In that case, the structure of calls at the process model level is kept and there is a mapping between Petri nets. Figure 7 represents how an activity u in a model At is mapped to another model A ’ , in terms of Petri nets. The basic idea is that the place a in N ( A ) is refined into the whole net N ( A ’ ) .

21

CONTROL IN MULTI-THREADED INFORMATION SYSTEMS

...

Caller start

-0

a

exit

Caller’s translation

Called’s translation

0-43-0

Called

exit

FIG. 7. A caller model ht with an activity u that calls a called model A’.The figure shows the translations in terms of Petri nets and the refinement mapping of a into a whole net.

4.3

Behavioral Properties

One basic property of a good model is that it does not deadlock, that is, each process of the model reaches an output socket to produce a termination response to the environment. Inappropriate control structures can create deadlocks as in Fig. 8.

Definition 4.4 (Deadlock) A final marking M, is a deudlock in a process model if Mf(exit) = 0. Property 4.1 (Deadlock freedom) A model is deadlock-free if none of its final markings is a deadlock.

b

b

(4

(b)

FIG. 8. Models that deadlock; (a) guaranteed deadlock; (b) deadlocks when activity u chooses one path and b chooses the other. This is a distributed decision, i.e. a single decision must be taken by two independent executors.

22

PABLO A. STRAUB AND CARLOS A. HURTADO

Looking at Fig. 8(b) it seems that distributed decision is a structural property of a net. A relevant question is whether there is a class of nets that do not have distributed decision, i.e. local-choice nets. The property of being local-choice is not a structural property that can be checked by a simple predicate on the flow relation (as the free-choice property). Figure 9 shows a model that is not local-choice, but the and-nodes that may deadlock are far away from the activities that made the distributed decision. Process models can suffer from prescribing too much work, so that some activities are unnecessarily performed, because in some particular execution the exit of the model does not depend on them, i.e., the activity is useless. Useless activities are those that are unnecessarily performed in some particular execution, because there is no causal relation from the activity to the exit node. In other words, the process could produce the same output without executing the activity. For example, activity c is useless in the process pictured Fig. 6(b). If tokens are regarded as information placeholders, useless activities represent unused information. Useless activities are the consequence of choices that involve parallel activities. Given a process, a place is useless if there is no path from the place to the exit place. To define useless activities we need a notion of behavioral semantics of Petri nets that can represent causality, i.e. true parallelism as opposed to interleaving semantics. A parallel semantics of Petri nets represents executions of a net by a process, in which precedence represents causality (Section 3). In a process, a place represents a token, a transition represents a firing occurrence, and a path between two nodes represents a causal dependency between them. Useless activities are defined in terms of processes instead of final markings, because from a final marking it is impossible to know whether an activity was even executed. However, the existence of useless activities can be characterized in krms of final markmgs. A process model has no useless activities if and only if in all final markmgs all tokens are in the exit node [22].

FIG.9. This process model has distributed decision between activities A and B : executors of A and B are both to decide on the future of the process (decisions are representedby the ornodes immediately after the activities). If they take different decisions, the process deadlocks because the and-join will have two tokens on the same arc, instead of one token on each arc. The blobs represent submodels without control anomalies.

23

CONTROL IN MULTI-THREADED INFORMATION SYSTEMS

Definition 4.5 Given a process, a place is useless if there is no path from the place to the exit place. In other words, a place e within a process is useless if ( e ,exit) E F where F' is the transitive closure of F. An activity a is useless if there is a place e in a process for the model which is related to the activity, i.e., q ( e ) = a , and e is useless in the process. +,

Property 4.2 (Usefulness) A process model is useful if no activity is useless in any of its processes. For example, consider the process model from Fig. 10, whose translation into a Petri net is shown in Fig. 1 la. In one possible process of that net (Fig. 1lb), activity B is useless, hence the process model is not useful. Tokens in interior places (i.e. places that does not correspond to sockets) in a final marking that is not a deadlock define an overloaded markmg.

Definition 4.6 (Overloaded state) A final marking M f is overloaded if it is not a deadlock and there is an interior place p such that Mf(PI > 0.

C FIG. 10. A process model in CICN notation: activities are represented by ovals, and-nodes by black circles and or-nodes by white circles.

-0-DC)

exit

FIG. 1 1 . (a) The translation into a Petri net of the model in FIG. 10. (b) One of the processes of the Petri net, in which activity 8 is useless.

24

PABLO A. STRAUB AND CARLOS A. HURTADO

Useless activities are defined in terms of processes instead of final states, because from a final state it is impossible to know whether an activity was even executed. However, the existence of useless activities, can be characterized in terms of final states, as is stated in the following theorem. Theorem 4.1 A process model is useful overloaded markings and is deadlock-free.

if and only if it has no

Single-input-single-output is the most commonly accepted form of model abstraction, as used in ICN, VPL/Rasp, Action Workflow, and other languages. If a process is viewed as a black box, then if there is one token in the process, eventually there will be one token in the output. Given the semantics of activities in these languages, abstraction of a process as a compound activity is only possible if the model has single-input-single-output. Some languages [6,9] define other lunds of behavior, where and and or outputs are mixed, defining an unnecessarily complex control logic within activities. Hence, another property of a good process model is single-response, that is, each enaction of the process produces exactly one output (a singleresponse process model is by definition deadlock-free). If a process can produce more than one output in the same enaction we call the process multiple-response (Fig. 4).

Property 4.3 (Single response) A process model is singleresponse if all final markings M , satisfy M, (exit) = 1. It is multiple-response if there is a final marking M, such that M,(exit) > 1. The simple control property summarizes or-behavior.

Property 4.4 (Simple control) A model has simple control if the only final marking M , is

Mf@) =

1 i f p =exit 0 otherwise

Simple control implies that if a model begins with a single token in one of its input sockets, it ends with a token in one of its output sockets and there are no other tokens. Theorem 4.2 provides an alternative definition of simple control in terms of other properties. Theorem 4.2 A process model has simple control single-response and useful.

if and only if it is

A model with simple control is said to be behaviorally correct. There are two reasons to adopt this notion of correctness. First, from the above theorems there are no control anomalies, like deadlock, useless activities,

CONTROL IN MULTI-THREADED INFORMATION SYSTEMS

25

and multiple response. Second, a simple control model behaves like an activity; this allows consistent or-abstraction in which a process model can be used safely within a larger model without assumptions of its behavior (except for simple control).

4.4

Discussion

Basic constructs are all free-choice, i.e. they lead to free-choice Petri nets [25].This can be observed from the semantics of most languages to model process behavior. This allows simpler model analysis of control properties and the development of thread theory in the following section. Should choice and synchronization mix in BP models? In most situations no, because synchronization occurs within the model and choices are the result of the functionality of the activities which are executed without reference to the model’s state, that is, the state is not an input to the execution of an activity. We call this the principle of individual decisions; following this principle necessarily leads to free-choice nets. This principle is desirable from a language perspective, as it allows a sort of referential transparency with respect to the meaning of an activity. That is, the functionality of an activity is independent of the model in which the activity is embedded. This is needed if a system supports abstraction so that a model can be used as an activity in several other models. For example, in a banking application, there can be a process model for credit approval which is used as an activity in several process models for different kinds of loans. However, sometimes we want to model the situation in which an actor decides for another actor working in parallel, taking away his authority to choose. If we regard this authority as an order to make a decision, then this situation is a counterorder. Traditional modeling constructs do not allow one to specify this situation. The unbalanced connector and its supporting theory of threads is a simple, high-level and succinct construct to model this phenomenon. Simple control is related to some behavioral properties of free-choice Petri nets. In fact we prove in [22] that a model has simple control if and only if a connected free-choice net derived from the model is live and safe, as defined in Section 3. Because these properties can be determined in polynomial time for free-choice nets [4], this implies in turn that simple control in free-choice nets can be decided in polynomial time.

5. A Theory of Threads of Control We use a thread metaphor in which a thread is a set of strands for a single rope that goes from an input to an output in a process model. The metaphor

26

PABLO A. STRAUB AND CARLOS A. HURTADO

is best understood by considering a model with parallelism but no choice (e.g., a PERT chart). A single rope in which all strands are as long as the rope is passed from the input to the output by dividing the groups of strands into splits and uniting them in joins. In the metaphor, a thread is a set of strands. This metaphor can be extended to models that do have choice; in that case, whatever choices are made, all ropes that get to a choice point (i.e., an or-node) are part of the same thread. In other words, making a choice does not change the thread: only and-nodes change threads. The theory defines the concept of the thread of control ?#(n)of a node n, the subthread relation C, and thread addition Q. Threads are algebraic expressions whose operators are nodes in the net. The intuition behind the theory of threads is that threads are divided by and-splits and added by andjoins, in such a way that for every and-node the sum of the threads of its predecessors equals the sum of the threads of its successors. Each activity and or-nodes has one and only one thread. We have shown elsewhere that a model has no control anomalies like deadlock (i.e., no output), multiple response (i.e., more than one output), or useless activities, if and only if threads are well defined: the thread of the start node equals the thread of the exit node. The thread of a node x is denoted +(x), thread addition is denoted 8 , and the subthread relation is denoted For example, in Figure 12, the following relationships among the threads are true: v ( A ) = v ( o r l )= v(B)= v ( o 4 c v ( s t a r t ) = v ( e x i t > , v ( C >Q v ( o Q ?#(A)@w(C>= ?#(start), ?#(A)is not comparable to ?#(C). In Fig. 10 threads are not well defined. The reasoning is as follows: (a) It so that the and-node that joins should be the case that ?#(A)8 v ( B ) = ?#(or2) A and B is balanced. (b) It should be the case that v ( A ) = v ( o r l )because they are connected, and also that ?#(or,)= ?#(orz).But these equations are unsatisfiable because there is no equivalent of zero or negative threads. Thus, activity A and or-node or, belong to two different threads. This in turn implies that the model has behavioral anomalies, in this case the (unresolved) useless activity B which may lead to an extra token upon reaching the exit node.

c.

FIG. 12. A CICN model without choice and well-defined threads

CONTROL IN MULTI-THREADED INFORMATION SYSTEMS

27

The following goals are derived from the thread metaphor to characterize good models.

(1) An and-node joins input threads into a superthread and splits the superthread into output subthreads. ( 2 ) Two activities connected in sequence have the same thread. ( 3 ) Threads are executed sequentially, i.e., if two activities can be active at the same time, they belong to different threads. (4) If a thread is active (i.e. one of its activities is active), then its subthreads are inactive. If a subthread is active then the thread is inactive. (5) The start node and the exit node belong to the same thread, called p . (6) Every thread is a subthread of p . There is a strong relationship between threads and behavior. First, threads are a lund of non-numeric place invariant (i.e., the weighted sum of the tokens is constant). Second, a model has simple control if and only if every node belongs to one thread and the thread of the start node equals the thread of the exit node. Thus, the model in Figure 12 has simple control, while the model in Figure 10 does not. Moreover, if this condition does not hold, an explanation on the origin of the problem can be derived (e.g. distributed decision between A and B would be diagnosed for Figure 9, and a connection from a subthread to a superthread between the or-nodes would be diagnosed in Figure 10, page 23). Section 5.1 defines an algebra in which terms can be interpreted as threads satisfying all these goals. Section 5.2 shows that we can assign a unique thread to each place in a model if and only if the model has simple control. Moreover, if the model has no simple control, analysis of threads sheds light into the reasons for not having simple control.

5.1

Thread Labels and Threads

The definition of threads is done indirectly via thread labels. A thread label for a place represents one possible destiny or future history for a token in that place (i.e. a set of paths that is part of a process that begins with a token in p ) . Ldcewise, a thread label for a transition represents one possible destiny for a set of tokens which enable the transition. Because there are choices, hence several possible destinies, each node in the net is assigned a set of thread labels. Only those destinies that end with one token in the exit node of the model are considered. Thus, thread labels capture successful executions.

Definition 5.1 (Thread labels) The set of thread labels of a

28

PABLO A. STRAUB AND CARLOS A. HURTADO

process model At whose net is ( P , T , F ) , denoted L,, includes the following and nothing else: 0

0 0

every place or transition x in P U T ; thesymbolp; the label multiplication a @p, where a and /3 are labels; the label addition a o/?,where a and /? are labels; ( a ) ,where a is a label.

Label equality is defined by the following axioms: (1) (2) (3) (4)

addition is associative, a @ (/?8 y ) A ( a S B ) o y; addition is commutative, a Q p = /? Q a ; multiplication is associative, a QI (B @ y ) ( a @/?) y; multiplication distributes over addition, a @ (B o y ) a @/? Q a QI y , and ( a @ / ? ) @Gya @ y s / ? @ y .

As usual, multiplication has higher precedence than addition, and juxtaposition means multiplication, e.g. a o/?8 y = a 0 (/?@ y ) = a Q B ~ . The meaning of a label is a set of paths in the net. For places and transitions, these paths have length one. Label multiplication @ denotes a sort of relational cross product of paths, i.e. an element of the product is the catenation of an element from the multiplier with an element from the multiplicand. Label addition 8 denotes union of sets of paths. It is easy to check that whenever two labels are equal then they denote the same sets of paths, because cross product and union satisfy axioms 1 to 4. A label for a place represents one future history of a token in that place. If there are two or more tokens in a process, the set of futures of the process does not include all possible combinations of futures of these tokens, because their futures might eventually interact (e.g., by synchronization). Labels are said to be consistent if whenever they refer to the same place, the future history of a token at that place is the same, i.e. decisions taken at common places are the same. The definition of label consistency is syntactic if labels are expressed in a normal form.

Definition 5.2 (Label normal form) A label is in normal form if it is written as a sum of factors, without parentheses nor multiplication symbols. Any label can be normalized, by distributing multiplication successively and then dropping all parentheses and multiplication symbols.

Definition 5.3 (Consistent labels) A set X of normalized labels is consistent if for each place p that occurs in the labels, either all occurrences

CONTROL IN MULTI-THREADED INFORMATION SYSTEMS

29

of p are the rightmost symbol of some label, or there is a transition t such that all occurrences of p are right-multiplied by t. In other words, a given place has one successor in all the labels, or no successor in all the labels.

Definition 5.4 (Model labeling) The labeling of a model A is a function z : P u T+2: from nodes of the net to sets of labels, defined by 0

the only label for the place y that corresponds to the exit node is the place itself

d P ) = {PI. 0

Labels of a transition t are computed by adding a consistent set of labels, one from each successor of t , and pre-multiplying it by t : z ( t ) = ( t m ( a , CB. .. 0 a,) 1 t'

= ( p , , .. ., p , ) A

(A:=, 0

a , € t ( y ! ) ) A ( a ,,..., a , ) isconsistent).

Labels of a place p # exit are computed by pre-multiplying labels from its successors by p . z( Y ) = U r e p . { p & a I a E z(t)I

For example, part of the labeling of the model in Fig. 6(a) is shown in Table E is a shorthand for exit. There is an equivalence relation for thread labels, which is the base of the definition of a thread. 111, where

Definition 5.5 (Label equivalence) Label equivalence, denoted G , is the least equivalence relation that satisfies the axioms for equality and the following axioms: (6) OPE.,p m c 8 a a ( 7 ) p + p , if p is the place of the exit node. TABLE111 LABELING FOR

FIG. 6(a)

30

PABLO A. STRAUE AND CARLOS A. HURTADO

Intuitively, axiom ( 5 ) represents the firing of a transition. Applying this axiom changes the interpretation of labels, though: they are not always paths in the original net (they are paths in a net that may be collapsed in parts). Finally, axiom (6) says that should the model be associated with an activity a in another calling model, p denotes q(4)in the other model. A thread is an equivalence class of thread labels. A threading is a total function from the places of a model to threads, such that the start node is mapped to the same thread as the exit node, i.e. p.

Definition 5.6 (Thread and threading) A thread is a non-empty equivalence class of labels. A labeling t such that all labels assigned to a given place p are equivalent and the start node belongs to the thread p , i.e., (b'a,B € z( p ) : a

defines a threading $J: P + 2 ;

B ) A (3E

start) :I& p )

such that W ( p ) = [ a [ 38 E t( p ) :a t P}.

Notation. We usually denote a thread by a label (one of its members), e.g., q(p ) = a means a E q(p ) . Likewise, operations on labels are extended to operations on threads. Thread equality is simply set equality; hence, with a little abuse of notation, we write a = B to mean that both labels belong to the same thread, i.e., a B. For example, the threading of the model in Fig. 12 is as shown in Table IV, in which fy is the name of the start transition of y, p o is the name of the place between the final transition of or2 and t,, and p c is the name of the place between the final transition of c and ty. Now, we cannot derive a threading for Fig. 6(a). Consider the partial labeling shown in Table llI. Node Y has two labels that must be equivalent. The first label (the one going to translation t9) can be simplified to p, because all transitions in the label have only one predecessor. The second cannot be simplified to p; in fact its simplification is vt7yp,which is different from p because transition t7 has more than one predecessor and no addition is present in the label.

Definition 5.7 (Subthread, superthread) A thread a is a subTABLEIV THREADING OF MODELOF FIG. 12

Place p exit, start, and,, andz A , or,, B , or2

C

Thread v( P )

P PJ,P P
CONTROL IN MULTI-THREADED INFORMATION SYSTEMS

thread of a thread B, denoted a p, if there is thread d such thata or a is a subthread of one of the addends of B, i.e.

31 =B@S,

a C B W (36, y : a = 6 8 B ) V ( 3 y , E : B = y @E A a c y ) . The inverse relation is called superthread. Our definition of threads meets the six goals at the beginning of this section. This is because threadings can be regarded as positive non-numeric invariants, hence properties of invariants apply to threads. Theorem 5 . I Given a model whose connected net is ( P , T , F ) , its connected net has an extra transition t, such that {exit}= ' t , and t; = { start}. Then, i f the model has a threading I),for every transition t E T U 1 f , 1 , Q P E . ,I ) ( P ) = Q , , ; + ( P I .

If the function I)is rational-valued, as opposed to thread-valued, and the symbol Q is replaced by a summation symbol, the above equation is the standard definition for so-called S-invariants in Petri nets. The characteristic property of S-invariants is the fact that the weighted sum of tokens remains constant. This property also holds for threadings. Thus, if a model has a threading, the summation of the active threads (those that have a token) is ,a, in any reachable marking M.

5.2 Threads and Behavior Given a set of compatible labels whose summation is equivalent to p , these labels represent a (partial) process. The construction of the process from the labels is rather simple. The importance of this construction is that the final state implied by the process has a single token, in place exit, that is the set of labels represents a correct process. Furthermore, it can be proven that a process model has a threading if and only if it has simple control.

5.2.1 labels and processes Different labels are related to different processes. In fact a set of labels satisfying certain constraints defines a process. Given a consistent set [ a , ,..., a,,} of normalized labels taken from a labeling z, such that QY,i a, Ap, then it is possible to build a causal net ( B , E , F ' ) with only one place without successors. The construction has the following four steps:

S1. Let a , = 6,,@ @ 6 , ,where , 6,)has no operators. Each 6, is an odd-length string of alternating places and transitions, whose last symbol is the place of the exit node. - . a

32

PABLO A. STRAUB AND CARLOS A. HURTADO

S2. The set B of places in the causal net is obtained from the places of the Q,. S3. The set E of transitions is obtained from the transitions of the 6,. S4. A node x is a successor of a node y if there is a 6, of the form axyp, where a and p are possibly empty strings. The following lemmas claim that the net so constructed is a causal net and also a process of the original net N ( A ) , where the initial state is the one that marks all states related to the labels in the set. This process ends in a state whose only token is in the exit node (i.e. a state that satisfies the simple control condition). Lemma 5. I lfall labels have no cycles (hence arefinite), the net defined by the four-step construction is a causal net, i.e. ( a ) it has no cycles; ( b ) each place has at most one successor and ( c ) each place has at most one predecessor. Lemma 5.2 I f all labels have no cycles, the causal net of the four-step construction is a process of N(A)that begins in a state with one token in each place related to the labels (and no more tokens) and ends in a state with one token in an output socket (and no more tokens).

5.2.2 Threads and simple control The purpose of this section is to relate simple control and threadings. Theorem 5.2 uses the results of the previous Section to prove that a model that has a threading has simple control. Then a series of lemmas are introduced to prove in Theorem 5.3 that a model with simple control has a threading. Theorem 5.2 (Threading implies simple control) .A has a threading 11, then A has simple control.

If a process

model

We want to prove that simple control and the existence of a threading are equivalent. To prove that all models with simple control have a threading we use some properties of connected free-choice place/transition nets. Lemma 5.3 I f a model has simple control all places have at least one label. Lemma 5.4 If a model has simple control, all labels for a given place are equivalent, i.e.,

vp E P :v 1I , I ,

E z( p ) : 1 1 A

12

Theorem 5.3 (Simple control implies threading) A model .A that has simple control has a threading.

CONTROL IN MULTI-THREADED INFORMATION SYSTEMS

33

There are three possible causes for not having a balanced threading, which can be interpreted in terms of behavioral properties. First, there might be a place with no labels. If a place in the model has no label, this implies there is a proper trap in the net (a proper trap is a set of places that once they receive a token they always have some tokens [4]) and the model has an overloaded state, unless a deadlock upstream impedes reaching the trap. Second, there might be two unequivalent labels for a place. If there is a place with unequivalent labels, this means that this place is an activity whose output sockets are connected to different threads: this implies either deadlock or overloaded state. Third it might be the case that the thread of the start node is not p; in that case it is not possible to reach the proper final state, because the threading is an invariant.

6. Applications of Thread Theory The main two applications of the theory of threads are the development of consistent models and the definition of a new modeling construct called the unbalanced connector. These two applications are related, because the unbalanced connector is basically an and-node with a special semantics that guarantees consistency. There are basically two approaches to build consistent models: (1) build a model, then deduce that the model is consistent (otherwise modify the model); or (2) inductively develop the model using a set of primitives that guarantee consistency.

Proving consistency of a model. The usual method to deduce properties like simple control is based on a standard Petri net analysis, the reachability graph, which shows all reachable states for the net. There are many useful properties derivable from the reachability graph. Unfortunately the size of the graph might in general be exponential on the size of the net, so some methods to reduce the graph are needed.4 Another method to prove simple control is by computing a labeling and showing that it is (or it is not) a balanced threading. An algorithm to compute a threading is derivable from the definition. The idea is to assign labels to the outputs and compute labels of predecessors. Whenever a place has more that one label, they are proven equivalent by term r e d ~ c t i o n .If ~ they are no1 equivalent, then there is no 4The rank theorem [4] shows that the well formedness property of free-choice nets can be checked in polynomial time. This property is intimately related to simple control, hence it can be proven that simple control is also polynomial in free choice nets [22,25,29]. ‘Because thread addition commutes, unless care is taken, attempts at automatic proofs can lead to infinite iteration.

34

0

PABLO A. STRAUB AND CARLOS A. HURTADO

threading and the model has no simple control. If they are, one label is kept as a representative and others are deleted, thus the algorithm does not compute full labels but it does compute a threading. Of course, full label sets can be computed if desired; they give details about all possible futures of the computation. The best method is of course to have a small net: this can be accomplished if complex processes are described in terms of simpler processes, analyzing first subprocesses and then the composition of these subprocesses. Building consistent modeZs. It is possible to use a context-free grammar to model the primitives in the second approach. If a model can be generated using a grammar with a set of constructions, it is possible to prove by structural induction that the model has simple control, provided that each rule preserves simple control. One possible set of rules is sequential composition, alternative composition, and parallel composition. In addition, we have as an axiom the fact that each atomic activity has simple control. These rules are sound, although they are not complete (there are process models with simple control that cannot be parsed).

In what follows we present a variant of the grammar approach called the incremental case composition method, where instead of using a contextfree grammar the model is built incrementally by adding alternatives to a socalled base model. Each alternative is added by performing an operation on the model and creating a new model. The base model has simple control and each operation preserves this property, hence no control anomalies will occur. This method is not complete in the sense that there are models with simple control that cannot be obtained using the method. Creating a complete method in this sense is an open problem. 6.1

Base model

A base model is a model that describes one execution scenario, i.e., there are no alternatives. Most corporate procedure manuals describe processes without alternatives, i.e., they describe base models. Because behavioral inconsistency is the consequence of inappropriate use of parallelism and choice and the base model has no choices, it has no control anomalies. Figure 2 shows the base model for the loan example; it assumes there are no problems whatsoever with the credit so every activity is performed successfully and once. A process model is built beginning with a base model that represents the

CONTROL IN MULTI-THREADED INFORMATION SYSTEMS

35

"normal" case (whatever that means for the process developer). Then the model is enhanced by adding alternatives. Each alternative or case has a condition (specified as an or-split), a controlling activity or set of activities, and a next state (specified as a connection). At each step, the method checks that no behavioral anomalies are added.

Definition 6.1 (Base model ) it has no or-nodes.

A behavioral model is a base model if

An interesting property of a base model is that the only process denoted by the model is the same net as N(JU), because the model has no choices. This implies that those activities not related by F are concurrent. In P/Tnet parlance, a base model is a process of itself [20]. Moreover, the base model is the onfy process of JU (defining q as the identity function). +

Theorem 6. I

Every base model hus simple control.

Because every base model has simple control, it has a balanced threading

q.To compute the threading, the net is executed backwards as a P/T-system ( P , T , F - I ) , initially defining q(exit) = p and upon firing a transition t defining

q @ ) : = p8 t 8eqG ;~ ( q )for p The following algorithm gives details.

E 't.

Algorithm: Threading of a base model Input Output Local variable

The place/transition net ( P , T , F ) of a base model At The threading i+ defined for each node of the net T ' , the set of processed transitions, and P ' , the set of processed places Loop invariant b'x E P' U T', y E P U T : x F*y w( y ) is well defined.

*

(1) Let q(exit):=,u. (2) Let P' := [ exit 1, and let T' := M. (3) While T' # T do (a) Let t be an element of T - T' such that t' 2 P'. (b) ForeachpE't, let q ( ~ ) : = p 8 f t @ ~q. (, q. ) (c) Let P' := P' U ' t , and let T' := T' U ( t ) Some comments on the algorithm: 0

This algorithm computes a labeling that has one label for each net element. Because every place has one predecessor, only one label is defined, so the labeling is a threading.

36

PABLO A. STRAUB AND CARLOS A. HURTADO

0

It is easy to show that the loop invariant is established before the loop, and that the loop invariant, together with the negated loop condition and the connectivity of the net establish the desired output. The fact that the invariant is indeed invariant is also simple, provided that there is always one transition t that satisfies the condition of statement 3(a). But that is the case, because the set of places p in P' such that ' p is not in T' represent a state of the system (i.e. it is a maximal cut in the process) and there must be a t whose firing leads to that state.

0

The threading computed by executing the net as above is an invariant. The only initial state has just one token in start, because the model has simple control. Hence, it must be the case that ly(sturt)= p , that is, ly is a threading.

6.2 Exceptions The base model describes a fixed procedure to handle a process. An exception is defined as any situation in which the base model does not apply. Exceptions are an intrinsic part of process modeling. This is specially true when modeling non-routine processes, but most routine processes do have exceptions, e.g., due to different cases or incorrect data. As in programs, exception handling in processes is done with regular branching and iteration, and also with special mechanisms that perform global control state changes, e.g., by manipulating the stack. In general, an exception comprises three parts:

(1) a control state and a condition in which the exception is raised; (2) a new control state resumed after the exception is handled; (3) an optional process that handles the exception. For example in an Ada program, the exceptional condition might be an attempt to divide by zero, the controlling activity is the exception handler, and the new control state is derived from the actions of the exception handler. We have shown that without exceptions, models have no behavioral anomalies. It is easy to prove that without parallelism there are no behavioral anomalies. Adding alternatives in the presence of parallelism can create behavioral anomalies. Consider a connection from a thread y to a thread B. If B < y the connection leads to deadlock. On the other hand, if y c B the connection may lead to an overloaded state or to multiple response. In the following subsections we show how exceptions can be added to a base model preserving the simple control property. The method adds

CONTROL IN MULTI-THREADED INFORMATION SYSTEMS

37

alternatives to a base model in such a way the the weighted sum of the tokens is always p , hence the process ends correctly and preserves the simple control property.

6.3

Alternatives within a Thread

Let .A be a model that is being developed and 1/1 its threading. In this section we describe how to extend JM by adding an alternative. The resulting model At.' becomes the new version of the model. A new alternative is specified by giving: (1) an or-node where the exception is raised; (2) An optional exception handler given by either a new activity a ; ( 3 ) an or-node where control returns after exception handling.

The addition of an alternative can be constructed with three more basic operations on the model: adding an or-node; adding a connection between two or-nodes; adding an activity.

Operation 6.1 (Adding an or-node) The new model resulting from the addition of an or-node o within a connection c = SH d is defined by Table V. Adding an or-node always preserves the simple control property. Lemma 6. I Adding an or-node o within a connection c = s + + dalways preserves the threading of existing nodes. The threading is extended by definingpsi (0)= q ( c ) .

Operation 6.2 (Adding a connection) The new model resulting from the addition of a connection c = s ~ between d or-nodes is defined by Table VI. TABLEV ADDITION OF AN OR-NODE Set

New value

A' := A 0' := 0 u I 0 ) N' := N R':= R U {s ++o,o++dJ - { s ~

d

)

38

PABLO A. STRAUB AND CARLOS A. HURTADO

TABLEVI ADDINGA CONNECTION BETWEEN OR-NODES Set New value A' := A 0' := 0

N' := N R':=RU[s-d]

This operation preserves the simple control property and the threading if and only if the source and destination belong to the same thread. Lemma 6.2 Adding a simple connection c = SH d preserves the threading of existing nodes if and only if s and d belong to the same thread. The threading is extended by defining V ( c ):= q ( d ) .

Operation 6.3 (Adding an activity) The new model resulting from the addition of an activity a within a connection c = s~ d is defined by Table VII. Like adding an or-node, adding an activity always preserves simple control. Lemma 6.3 Adding an activity a in a connection c = s o d always preserves the threading of existing nodes. The threading is extended by defining * ( a ) = ~ ( c ) .

6.4 Alternatives between Multiple Threads Sometimes adding a simple connection is not enough. In a multiple connection a new and-node n is added and instead of returning control to a single or-node, control is returned to a set D = { d , , ...,d k ]of or-nodes. The addition of a multiple connection can be constructed with three more TABLEVII ADDINGAN ACTIVITY Set New value

CONTROL IN MULTI-THREADED INFORMATION SYSTEMS

39

basic operations on the model: 0 0

0

adding an or-node; adding a multiple connection from an or-node to several or-nodes, through an and-node; adding an activity.

Of these, the first and third are already analyzed.

Operation 6.4 (Adding a multiple connection) The new model resulting from the addition of a multiple connection from an or-node s to a set D = { d , , ..., d k ] of or-nodes through an and-node n is defined by Table VIII. This operation preserves the simple control property if and only if the thread of control of the source equals the summation of the threads of control of the destinations. Lemma 6.4 Adding a multiple connection preserves the threading of existing nodes, if and only if q ( s ) =@;=I

@(dJ

The threading is extended by defining ly(sn):= @ ( S > v ( n > : =v(s) q(ndi):=q(di)

for 1 G i c k .

The operation of adding a multiple connection was defined by adding an and-node that has exactly one predecessor. If an and-node with more than one predecessor is added the resulting model is necessarily not deadlockfree. In fact it leads to the phenomenon that we call distributed decision (Figure 8(b), Figure 9), in which the execution path to follow must be decided by two or more executors, but they all must make the same decision. TABLEVIII ADDITION OF A MULTPLS CONNECTION Set

New value

40

PABLO A. STRAUB AND CARLOS A. HURTADO

Theorem 6.2 (Distributed decision) if an and-node with more than one input is added to a model with simple control, the resulting model is not deadlock-free.

6.5

Unbalanced Connectors

Not all exceptional conditions can be handled by adding alternatives with balanced connections. While exceptions are raised locally in a single thread, other parallel threads of control might be involved, e.g. an exception may need to abort activities in other threads of control. An unbalanced connector is used to abort activities; it has always one source and one or more destinations. Information processes need exception handling facilities to succinctly describe many possible outcomes of a process. In principle, all exception handling can be done using conditionals and iteration, by adding tests for all possible events in every relevant point in the model. The idea of special exception handling constructs is factoring out those tests. For example, if it is known that because of some event a whole range of activities become irrelevant, these activities should be canceled. Furthermore, if an exception cannot be handled at the level where it is detected, some form of exception propagation must occur. To enrich information process control languages we can draw from ideas prevalent in the programming languages community. However, because of multi-threading, it is not obvious that ideas from languages such as Ada, Lisp or C can be mapped to information processes. Thus, it is not surprising that few languages have exception-handling constructs. In Rasp/VPL and in WAM [15] when a token reaches the end of a model, all activities within the model are deleted. This semantics ensures that there cannot be multiple response, i.e., more than one output (defined in Section 4). An explicit construct to abort the execution of unneeded activities is the unbalanced connector [25]. The unbalanced connector is a sort of and-node in which one predecessor is explicitly shown, and other predecessors are implicit (all successors are explicitly shown). The semantics can be informally expressed as “take a token from the input and other tokens from the model so that the number of tokens is just right”. The meaning of ‘‘just right” is such that it ensures that at the end the model will produce one token on the exit node and there will be no other tokens left. The semantics of the unbalanced connector is based on the theory of threads. Unbalanced connectors are different from regular and-nodes and thus its translation into P/T-nets is different. The semantics of unbalanced connectors can be expressed using Petri nets. In the example from Figure 13, the unbalanced connector is represented in Figure 14 by three transitions, which

41

CONTROL IN MULTI-THREADED INFORMATION SYSTEMS

CFIG.

13. A process model with an unbalanced connector corresponding to the model in

FIG. 9.

unbalanced

DDO-

Uu-0

exit

start

FIG. 14. Semantics of the unbalanced connector in FIG. 13

produce a token in the place of the following or-node, and consume a token from the connector’s incoming edge and a token from the places of either B, its incoming edge, or its outgoing edge. The translation ensures that extra tokens are consumed by the one of the transitions corresponding to the connector. An unbalanced connector produces dangling activities, i.e., those whose output is not needed. Dangling activities are those whose thread is less than the threads of the destinations. In Figure 13 B is dangling. Dangling activities violate the invariant, so they must be aborted.6 In fact, dangling activities are very much like useless activities, so if they were not aborted they would lead to overloaded states. In principle, Petri nets are useful to model unbalanced connectors, but they are not practical because the number of transitions needed to represent an unbalanced connector can grow exponentially with the number of nodes in the net due to the state explosion that may occur. In a realistic loan approval example [ 2 5 ] , the CICN net had 24 nodes, 3 of which were unbalanced connectors, and the corresponding Petri net had 1266 nodes; for comparison, the CICN net without unbalanced connectors has 20 nodes and ‘This requires a user interface where users are advised that their pending activities have been interrupted. Here we d o not delve in these matters.

42

PABLO A. STRAUB AND CARLOS A. HURTADO

the corresponding Petri net has 35 nodes. On the other hand, the intuitive semantics of the unbalanced connector is relatively simple: “abort now all activities that are not required to produce output,” or more technically “delete all tokens in places whose thread is a subthread of the destination of the unbalanced connector and put one token in the destination”. An implementation can use the second intuitive semantics, with the subthread relation computed off-line when the process model is compiled into an internal form. Formally, the semantics of a deterministic unbalanced connector c is given in terms of its translation into a P/T-net, i.e., by extending the definition of N ( & ) (Definition 4.2) as follows.

Definition 6.2 (Extension of Definition 4.2) The connection from an or node s to a set of or nodes D = ( d l ,...,dk} through an unbalanced connector c is translated by adding nodes to the P/T net, once regular nodes are translated. Let PI,...,P,, be all possible sets of dangling places in the connection at any state M such that M(’c) > 0. The translation will add 1 place p and n + 1 transitions { f, t , , ...,t , , ) , connected as follows: ‘P= (tl ‘t=s ‘ t i= Piu { p )

p ’ = i f , , . ..,f”) t’ = p tl. = D.

An important property of the semantics for an unbalanced connector is that the transitions of the connector are all balanced. Thus, even if the model seems unbalanced, the P/T-net that defines its semantics is balanced. However, an important property is lost: the nets are no longer free-choice. For example, in Figure 14 the representation for the unbalanced connector shows a place with three successors, each of them with a different set of predecessors, hence the net is not free-choice. That is, unbalanced connectors mix choice and synchronization. This means that choices are constrained by the global state, i.e. they are no longer local. However, this is precisely the intended meaning: in the presence of an exception an executor might unexpectedly lose control of its thread. Lemma 6.5 Adding a multiple connection with an unbalanced connector preserves the threading, if and only if

W )G @;=

1

V(4).

The threading is extended by defrning

1v( P > := 1v(s).

CONTROL IN MULTI-THREADED INFORMATION SYSTEMS

43

6.6 Summary: Incremental Case Composition The method to build a model by incremental case composition is as follows.

(1) Identify one possible case of the model. This case might be the most likely or the one that is considered normal. (2) Create a base model by enumerating all activities of this case and identifying the precedence relation. This process is the same as creating a critical path method (CPM) chart. (3) Repeat, until all possible cases are covered: (a) Identify a condition and place of evaluation of the condition. (b) If a handling process is needed, develop it using this method or use one that is already developed. (c) Determine the new state of the process after the exception is handled. (d) Check that the connection is feasible. If not, this exception cannot be added to the model. Feasibility of a connection can be checked automatically, if the system keeps the threading as the model is being developed. For example, to develop the loan approval process, the base model shown in Figure 2 is the case in which all activities succeed (this might not be the most frequent case). Figure 15 is the final model which was built by adding several cases to the base model. The first case was the possibility of not approving the credit: this case added an or node ( u l ) after activity A, an unbalanced connector (a,), and an or connector immediately before exit. The second case was the possibility of errors in the legal data of the property: this case added a loop in activities E and F , with two new or

exit

FIG. 15. The complete loan approval process, with four exceptional cases.

44

PABLO A. STRAUB AND CARLOS A. HURTADO

nodes. The third case had to do with the possibility of the property not being appraised as valuable enough to cover the credit: this case added an activity H (“to notify the customer”) and an unbalanced connector ( a s ) .The fourth case involved letting the customer decide upon this notification whether or not to use the credit for another property: this case added or node o4 after activity H and an unbalanced connector (a6). Because all steps in the process keep the threading, it can be proven that the model has simple control. Theorem 6.3

Models built with the described method have simple

control.

6.7

Dealing with Unspecified Situations

In most business processes it is impossible to know in advance how to handle every possible situation. It is likely that regardless of how complete the model might be, at some point none of the pre-programmed alternatives apply. Thus, if a process modeling system insists on the model being complete before enaction, once an unforeseen situation arises either the problem will not be solved (“sorry, madam, the computer doesn’t allow that”) or it will be very hard to keep the computer up to date; that is, at this point the workflow system would be more hindrance than help. A system should allow addition of unspecified exceptions at run time. When such an unforeseen exception arises, the executor of the activity manually solves the problem and creates a connection to bring the process to a desired state. The system checks that the connection is feasible; if the connector is infeasible it is rejected, else execution continues as if the connection existed before enaction. Because the connection is feasible no behavioral anomalies will ensue.

6.8 General Unbalanced Connectors A generalization of the unbalanced connector includes the possibility of producing extra tokens instead of deleting them [26]. This happens if the sum of the threads of the incoming arcs is greater than the thread of the outgoing arc. A further generalization is allowing incomparable threads, so that some tokens are deleted while others are produced. Producing tokens in a thread poses the problem that there are several places in the same thread: which place should be chosen? If the choice is random, then we have a nondeterministic system with the power to decide on the work performed by people. We rule that out. Fortunately, there is a better choice: produce the token in the place right before the connector that will be

CONTROL IN MULTI-THREADED INFORMATION SYSTEMS

45

waiting for it. This connector can be determined by looking at the thread expression. Figure 16 shows the use of such a kind of unbalanced connector, depicted with dangling arcs for both predecessors and successors. The extra token is produced in such a way that it minimizes work along its thread, i.e. no work is implicitly assigned due to the unbalanced connector. In the example, when or-node or2 chooses the lower path to iterate, activity D is aborted, activities C , D , and E are re-executed, but B is executed just once. While the semantics of general unbalanced connectors imply the deletion and addition of tokens to ensure that the weighted sum of tokens is equal to p , this is not always feasible. That is, general unbalanced connectors cannot always be added from any or node to any other set of or nodes. The constraint is that the destinations must add up to something less than or equal to p. This implies in turn that no thread will be active twice, and no thread will be active if a subthread is active.

Definition 6.3 (Feasible connector) A connection from an or node s to a set of or nodes D = { d ,,..., dk] using a generalized unbalanced

FIG.16. A generalized unbalanced connector and its semantics in terms of Petri nets.

46

PABLO A. STRAUB AND CARLOS A. HURTADO

connector n is feasible if and only if Note that because C is a partial order it might be the case that these threads are uncomparable.

7.

Conclusion

The modeling of process control is a non-trivial task, especially when processes are complex. These problems are not conveniently handled in all tools we are aware of, even though it does not make much sense to analyze efficiency, timing, data flow, data availability, and resource utilization, if the model behaves incorrectly. Traditional solutions to the problem of ensuring correct behavior do not seem adequate. The use of context-free grammars to define sets of allowable models ensures correctness, but overly constrain the range of expressible models, inhibiting the natural parallelism in processes. The verification of models based on reachability graphs (i.e. finding all reachable states) or on Petri net invariants are computationally tractable methods for free-choice nets,' but they do not give clues on the causes of behavioral anomalies and possible corrections. In this article we have identified a series of relevant control properties, including deadlock freedom, useless activities, consistent abstraction, etc. These properties are related and form the basis for a notion of behavioral correctness in a model. Process semantics of Petri nets helps to determine the causal relations between activities in a process, i.e., whether two activities are independent and are executed in parallel, or they are executed in sequence. This paper defines the concept of useless activity using process semantics. Surprisingly, there are situations in which the possibility of having a useless activity cannot be avoided [26]. In this case, once an activity is determined to be useless it can be aborted. An algebraic formalization of the rather elusive concept of thread of control was given. The thread algebra is a suitable framework to understand behavioral properties; in fact, there are strong relationships between threads and behavior: behavior is correct if and only if threads of controls are properly mixed in the model, a notion that has been formally defined. We have recognized several applications of thread theory, applicable to several other languages. One application of thread theory is the incremental composition method to develop process models by iteratively adding exceptions to a so-called base 'All basic control constructs lead to free-choice nets 1251.

CONTROL IN MULTI-THREADED INFORMATION SYSTEMS

47

model. At each step, an appropriate mix of threads is kept in the control model, hence preserving the simple control property. This method can also be applied to the problem of handling an unforeseen exceptional condition at run time. A thorough understanding of control allows the modification of an executing process due to unanticipated exceptions, guaranteeing that the modified model will not have behavioral anomalies: it suffices to (incrementally) recompute the threading for the new model, using the techniques of Section 6. Using the theory of threads we have extended basic control models adding so-called unbalanced connectors that mix threads in a controlled way [ 2 3 ] . Thread theory is used to identify which activities should be canceled. This is a generalization of the rule in VPL/Rasp that once a token is put into an output socket’ of the model all pending activities within the model are canceled. While the semantics in terms of Petri nets is complex, the informal semantics can be simply stated as “abort now all activities that are not required to produce output”. The theory of threads identifies those places based on the threads of predecessors and the successor of the unbalanced connector. A further generalization of the unbalanced connector not only deletes extra tokens, but also add those that are now needed. The generalized unbalanced connector has an even more complex semantics in terms of Petri nets, but its informal semantics is still simple “abort now all activities that are not required to produce output and create tokens so as to avoid deadlock”. Again, the theory of threads identifies those places that will be affected. ACKNOWLEDGMENTS This paper has been improved by the comments of one anonymous referee.

Appendix: Proofs of Theorems This appendix includes most proofs. In some cases where the full proofs are lengthy and do not give much insight, only a sketch of the proof is provided and the reader is referred to the original proofs.

Proof of Theorem 4.1 A process model is useful if and only if it has no overloaded markings and is deadlock-free. (If.) Let n be a complete process that corresponds to one execution of the ‘An output socket of a model is the equivalent of an exit node.

48

PABLO A. STRAUB AND CARLOS A. HURTADO

P/T-system of a CICN model. Assume x has a useless place 4 ( b ) . Then there is no path from b to a place corresponding to exit, that is, if q ( x ) = exit, (b,x ) E F'. Then for every successor b' of b there is no path that leads to exit. Because n is a complete process (hence finite), there must be at least one successor b" of b that has no successors. The state denoted by x is such that M ( q ( b " ) )> 0, and a ( b " )+ exit. That is, the final state of n is either overloaded or a deadlock. (Only if.) Assume the model has an overloaded state or a deadlock. In any case there must be a place p # exit and a process x whose final state satisfies M,( p ) > 0. From the definition of reachable state of a process, there must be a place b E B such that q ( b ) = p and b' = 0.Then p is a useless place in n.

0 Proof of Theorem 4.2 A process model has simple control if and only if it is single-response and useful. (If.) From Theorem 4.1 the model has no overloaded states nor deadlocks, hence for every place p # exit, M,( p ) = 0. Because it has single response, then M,(exit) = 1, so the model has simple control. (Only if.) Immediate from the definition of simple control.

0 Proof of Theorem 5.1 Given a model whose connected net is ( P , T , F ) , its connected net has an extra transition t, such that { e x i t ] = ' t , and ti = { start }. Then, if the model has a threading V , for every transition t E T U k l , O p E . ,V ( P ) = e P € iV ( P > . We first prove that equality for t, and then for other transitions. For t,, we have ' t , = { exit] and t: = ( s t a r t ) . From the definition of label equivalence, 1/,(exit) = ,u and from the definition of labeling, V (start)= p. Let c E T be a transition different from I,. Because all labels in a given place are equivalent, the proof can be done by choosing any label from each predecessor and successor of t. Let a,, ..., a , be a compatible set of labels each for one of the successors of t. Let p be a predecessor of t. Then a label for p is p

@ t @ (a1 @

... @ a,)

and a summation of labels from predecessors of t is

[Because of rule 41 [Because of rule 71

CONTROL IN MULTI-THREADED INFORMATION SYSTEMS

49

Proof of Lemma 5.1 If all labels have no cycles (hence are finite), the net defined by the four-step construction is a causal net, i.e. (a) it has no cycles; (b) each place has at most one successor; and (c) each place has at most one predecessor. For a proof of this lemma, please refer to [7]. The proof hinges on the fact that if the net had a place with more than one successor, then the labels would not be consistent. Now, if no place has more than one successor, a cycle would be an infinite cycle only obtainable from an infinite label or from an infinite set of finite labels, but all labels are finite. Finally, if there were a place with more than one predecessor, the labels would be irreducible to p by the given rules. 0 Proof of Lemma 5.2 If all labels have no cycles, the causal net of the four-step construction is a process of A'(&) that begins in a state with one token in each place related to the labels (and no more tokens) and ends in a state with one token in an output socket (and no more tokens). For a proof of this lemma, please refer to [ 7 ] . The proof hinges on the fact that all labels end in the exit node and that if the labels add up to p they must be reducible to p. 0 Proof of Theorem 5.2 If a process model A, has a threading q then A, has simple control.

Because the threading is a place invariant (proved in Theorem 5.1) every reachable state M must satisfy v ( M ) = p . Let ( a l ,..., a,,] be a set of consistent labels, one from each place marked in M (these labels exist, just take for each successor transition the same label). Because the labeling is a threading, this set of labels adds up to p , hence it is possible to build a process that ends with a single token in the exit node. Hence, given any reachable state it is possible to reach an adequate final state (i.e. no reachable state is deadlock or overloaded or multiple response). Thus the model has simple control. 0

Proof of Lemma 5.3 If a model has simple control all places have at least one label. For a proof of this lemma, please refer to [7]. The proof is based on 2 other lemmas, which state: (1) If the connected net has a trap' that does not include start, then the model has no simple control. (2) If there is a labeling 'In Petri net theory, a trap is a set X of places such that x'

'X.

50

PABLO A. STRAUB AND CARLOS A. HURTADO

that includes a place without labels, then the net has a trap that does not include start.

0 Proof of Lemma 5.4 If a model has simple control, all labels for a given place are equivalent, i.e. t l p E P :V l , ,I, E t(p ) : I, 2 I, Assume there is a place p with two non-equivalent labels I, it I,, such that all successors of p only have equivalent labels (i.e. p is the first offending place discovered by a labeling algorithm working backwards). Let M be a reachable state that marks p . This state must exist because if the net has simple control, the net is live [22,29]; because the net is safe (or 1-bounded) then M ( p ) = 1. Then, let M' be a state obtained from M by firing transitions that do not consume the token in p until no more transitions can be fired without consuming that token. Because the model has simple control, from M' all processes end in a state with a single token in the exit place. Thus, it is possible to choose one label from each place marked by M' such that the summation of all labels is p. In particular, we might choose I, to be in that set. From the set of labels we can construct a process. In this process all places but those marked by M' are successors (under F ' ) of p . Now, because the election of I, was arbitrary ...

0 Proof of Theorem 5.3 A model At that has simple control has a threading. From Lemma 5.3 all places have a label. From Lemma 5.4 all labels in a given place are equivalent. To prove that the labeling defines a threading we only need to prove that a label a for the start node is equivalent top. But that must be the case, because from the second part of the proof of Theorem 5.1 if all labels are equivalent the threads are an invariant. Now, because the model has simple control the threads of all final states add up to p , hence the thread of the initial state must also be p. (This can be seen more clearly by executing the net backwards, the invariant property holds both ways.) 0 REFERENCES

1. Abbati, D., Caselli, S . , Conte, G., and Zanichelli, F. (1993). Synthesis of GSPN Models for Workload mapping on Concurrent Architectures. Proceedings of the International Workshop on Petri Nets and Performance models. 2. De Michelis, G., and Grasso, M. A. (1993). How to put cooperative work in context:

CONTROL IN MULTI-THREADED INFORMATION SYSTEMS

51

Analysis and design requirements. In Issues of Supporting Organizational Context in CSCW Systems (L. Banon and K. Schmidt, Eds). 31 August. 3. Dennis, A. R., Hayes, G. S., and Daniels, R. M. (1994). Re-engineering business process modeling. Proceedings of the Twenty-Seventh Annual Hawaii International Conference on System Sciences. 4. Desel, J., and Esparza, J. (1995). Free-choice Petri Nets. Tracts in Theoretical Computer Science 40, Cambridge University Press, Cambridge. 5. Ellis, C. A,, and Keddara, K. (1995). Dynamic Change within Workflow Systems. University of Colorado Technical Report, July. 6. Gula, J. A,, and Lindland, 0. A. (1994). Modeling cooperative work for workflow management. 6th International Conference on Advances Information Systems Engineering, CAISE, June. 7. Hurtado, C. A. (1995). Modelaci6n y Analisis de Procesos de Informaci6n. Tesis de Magister, Pontificia Universidad Cat6lica de Chile, Santiago (in Spanish). 8. Harel, D., Lachover, H., Naamad, A., Pnueli, A., Politi, M., Sherman, R., ShtullTrauring, A., Trakhtenbrot, M. R. (1990). STATEMATE: A Working Environment for the Development of Complex Reactive Systems. IEEE Trans. on Software Engineering, 16(4). 9. Integration Dejnition for Functional Modeling (IDEFO). National Instihte of Standards and Technology, USA 1992. 10. Curtis, W., Kellner, M. I., and Over, J. (1992). Process modeling. Communications ofthe ACM, 35(9). 11. Lamb, D. A. (1988). Software Engineering: Planning for Change. Prentice-Hall, Englewood Cliffs, N.J. 12. Lauterbach, K. (1987). Linear algebraic techniques for place/transition nets. Lecture Notes in Computer Science, 255. 13. Malone, T. W., and Crowston, K. (1994). The Interdisciplinary Study of Coordination. ACM Computing Surveys, 26(26). 14. Medina-Mora, R., Winograd, T., Flores, R., Flores, F. (1992). The Action Workflow approach to workflow management technology. Proceedings of CSCW, November. 15. Messer, B., and Faustmann, G., (1995). Efficient Video Conference via Workflow Management systems. Workshop “Synergie durch Netze”, Universitat Magdeburg, October. (English translation by the authors.) 16. National Institute of Standards and Technology (NIST) (1993). Integration Definition for Function Modeling (IDEFO). FIPS Pub 183, NIST, December. 17. Parry, M. (1994). Reengineering the Business Process. The Workflow Paradigm, Future Strategies Inc., 1994. 18. Peters, L., and Schultz, R. (1993). The application of petri-nets in object-oriented enterprise simulations. Proceedings of the 27th Annual Hawaii International Conference on System Sciences, 1993. 19. Pratt, T. W., and Zelkowitz, M. V. (1996) Programming Languages Design and Implementation. Prentice-Hall, Englewood Cliffs, N.J. 20. Reisig, W. (1985). Petri Nets: An Introduction. Springer-Verlag. Berlin. 21. Robinson, M. (Ed.) (1991). Computer Supported Cooperative Work. Cases and Concepts. Proceedings of Groupware ’91. Software Engineering Research Center. 22. Straub, P., and Hurtado, Carlos A. (1995). The simple control property of business process models. XV International Conference of the Chilean Computer Science Society, M c a , Chile, 30 October-3 November. 23. Suaub, P., and Hurtado, Carlos A. (1995). A theory of parallel threads in process models.

52

24.

25.

26.

27. 28.

29.

30.

PABLO A. STRAUB AND CARLOS A. HURTADO

Techical Report RT-PUC-DCC-95-05. Computer Science Department, Catholic University of Chile, August (In URL ftp://ftp.ing.puc.cl/puWescuelddcc/techReportslrt95-05.ps). Straub, P., and Hurtado, Carlos A. (1996). Understanding behavior of business process models. In Coordination Languages and Models, First International Conference, Coordination’96, LNCS 1061, Springer, Cesena, Italy, April 15-17. Straub, P., and Hurtado, Carlos A. (1996). Business process behavior is (almost) freechoice. In Computational Engineering in Systems Applications, Session on Petri Nets for Multi-agent Systems and Groupware, Lille, France, July 9-12. Straub, P. and Hurtado, Carlos A. (1996). Avoiding useless work in workflow systems. International Conference on Information Systems Analysis and Synthesis, ISAS’96, International Institute of Informatics and Systemics, 14269 Lord Barclay Dr., Orlando, USA, 22-26 July. Swenson, K. D. (1993). Visual support for reengineering work processes. Proceedings of the Conference on Organizational Computing Systems, November. Touzeau, P. (1996). Workflow procedures as cooperative objects. In Computational Engineering in Systems Applications, Session on Petri Nets for Multi-agent Systems and Groupware, Lille, France, 9-12 July. van der Aalst, W. M. P. (1995). A class of Petri nets for modeling and analyzing business processes. Computing Science Report 95/26, Dept.’ of Computing Science, Eindhoven University of Technology, August. Workflow Management Coalition (1994). Glossary. Document no. TC00-0011, 12 August.

Parallelization of DOALL and DOACROSS Loopsa Survey A. R. HURSON, JOFORD T.LIM The Pennsylvania State University Department of Computer Science and Engineering University Park, PA

KRISHNA M. KAVl The University of Texas at Arlington Department of Computer Science and Engineering Arlington, TX

BEN LEE Oregon State University Department of Electrical and Computer Engineering Corvallis, OR

Abstract Since loops in programs are the major source of parallelism, considerable research has been focused on strategies for parallelizing loops. For DOALL loops, loops can be allocated to processors either statically or dynamically.When the execution times of individual iterations vary, dynamic schemes can achieve better load balance, albeit at a higher runtime scheduling cost. The inter-iteration dependencies of DOACROSS loops can be constant (regular DOACROSS loops) or variable (irregular DOACROSS loops). In OUI research, we have proposed and tested two loop allocation techniques for regular DOACROSS loops, known as Staggered distribution (SD) and Cyclic Staggered (CSD) distribution.This article analyzes several classes of loop allocation algorithms for parallelizing DOALL, regular, and irregular DOACROSS loops.

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Loop-scheduling Algorithms for DOALL Loops . . . . . . . . . . . . . . . . . . 2.1 Self-scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Fixed-size Chunking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

53 ADVANCES IN COMPUTERS. VOL. 45

54 55 56 57

Copyright0 1997 by Academic Ress Ltd. All rights of reproduction in any form reserved.

54

3. 4.

5. 6.

7.

A. R. HURSON ETAL.

2.3 Guided Self-scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Factoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Trapezoid Self-scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparative Analysis of DOALL Loop-scheduling Schemes . . . . . . . . . . . DOALL Loop Scheduling on NUMA Multiprocessors . . . . . . . . . . . . . . . 4.1 Affinity Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Partitioned Affinity Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Locality-based Dynamic Scheduling . . . . . . . . . . . . . . . . . . . . . . Comparison of Affinity-scheduling Schemes . . . . . . . . . . . . . . . . . . . . DOACROSS Loop Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 The Regular DOACROSS Model . . . . . . . . . . . . . . . . . . . . . . . 6.2 Comparison of DOACROSS Scheduling Schemes . . . . . . . . . . . . . . 6.3 Irregular DOACROSS Loop Scheduling . . . . . . . . . . . . . . . . . . . . 6.4 Comparison of Irregular DOACROSS Scheduling Schemes . . . . . . . . . 6.5 Other Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.

57 58 59 59 63 64 64 66 67 71 71 76 79 87 89 YO 101

Introduction

Since loops are the largest source of parallelism (Polychronopoulos et al., 1986) considerable attention has been paid to the partitioning and allocation of loop iterations among processors in a multiprocessor environment. The key goal is to maximize parallelism while minimizing the processor load imbalances and network communication. The literature abounds with scheduling algorithms for loops. These algorithms can be categorized as static and dynamic (Krothapalli and Sadayappan, 1990). In static scheduling, the division of iterations among the processors is determined prior to the execution time. This results in a low runtime scheduling overhead. On the other hand, static scheduling can cause unbalanced distribution of load among the processors if the execution times of individual iterations vary. The variance in execution can result from conditional statements (Hummel et al., 1992), or because of interference from the environment (the operating system, switching between iterations or time-sharing with other programs). Dynamic scheduling determines the division of iterations among processors at runtime. Some algorithms may dynamically reassign iterations to different processors based on the progress made by processors on previously assigned iterations. Thus, dynamic schemes can achieve better lead balance, but this comes at the expense of runtime scheduling overhead. Loops can be categorized as sequential loops, vector loops (DOALL), and loops of intermediate parallelism (DOACROSS) (Cytron, 1986). For a DOALL loop, all N iterations of the bop can be executed simultaneously.

DOALL AND DOACROSS LOOPS

55

When there is a sufficient number of processors, all iterations can be executed in parallel. But with a finite number of processors, iterations are divided among the processors. When iterations of a loop must be executed completely sequentially (sequential loops), no improvement can be gained by using multiple processors. However, some loops may exhibit intermediate levels of parallelism permitting some overlapped execution among iterations. The DOACROSS loops model proposed by Cytron (1986) can mimic sequential loops, vector loops and loops with intermediate levels of parallelism. Iterations may be either data- or control-dependent on other iterations. Control dependencies are caused by conditional statements. Data dependence appears in the form of sharing results computed by other iterations. Data dependence can be either lexically forward (data from higher indices used by iterations with lower indices) or lexically backward (data from lower indices used by iteration with higher indices). Normally, lexically forward dependencies (LFD) do not contribute to delays in executing loop iterations. Sometimes a lexically backward dependence (LBD) can be transformed into a lexically forward dependence by reordering the statements of the loop, provided the statements do not form a dependence cycle (Cytron, 1986). DOACROSS loops where the LBD cannot be transformed into LFD lead to delays in executing successive iterations. Such loops are the subject of most research. This chapter presents an introduction to several loop allocation techniques and analyses these techniques for their complexity, scheduling overhead, communication cost, processor utilization and expected speedup. Section 2 surveys DOALL loop scheduling algorithms and Section 3 compares these algorithms. Section 4 presents affinity scheduling schemes for DOALL loops, while Section 5 compares these techniques. Regular and irregular DOACROSS loop scheduling algorithms are presented and analysed in Section 6.

2.

Loop-scheduling Algorithms for DOALL Loops

Static scheduling schemes assign a fixed number of loop iterations to each processor. For a loop with N iterations executed on P processors, each processor will receive a total of [ N I P 1 iterations. Variations on how these iterations are distributed among the available processors lead to different algorithms. Block scheduling or static chunking (SC) assigns iterations 1 through [ N I P ] to the first processor, iteration [ N I P 1 + 1 through 2 * r N / P 1 to the second processor, and so on. Cyclic scheduling allocates iterations i, i + P , i + 2 P , ..., to processor i (1 d is P ) .

56

A. R. HURSON ETAL.

When the execution times of individual iterations vary, static chunking leads to different processors performing different amounts of work, and finishing their computations at different times. For example, when the execution times of iterations monotonically decrease (i.e., triangular iteration space), the chunks containing smaller iteration indices consume more time than chunks containing iterations of higher indices. In such a case, the execution time of the DOALL loop is bounded by the completion times of the earlier chunks. Thus static chunking could perform suboptimally, and cause under-utilization of processor resources (Hummel et al., 1992). Since cyclic scheduling assigns consecutive iterations to different processors, a better load balance across processors is achieved. The main advantage of static scheduling methods is their simplicity (hence small scheduling overhead). No runtime overhead is incurred by such methods since all scheduling decisions are made at compile time. This implies the availability of information on loop bounds and number of processors. When such information is not known statically (or changes dynamically), static scheduling methods lead to unbalanced workload among processors. Dynamic scheduling schemes are proposed to address the limitations of static methods. Typically, shared variables and critical sections are used to control the distribution of iterations to idle processors. Thus, an idle processor locks the shared variable and obtains an iteration (or a set of iterations). This leads to runtime overhead, both in terms of the time required to access the shared variable (including communication cost and synchronization cost), and the time needed to compute the next schedule. Complex dynamic schemes with high scheduling costs and large communications costs may negate any performance gained. In order to simplify the analysis, we will assume that the scheduling cost per scheduling step, for all dynamic schemes will be the same, and is given by TsCw= 2C + twhed,where C is the communication cost for accessing a shared variable as well as the communication cost for returning an updated value to the shared variable, is the time required to calculate the chunk size. Some of the and t,, dynamic scheduling algorithms are discussed below.

2.1

Self-scheduling

Self-scheduling (SS) (Tang and Yew, 1986) is a dynamic scheme that schedules iterations of a loop, one at a time. An idle processor obtains a new iteration and executes it. Hence, processors finish at nearly the same time and the workload is balanced. However, since this method requires N scheduling steps (one for each iteration), the overall scheduling cost may be unacceptable. In addition, processors may have to contend with synchronization delays

DOALL AND DOACROSS LOOPS

57

in accessing shared variables. For example, with P processors attempting to obtain a loop iteration, one processor must wait ( P - 1)T,h& waiting for all the other processors to access and update the shared variable. The average wait time for P processors is given by P ( P - 1)Tsched/2.With N iterations, the average wait time per processor is given by N ( P - 1)cched/2.

2.2

Fixed-size Chunking

In an attempt to reduce the number of scheduling steps needed, Fixed-size chunking (FS) schedules a fixed number of iterations to each idle processor (as opposed to one iteration in SS) (Kruskal and Weiss, 1985). This reduces scheduling overhead, but the trade-off is increased load imbalance due to coarser task granularity. It is often difficult to determine the optimal number of iterations to schedule at each step. Small chunks increase the number of scheduling steps (hence scheduling overhead), while large chunks may cause imbalanced load across processors. Kruskal and Weiss (1985) have proposed a scheme to calculate an optimal chunk size based on the number of iterations, the number of processors, the standard deviation of the execution times of individual iterations, and the scheduling overhead. Since it is often difficult to determine the variance among the iteration execution times before executing them and because the variance may depend on the environment of the processor to which they are assigned, this method is not practical for real applications. Several schemes have been proposed to minimize the limitations suffered by both self-scheduling and fixed-size chunking (Hummel et al., 1992; Polychronopoulos and Kuck, 1987; Tzen and Ni, 1991). These schemes are based on scheduling chunks with decreasing number of iterations. Typically, larger chunks are initially scheduled, reducing the scheduling overhead, while smaller chunks are subsequently scheduled to smooth any load imbalances resulting from previous assignments.

2.3

Guided Self-scheduling

In guided self-schedulin (GSS), the size of the chunk scheduled to the next idle processor is [ R I P , where R is the number of remaining iterations (Polychronopoulos and Kuck, 1987). Thus, the chunk size varies from r N / P1 iterations down to one iteration. This algorithm allocates large chunks at the beginning of a loop’s execution to reduce the scheduling overhead. Smaller chunks are then allocated as the number of remaining iterations to be executed decreases. The last P - 1 chunks consist of one iteration that can be used to balance the load, thus increasing the likelihood that all processors finish at the same time. A feature of GSS is that approximately two thirds of

4

58

A. R. HURSON ETAL.

the remaining iterations are allocated over every P chunks (Hummel ef al., 1992). For example, if there are N = 100 iterations to be executed on a fourprocessor system, the sizes of the chunks are: 25, 19, 14, 1 1 , 8 , 6 , 5 , 3 , 3 , 2 , 1, 1, 1, 1. It should be noted that GSS addresses the problem of uneven starting times of processors resulting from the delays in acquiring the chunks. Simulations involving constant-length iterations and uneven processor starting times, as well as iterations with variable-length running times, were conducted and found that GSS performs better than the SS method (Polychronopoulosand Kuck, 1987). The number of scheduling steps required for GSS, in the best case, is P , when the number of iterations N is approximately equal to the number of processors P. Otherwise, the maximum number of scheduling steps is l'Hr,,,,pl, where H,=ln(n)+ y 1/(2n) is the nth harmonic number and y = 0.5772157 is the Euler's constant (Polychronopoulos and Kuck, 1987). For large N this approximates to P l d N / P 1 (Yue and Lilja, 1994a). The number of scheduling steps required for GSS is more than that for SS, but less than that for FS. Although GSS often achieves a balanced load when iteration execution times vary widely, it is still possible that some initial chunks (due to their large sizes) do not complete by the time all other chunks have completed.

2.4

Factoring

Factoring was specifically designed to handle iterations with widely varying execution times (Hummel et al., 1992). Similar to GSS, this scheduling strategy uses variable and decreasing chunk sizes. At each round, factoring schedules half of the remaining iterations into P equal sized chunks. In other words, each chunk contains rR/(2P)1 iterations, where R is the number of unscheduled iterations. Factoring allocates smaller initial chunks than GSS, hence alleviating one of the main problems of GSS. The chunk sizes for N = 100 iterations to be executed on a four-processor system are: 4 chunks with 13 iterations each, 4 chunks with 6 iterations each, 4 chunks with 3 iterations each, 4 chunks with 2 iterations each, and finally 4 single-iteration chunks. The chunk size for factoring is determined by: I$=

I(;)"

j d 0,

where Kj is the chunk size for factoring step j , N is the total number of iterations, and P is the number of processors. The number of scheduling steps can be determined by setting Kjto one and solving equation (1) for j-the number of factoring steps. However, since factoring schedules P

DOALL AND DOACROSS LOOPS

59

equal size chunks per batch (factoring step), the total number of scheduling steps is approximately equal to d 1 . 4 4 I n ( N / P ) 1 (Yue and Lilja, 1994a). As can be seen, the number of scheduling steps for factoring is 1.44 times that for GSS. However, it has been shown that factoring performs better than GSS when the iteration execution times vary significantly (Hummel et al., 1992).

2.5

Trapezoid Self-scheduling

Trapezoid self-scheduling (TSS) is another scheme that is developed for loops with varying iteration execution times (Tzen and Ni, 1991), by using variable and decreasing chunk sizes. TSS attempts to reduce the synchronization cost of obtaining work by individual processors by simplifying the scheduling computations in the critical section. TSS uses a simple linear function to determine the size of the chunk allocated at each step. This method will rely on a programmable number for the size of the first and final chunks, f and 1. The sizes of the chunks between successive scheduling steps are decreased by s = ( f - Z)/(C - I), where C = r 2 N / ( f + f)l is the number of chunks to be scheduled. Thus, the first chunk size is c1= f, and the second is c2 = cl - s. In general, c,,, = c, - s. The typical values for the first and last chunks are f = ( N / 2 P ) and 1 = 1 (Tzen and Ni, 1991). The number of scheduling steps for trapezoid selfscheduling is equal to the total number of chunks C , which ranges from 2P to 4P. For large N , the total number of scheduling steps is approximately equal to 4 P (Yue and Lilja, 1994a). TSS allocates smaller initial chunks than GSS, and requires fewer scheduling steps than factoring.

3. Comparative Analysis of DOALL Loop-scheduling Schemes The advantages and disadvantages of the various scheduling algorithms are summarized in Table I. As can be seen, fixed-size chunking requires the smallest number of scheduling steps while self-scheduling requires the most. Fixed chunking is more efficient since the chunk sizes can be determined at compile time. Unldce fixed chunking, self-scheduling balances the load on processors more evenly; however, the N scheduling steps needed may offset any performance gains. Since the processor must access a shared variable to obtain work, SS also adds delays due to network and memory contention. Factoring requires more scheduling steps than GSS, but the chunk size is computed less frequently (every P steps instead of every step in GSS). Factoring allocates more smaller chunks than GSS in order to balance the

TABLEI COMPARATIVE ANALYSIS OF DOALL SCHEDULLNG ALGORITHMS ~~

Algorithm

Scheduling steps

Chunk size

Selfscheduling (SS)

N”

1

Fixed size chunking (FS 1

Ph

Guided Selfscheduling (GSS)

~~~

~~

Advantages

Disadvantages

Can balance the workload well.

Requires N scheduling steps. Should only be used in systems in which the overhead for accessing shared variables is small. Chances of network and memory contention are very high. Contention for network and memory becomes a major problem.

El

Requires the minimum number of scheduling steps. Chunk size can be determined at compiletime or during run-time before the loop is executed.

May not balance the workload very well, especially if the variance in iteration execution times is large.

El

Trade off between load balancing and scheduling overhead. Number of scheduling steps between S S and FS, and tries to handle variations in iteration times by balancing the workload.

Early chunk could be so large, it does not complete by the time all other chunks have completed. The current chunk size must be calculated at every step.

Factoring

P 1.44 In11

Trapezoid Selfscheduling

4P

c,,

, = c,

- s"

(TSS)

Allocates more smaller chunks than GSS in order to balance the workload. Chunk size only needs to be calculated every P steps.

Requires more scheduling steps than GSS.

The chunk size decreases linearly, hence the difference between the current chunk and the next chunk is constant. The calculation of the current chunk size can be performed in parallel, eliminating the need for a critical section. Fewer scheduling steps than GSS and factoring when the iteration-to-processor ratio is larger than 55 and 16, respectively (Yue and Lilja, 1994a).

The algorithm for computing the chunk size is fairly complex. Allocates larger portions of remaining work to later chunks, which may generate large load imbalances for the last few scheduling steps.

Number of iterations. Number of processors. Number of remaining iterations. s = (f- I ) / ( C - I), where C = r 2 N / ( f + and c , = f. The typical values for the first and last chunks are f = N / 2 P and I = 1 (Yue and Lilja, 1994a). a

/)I,

62

A. R. HURSON ETAL.

load, accounting for the increased number of scheduling steps. The earlier chunks in GSS may take longer to execute than all other chunks, leading to unbalanced load, particularly when the execution time of iterations decreases with increasing indices. It has been shown that when the ratio of the number of iterations to the number of processors is larger than 55 TSS requires fewer scheduling steps TABLEI1

NWER OF ITERATIONS ASSIGNED TO A PROCESSOR AT EACHSCHEDULING STEP W m n = 1o00, P = 4 TSS

f = 125, Step

GSS

1 2 3 4 5 6 7 8 9

250 188 141 106 79 59 45 33 25 19 14 11 8 6 4 3 3 2 1 1 1 1

10 I1 12 13 14 15 16 17

18 19 20 21 22 23 24 25 26 21 28 29 30 31 32

-

FS

Factoring 125 125 125 125 63 63 63 63 31 31 31 31 16 16 16 16 8 8 8 8

4 4 4

-

4

-

1

2 2 2 2 1 1 1

1=1

DOALL AND DOACROSS LOOPS

63

(4P steps) than that required by GSS, and when the ratio is 16, TSS requires fewer scheduling steps than Factoring (Yue and Lilja, 1994a). This is true, because the next chunk size differs from the current chunk size by a constant, and thus the scheduling computation is simpler. In TSS, even later chunks may remain large, potentially causing load imbalance. GSS and factoring, on the other hand, guarantee that the last P chunks contain only one iteration per chunk. These small chunks can be used to balance the finishing times of all processors. Performance of GSS, factoring, self-scheduling, and static chunking have been simulated on the RP3 multiprocessor platform for several benchmark loops (Hummel et al., 1992). This study shows that factoring is scalable, and unlike GSS, its peformance is resistant to variance in iteration execution time. In another study, it was shown that GSS did not perform well when the variance in iteration execution times is large (e.g., adjoint convolution programs) (Yue and Lilja, 1994a). GSS assigns too much work at the beginning of the execution and does not save enough work at the end for balancing the load. Factoring and TSS balance the workload better than the other methods. These studies also suggest that none of the algorithms perform well when N is small since there is insufficient work to offset the overhead of scheduling. Since the scheduling overhead is minimal for staticchunking and fixed-size chunking, they perform better when the variance among iteration execution times is small. Table I1 shows the number of iterations assigned to a processor at each scheduling step for GSS, fixed-size chunking (FS), factoring, and TSS.

4.

DOALL Loop Scheduling on NUMA Multiprocessors

The loop-scheduling algorithms discussed in the previous sections assumed a shared memory with uniform memory access costs, and hence our discussion did not take data locality into consideration. In this section we will introduce scheduling schemes designed for shared memory systems with non-uniform memory access (NUMA) where the memory access cost increases with the distance between the processor and the memory. Such scheduling methods should consider the location of the data to improve the performance of parallel loops. Loop iterations can be viewed as having an afinity to the processor which contains the required data (Markatos and LeBlanc, 1992). To exploit processor affinity, loop iterations are normally scheduled on processors that contain the required data either in their local memories or cache memories. Such an assignment can significantly reduce execution times-by as much as 30-60% (Subramaniam and Eager 1994).

64

A. R. HURSON ETAL.

4.1 Affinity Scheduling Afiniiy scheduling (AFS) is an algorithm which attempts to balance the workload, minimize the number of synchronization operations, and exploit processor affinity (Markatos and LeBlanc, 1992). The affinity of a loop iteration to a particular process is due to: (i) the same data is repeatedly used by successive executions of a loop iteration (e.g., a parallel inner loop within an outer sequential loop), and (ii) the data is not removed from the local memory or cache before it is reused. In AFS, the iterations of a loop are divided into chunks of size rN/P1 iterations and each chunk is statically assigned to a different rocessor. When a processor becomes idle, it takes the next chunk of rl/P iterations from its local work queue and executes them. When a processor com letes its assigned iterations, it finds a heavily loaded processor, and takes a 1/P1 fraction of that processor’s unexecuted iterations and executes them. The initial assignment of chunks to processors in AFS is deterministic. That is, the ih chunk of iterations is always assigned to the i* processor. Normally, this ensures that repeated executions of the loop will access data that is local to the processor. The AFS assumes a balanced load at initial assignment and assigns an equal number of iterations to all processors. Each processor can access its local work queue independent of other processors. As load imbalances occur due to variances in iteration execution times, iterations are migrated from heavily loaded processors to lightly loaded processors. Such migrations can cause the data to be migrated twice; from the heavily loaded processor to the lightly loaded processor to balance the work load, and back to the heavily loaded processor for the purpose of maintaining the original affinities. This in turn could lead to penalties due to cache reload and negate any performance gained from processor affinities. It should be remembered that the overhead is incurred only when load imbalances occur. The synchronization costs associated with accesses to the local and remote work queues are the same and equal to O(Plog(N/P2)). Hence, AFS incurs at most a cost of O ( P log(N/P2) + P log(N/P*)) in synchronization operations or scheduling steps on each work queue (Markatos and LeBlanc, 1992). AFS offers higher performance than the dynamic scheduling schemes previously discussed, since synchronization operations on local work queues are usually less expensive than global synchronization operations. Moreover, network traffic is reduced since processors independently schedule iterations from their local work queues.

f

f

4.2 Partitioned Affinity Scheduling For loops with widely varying execution times, two affinity scheduling algorithms have been proposed (Subramaniam and Eager, 1994). These

DOALL AND DOACROSS LOOPS

65

algorithms are based on the assumption that iteration times vary in a correlated fashion, i.e., the execution time for the ’i iteration is a function of i (for example, a linear function gives a “triangular” iteration space). In this case, a uniform initial allocation of iterations to all processors may result in an unbalanced load. These algorithms are discussed below.

4.2.1

Dynamic Partitioned Affinity Scheduling

The dynamic partitioned aflnity scheduling (DPAS) algorithm is very similar to AFS, except that it balances the load by readjusting the sizes of the allocated partitions on subsequent executions of a loop. This, in turn, reduces the cache reload due to migration of work that occurs in AFS. The algorithm keeps track of the number of iterations that were actually executed by each processor and computes the distribution of iterations for subsequent scheduling steps. The algorithm consists of three phases. (1) Loop initialization phase: As in AFS, this phase partitions the loop iterations into chunks of N I P iterations. This chunk size is not used for further execution steps. (2) Loop execution phase: A processor removes 1/P of the iterations from its local work queue and executes them. If a processor’s work queue is empty, it finds a heavily loaded processor, removes 1/P of the iterations from this processor and executes them. An array called executed is used for keeping track of the actual number of iterations executed by each processor. (3) Re-initialization phase: This phase performs the adjustment to the size of the initial chunks by calculating a new chunk size to be assigned to each processor. Assuming that processors are numbered from 0 to P - 1 , the new chunk size is computed as:

when i = 0 partition-start[i] = 0; partition-end[i] = executed[i] - 1; when i > 0 partition-start[i] = partition-end[i - 1] + 1; partition-end[i] = partition-start[i] + executed[i] - 1; By dynamically changing chunk sizes, the DPAS is cap,-,: of hanc ing imbalances in workloads resulting from varying iteration execution times. The scheduling overhead for DPAS is less than that for AFS, since the synchronization costs associated with remote work queues will decrease on each subsequent execution of the loop. Eventually, the only synchronization operations needed are those associated with local work queues.

66

A. R. HURSON ETAL.

4.2.2 Wrapped Partitioned Affinity Scheduling The wrapped partitioned a@nity scheduling (WPAS) aims to rectify the load imbalances of GSS. Iterations are allocated in a wrapped-around fashion whereby a processor is assigned iterations that are at a distance P (the number of processors in the system) from each other. The implementation of WPAS is very similar to that of AFS, except for the wrapped allocation of iterations to a processor. An example of a wrapped allocation of a loop with 18 iterations indexed from 0 to 17, and for 4 processors is shown below (Subramaniam and Eager, 1994). processor 1: processor 2: processor 3: processor 4:

2 7 4 5

6 11 8 9

10

14

3

15 12 13

0 16

1

17

The wrapped assignment of iterations results in assigning consecutive iterations to distinct processors, thus violating spatial locality. Since cache often misses load blocks of data that may belong to multiple iterations, processors may not be able to take advantage of the data localities resulting from large cache blocks. It is difficult to partition the data to fit the wrapped allocation and yet take advantage of large cache blocks. When the number of processors is small, it is possible that a large block will cache data belonging to successive iterations assigned to the same processor, thus exploiting cache localities.

4.3

Locality-based Dynamic Scheduling

AFS and DPAS assume that the data locality can be exploited only when the data is partitioned and distributed in blocks. The locality-based dynamic scheduling (LDS) algorithm (Li et al., 1993), on the other hand, takes data placement into account, by requiring processors to first execute those iterations for which data is locally available. Thus, the LDS can adapt to any data partitioning methods, including cyclic or block-cyclic. In LDS, the data space is assumed to be partitioned to reside on P processors. When a processor is ready to execute the next chunk, it computes the chunk size as rR/(2P)1. This creates chunks about half as large as those in GSS. The processor must then decide which iterations of the chunk to execute. Unlike other dynamic scheduling algorithms, processors in LDS do not execute iterations of a chunk in the order of the indices. For example, if the data distribution is cyclic, a processor may execute iterations in the following order: p + P , p + 2P, ...,p f S,*P, where S,, is the chunk size assigned to the processor. If data distribution is

DOALL AND DOACROSS LOOPS

67

block-cyclic, the processor will execute iteration in the following order p * B + 1, p * B + 2, ..., p * B + S , , where B is the block size. As with other affinity scheduling methods, in LDS, an idle processor acquires work from a heavily loaded processor. The number of synchronization operations for LDS is O ( P log N ) . Unlike AFS, DPAS, and W A S , in LDS each processor must access a central work queue or location to obtain the size of the next chunk. This can lead to network traffic and synchronization delays.

5. Comparison of Affinity-scheduling Schemes Table I11 summarizes the advantages and disadvantages of the various affinity-scheduling schemes discussed in the previous section. In AFS, DPAS, and W A S , each processor independently schedules iterations from its local chunk. Thus, AFS, DPAS and WPAS do not require a central queue and they reduce the overhead due to network congestion and synchronization delays. Furthermore, since the chunk size to be scheduled on each processor is fixed, there is no need for each processor to calculate a chunk size, leading to a low scheduling overhead. Each processor needs only to remember which iterations are unexecuted. However, in LDS each processor must coordinate with a central queue to schedule iterations. A processor cannot compute the size of the next chunk since chunk sizes are computed dynamically. Scheduling can thus create a potential bottleneck in terms of network congestion or access to a central queue. The synchronization delays from accessing a central queue could force processors to remain idle between scheduling steps. The performance can be somewhat improved by pre-calculating the chunk sizes for all scheduling steps. Scheduling steps of AFS can be expressed as (Markatos and LeBlanc, 1992):

O ( P log(N/P2) + P log(N/P2))

(2)

The first part of this equation is associated with accesses to the local work queue, and the second part shows the accesses to remote work queues. DPAS incurs less scheduling overhead than AFS, since the chunk sizes are dynamically adjusted after the initial allocation. This reduces the contribution of the second part of the above equation for subsequent executions of the loop. However, each processor must keep track of the actual number of iterations already executed. The dynamic computation of the chunk size may be amortized across a large number of iterations executed by each processor. The overhead due to LDS was found to be insignificant compared to AFS (Subramaniam and Eager, 1994). WPAS incurs similar scheduling overhead to AFS with respect to the first part of the overhead equation shown above,

TABLEI11 COMPARISON OF AFFINITY SCHEDULING ALGORITHMS

Algorithm

Scheduling steps

Chunk size

Advantages

Disadvantages

o

d

,b,c Cache-reload overhead incurred only when load imbalance arises.

Affinity scheduling (AFS) Dynamic partitioned affinity scheduling (DPAS)

0

Wrapped partitioned affinity scheduling (WPAS)

Same as DPAS

n b r ,

1

A cache reload may occur for each execution of the loop since different iterations may migrate on different executions of the loop. d

f

Incurs less scheduling overhead than AFS. Improved initial load balance compared to AFS . Performs well for loops with triangular workload.

Requires several executions of the sequential outer loop in order for the partition to converge (iterations > = 4).

u b c

The data has to be partitioned in the same manner as the iterations in order to get the best performance.

9

,

Incurs the lowest scheduling overhead. It avoids assigning all the time consuming iterations to a single processor, minimizing load imbalance. Total number of migrations is significantly less than both AFS and DPAS. Very effective for loops with rectangular workloads, and performs well for triangular workloads (Subramaniam and Eager, 1994).

Localitybased dynamic scheduling (LDS 1

Chunk sizes can be determined before execution to reduce overhead. Data placement is taken into account by always having the processor first execute those iterations for which the data is local to the processor.

O ( P log N )

~~

a

Each processor has to dynamically obtain the next chunk size from a central work queue (scheduling is serialized). Requires more scheduling steps than the other three schemes. Scheduling steps cost more, hence more overheads incurred compared to the other three schemes.

~

Each processor independently schedules iterations from its local partition (scheduling done in parallel). Fixed chunk size, hence no calculation is needed, resulting in low scheduling overhead. Majority of the scheduling is inexpensive, since it accesses the local work queue. Memory locality can only be exploited when data is also partitioned and distributed in blocks.

70

A. R. HURSON H A L .

since both schemes assign the same number of initial iterations to processors. The second part of the equation, which is associated with the migration of iterations from other processors, would be less for W A S . Since WPAS assigns consecutive iterations to distinct processors, it avoids assigning all the time-consuming iterations to a single processor and minimizes the chances of load imbalance. The number of iterations to be migrated due to load imbalance would be less than those for AFS. Even though consecutive iterations are not scheduled on the same processor, scheduling in WPAS is similar to scheduling a loop with a stride of P. It was shown in Subramaniam and Eager (1994) that this additional overhead is negligible, and that the total number of migrations for WPAS is significantly less than those in either AFS or DPAS, both when the load is balanced and when the load is unbalanced. LDS incurs O ( P log N ) scheduling steps (Li et al., 1993). In addition to more scheduling steps, each step of LDS is more expensive since each processor must access a central work queue to obtain the size of the next chunk to be executed. The other three affinity schemes described can perform the majority of the scheduling actions in parallel since they need only to access a local work queue. AFS and DPAS assume that the data is partitioned into blocks. These schemes partition the iteration space into blocks and assign a block of consecutive iterations to each processor. This implies that the memory locality can only be exploited when the data is also partitioned and distributed in blocks, which is normally the case. However, for W A S , it is more difficult to manage the data locality with a wrapped assignment of iterations. LDS takes data placement into account, by requiring processors to first execute those iterations for which data is locally available. The data placement must be known prior to scheduling of iterations to obtain good performance with LDS. Load balancing is inherent in all the schemes discussed since idle processors steal work from heavily loaded processors. However, such migration of iterations may defeat the data locality advantages of affinity scheduling (Lilja, 1994a). Performance results have shown that WPAS is very effective for loops with rectangular workloads (Subramaniam and Eager, 1994), where the execution time of a set of iterations remains the same, while the next set of iterations have a smaller execution time. This is because W A S avoids assigning all the time-consuming iterations to a single processor. Results also show that both W A S and DPAS perform well for loops in which execution times of iterations decrease linearly (triangular workload). This is due to the fact that these two algorithms start with a good initial load balance and minimize migration of work, leading to a reduced cache reload cost. Although DPAS appears to perform best among the four schemes, it does have some limitations. DPAS takes several executions of the outer loop

DOALL AND DOACROSS LOOPS

71

before the sizes of the partitions converge for the inner DOALL loop. It was found that convergence is not possible unless the outer sequential loop is executed at least four times (Subramaniam and Eager, 1994). When the number of inner loop iteration changes with each outer loop execution, DPAS must compute new adjustments for chunk size (since previous adjustment would be based on a different number of inner loop iterations). All four affinity scheduling schemes rely on processors snooping on other processors for finding additional work. This implies that these affinity schemes are suitable only for bus-based systems.

6. DOACROSS Loop Scheduling Chen and Yew (1991) have used an event-driven simulator to measure the parallelism inherent in application programs. Six real application programs from the PERFECT benchmark suite were used in their study. They observed that the loss of parallelism after serializing DOACROSS loops was very significant. This supports the need for good schemes for the parallel execution of DOACROSS loops. DOACROSS loops can be classified as regular and irregular loops. In a regular DOACROSS loop, dependence distances are constant while the dependence distance varies from iteration to iteration in irregular DOACROSS loops. Regular DOACROSS loops are easier to parallelize than irregular loops.

6.1 The Regular

DOACROSS Model

Cytron (1986) developed a DOACROSS model for the execution of loops with some degree of parallelism among various iterations. Consider a single loop L with s statements (S,, Sz, .. ., S,) and N iterations. If T ( S , ,S,) is the execution time of statements Si through S, (iGj),then the DOACROSS model has d = 0 for vector loops, d = T ( S , , S,) for sequential loops, and 0 < d < T ( S , ,S , ) for loops with intermediate parallelism. In this model, each iteration is assigned to a virtual processor and execution of two successive virtual processors is delayed by d time units. This is similar to cyclic scheduling discussed earlier. In general, the delay d can range from zero (the vector loop case) to T (the sequential loop case), where T is the execution time of one iteration of the loop. The total execution time for a DOACROSS loop L with N iterations for an unbounded number of processors is:

T E ( L )= ( N - l)d + T

(3)

72

A. R. HURSON ETAL.

When there are only P processors, the execution time is (Polychronopoulos and Banerjee, 1987):

T E ~ ( L ) (=r N / P l - l ) m =

( ~ , ~ d ) + ( ( ~ - i ) m o d ~ )(4) d + ~

In Section 1, it was stated that data dependence can be either lexically forward (data from higher indices is used by iterations with lower indices) or lexically backward (data from lower indices is used by iterations with higher indices). Normally, lexically forward dependencies (LFD) do not contribute to delays in executing loop iterations. Sometimes a lexically backward dependence (LBD) can be transformed into a lexically forward dependence by reordering the statements of the loop if the statements do not form a dependence cycle (Cytron, 1986). DOACROSS loops where the LBD cannot be transformed into LFD lead to delays in executing successive iterations. Hence, we focus our attention on regular DOACROSS loops with LBD. Consider two statements of a loop, S4 and S8, where S, lexically precedes &. Statement S, of iteration Z, computes the data used by statement S, of iteration I,. The semantics of sequential programs guarantee that iteration Z2 is executed before iteration I,. If these two iterations were assigned to different processors, a delay must be introduced for executing iteration I , , such that statement S, of iteration I , executes before statement S, of iteration I,, in order to satisfy the dependence. Hence, a delay d equal to 5 statements must be introduced to iteration I , . This loop example exhibits a lexically backward dependence. The DOACROSS loop of the example shown in Figure 1 has N = 8 iterations, a delay d = 4, and a loop execution time T = 10. The parallel execution of the loop on three processors takes 38 time units, resulting in a speed-up of 2.1. Communication cost among processors is not included in this model. The overall execution time depends on the communication cost due to interiteration dependencies (Su and Yew, 1991). For a shared memory system, the delay d should include not only the delay due to the lexically backward dependence (LBD), but also the delays in accessing shared variables. For distributed memory systems data must be shared using messages which take several orders of magnitude longer than a processor execution cycle. It has been reported that the Intel iPSC/l, iPSC/2, and iPSC/860 hypercubes have communication/executionratios of 26, 59, and 1000 respectively, while the ratio for the nCube 3200 and 6400 hypercubes are 30 and 107, respectively (Dunigan, 1991). Performance studies on CM-5 also show that some classes of problems are communication limited on that machine (Kwan et al., 1993). Using a balance factor b = tcom/tcomp, t,,,,,,,, = fmd - fcomp, a system is communication limited if b a 1. The Laplace solver on a 256 node partition has resulted in balance factors ranging from 2.1 1 for an 8192 x 8192 mesh size to 14.38 for a 64 x 64 mesh size. If we let d be the delay due to the

DOALL AND DOACROSS LOOPS

73

LBD, and C be the total communication and synchronization cost (communication delay in the sequel) incurred, then the execution time of a DOACROSS loop with N iterations on P processors is

T E ~ ( L=) (rN/Pl- 1) max { T , ~ ( + dc)1 + ( ( N - 1) mod ~ ) ( +d c)+ T (5) Here for every delay d due to LBD, a comrnunication/synchronization cost C is added, increasing the dependency delay from d to d i C. For the example of Fig. 1 , if we assume a communication delay C = 6, the total parallel execution time becomes 80 time units, leading to a speed-up of 1. Increasing or decreasing the number of processors will not change the execution time. Larger values for C will make a parallel execution of the DOACROSS loop ineffective and lead to under-utilization of processors as they remain idle between the termination of an iteration and the initiation of the next assigned iteration. Dynamic scheduling schemes such as GSS and factoring are not effective in scheduling DOACROSS loops. When chunks (a number of consecutive iterations) are assigned to processors, iterations in successive chunks must wait for the completion of all iterations in the preceding chunks. Since chunk sizes are greater than one, the delay among processors assigned successive DOACROSS I = 1 , 8 {d= 4)

:1 1 0

END P1

P2

P3

:I 8

12

32 36

40L FIG 1. Allocation of a DOACROSS loop.

74

A. R. HURSON ETAL.

iterations is now equal to ( n - l)T + d + C , where n is the size of the chunk assigned to the previous processor. The total execution time of the DOACROSS loop shown in Fig. 1 using either GSS or Factoring is 56 when C = 0, and 80 when C = 6. Both schemes reduce the amount of communication overhead when compared to cyclic scheduling, at the expense of reduced parallelism. They perform worse than the cyclic method when the communication cost is zero; but with non-zero communication cost, they perform no worse than cyclic scheduling. The execution time for the same example using static chunking is 68 when C = 0, and 80 when C = 6. Thus, static chunking performs better when the communication cost is significant, since it only incurs ( P - 1) communication delays.

6.7.7

Pre-synchronized Scheduling (PSS)

Krothapalli and Sadayappan (1990) proposed a dynamic scheduling scheme called pre-synchronized scheduling (PSS) for eliminating processor idle cycles that result from scheduling schemes such as GSS and Factoring. Here, iterations are scheduled only when their data dependencies and synchronization requirements are met. Loop iterations are uniquely identified using indices, and a ready queue of enabled iterations is maintained by a global control unit (GCU). An idle processor gets an id from the ready queue and executes it. When the execution is complete, successor loop iterations that are enabled are added to the ready queue. A DOACROSS loop is divided into two separate loops and scheduled separately. The two loops correspond to a T - d portion of the loop that can be executed in parallel, and a d portion that must wait for synchronization from previous iterations. This method introduces a scheduling/synchronization overhead equal to 2N, resulting from the fact that we now have 2N loop iterations to schedule. The performance of the PSS is largely dependent on two factors: (1) how the ready queue is implemented, e.g., FIFO, priority; and (2)the scheduling cost. Even though the T - d portion is a parallel loop (akin to DOALL), it is necessary to schedule the iterations in proper to facilitate an interleaved execution of iterations from the T - d portion and the d portion of a DOACROSS loop. For the example of Fig. 1, the best performance achievable with PSS is 38, when the scheduling cost is ignored. This is comparable with that achieved by the Cyclic scheduling scheme. However, the PSS scheme incurs significant scheduling costs. We need to include a communication (C) each time an idle processor obtains the id of a loop iteration from the ready queue and a communication (C) to update the ready list when a processor completes a loop iteration. Since PSS schedules 2N loop iterations, we have a 4CN communication cost. For example, with C = 6, the execution time for the loop shown in Fig. 1 will become 146.The

DOALL AND DOACROSS LOOPS

75

cost can be reduced by assigning several loop ids (i.e. a chunk) each time an idle processor accesses the ready queue. However, it is difficult to arrive at an optimal chunk size.

6.1.2 Staggered Distribution Scheme A staggered distribution scheme (SD) was originally developed for multithreaded dataflow multiprocessors (Hurson er al., 1994a; Lim et al., 1992). Here loop iterations are unevenly distributed among processors in order to mask the delay caused by data dependencies and communication. Performance studies have indicated that this scheme is effective for loops containing large degrees of parallelism among iterations. It has been observed that near optimal speed-up can be attained even in the presence of communication delays. In order to use SD for the DOACROSS loop, the loop is separated into two loops, in a manner similar to that of the PSS method. The first loop (the T - d portion) is scheduled according to the following policy: the iterations assigned to PE, succeed the iterations assigned to PE,-, and PE, is assigned m more iterations than PE,_,. This results in a more iterations assigned to higher numbered processors. For example, with six iterations and three processors we may assign 1, 2 and 3 iterations respectively to the three processors. The main objective of the SD scheme is to mask the delay due to data dependencies and communication in executing the second (d-portion) loop iterations by assigning more T - d loop iterations. The number of additional iterations assigned to PELis given by

where nz-,is the number of iterations allocated to PE,_,, T is the execution time of one iteration, d is the delay and C is the inter-processor communication cost. The total number of iterations n, allocated to PE, would be

Thus, the staggered distribution masks delays resulting from the lexically backward among loop iterations and the communication delays involved in transmitting the dependent data among processors since later processors execute more ( T - d) loop iterations. The performance of this scheme can be fine tuned (to accommodate different communication costs) by selecting an appropriate number of iterations n , , assigned to the first processor. Note that equation (7) also determines the maximum number of processors needed to

76

A. R. HURSON ETAL.

execute the DOACROSS loop with N iterations. The synchronization overhead is only ( P - l)*C, which is smaller than that incurred by cyclic scheduling and PSS methods. For the example shown in Fig. 1, we use a distribution of 2-3-3 iterations when C = 0, giving 44 units of execution t h e and a speed-up of 1.82. When C = 6 we use a distribution of 1-2-5 iterations, giving 50 units of execution and a speed-up of 1.6. Staggered distribution accounts for different communication costs by selecting an appropriate n,. The Staggered scheme however, distributes an uneven load among processors; heavily loading later processors.

6.1.3 Cyclic Staggered Distribution (CSD)

A modified version of the staggered scheme called cyclic staggered distribution (CSD) was proposed to address the uneven load distribution (Hurson et al., 1994b). CSD will also handle loops with varying iteration execution times. The CSD has been found to be effective when the number of processors is less than those needed by the staggered scheme (rnaxpe). Unlike using n , iterations, CSD will start with one iteration assigned to the first processor, and n, iterations to the remaining P - 1 processors based on equation (7). The remaining iterations are redistributed to all P processors based on the staggered allocation. Note that the delay that must be masked by higher numbered processors now is smaller than that in the original SD approach, since some loop iterations would already have completed due to prior staggered allocation. Thus a smaller number of additional iterations are assigned to PE,, as compared to equation (7). The number of iterations n, for this new scheme will be (8)

where n, is the number of iterations previously allocated to processor PE,. This approach results in a more balanced load and improved speed-up than the original staggered scheme on P processors. When the execution times of loop iterations vary, CSD can use estimated worst case iteration execution time (possibly augmented by runtime support to adjust these estimates with actual execution times) in determining the distribution for the second and subsequent passes.

6.2 Comparison of DOACROSS Scheduling Schemes Table IV compares the characteristics of the four DOACROSS loop allocation schemes discussed. Cytron’s cyclic scheduling scheme for

DOALL AND DOACROSS LOOPS

77

TABLEIV COMPARISON OF DOACROSS SCHEDULING ALGOrUTHMS Algorithm

Advantages

Disadvantages

Cyclic scheduling (cyclic)

Exploits the parallelism present in a DOACROSS loop.

Does not take into consideration the effect of inter-processor communication cost. Overhead increases linearly as a function of ( n - 1)(C + d). Offers low hardware utilization.

Pre-sy nchronized scheduling (PSS)

Balances the load and eliminates busy waiting period. Iterations are scheduled when their synchronization requirements are met.

Introduces scheduling overhead equal to 4CN. No implementation details on the ready queue management were presented. The performance of this scheme is unacceptable if the scheduling cost is significant.

Staggered distribution scheme (SD)

Considers the effect of both delay Produces an unbalanced load among processors, with the (d) and communication cost (C). higher numbered processors Automatically controls and determines the maximum number receiving the larger amount of of processors required for work. efficient execution of the loop based on the physical characteristicsof the loop and the underlying machine architecture -higher resource utilization. Lowest scheduling overhead.

Cyclic staggered distribution (CSD)

Balances the load by cyclically Advantages are only possible if the number of PEs available is less assigning the remaining iterations to processors, and at the same time than rnnrpe (Hurson et ol., masking out the effect of both 1994b). delays due to LBD and communication. Increases the amount of communication delay, relative to the SD, but simulation results have shown that it still improves the performance and offers a higher speed-up (Hurson et al., 1994b).

78

A. R. HURSON ETAL.

TABLEV N W E R OF ITERATIONS ASSIGNED TO A PROCESSOR AT EACH SCHEDULING STEP WITH T = 10, d = 2, n = 500, = 5 , P =4.

c

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47..500

Total execution time

1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3..

1

1 1 1 1 1 1 1 1 1 1

1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3503

1 2 4 6 7 9 10 11 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 6 -

1298

DOALL AND DOACROSS LOOPS

79

DOACROSS loops does not take into consideration the effect of interprocessor communication cost. When the communication delays are significant, the overhead of this scheme increases as a function of ( n - 1)(C + d ) , and is independent of the number of PEs. This scheme offers low hardware utilization as a result of the processor idle cycles between the termination and initiation of successive iterations assigned to the same PE. Re-synchronized scheduling (PSS) while attempting to balance the load and eliminate idle cycles, introduced scheduling overhead proportional to 2 N and a communication cost of 2C per iteration. The Staggered distribution scheme (SD) accounts both for the processor delays due to LBD and communication. This is achieved by assigning a varying number of loop iterations to processors. This scheme achieves better results than the previous algorithms, and utilizes an optimal number of processors. The major weakness of the staggered scheme is the uneven load assignment to processors. The cyclic staggered distribution (CSD) answers this load imbalance of SD. It should be noted that CSD results in larger communication delays for a loop than that with SD; however, the more balanced load of CSD leads to a better performance, particularly when the number of processors is less than the optimal number required for SD (Hurson ef al., 1994b). We have conducted simulation studies for determining the number of iterations assigned to processors at each scheduling step using the various schemes described. The results are shown in Table V. Static chunking was included for the sake of completeness and because it performed better than the cyclic scheme when the communication cost is significant. The total execution time for SC shown in Table IV was obtained by separating the loop into two separate loops as done in SD and CSD. Re-synchronized scheduling (PSS) was not included in Table V, since the authors have not suggested any strategies for managing the ready list, malung an accurate analysis difficult. As discussed earlier, PSS in general performs poorly when the communication costs are significant. Table V shows that the cyclic scheme has the worst performance, followed by static chunking. The cyclic staggered scheme (CSD) produced the best performance.

6.3 Irregular DOACROSS Loop Scheduling Regular DOACROSS loops have constant distance dependence patterns which can be determined during compile-time. For irregular DOACROSS loops, the dependence patterns are complicated and usually are not predictable at compile-time. An example of an irregular DOACROSS loop is shown in Fig. 2. Here the dependency between loop iterations is based on the content of arrays B and C. Hence, the dependence relation cannot be

80

A. R. HURSON ETAL.

DO I = l , N s p : A ( B ( I ) 1 :=. . . :=A(C( I )I+. sq : END

. .

.. .

FIG 2. Irregular DOACROSS loop.

determined until runtime. We need to consider approaches that are different from those used for regular DOACROSS loops to achieve good performance for irregular loops.

6.3.1 Pre-synchronized Scheduling The pre-synchronized scheduling (PSS) for scheduling presented earlier can also be used for irregular DOACROSS loops (Krothapalli and Sadayappan, 1990). In PSS, iterations are scheduled only when their synchronization requirements are met. Each iteration of a loop is uniquely identified by its index value. The dependence relations among iterations of a loop are represented as a directed graph called iteration space graph (ISG). The nodes in ISG represent the iterations while the edges show dependence relationships between iterations (Fig. 3). Edges are not differentiated for flow dependence, anti-dependence, and output-dependence. The number of predecessors of each iteration is computed from the ISG and stored in a trig-count array. Figure 3(b) shows six nodes in the ISG format, each corresponding to an iteration of the loop in Fig. 3(a). The edges in Fig. 3(b) represent inter-iteration dependencies corresponding to the values of array B shown in Fig. 3 (c). For instance, there is a flow-dependence from iteration 1 to iteration 3, because iteration 1 writes to location A(1) and iteration 3 reads from the same location. Similarly, there is an anti-dependence from iteration 2 to iteration 3 through the element A(3). Thus, the trig-count for iteration 3 is 2, and iteration 3 cannot start until both its predecessors, iterations 1 and 2 have completed. Initially, iterations without any predecessors (i.e., trig count = 0 , iterations 1, 2 and 4 in Fig. 3(b) are placed on the ready queue managed by a global control unit (GCU). An idle processor obtains an iteration id from the queue and executes the loop body for that index. Upon completion of the execution, the processor updates the trig-counts for all successor iterations. The GCU is informed of the updates by transmitting an instruction packet to the GCU. The GCU decrements the appropriate trig-count by one. The iterations with a zero trig-count is placed on the ready queue. The algorithm for generating the ISG can be found in Krothapalli and Sadayappan (1990). The algorithm executes a skeleton of the loop in two passes and generates a trace of the memory

DOALL AND DOACROSS LOOPS

81

(c) FIG 3. Illustration of irregular DOACROSS loop execution scheme: (a) an irregular loop;

(b) iteration space graph of Fig. 3(a); (c) values of B .

references. These memory traces are used for identifying data dependencies. In the first pass, flow and output-dependencies are identified, while antidependencies are identified in the second pass (by executing the skeleton to generate the reverse trace of references to memory locations). Obviously, the construction of the ISG at runtime introduces overhead. However, for irregular loops, some runtime analysis is necessary regardless of the actual scheduling strategy used for loop allocation. It is observed that in some scientific applications using iterative techniques, or applications that model the behavior of a structurally invariant physical system through time, the same dependencies among iterations exist for repeated executions. The overhead in computing the ISG for such applications can be amortized over the repeated executions of irregular loops. PSS incurs scheduling overhead proportional to N , and a communication cost of 2C for each iteration, leading to an overall cost of 2CN. An analysis of the trade-off between the overhead and the increased parallelism in executing irregular loops depends on the application itself. The PSS scheme requires a GCU to manage the ready queue and means of communicating updates of the trigcount array.

82

A. R. HURSON ETAL.

6.3.2 Run time Paralleliza tion Schemes Runtime parallelization schemes perform dependence analysis at runtime and, depending on the dependencies, executes the loop in parallel (Chen et al., 1994). For example, consider the case where the arrays B and C in Fig. 2 are not available until runtime. All three types of dependencies (flow, anti and output) between instances of statements S, and S, are possible. When B ( l ) = C(3) = B(4) = B ( 5 ) = x the following dependencies result:

Sp (1)flow

-+

Sq(3)anti -+Sp (4)output-+ Sp(5)

It is normally assumed that the values of B and C do not change during the execution of the loop. In general, runtime parallelization schemes require an inspector and an executor. The inspector determines the dependence relations among the data accesses, while the executor uses this information to execute iterations in parallel. If both the inspector and the executor are parallel algorithms, the scheme can take full advantage of parallel machines. The key to the success of these schemes is to reduce the communication overhead between the inspector and the executor.

The Zhu- Yew Runtime Parallelization Scheme (ZYRPS). Zhu and Yew proposed a runtime parallelization scheme that is general enough to handle any dependence pattern (Zhu and Yew, 1987). We will call this ZYRPS method. Using ZYRPS, the loop in Fig. 2 will be transformed into the form shown in Fig. 5. Figure 4 outlines the transformation. Two fields are associated with each element of array A : the data field stores the data value while the key field is used to order accesses to the array elements. Here, an iteration i is allowed to proceed only if all accesses to the array elements A ( B ( i ) ) and A ( C ( i ) )by all iterations j < i have been completed. The inspector determines the set of iterations that can proceed, by having all unexecuted iterations visit the array elements they need to access and store their own iteration number in the key field of these elements if it is less than the value already stored. After doing so, the numbers that are now remaining in the key fields are the numbers of the iterations that can proceed. In the executor phase, iterations check to see if the key field of the elements they need to access have values that are equal to their iteration indices. If so, no unexecuted predecessor exists and the loop iteration is allowed to proceed. Once iteration i completes, Done(i) is set to TRUE and the process continues until all iterations are executed. This approach has two limitations (Chen et al., 1994). First, the inspector cannot be reused across different invocations of the same loop, even if there is no change in dependencies, since the inspector and the executor are

03

DOALL AND DOACROSS LOOPS

R e o e a t u n t i l a l l i t e r a t i o n s h a v e been e x e c u t e d I N S P E C T O R PHASE

I n i t i a l i z e a l l key f i e l d s t o i n f i n i t y . f o r a l l u n e x e c u t e d i t e r a t i o n s ( 7 = i t e r a t i o n number) I f i t e r a t i o n number < key f i e l d o f A ( B ( i ) ) t h e n R e p l a c e t h e key f i e l d o f A ( B ( i ) ) w i t h i t e r a t i o n number I f i t e r a t i o n number < key f i e l d o f A ( C ( I ) ) then R e p l a c e t h e key f i e l d o f A ( C ( I ’ ) ) w i t h i t e r a t i o n number. “The key f i e l d s now c o n t a i n t h e ( l o w e s t ) i t e r a t i o n numbers t h a t a r e now a l l o w e d t o a c c e s s t h e s e a r r a y e l e m e n t s . A1 1 p r e d e c e s s o r i t e r a t i o n s h a v e already accessed t h e s e array elements.” EXECUTOR PHASE F o r a l l unexecuted i t e r a t i o n s ( i

I f i t e r a t i o n number

=

= i t e r a t i o n number) key f i e l d o f b o t h A ( B ( I ) ) and

A ( C ( 7 ) ) then

Execute loop body. “ T h e key f i e l d s o f b o t h a r r a y s must m a t c h t h e i t e r a t i o n number i n o r d e r f o r i t t o p r o c e e d . I f t h e y b o t h match then a l l p r e d e c e s s o r i t e r a t i o n s t h a t a c c e s s t h e s e a r r a y e l e m e n t s h a v e a l r e a d y been e x e c u t e d . ”

FIG 4. Pseudocode of transformed of loop Fig. 2 using the Zhu-Yew scheme.

tightly coupled. Second, the execution of iterations with dependencies cannot be overlapped. The executor checks the key fields of all the accesses needed by an iteration and executes an iteration only if all key fields contain a value that is equal to the iteration index. This limitation not only reduces the amount of parallelism present, but also causes unnecessary traffic since all key fields have to be inspected. For the example of Fig. 2, 3r memory accesses are required for each iteration, where r is the number of references to array A per iteration.

Chen‘s Runtime Parallelization Scheme (CRPS). In order to address the limitations of the previous scheme, Chen et al. (1994) proposed a new algorithm that reuses the inspector results across loop invocations and permits the overlap of dependent iterations. This is done by separating the

a4

A. R. HURSON ETAL.

Done ( 1: N ) = . FALSE. REPEAT UNTIL ( ( D o n e ( i ) . E Q . . T R U E )

for a l l i )

Inspe c t or Phase DOALL i = l , N A ( B ( i )1 . key=A ( C( i ) ) . k e y = END DOALL DOALL i = l , N I F (Done ( i 1 . E Q . . FALSE 1 “ t h e next t w o i n s t r u c t i o n s a r e atomic” c o r n p a r e & s t o r e {i f ( A ( B ( i 1 ) . k e y > ? ) { A ( B ( i 11. k e y c i ; I I c o r n p a r e & s t o r e l i f ( A ( C ( i) ) . k e y > i ) 1A ( C ( i 1 1 . k e y c i : I t END I F END DOALL

Executor Phase DOALL i = l . N I F ( D o n e ( i ) . E Q . .FALSE) I F ( ( A ( 5 ( i ) ) . k e y . E Q . i ) . A N D . ( ( A ( C ( i ) ) . k e y . E Q . 7 ) THEN

... ... ... . . . =A ( C ( i ) ) . d a t a + . . .

A(B(i)).data=

Done( ? ) = . T R U E END I F END I F EN0 DOALL END R E P E A T FIG 5. Transformed loop of Fig. 2 using the Zhu-Yew scheme.

inspector and executor phases. All the dependence information is gathered and stored in a table called Ticket by the Inspector. This information is then used in one or more executor phases. To reduce the time in building the Ticket, each processor builds the table in parallel. This method, however, is very expensive, since it requires some interprocessor communication. The algorithm tries to minimize interprocessor communication by constructing

DOALL AND DOACROSS LOOPS

85

the table first locally (focal inspector phase) and then combining the local tables during a global inspector phase. In the inspector phase, the references accessing the same location are ordered (i.e., serial execution order) while maintaining the original dependencies. Processors share the Ticket table, whose rows (i) correspond to iteration while the columns ( j ) make the order of the references correspond to a shared location. An example is shown in Fig. 6, which shows the Ticket table for the loop of Fig. 2, for an array reference A ( x ) , with the following dependence relationships:

B(l)= C(3) = C(4) = B(7)= B(9)= C(9) = C(11)= x The first column of the Ticket table represents the accesses to B ( i ) and the second column represents C ( i ) . The first access to B(l)corresponds to the Ticket(1,l)and an initial value of 0 is stored there. Ldcewise the second

1

0

2

3

1

4

2

5

i

6 7

3

8

9

4

5

10 11

6

FIG 6. Example Ticker table for the loop in Fig. 2.

86

A. R. HURSON H A L .

access C ( 3 ) corresponds Ticket(3,2) and a value of 1 is stored there. Similarly, a value of 2 in Ticket(4,2) corresponds to the third access for C(4), and so on. The objective is to store the order of accesses involved in this chain of dependencies and this order is enforced by the executor. In the executor phase, an extra field key is associated with each shared element to enforce the order of accesses. This field is initialized to 0 and after each access the value is updated to permit next access in the dependence chain. At any time, the key value indicates the permitted access as indicated by the Ticket table entries. When a processor is executing the i" S e t a l l key f i e l d s t o 0 . For e a c h i t e r a t i o n ( 7 ) Busy-wait u n t i l f i r s t a c c e s s key=Ticket t a b l e e n t r y ( T i c k e t ( i . 1 ) ) . A c c e s s t h e d a t a and e x e c u t e f i r s t p a r t o f l o o p b o d y . A ( B ( i ) ). d a t a = . . . I n c r e m e n t t h e key by o n e . Busy-wait u n t i l second a c c e s s key=Ticket t a b l e e n t r y ( T i c k e t ( 7 , Z ) ) . A c c e s s t h e d a t a and e x e c u t e s e c o n d p a r t o f l o o p b o d y . . . . = A ( C( i ) 1 . d a t a + . . . I n c r e m e n t t h e key by o n e . FIG 7. Pseudocode of the executor algorithm for the loop in Fig 2.

A ( : ).key=O DO i = l . N busy-wait i n g DO W H I L E ( A ( B ( 7 ) . k e y ! = T i c k e t ( 7.1) ) access t h e d a t a A ( B ( 7 ) ).data=. .. increment t h e key A ( B ( i 1 1. k e y + +

.... busy-waiting DO WHILE ( A ( C ( i ) ) . k e y ! = T i c k e t ( i . 2 ) access t h e data ... = A ( C ( i ) ) . d a t a + . . . i n c r e m e n t t h e key A ( C ( 7 ' ) ).key++

.... ENDDO

FIG 8. Executor algorithm for the loop in Fig.2.

1

DOALL AND DOACROSS LOOPS

87

iteration of a DOACROSS loop, for each access j in the iteration, Ticke t ( i , j ) gives the sequence number S. The processor must wait to access the shared structure A ( x ) until the key for this element becomes equal to S. For the example in Fig. 6, a processor executing iteration 4 will wait until the key value of A(C(4)) becomes 2 before accessing A(C(4)). The key is incremented to 3, permitting the access by the processor executing iteration 7. A pseudo algorithm and FORTRAN code for the executor algorithm are shown in Figs. 7 and 8. A SPMD form of the same algorithm can be found in (Chen et al., 1994). Static cyclic scheduling was chosen for scheduling iterations in the study.

6.4 Comparison of Irregular DOACROSS Scheduling

Schemes The characteristics of the three approaches for scheduling irregular DOACROSS loops are summarized in Table VI. Any approach for irregular DOACROSS loops requires some runtime analysis, which adds to the overhead of scheduling and synchronization delays. However, this overhead may be amortized over the repeated execution of the DOACROSS loop in some scientific applications that use iterative techniques and in applications that model the behavior of a structurally invariant physical system through time. In Re-synchronized scheduling (PSS), the runtime overhead is in the construction of the 1%. For runtime parallelization schemes, an inspector phase is necessary to determine the dependencies between iterations. PSS uses generated traces of memory references for determining data dependencies. Flow and output dependencies are first resolved (phase 1) and anti dependencies are resolved in the second phase. The two-phased approach for determining dependencies can lead to more complex implementation along with higher overhead. The two runtime parallelization schemes, on the other hand, use a single-phase algorithm and records accesses made by all iterations to capture all types of dependencies (viz., flow, anti and output). This reduces the complexity of the algorithm and overhead. Unlike the other approaches, the runtime overhead for ZYRPS cannot be amortized across multiple executions of an inner DOACROSS loop because the inspector and executor phases of the algorithm are tightly coupled, making it impossible to reuse dependence information across multiple executions of the loop. Chen’s scheme (CRPS) permits the overlapped execution of operations in dependent iterations, since the algorithm analyses dependencies based on accesses to shared structures. This ability to overlap dependent iterations may increase the amount of parallelism in the inspector and executor phases. In addition, it removes redundant operations in the inspector (unlike the

88

A. R. HURSON ETAL.

TABLE VI COMPARISON OF IRREGULAR DOACROSS SCHEDULING A L G O W S ~

Disadvantages Pre-synchronzied scheduling

(PSS 1

Zhu-Yew runtime parallelization scheme (ZYRPS 1

Runtime analysis phase is independent from their execution phase. If computations with the same dependenciesare repeatedly iterated over, the enhanced parallelism realized can offset the overhead of performing runtime analysis. No busy-waiting is introduced as well as unnecessary memory accesses.

The algorithm for generating the ISG introduces complexity and could increase overhead. No overlap of dependent operations.

The inspector phase is tightly Utilizes a single algorithm that coupled to its executor phase, simply checks the accesses making independent execution made by all the iterations in of both phases impossible. order to detect all three types of dependencies. This reduces Causes redundant traffic and requires the complexity in the algorithm. several memory accesses, since the inspector will inspect the iteration more times than is required. No overlap of dependent operations.

Similar to PSS, the parallelism Chen’s runtime can offset the overhead of parallelization scheme (CRPS) performing runtime analysis if the same computations are repeatedly iterated over, since the runtime analysis phase is independent from their execution phase. Similar to ZYRPS,utilizes a simple algorithm that checks the accesses made by all iterations in order to detect all types of dependencies. Only scheme that allows the overlap of dependent operations. Removes redundant operations of ZYRPS in the inspector phase.

Increased spin locking during execution. Deadlock is possible. Increased accesses to memory locations. Utilizes static cyclic scheduling, which might not be able to balance the load very well if there is a variance in iteration execution times.

DOALL AND DOACROSS LOOPS

89

ZYRPS). The main weakness of the CRPS algorithm is the delays that can result in waiting for the key field to match the access order. Deadlocks could occur in cases where the iterations are randomly assigned to processors, since all iterations could be waiting for their turn (possibly on different shared elements). The simplicity of PSS may outperform CRPS, even though PSS does not overlap the execution of dependent iterations. Unlike CRPS, a processor in the PSS approach obtains only an iteration that is ready for execution, thus eliminating the need for further synchronization on key fields (and avoid deadlocks). In Z Y R P S , the executor checks the key fields of all the accesses needed by an iteration and only executes an iteration if all key fields are equal to the iteration number. As mentioned earlier, this limitation not only reduces the amount of parallelism but also causes more repeated memory accesses than are really needed to inspect dependencies. The studies made by Chen utilized the static cyclic method for scheduling iterations, which may lead to load imbalances across processors. One could have used self-scheduling as done in PSS. The self-scheduling would have incurred a scheduling overhead proportional to 2CN (refer to Section 6.1.1). It was suggested (Chen et al., 1994) that the loops can be executed in a single-program-multiple-data form by distributing the iterations equally among the processors. However, a naive distribution of iterations could lead to load imbalances across processors, since the order of accesses to shared structures affects the order of execution of iterations. A study to compare CRPS with ZYRPS was performed using a set of parameterized loops running on a 32-processor Cedar shared-memory multiprocessor (Chen et al., 1994). Loops with varying number of iterations and references were used. The results show that CRPS yields speed-ups as high as 14 when the inspector is not reused and as high as 27 when the inspector is reused. CRPS consistently outperformed ZYRPS.

6.5 Other Research Since DOALL loops are easy to parallelize, several heuristics were proposed and studied. In addition to the approaches presented in Section 2, researchers have explored other dynamic scheduling of DOALL loops. It was believed that despite the runtime overhead incurred by dynamic scheduling approaches, dynamic scheduling of DOALL loops could lead to better execution times than those using static schemes. Exploiting parallelism among DOACROSS iterations is much more difficult because of the inter-iteration dependencies. Scheduling such loops must overcome communication and synchronization costs when dependent iterations are scheduled on different processors. Developing better synchronization

90

A. R. HURSON ETAL.

schemes for efficient execution of DOACROSS loops must be discovered. Su and Yew proposed a DOACROSS execution scheme which utilizes direct communication and static message passing (Su and Yew, 1991). This scheme exploits the nearest shared memory feature of distributed shared memory multiprocessors. In this method, either the producer writes (or sends) the data in the nearest shared memory module or the data is bound to a buffer location at compile time. The compiler can generate the necessary instructions for utilizing these features and execute DOACROSS loops in parallel with reduced communication costs. The researchers have also investigated conditions under which the message buffer size can be greatly reduced. Researchers are also investigating techniques for reordering statements in a DOACROSS loop to maximize the parallelism and to minimize interiteration dependencies. Since the optimal reordering is NP-complete, heuristics have been proposed (Chen and Yew, 1994b). Statement reordering may also reduce the amount of synchronization needed for accessing shared data items (Chen and Yew, 1994a; Krothapalli and Sadayappan, 1991).

7.

Summary and Conclusions

There has been considerable interest in parallelizing loops since they are the major source of program parallelism. In this chapter, we examined how loops with inter-iteration dependencies (DOACROSS), and without dependencies (DOALL) can .be executed in parallel. Both static and dynamic scheduling approaches were studied. The various approaches presented in this article were also compared for their complexity, scheduling overhead, communication cost, processor utilization, and expected speed-up. Yue and Lilja (1994a) measured performance of the different DOALL scheduling algorithms on two different types of loops. The first loop is a matrix multiplication program which is parallelized on the outer loop. The size of the parallel tasks is large and all the iterations have the same number of operations so that the variance in iteration execution times is small. The second loop is based on the adjoint-convolutionprocess. It is parallelized on the outer loop and, in contrast to the first loop, each parallel iteration has a different number of operations so that it has a large variance in iteration execution times. The results are shown in Figs. 9 and 10. The figures do not include performance of the self-scheduling scheme because it performs poorly on their system. The results from the first experiment (Fig. 9) show that all the algorithms performed similarly when N is large and the variance

91

DOALL AND DOACROSS LOOPS

2o I

---

FS

_ _ - _GSS _ l5 a

i

....... Factoring

-TSS .

-. ..

?

U

8

10

Linear

~

a

(0

5

0

1 5

0

I

I

10

15

20

Number of PEs

FIG 9. Performance of DOALL scheduling algorithms on matrix multiplication ( N = 300).

_______

01 0

FS GS5

I

5

10

15

I

20

Number of PEs

FIG 10. Performance of DOALL scheduling algorithms on adjoint convolution ( N = 100).

is small. Hence, the effect of load imbalance is not significant. They also found that fixed-sized chunking (FS) performed better than the others when N is small. On the other hand, the results of the second experiment (Fig. 10) show that if the variance is large, fixed-size chunking (FS) attains only half of the possible speed-up. Guided self-scheduling (GSS) also does not perform well as it assigns too much work at the beginning of the execution and does not save enough work at the end for balancing the load. Factoring and trapezoid self-scheduling (TSS) balance the workload better than the

92

A.

R. HURSON ETAL.

other schemes and attains significantly better speed-up. It should be noted that when the number of iterations is small, none of the scheduling approaches perform well, since there is insufficient work to offset the overhead due to scheduling and distribution of work. Based on these results, we can conclude that among the techniques investigated in the study of parallelizing iterations with varying execution times, fixed-size chunlung performs well when the variations in execution times and the number of iterations is small. On the other hand, factoring and TSS perform better when the variance is large. When loop iterations are scheduled across multiple processors, one must account for the distribution of the data needed by the iterations. Loop iterations frequently demonstrate an aflnity for a particular processor containing the needed data. By exploiting processor affinity better performance can be obtained since communication overhead in accessing needed data is reduced. Affinity scheduling methods also achieve a better workload by permitting idle processors to steal work from busy processors. However, this limits the scalability since processors must snoop (on a bus) to steal work. Performance measurements of affinity scheduling (AFS), dynamic partitioned affinity scheduling (DPAS), wrapped partitioned affinity scheduling (WPAS), and GSS using a synthetic application program and a real application (Jacobi iterative algorithm) were conducted by Subramaniam and Eager (1994). Three different cases were used for the synthetic application. The first case had a triangular workload, in which the iteration size decreases linearly. The second case had a rectangular workload, in which a fraction of the iterations are of a constant large size, while the remaining fraction has a constant smaller size. The third case has constant iteration sizes. The Jacobi iterative algorithm was used since it offers a significant amount of data locality that can be exploited and at the same time it also exhibits a significant amount of load imbalance. The results for the rectangular workload and Jacobi algorithm are shown in Figs. 11 and 12, respectively. From Fig. 11, one can conclude that WPAS offers the best performance. This is because WPAS avoids assigning all the timeconsuming iterations to a single processor. The same is also true for the Jacobi algorithm (Fig. 12)-Even though the performance of GSS and AFS has improved, W A S and DPAS still performed better. Furthermore, both WPAS and DPAS perform well when execution time for iterations decreases with the increasing index (triangular workload), and the three affinity scheduling algorithms (AFS, DPAS, and WPAS) exhibited the same performance for a balanced workload. Based on these results, we can conclude that of the three affinity scheduling schemes tested, W A S performs well for a rectangular workload, both W A S and DPAS equally

93

DOALL AND DOACROSS LOOPS

--- GSS _ - _ _AFS _

6-

1-

01 1

I

I

2

3

1

I

4 5 Number of PEs

I

I

1

6

7

8

FIG 11. Performance of affinity scheduling algorithms on rectangular workload

4.0

r

_--

(N= 128).

GSS

----- AFS ....... DPAS

-WPAS

0.5 0.0 I

1

1

I

I

2

3

4

I

I

I

5

6

7

8

Number of PEs

FIG 12. Performance of affinity scheduling algorithms on Jacobi algorithm (matrix size = 128 x 128).

perform better than AFS for triangular workloads, and all three schemes perform equally on balanced workloads. Unlike DOALL, iterations of DOACROSS loops must be executed in a predetermined order to maintain inter-iteration dependencies. As can be expected, the serialization of iterations leads to a significant loss of parallelism (Chen and Yew, 1991). DOACROSS loops can be either regular or irregular. In a regular DOACROSS loop, inter-iteration dependence distances are constant.

94

A. R. HURSON ETAL.

The staggered distribution (SD) scheme and the cyclic staggered distribution (CSD) attempt to mask the communication delays resulting from inter-iteration dependencies. This is achieved by assigning a monotonically increasing number of iterations to higher numbered processors. CSD is a modified version of SD that overcomes the load imbalance caused by SD. These schemes perform better than other scheduling methods for regular DOACROSS loops. Effectiveness of the staggered schemes has been simulated and compared against those of static chunking and cyclic scheduling (Hurson et al., 1994a, 1994b; Lim et al., 1992). The test-bed includes a representative loop with the execution time of T = 50 and loops 3, 5, 11, 13, and 19 of the Livermore loops, which have cross-iteration dependencies (Feo, 1988): Loop 3 is the standard inner product function of linear algebra, Loop 5 is taken from a tridiagonal elimination routine, Loop 11 is a first sum, Loop 13 is a fragment from a two-dimensional particle-in-cell code, and Loop 19 is a general Linear Recurrence Equation. In their simulation: (1) The inter-PE communication delays are varied based on the ratio of communication time to iteration execution time (C/ T ) . (2) Delays due to LBD are computed for various k values, where k is the fraction of delay d to the execution time of an iteration T , k = d / T .

Pre-synchronized scheduling was not considered, since the best-case performance of this scheme would be equivalent to cyclic scheduling.

FIG 13.

Maximum speed-up (MS), n = 2000, C / T = 0.2.

95

DOALL AND DOACROSS LOOPS

Figure 13 shows the maximum speed-up attained by SD and CYC schemes for n = 2000 and C / T = 0.2. The speed-up for SD is significantly better than CYC for all cases. The average parallelism (AP) of the loop (can also be considered the maximum speed-up of a loop) when k = 0.1 is equal to 9.9, which is very close to the speed-up attained by SD even with communication overhead. The speed-up for CYC is less than two and about one when k = 0.7. Other results show that the maximum speed-ups attained by CYC for C / T = 1.0 and up are all less than one. This means that the loops can obtain better performance if they were executed serially in one PE. The number of PEs required to realize maximum speedup for CYC is shown in Fig. 14. This number drops to two independent of k for (C/TaO.S. This is due to the fact that for C / T = 0.5, after two iterations, the communication delay would be equivalent to the execution time of one iteration T . Therefore, the third and fourth iterations can be executed in the same two processors without any additional delay. The cycle will be repeated for every pair of iterations-using more processors does not affect the performance. Table VII shows the speed-up of SD over SC and CYC when the Livermore loops were simulated. Timing values and inter-processor communication used in the simulation were based upon instruction and communication times for the nCUBE 3200 (Dunigan, 1991). The ratio of communication to instruction execution ( C / E ) for the 3200 is 30. Loop 19 consists of two loops. Hence, each loop was tested separately (19(1) and 19(2)). The number of iterations for each loop were based on the specification of each loop. Loops 3, 5 , and 13 were simulated for n = 1000,

5

4 v)

w

\

a

\

I

0

$ 3 n

‘\,C/T=O.2

~

\

s z

\

\ \

2

CiT= 0.5

1 (

1

0.2

0.3

0.4

0.5 k

0.6

0.7

0.8 0.9

FIG 14. Number of PEs to attain maximum speed-up for cyclic scheduling.

96

A. R. HURSON H A L .

TABLEVII

su(sc)

SPEED-UP OF STAGGERED DISTRmmON RELATIVE TO STATIC C H U ” G AND CYCLIC SCHEDULING s u ( c Y c ) FOR THE LIVERMORE LOOPS WlTH C / E = 30. ACTUAL NWER OF PES USED BY STAGGERED DISTRIBUTION IN PARENTHESES.

PE=4

PE=8

LOOP #

k

C/T

su (SC)

su (CYC)

s u (SC)

s u (CYC)

3 5 11 13 19 (1) 19 (2)

0.25 0.30 0.25 0.05 0.33 0.27

3.75 3.00 3.75 0.71 3.33 2.73

1.20 1.21 1.21 1.07 1.24 1.23

10.72 8.22 10.50 2.82 7.53 6.86

1.21 (7) 1.16 (6) 1.21 (7) 1.14 1.34 (4) 1.28 (5)

13.10 (7) 9.35 (6) 12.18 (7) 5.05 7.53 (4) 6.93 (5)

Loop 11 with n = 500, and Loops 19(1 & 2) with n = 100. Although the number of iterations for Loops 11 can reach a maximum of 1000, Hurson et al. felt that 500 iterations would give a different perspective from Loop 3, since they both have the same value of k. The speed-up for SD increases compared to SC and CYC as the C / T ratio increases and decreases as the value of k increases. There was not much speed-up for Loop 13, since it had a negligible delay. For Loops 3, 5 , 11 and 19(1 & 2) when PE = 8, the SD scheme utilized fewer PEs than the available number of PEs. These results show that SD offers better resource utilization. Furthermore, the number of PEs required also decreases as the communication cost increases. Effectiveness of the cyclic staggered scheme (CSD) was also simulated and compared against the original Staggered scheme (SD) using the same test-bed. As can be seen in Figs. 15 and 16, CSD performed better than SD regardless of the values of n, C / T , and k , especially when the number of PEs was halfway between 2 and maxpe - 1. Finally, CSD attained an almost linear speed-up for smaller number of PEs, even with delays due to LBD and communication cost. Since CSD outperforms SD, we can conclude that CSD comes even closer to the maximum speed-up possible for a particular loop. However, these advantages are made possible if the number of PEs available is less than maxpe. The performance of the Staggered schemes (SD & CSD) has also been evaluated by running loop 13 of the Livermore loops on an C U B E 2 multiprocessor. These schemes have been compared to static chunking (SC) and cyclic scheduling (CYC). Loop 13 was chosen due to its size and the fact that it has a large amount of exploitable parallelism (AP=4.29). Furthermore, it possesses a reasonable amount of delay that hinders the

97

DOALL AND DOACROSS LOOPS

4r

k = 0.3

FIG 15.

1

I

I

Comparative analysis of the staggered schemes, C / T = 3.0.

k = 0.3

I

2

I

3

I

I

4

Number of PEs FIG 16. Comparative analysis of the staggered schemes, C / T = 5.0.

ability of easily executing the loop in parallel. Fig. 17 shows that the SD scheme again attained better speed-up. Furthermore, the SD scheme utilizes less than 8 processors, since it controls the number of processors that are used effectively. The peak speed-up for SD was 2.723 utilizing 7 PEs which is a 36.5% speed-up reduction from the average parallelism of 4.29, and SC had a 46.85% speed-up reduction utilizing 8 PEs. Furthermore, as expected, cyclic scheduling is ineffective if the communication cost is significant.

98

A.

'F

R. HURSON ETAL.

,

,

4

6

,

,

,

12

14

16

,CYC, .....................................................................

0

2

8

10

Number of PEs

FIG 17. Speed-up for loop 13.

Figure 18 shows the speed-up of the two staggered schemes. In was seen in Fig. 17 that the number of PEs needed by SD to achieve maximum speedup (rnaxpe) was seven. Hence, in Fig. 18, the number of PEs utilized for CSD is maxpe - 1. Interestingly, unllke the previous results, CSD performed better than SD only when the number of processors are between 3 and 5 . This was due to the additional overhead incurred to implement the cyclic

2

3

4 Number of PEs

5

6

FIG 18. Speed-up of the staggered schemes for loop 13.

DOALL AND DOACROSS LOOPS

99

staggered scheme-each PE has to continuously check for more iterations to execute after execution of each chunk. This overhead is also the reason for the small performance gain of the cyclic staggered schemes over SD compared to their previous results. With these results in mind, we can conclude that the staggered schemes are very effective in the execution of DOACROSS loops. In irregular DOACROSS loops the inter-iteration dependencies cannot be resolved at compile time. Runtime analysis is needed to identify the dependence patterns. In some applications, the overhead due to the runtime analysis can be amortized across repeated execution of the DOACROSS loops since the dependencies computed once can be reused. Such cases are common in scientific applications and applications that model the behavior of structurally invariant physical systems. Re-synchronized scheduling (PSS) schedules only those iterations for which all synchronization requirements are met. This eliminates processor idle cycles when processors with assigned iterations are waiting for dependent data. Chen’s runtime parallelization scheme (CRPS) requires two phases for scheduling loop iterations. The inspector phase determines the dependence relationships among data accesses, while the executor phase uses this information to execute the iterations in parallel. CRPS allows the overlapped execution of operations among dependent iterations, thus permitting better overall execution times. This, however, may cause more delays due to spin-locks used by iterations waiting their turn to access shared data. CRPS uses static cyclic scheduling for distributing loop iterations among processors which may lead to load imbalances when iterations take varying amounts of execution times. A study to compare CRPS with the Zhu-Yew runtime parallelization scheme (ZYRPS) was performed using a set of parameterized loops running on a 32-processor Cedar shared-memory multiprocessor (Chen and Yew, 1994). Loops with varying number of iterations, iteration grain size ( W ) , and varying number of references ( r ) per iteration with different dependence patterns were simulated. Table VIII shows the speed-up of CRPS using 32 processors when both the inspector and executor are performed (the inspector is not reused)-a loop with long dependence chain and therefore low parallelism is referred to as a “mostly serial” loop. On the other hand, a loop with a short dependence chain has a large amount of parallelism and is referred to as a “mostly parallel” loop. The results show that CRPS yields speed-ups as high as 14 when the inspector is not reused. The best results were attained when the size of the loop body ( W ) is large and the number of accesses ( r ) is low. Also, as expected, performance is better if the dependence chains are short (mostly parallel). They have also shown that a speedup as high as 27 can be attained (Chen and Yew, 1994) if the results of the

100

A. R. HURSON ETAL.

TABLEVIII SPEED-UPOF Cms USING 32 PROCESSORS

Mostly serial loop

W

N = 1600

N = 3200

N = 1600

N = 3200

8

1.10 1.50 2.35 3.66

1.13 1.56 2.65 4.76

1.36 2.26 3.49 4.67

1.42 2.26 3.76 6.18

8

2.41 3.93 6.23 9.55

2.48 3.86 6.94 11.95

2.96 5.67 9.23 11.65

2.97 5.40 9.57 13.60

T

160 ,us (941 cycles)

Mostly parallel loop

14

(3765 cycles)

TABLEIX RATIO BETWEEN THE EXECUTION TME OF ZYRPS AND CRPS USING 32 PROCESSORS

Mostly serial loop

W

r

160 ,us

(941 cycles)

8 (3765 cycles)

Mostly parallel loop

N = 1600

N = 3200

N = 1600

N = 3200

31.70 12.93 4.57 1.58

37.55 13.96 5.05 1.66

7.22 2.77 1.04 0.72

7.92 2.86 1.12 0.91

3 1.69 13.04 4.49 1.71

37.35 13.13 4.76 1.85

7.25 2.88 1.27 0.95

7.61 2.79 1.27 1.oo

inspector analysis is reused across loop invocations. Table IX shows the ratio between the execution time of ZYRPS and CRPS. As can be concluded, CRF’S is nearly always faster than ZYRPS. Moreover, it is also relatively faster in the mostly serial loops-this is due to its ability to overlap execution of dependent iterations. In summary, it is clear that scheduling of DOALL loops is well understood, while efficient solutions for DOACROSS loops require further research.

DOALL AND DOACROSS LOOPS

101

ACKNOWLEDGMENT This work in part has been supported by the National Science Foundation under Grants MIP9622836 and MIP-9622593.

REFERENCES

Chen, D. K., and Yew, P. C. (1991). An empirical study on DOACROSS loops. Proceedings Supercomputing, pp. 620-632. Chen, D. K., and Yew, P. C. (1994a). Redundant synchronization elimination for DOACROSS loops. Proceedings 8th International Parallel Processing Symposium, pp. 477-48 1. Chen, D. K., and Yew, P. C. (1994b). Statement re-ordering for DOACROSS loops. Proceedings International Conference on Parallel Processing. Cytron, R. (1986). DOACROSS: beyond vectorization for multiprocessors. Proceedings International Conference on Parallel Processing, pp. 836 -844. Dunigan, T. H. (1991). Performance of the Intel iPSC/860 and Ncube 6400 hypercubes. Parallel Computing, 17, 1285-1302. Feo, J. T. (1988). An analysis of the computational and parallel complexity of the Livermore loops. Parallel Computing, 7,163- 185. Hummel, S . F., Schonberg, E., and Flynn, L. E. (1992). Factoring: a method for scheduling parallel loops. Communications ofthe ACM, 35(8), 90-101. Hurson, A. R., Lim, J. T., Kavi, K., and Shirazi, B. (1994a). Loop allocation scheme for multithreaded dataflow computers. Proceedings 8th International Parallel Processing Symposium, 3 16-322. Hurson, A. R., Lim,J. T., and Lee, B. (1994b). Extended staggered scheme: a loop allocation policy. Invited Paper, World IMACS Conference, pp. 1321-1325. Krothapalli, V. P. and Sadayappan, P. (1990). Dynamic scheduling of DOACROSS loops for multiprocessors. Proceedings Parbase-90: International Conference on Databases and Parallel Architectures, pp. 66-75. Kruskal, C., and Weiss, A. (1985). Allocating independent subtasks on parallel processors. IEEE Transactions on Software Engineering, SE-11(10), 1001- 1016. Kwan, T. T, Totty, B. K., and Read, D. A. (1993). Communication and computation performance of the CM-5. Proceedings International Conference on Supercomputing, 192-201. Li, H., Tandri, S . , Stumm, M., and Sevcik, K. C . (1993). Locality and loop scheduling on NUMA multiprocessors.Proceedings International Conference on Parallel Processing, 11, 140- 147. Lilja, D. J. (1994a). Exploiting the parallelism available in loops. IEEE Computer, 27(2), 13-26. Lim, J. T., Hurson, A. R., Lee, B., and Shirazi, B. (1992). Staggered distribution: a loop allocation scheme for dataflow multiprocessor systems. The Fourth Symposium on the Frontiers of Massively Parallel Computation, pp. 310-3 17. Markatos, E. P. and LeBlanc, T. J. (1992). Using processor affinity in loop scheduling on shared-memory multiprocessors. Proceedings Supercomputing, pp. 104- 113. Polychronopoulos, C. D., and Banerjee, U. (1987). Processor allocation for horizontal and vertical parallelism and related speedup bounds. IEEE Transactions on Computers, C-36 (4), 4 10-420. Polychronopoulos, C. D., and Kuck, D. J. (1987). Guided self-scheduling: a practical scheduling scheme for parallel supercomputers. IEEE Transactions on Compurers, C-36(12), 1425- 1439.

102

A. R. HURSON ETAL.

Polychronopoulos, C. D., Kuck, D. J., and Padua, D. A. (1986). Execution of Parallel Loops on Parallel Processor Systems. Proceedings International Conference on Parallel Processing, pp. 5 19-527. Su, H. M., and Yew, P. C. (1991). Efficient doacross execution on distributed shared-memory multiprocessors. Proceedings Supercornputing, 842-853. Subramaniam, S., and Eager, D. L. (1994). Affinity scheduling of unbalanced workloads. Proceedings Supercornputing, pp. 214-226. Tang, P. and Yew, P. C. (1986). Processor self-scheduling for multiple-nested parallel loops. Proceedings International Conference on Parallel Processing, pp. 528 -535. Tzen, T. H., and Ni, L. M. (1991). Dynamic loop scheduling for shared-memory multiprocessors. Proceedings International Conference on Parallel Processing, 11, 247-250. Yue, K. K., and Lilja, D. J. (1994a). Parallel Loop Scheduling for High-Performance Computers. Technical Report No. HPPC-94- 12, Department of Computer Science, University of Minnesota. Zhu, C. Q., and Yew, P. C. (1987). A scheme to enforce data dependence on large multiprocessor systems. IEEE Transactions on Software Engineering, SE-13,726-739.

FURTHER READING

Abraham, S. G., and Hudak, D. E. (1991). Compile-time partitioning of iterative parallel loops to reduce cache coherency traffic. IEEE Transactions on Parallel and Distributed Systems, 2(3), 318-328. Chen, D. K., and Yew, P. C. (1992). A scheme for effective execution of irregular DOACROSS loops. Proceedings International Conference on Parallel Processing, 11, 285-292. Cytron, R. (1987). Limited processor scheduling of doacross loops. Proceedings Infernational Conference on Parallel Processing, pp. 226-234. Hamidzadeh, B., and Lilja, D. J. (1994). Self-adjusting scheduling: an on-line optimization technique for locality management and load balancing. Proceedings International Conference on Parallel Processing, 11, 39-46. Hudak, D. E., and Abraham, S. G. (1992). Compile-time optimization of near-neighbor communication for scalable shared-memory multiprocessors. Journal of Parallel and Distributed Computing, 15, 368-381. Krothapalli, V. P., and Sadayappan, P. (1991). Removal of redundant dependencies in DOACROSS loops with constant dependencies. IEEE Transactions on Parallel and Distributed Systems, 2(3), 281 -289. Lilja, D. J. (1994). The impact of parallel loop scheduling strategies on prefetching in a shared-memory multiprocessor. IEEE Transactions on Parallel and Distributed Systems, 5(6), 573-584. Polychronopoulos, C. D. (1987a). Advanced loop optimizations for parallel computers. In Lecture Notes in Computer Science No. 297: Proceedings International Conference on Supercornputing, pp. 255-277. Polychronopoulos, C. D. (1987b). Automatic restructuring of Fortran programs for parallel execution. Proceedings 4th International DFVLR Seminar on Parallel Computing in Science and Engineering, pp. 107- 130. Rudolph, D. C., and Polychronopoulos, C. D. (1989). An efficient message-passing scheduler based on guided self scheduling. Proceedings International Conference on Supercomputing, pp. 50-61.

DOALL AND DOACROSS LOOPS

103

Saltz, J. H., and Mirchandaney, R. (1991).The preprocessed DOACROSS Loop. Proceedings International Conference on Parallel Processing, 11, 174- 179. Saltz, J. H.,Mirchandaney, R., and Crowley, K. (1989).The Doconsider loop. Proceedings, International Conference on Supercomputing, pp. 29-40. Saltz, J. H., Crowley, K, Mirchandaney, R., and Berryman, H. (1990).Runtime scheduling and execution of loops on message passing machines. Journal of Parallel and Distributed Computing, 8,303-312. Saltz, J. H.,Mirchandaney, R., and Crowley, K. (1991). Runtime parallelization and scheduling of loops. IEEE Transactions on Computers, 40(5), pp. 603-612. Tzen, T. H., and Ni, L. M. (1992).Data dependence analysis and uniformization for doubly nested loops. Proceedings International Conference on Parallel Processing, 11.91 -99. Yue, K. K.,and Lilja, D. J. (1994b).Parameter Estimation for a Generalized Parallel Loop Scheduling Algorithm. Technical Report No. HPPC-94-18,Department of Computer Science, University of Minnesota.

This Page Intentionally Left Blank

Programming Irregular Applications: Runtime Support, Compilation and Tools JOEL SALTZ CHlALlN CHANG GUY EDJLALI YUAN-SHIN HWANG BONGKI MOON RAVl PONNUSAMY SHAMIK SHARMA ALAN SUSSMAN AND MUSTAFA UYSAL UMIACS and Department of Computer Science University of Maryland College Park, M D

GAGAN AGRAWAL Department of Computer and Information Sciences University of Delaware Newark, DE

RAJA DAS College of Computing Georgia Institute of Technology Atlanta, GA

PAUL HAVLAK Department of Computer Science Rice University Houston, 7X

Abstract In this chapter, we present a summary of the runtime support, compiler and tools development efforts in the CHAOS group at the University of Maryland. The principal focus of the CHAOS group's research has been to develop tools,

105 ADVANCES IN COMPUTERS, VOL. 45

Copyright Q 1997 by Academic Press Ltd. All rights of reproduction in any fonn reserved.

106

JOEL S A L E ETAL .

compiler runtime support and compilation techniques to help scientists and engineers develop high-speed parallel implementations of codes for irregular scientific problems (i.e. problems that are unstructured. sparse. adaptive or block structured). We have developed a series of runtime support libraries (CHAOS. CHAOS++) that carry out the preprocessing and data movement needed to efficiently implement irregular and block structured scientific algorithms on distributed memory machines and networks of workstations . Our compilation research has played a major role in demonstrating that it is possible to develop data parallel compilers able to make effective use of a wide variety of runtime optimizations. We have also been exploring ways to support interoperability between sequential and parallel programs written using different languages and programming paradigms .

1. Introduction

.....................................

2. CHAOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Runtime Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Summary of the Runtime System . . . . . . . . . . . . . . . . . . . . . . . 3 . Compilation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Program Slicing-based Loop Transformations . . . . . . . . . . . . . . . . . 3.2 Interprocedural Partial Redundancy Elimination . . . . . . . . . . . . . . . . 4 . Runtime Support for Pointer-based Codes: CHAOS+ + . . . . . . . . . . . . . . 4.1 Mobile Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Globally Addressable Objects . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Building Distributed Pointer-based Data Structures . . . . . . . . . . . . . . 4.4 Data Movement Routines . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 CHAOS+ + : Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5 . Interoperability Issues: Meta-Chaos . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Meta-Chaos Mechanism Overview . . . . . . . . . . . . . . . . . . . . . . . 5.2 Data Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Communication Schedule Computation . . . . . . . . . . . . . . . . . . . . 5.4 Meta-Chaos Applications Programmer Interface (API) . . . . . . . . . . . . 5.5 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 . Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Runtime Support for Irregular Problems . . . . . . . . . . . . . . . . . . . . 6.2 Runtime Support for Irregularly Coupled Regular Mesh Applications . . . . 6.3 Compiler Methods for Irregular Problems . . . . . . . . . . . . . . . . . . . 7 . summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

106 108 108 111 116

117 118 121 124 126 127 129 132 133 134 135 136 138 139 141 142 143 143 144 145 148 149

1. Introduction The last decade has seen the emergence of a variety of parallel computer architectures. A number of research groups and vendors have successfully

PROGRAMMING IRREGULAR APPLICATIONS

107

demonstrated that powerful parallel machines can be built from commodity hardware. While building these machines has become increasingly easy and inexpensive, programming such machines has remained a major challenge. The important tasks in providing better support for programming parallel machines can be summarized as follows: 0

0

0

0

carefully studying the features of important classes of parallel applications; developing runtime support systems for parallelizing these classes of applications; developing compilation techniques for incorporating such runtime support into compilers for parallel machines; developing techniques for interoperability of different classes of runtime support and programming paradigms.

The principal focus of the CHAOS group’s research has been to develop tools, compiler runtime support and compilation techniques to help scientists and engineers develop high-speed parallel implementations of codes for irregular scientific problems (i.e. problems that are unstructured, sparse, adaptive or block structured). We have developed a series of runtime support libraries (CHAOS, CHAOS++) that carry out the preprocessing and data movement needed to efficiently implement irregular and block structured scientific algorithms on distributed memory machines and networks of workstations [l, 34,531. Our compilation research has played a major role in demonstrating that it is possible to develop data parallel compilers able to make effective use of a wide variety of runtime optimizations. We have also been exploring ways to support interoperability between sequential and parallel programs written using different languages and programming paradigms. Successful techniques would facilitate the design of complex scientific applications that are composed of separately developed components, and provide the infrastructure required to make use of highly distributed data and computational resources. While our research on this topic is still at an early stage, we have already demonstrated an ability to compose parallel programs that have been written using different programming paradigms (e.g. High Performance Fortran [36] and message passing with MPI [61]). In this chapter, we give an overview of these activities in the CHAOS group at the University of Maryland. Our purpose is to give a flavor for the range of activities; detailed technical descriptions of each of the components of our research have been reported elsewhere. The central focus of our group for several years has been to develop a highly optimized runtime support

108

JOEL SALTZ ETAL.

library for irregular applications executed on distributed memory parallel machines. This runtime system is called CHAOS. In Section 2 we give an overview of this runtime library. We particularly focus on the major challenges associated with efficiently executing irregular applications on parallel machines, and techniques for optimizing their performance. We show how these techniques are incorporated into the widely distributed CHAOS runtime library. In Section 3, we describe effective techniques for incorporating the CHAOS runtime support into compilers for parallel machines. Over the last several years, our group has developed a number of successful prototype compilation systems demonstrating that irregular applications can be compiled for efficient execution on distributed memory parallel machines. We specifically describe two techniques: program slicing and interprocedural partial redundancy elimination. In Section 4, we describe an extension of CHAOS: CHAOS + +. The main limitation of CHAOS is that it primarily targets may-based applications. In CHAOS+ +, we have demonstrated how the essential ideas behind the CHAOS runtime system can be used even for distributed pointerbased codes with complex user-defined data structures. With the increasing availability of a variety of runtime systems and the emergence of a number of programming paradigms, it is becoming increasingly important to be able to interoperate between these systems and a tool that facilitates such paradigms. We have developed META-CHAOS, interoperability. This is described in Section 5. In Section 6 we present a review of related literature and in Section 7 we present a summary.

2.

CHAOS

In this section, we briefly describe the class of applications that CHAOS targets and outline the important features of the CHAOS runtime support library that we have developed. The compilation techniques for incorporating this runtime library as part of compilers for distributed memory parallel machines are presented in Section 3. 2.1

Overview

We have addressed a particular class of scientific application programs, called irregular programs, which require special runtime and compiler support for parallelization. Examples of irregular applications are found in unstructured computational fluid dynamic solvers, molecular dynamics codes (CHARMM, AMBER, GROMOS, etc.), diagonal or polynomial

PROGRAMMING IRREGULAR APPLICATIONS

109

preconditioned iterative linear solvers, and particle-in-cell (PIC) codes. Each of these applications simulates the interactions among objects in a physical system. The class of irregular applications can be further divided into two subclasses: static and adaptive. Static irregular applications are those in which each object in the system interacts with only a fixed, predetermined set of objects. Although the properties of the objects in the system (e.g. positions in space or velocities) will change during the course of computation, the set of objects that every object interacts with remains the same. For example, some computational fluid dynamics applications model systems, such as wings or fuselages of aircraft, using unstructured meshes. Each mesh point represents a location within the physical space and the values at each mesh point correspond to properties of the location (e.g. tension or pressure). Every mesh point communicates values with only neighboring mesh points. Although the values at mesh points change, the set of neighboring mesh points for every mesh point remains constant. In adaptive irregular programs, such as molecular dynamics codes, interactions between entities (e.g. atoms, molecules, etc.) change during computation (due to movement of the entities). However, in many adaptive applications the set of interacting objects for each object changes slowly over time. For instance, the molecules in a molecular dynamic simulation move very little in a single time step, and consequently, the set of molecules that interact with a given molecule remains static for many time steps in the simulation. On the other hand, some applications exhibit a high degree of adaptivity. One such example is a direct simulation Monte Carlo (DSMC) code for simulating the movements and collisions of molecules in a flow of gas on a spatial flow field domain overlaid by a Cartesian mesh. Only molecules in the same cell can possibly collide with each other during any time step. However, as molecules collide they move between mesh cells, and consequently, the set of molecules in a particular cell changes at almost every time step. A common characteristic of all these irregular applications is the use of indirect indexing of data arrays to represent the relationships among elements. This means that the data arrays are indexed through the values in other arrays, which are called indirecfion arrays. In other words, the data access patterns are determined by the values of elements in the indirection arrays. Figure 1 illustrates a typical irregular loop. The data access pattern is determined by indirection arrays ia and ib. The use of indirect indexing leads to difficulties in allowing a compiler to determine the data access patterns in a program (i.e. the indices of the data arrays being accessed), since the values of elements in the indirection arrays are not available until runtime. The analysis and optimization techniques

110

L1: L2:

JOEL SALTZ H A L .

real x (max-nodes ) , y (max-nodes ) i n t e g e r ia(max-edges). ib(max-edges)

! data arrays ! indirection arrays

d o n = 1. n-step d o i = 1 . sizeof-indirection-arrays x(ia(i)) = x(ia(i)) + y(ib(i)) end d o end do

! outer loop ! inner loop

FIG.1. An example with an Irregular Loop.

exploited by current compilers require that the indices to data arrays must be symbolically analyzable at compile time. Consequently, most current compilers can only recognize indices that are either constants or affine functions of loop induction variables. However, the use of indirection arrays prohibits existing compilers from identifying array data access patterns at compile time. The inability to determine the array access patterns prevents existing compilers from generating efficient code on distributed memory parallel systems. On distributed memory machines, the memory is partitioned into separately addressable spaces. Each processor can only access its local memory, and all data residing in the memory of other processors must be transferred to local memory before being referenced. Furthermore, data arrays of applications running on distributed memory systems are usually partitioned over local processor memories so that portions of the arrays can be accessed by multiple processors simultaneously. As a result, processors sometimes must reference data elements that reside in the memory of a remote processor. Therefore, compilers must either be able to determine the array access patterns at compile time and transfer remote data elements to local memory before referencing them, or generate code to efficiently identify remote array references at runtime and transfer the values when they are needed. Since the array access patterns for irregular applications cannot be determined at compile time, a two-phase inspectorlexecutor runtime strategy [45,59] has been developed to parallelize irregular applications. The inspectorlexecutor strategy works as follows. During program execution the inspector examines the data references made by a processor and calculates what off-processor data needs to be fetched and where that data will be stored once it is received. The executor then uses the information from the inspector to fetch remote data elements and perform the actual computation. This approach can be directly applied to static irregular applications to achieve high performance. However, additional optimizations are required to achieve good performance when adaptive irregular applications are parallelized using the inspector/executor approach.

PROGRAMMING IRREGULAR APPLICATIONS

L1:

L2:

do n = 1, n s t e p s d o i = 1 . sireof_indirection_arrays x(ia(i)) = x(ia(i)) end do

S:

.t

! outer loop ! inner loop

y(ia(i)) *y(ib(i))

! under c e r t a i n c o n d i t i o n s ! i n d i r e c t i o n a r r a y may Chdflile

if ( r e q u i r e d ) then regenerate i c ( : )

L3: do i = 1 . s i z e o f - i c x(ic(i)) = x(ic(i)) e n d do end do

111

! inner loop

+

y(ic(i))

FIG. 2. A code that adapts occasionally.

Since interactions are specified by indirection arrays, the adaptivity of irregular programs is represented by the frequency of modifications to indirection arrays. Figure 2 illustrates the properties of loops found in molecular dynamics codes and unstructured fluid dynamics codes. In the example, multiple loops access the same data arrays, but with different access patterns. In loop L2 the data arrays x and y are indirectly accessed using arrays i a and i b. In loop L3 the same data arrays are indirectly accessed using indirection array i c . The data access pattern in loop L2 remains static, whereas the data access pattern in loop L3 changes whenever the indirection array i c is modified. The adaptivity of the loop is controlled by the conditional statement S.

2.2 Runtime Preprocessing The library is designed to ease the implementation of computational problems on parallel architecture machines by relieving users of low-level machine specific issues. The CHAOS runtime library has been developed to efficiently handle irregular programs. Solving such irregular problems on distributed memory machines using CHAOS runtime support involves six major phases (Fig. 3). The first four phases concern mapping data and computations onto processors. The next two steps concern analyzing data access patterns in loops and generating optimized communication calls. Detailed descriptions of these phases can be found in [54]. In static irregular programs, Phase F is typically executed many times, while phases A through E are executed only once. In some adaptive programs where data access patterns change periodically but reasonable load balance is maintained, phase E must be repeated whenever the data access patterns change. In highly adaptive programs, the data arrays may need to be repartitioned in order to maintain load balance. In such applications, all the phases are repeated.

112

JOEL S A L E ETAL.

Phase A:

Data partitioning

Phase B: Phase C :

Data remapping Iteration partitioning

Assign elements of data arrays to processors Redistribute data array elements Allocate iterations to processors

Phase D:

Iteration remapping

Redistribute indirection array elements

Phase E:

Inspector

Translate indices; generate schedules

Phase F:

Executor

Use schedules for data transportation Perform computation

FIG.3. Solving irregular problems.

2.2. I

Schedule Generation

A communication schedule is used to fetch off-processor elements into a local buffer before the computation phase, and to scatter these elements back to their home processors after the computational phase is completed. Communication schedules determine the number of communication startups and the volume of communication. Therefore, it is inlportant to optimize the schedule generation. The basic idea of the inspector/executor concept is to hoist preprocessing outside the loop as much as possible so that it need not be repeated unnecessarily. In adaptive codes where the data access pattern occasionally changes, the inspector is not a one-time preprocessing cost. Every time an indirection array changes, the schedules associated with it must be regenerated. For example, in Fig. 4, if the indirection array i c is modified, the schedules inc sched c and sched-c must be regenerated. Generating inc-sched-c involves inspecting sched-ab to determine which off-processor elements are duplicated in that schedule. Thus, it must be certain that communication schedule generators are efficient while maintaining the necessary flexibility. In CHAOS, the schedule-generation process is carried out in two distinct phases. The index analysis phase examines the data access patterns to determine which references are off-processor, removes duplicate offprocessor references by only keeping distinct references in hash tables, assigns local buffers for off-processor references, and translates global indices to local indices. o The schedule generation phase generates communication schedules based on the information stored in hash tables.

0

The communication schedule for processor p stores the following information:

(1) send list-a list of arrays that specifies the local elements of a processor p required by all processors;

PROGRAMMING IRREGULAR APPLICATIONS L1: do n = 1 . risteps c d I I g a t h e r i y ( b e g i n - b u f f i , y. h ~ h e d - ~ i t ) ) cal 1 zero-out bufferix(begiri-biiffJ. o f t p - x ) 12: d o 1 = 1 . local_sireof_indir-array:

x(local-ia(i))

=

113

! outer loop ! f e t c h o f t - p r o c data ! i n i t i a l i z e buffer ! inner loop

x(local-ia(i ) ) + y(local-ia(i 1 ) * y(local-ib(i))

end d o

S:

L3:

i f ( r e q u i r e d ) then

m o d i f y part-ic(: CHAOS-clear-mask ( h a s h t a b l e . itamp-c 1 local-ic(:) = p a r t - i c ( : 1 stamp-c = CHAOS-enter-hash (local-ic 1 i nc-sched-c = CHAOS-i ncrernprit a l - s c h e d u l e ( s t a m p - c ) sched-ac = CHAOS-schedule(stamp-a. stamp-c) endif

! i c i s modified ! clear i c

c a l l gather(y(begin-buffPi. y . inc_sched-c) c a l l r e r o - o u t - b u f f e r ( x ( b e g i n - b u f f Z i , otfp-xP) d o i = 1, l o c a l - s i z e o f - i c x ( 1 ocal-i c ( i 1 ) = x ( 1 ocal-i c ( i 1 ) + y ( 1 ocal-ic e n d do

! incremental gather ! i n i t i a l i z e buffer ! inner loop

ca17 s c d t t e r - a d d i x ( b e g i n - h i i f f

i . x . sched-ac)

(

i

! e n t e r new i c

! i n c r e i n e n t a l sched ! sched f o r i a , ic

)

! scatter addition

end d o

FIG.4. Schedule generation for an adaptive program.

(2) permutation list-an array that specifies the data placement order of off-processor elements in the local buffer of processor p ; ( 3 ) send size-an array that specifies the sizes of out-going messages of processor p to all processors; (4) fetch size-an array that specifies the sizes of in-coming messages to processor p from all processors. The principal advantage of such a two-step process is that some of the index analysis can be reused in adaptive applications. In the index analysis phase, hash tables are used to store global to local translation and to remove duplicate off-processorreferences. Each entry keeps the following information: (1) global index-the global index hashed in; (2) translated address-the processor and offset where the element is stored; this information is accessed from the translation table; (3) local index-the local buffer address assigned to hold a copy of the element, if it is off-processor; (4) stamp-an integer used to identify which indirection array entered the element into the hash table. The same global index entry might be hashed in by many different indirection arrays; a bit in the stamp is marked for each such entry.

114

JOEL SALTZ ETAL.

Stamps are very useful when implementing adaptive irregular programs, especially for those programs with several index arrays in which most of them are static. In the index analysis phase, each index array hashed into the hash table is assigned a unique stamp that marks all its entries in the table. Communication schedules are generated based on the combination of stamps. If any one of the index arrays changes, only the entries pertaining to the index array, i.e. those entries with the stamp assigned for the index array, have to be removed from hash table. Once the new index array is hashed into the hash table, a new schedule can be generated without rehashing other index arrays. Figure 4 illustrates how CHAOS primitives (in pseudocode) are used to parallelize the adaptive problem. The conditional statement S may modify the indirection array ic. Whenever this occurs, the communication schedules that involve prefetching references of ic must be modified. Since the values of ic in the hash table are no longer valid, the entries with stamp stamp-c are cleared by calling CHAOS-clear-mask( ). New values of i c are then entered into the hash table by CHAOS-enter-hash( ). After all indirection arrays have been hashed in, communication schedules can be built for any combination of indirection arrays by calling CHAOS schedule( ) or CHAOS-incremental schedule ( ) with an appropriate combination of stamps. An example of schedule generation for two processors with sample values of indirection arrays i a , i b , and i c is shown in Fig. 5. The global references due to indirection array ia are stored in hash table H with stamp a,i b with stamp b and ic with stamp c. The indirection arrays might have some common references. Hence, a hashed global reference might have more than one stamp. The gather schedule sched-ab for the loop L2 in Fig. 4 is built using the union of references with time stamps a or b. The scatter operation for loop L2 can be combined with the scatter operation for the loop L3. The gather schedule inc-sched-c for loop L3 is built with those references that have time stamp c alone because references with time stamps a or b as well as with c can be fetched by using the schedule sched-ab. The scatter schedule for loops L2 and L3 is built using the union of references with time stamps a and c. PARTI, the runtime library that preceded CHAOS, also had support for building incremental and merged schedules [ 191. However, in PARTI, such schedules were built using specialized functions for these purposes. The CHAOS library restructures the schedule generation process and by using a global hash table provides a uniform interface for building all types of schedules. Such a uniform interface is easier to use for both users and compilers that automatically embed CHAOS schedule generation calls.

115

PROGRAMMING IRREGULAR APPLICATIONS

Initial

dlstrlbutlon o f data a r r a y s Processor 0

ProceLsor

112131415

1

161 7 1 H j Y I

10

Inserting indirection a r r a y s i n t o t h e hash t a b l e < P r o c e s s o r 0 ) i n d i r e c t i o n a r r a y la = 7 . 2 . 9 . 1 . . 3

addr

~

irldir-cxtion a r r a y

ib

=

-

4

addr

4

1.5.7.8.2

h

/

indirection a r r a y

ic

-/ It).

?.

13.

Q .

3

FIG. 5 . Schedule generation with hash

table.

2.2.2 Light- weigh t Schedules We have developed a variety of techniques that have been incorporated into CHAOS and efficiently support a class of applications (e.g. particle codes such as Direct Simulation Monte Carlo) which manifest access patterns that change from iteration to iteration [34]. When data access patterns are not repeated every iteration, performance becomes very sensitive to scheduling overheads. The important observation here is to recognize that in many such codes, the crucial communication intensive loops are actually implementing a generalized reduction in which it is not necessary to control the order in whch data elements are stored. In such applications a significant optimization in schedule generation can be achieved by recognizing that the semantics of set operations imply that

116

JOEL S A L E ETAL.

elements can be stored in sets in any order. This information can be used to build much cheaper light-weight communication schedules. During schedule-generation, processors do not have to exchange the addresses of all the elements they will be accessing with other processors; they only need to exchange information about the number of elements they will be appending to each set. This greatly reduces the communication costs in schedule generation. A light-weight schedule for processor p stores the following information: list of arrays that specifies the local elements of (1) send list-a processor p required by all processors; (2) send size-an array that specifies the outgoing message size of processor p to all processors; ( 3 ) fetch size-an array that specifies the incoming message size of processor p from all processors. Thus, lightweight schedules are similar to the previously described schedules except that they do not carry information of data placement order in the receiving processor. While the cost of building a light-weight schedule is less than that of regular schedules, a light-weight schedule still provides the same communication optimizations of aggregating and vectorizing messages [19].

2.3 Summary of the Runtime System The CHAOS system can be used 0 0 0

to provide irregular compiler runtime support; directly by programmers who wish to port irregular codes; as benchmarks used in comparisons with new irregular runtime support efforts.

In the course of developing and optimizing CHAOS, we have collaborated with applications scientists to produce CHAOS-based parallel implementations of a number of full application codes, including the molecular dynamics code CHARMM [33], an unstructured multigrid Euler solver [17,44], a Direct Simulation Monte Carlo code [47], a flame simulation code [46], and the PARKA high-performance knowledge-based system. The applications demonstrate that CHAOS can be applied on various types of programs. They range from programs with static access patterns (Euler solver), to adaptive applications (CHARMM) and highly adaptive codes (DSMC). Furthermore, CHAOS can efficiently handle programs with time-varying computational costs (flame simulation code), and even codes with loop carried dependencies (PARKA and sparse triangular solvers).

PROGRAMMING IRREGULAR APPLICATIONS

117

The high performance achieved by these applications is facilitated by a set of data and computation decomposition primitives in CHAOS. These primitives provide users with a convenient tool to partition data and computations over processors appropriately based on the properties of applications such that communication overheads can be minimized and good load balance can be achieved. The parallelized CHAOS-based application codes have frequently been among the best performing codes m their application area. For instance, we collaborated with Bernard Brook,; at the National Institutes of Health to develop a highly optimized parallel version of the molecular dynamics code CHARMM [33]. While Brooks w,xs at Harvard, he was one of CHARMM's original developers. Before he began to collaborate with our group, Brooks had developed his own parallel CHARMM implementation. In the course of our collaboration with Brooks, we were able to make use of the flexibility inherent in CHAOS to jointly develop and implement a new scheme for parallelizing molecular dynamics codes. The new scheme is a hierarchical decomposition method for which CHAOS proved to be well suited. 128 The performance obtained depended on the target architecture-on processors of the Intel iPSC/860, we achieved speed-ups of roughly 100 compared to Brooks' original sequential code using a standard benchmark case (myoglobin with water) that is frequently used to characterize the performance of molecular dynamics codes. These performance improvements are of considerable practical importance because molecular dynamics codes can require hours to days to run on workstations.

3. Compilation Methods In the previous section, we had mentioned how our runtime library can be used either by application programmers or by compilers. In this section, we give an overview of our research in developing techniques by which the compilers can automatically insert calls to runtime libraries. In [MI, we demonstrated a simple compiler that was able to make effective use of the inspector/e:tecutor framework. A serious limitation of this work was that the methods could only be used when programmers limited themselves to a constrained set of programming idioms. Over the past few years, one of the principal goals of our research on irregular problems has been to develop compilation methods for irregular problems that are able to make effective u'se of irregular problem runtime support. Our close collaboration with the Rice Fortran D [32]group has played a central role in the development of sophisticated prototype compilers that are now able to deal effectively with challenging irregular codes. We will highlight

118

JOEL SALTZ H A L .

two major compilation problems that we have addressed: A Program Slicing-based loop transformation technique and lnterprocedural Partial Redundancy Elimination.

3.1

Program Slicing-based Loop Transformations

Standard methods for compiling irregular accesses to distributed arrays generate a single inspector-executor pair [66]. The inspector analyzes the subscripts, a gathering communication occurs if off-processor data is being accessed, a single executor carries out the original computation, and finally a scattering communication occurs if off-processor data has been written. Many application codes contain computations with more complex access functions. Subscripted subscripts and subscripted guards can make the indexing of one distributed array depend on the values in another, so that a partial order is established on the distributed accesses. Loops with such multiple levels of indirection commonly appear in unstructured and adaptive applications codes associated with particle methods, molecular dynamics, sparse linear solvers, and in some unstructured mesh computational fluid dynamics solvers. We have presented various optimizations which are part of our runtime support system and also methods for handling loops with complex indirection patterns by transforming them into multiple loops each with a single level of indirection. We have implemented this method in the Fortran D compiler developed at Rice University. Our experiments demonstrate substantial speed improvements through message aggregation [ 18 1. Figure 6 shows a loop with a single level of indirection. In this example, assume that all the arrays are aligned and distributed together in blocks among the processors, and that the iterations of the i loop are likewise block partitioned. The resulting computation mapping is equivalent to that produced by the owner computes rule, a compiler heuristic that maps the computation of an assignment statement to the processor that owns the lefthand side reference. Data array y is indexed using the array i a , causing a single level of indirection. The compiler produces a single executable; a copy of this executable runs on each processor. Each processor’s copy of the program determines what processor it is running on, and uses this information to determine where its

do i = 1. n x(i) = y(ia(i)) end do

=

z(i)

FIG.6. Simple irregular loop.

119

PROGRAMMING IRREGULAR APPLICATIONS

data and iterations fit into the global computation. Let my$eZems represent the number of iterations of the loop in Fig. 6 assigned to a processor, and the number of elements from arrays x , y , z and ia mapped to a processor. We obtain the following code (omitting some details of communication and translation for clarity): d o i = 1 , myBelems index$y(i ) = i a ( i 1 enddo

// i n s p e c t o r

. . . f e t c h y e l e m e n t s t o l o c a l memory. modify indexBy so t h a t i t r e f e r s t o l o c a l l y stored copies o f t h e e l e m e n t s o f y accesstad i n t h e l o o p . . . d o i = l , myBelems x ( i )=y(index$y(i) enddo

/I executor

1+z(i)

Many application codes contain computations with more complex access functions. Subscripted subscripts and subscripted guards can make the indexing of one distributed array depend on the values in another, so that a partial order is established on the distributed accesses. Loops with such multiple levels of indirection commonly appear in unstructured and adaptive applications codes associated with particle methods, molecular dynamics, sparse linear solvers, rind in some unstructured mesh computational fluid dynamics solvers. Figure 7 depicts examples of these kinds of loops. Consider the loop in Fig. 6, but assume now that while all the other arrays and the loop iterations are block-distributed so that x ( i 1, ia ( i 1 , and y ( i 1 are all on the same processor as iteration i, array z has been cyclicdistributed so that z ( i ) usually lies on a different processor. In addition to the irregular potential off-processor references to y , we now have regular off-processor references to z. Let my$elems represent the number of elements from each array and iterations from the original loop locally mapped, let n$procs represent the number of processors, and let rny$id represent the processor identification = 1. n i f (im:(i) ) t h e n x ( i d ( i ) ) = ... endif enddo

do i

do i

=

1, n

x(ia(ib(i))) enddo

A

=

_..

B

do i = 1, n do j = i a ( i ) , i a ( i + 1 ) x ( i a ( j ) ) = ... enddo enddo

C

FIG.7. Dependence between distributed array accesses

120

JOEL SALT2 ETAL.

(ranging from 0 to n$procs- 1). We obtain the following code (omitting some details of communication and translation for clarity): d o i = l , my8elems lI inspector indexdz ( i 1 =my$ id + i*n $ p rocs index$y(i) = i a ( i )

enddo

. . . f e t c h y a n d z elements t o l o c a l memory, modify indexdy a n d indexdy t o r e f e r t o l o c a l l y s t o r e d co p i es of t h e elements o f y accessed i n t h e l o o p . . . d o i = l , my$elems x(i )

=

y(index$y(i 1)

// e xe c utor + z(index$z(i ) )

enddo

Because the subscripting for y accesses only local elements of the distributed array ia , its inspector requires no communication and can be combined with the inspector for Z. However, if an inspector needs to make a potentially non-local reference (either because of a misaligned reference or multiple levels of indirection), the single inspector-executor scheme breaks down. The inspector must itself be split into an inspector-executor pair. Given a chain of n distributed array references, each in order depending on the previous, we must produce n + 1 loops: one initial inspector, n - 1 inspectors that also serve as the executor for the previous inspector, and one executor to produce the final result(s). The transformation required to eliminate non-local references in loops can be divided into two distinct parts. e The first part of the transformation process breaks up a loop whose references have multiple inspection levels into multiple loops whose references have no more than one inspection level. Each nonlocal reference is then a distributed array indexed by a local index array. e The second part of the transformation completes the inspectorexecutor pairs for each of the loops or code segments generated. For the completion of the inspector-executor pairs we insert runtime library calls for collective communication and for global to local address translation.

In [ 181, we present the algorithms required to perform the first part of the transformation; insertion of collective communication is presented in [30 1. The essence of the technique for the first part of the transformation can be described as follows: we take complicated subscript computations, replicate

PROGRAMMING IRREGULAR APPLICATIONS

121

them outside their original loops, and save the sequence of subscripts in a local index array. The subscript computations must be copied, transplanted back into the original program and the values saved without disturbing other computations. The transplant descriptor is a data structure which contains information to replicate the value of an expression at another point in the program. The replicated computati.on is built by program slicing. In the literature, a slice is a subset of a program’s statements determined to affect the value of a variable at a particular program point [65].Our method builds each slice as a derivative program fragment, tailored to compute the values of interest and ready to be transplanted elsewhere in the program. We have implemented our loop transformation algorithm as a part of the Fortran D compiler. We have successfully parallelized a number of kernels derived from various irregular applications. The structure of these applications are such that they cannot be parallelized using existing compilation techniques without a severe degradation in performance. The automatic method that can be used to pafallelxze these kernels, other than the technique described in this paper, is runtime resolution [ 5 8 ] , but that causes poor performance as each off-processor reference is communicated separately.

3.2

Inte rprocedura I Pa rtia I Redundancy EIi minatio n

An important optimization required for irregular applications is placement of communication preprocessing and communication statements. The key idea underlying these schemes is to do the placement so that redundancies are reduced or eliminated. These schemes are based upon a classical data flow framework called partial redundancy elimination (PRE). PRE encompasses traditional optimizalions like loop invariant code motion and redundant computation elimination. Our interest is in applying the PRE framework for optimizing placement of communication preprocessing statements and collective communication statements. The first step in this direction was to extend the existing PRE framework interprocedurally. For applying this transformation across procedure boundaries, we need a full program representation. We have chosen a concise full program representation, which will allow efficient data flow analysis, while maintaining sufficient precision to allow useful transformations and to ensure safety and correctness of transformations. In Fig. 8, we show an example program (which involves irregular accesses to data). The program representation FPR for this program is shown in Fig. 9. We now briefly show how partial redundancy elimination is used for optimizing placement of communication preprocessing calls and collective communication routines. The details of our techniques are available elsewhere [2,3,4]. We use the example presented in Fig. 8 to show the

122

JOEL SALTZ ETAL

Program Example Real X (nnodes 1 , Y (nnodes 1 Real Z ( n e d g e s 1 . W ( n e d g e s 1 Integer IA(nedges). IB(nedges)

C

Input data ... do 10 i = 1. 20 C a l l Proc-A(X.Y.Z.IA.IB) i f ( n t . g t . 0) then C a l l Proc-B(X.W. I A ) endif do 5 0 j = 1. n e d g e s

50 10

continue continue end

IB(j)

20

30

35

40

45

=

.. I B ( j ) . . Main

S u b r o u t i n e Proc-A(A.B.C.D.E) d o 20 i = 1. n e d g e s C(i) = C(i) + A(D(1)) continue d o 30 i = 1 . n e d g e s C(i) = C(i) + B(E(i)) continue do 3 5 i = 1 . n n o d e s B ( i 1 = ... continue end

Subroutine Proc-B(X.W.IA) do 4 0 i = 1 , n e d g e s W(i) = W ( i ) + X(IA(i)) continue do 4 5 i = 1, n n o d e s X ( i ) = ... continue end

FIG. 8. An irregular code.

0

Procedure entry node

I--

{ :' Procedure return node

.-/'

FIG.9. FPR for the example program.

communication preprocessing inserted by initial intraprocedural analysis, and the interprocedural optimizations that can be done. Initial intraprocedural analysis inserts one communication preprocessing call and one gather (collective communication routine) for each of the three data parallel loops in the program shown in Fig. 10. We have omitted several parameters from both the communication preprocessing routines and collective communication routines to keep the examples simple. The communication preprocessing routine IrregSched takes in the indirection array and information about distribution of the data arrays.

PROGRAMMING IRREGULAR APPLICATIONS

P r o g r a m Example Real X ( n n o d e s 1 , Y ( n n o d e s ) Real Z ( n e d g e s ) , W ( n e d g e s ) I n t e g e r IA(nedges), IB(nedges 1 C Input data . . . do 10 i = 1. 20 C a l l Proc-A(X, Y , Z , I A . IB 1 i f ( n t . g t . 0) t h e n C a l l Proc-B(X,W.IA) endi f do 50 j = 1 , n e d g e s - l o c a l I B ( j ) = .. I B ( j ) .. 50 c o n t i n u e 10 continue end

S u b r o u t i n e Proc-A(A.B.C.D.E)

Schedl = Irreg_Sched(D) Ca 1 1 G a t h e r ( A . S c h e d l ) do 20 i = 1 , n e d g e s - l o c a l C ( i ) = C(i) + A(D(i)) 20 c o n t i n u e SchedP = I r r e g - S c h e d l E l Ca 1 1 G a t h e r f B , S c h e d P ) do 30 i = 1 , n e d g e s - l o c a l C(i) = C(i) + B(E(i)) 30 [ c o n t i n u e do 3 5 i = 1 . n n o d e s - l o c a l B ( i ) = ... 35 c o n t i n u e end S u b r o u t i n e Proc-B(X.W.IA) Sched3 = Irreg-Sched ( I A 1

C a l l Gather(X.Sched3) do 40 i = 1 . n e d g e s - l o c a l W ( i ) = W ( i ) + X(IA(1’)) 40 c o n t i n u e do 4 5 i = 1 . n n o d e s - l o c a l X ( i ) = ... 45 c o n t i n u e end

C

123

P r o g r a m Exarnpl e Real X ( n n o d e s ) . Y ( n n o d e s ) Real Z ( n e d g e s ) , W ( f l e d g e s ) I n t e g e r IA(nedges), IBcnedges) Input data

Schedl

=

lrreg_SchediIA)

do 1 0 i = 1. 20 [ C a l l Proc-A(X.Y.Z.IA.IB) i f ( n t . g t . 0) t h e n C a l l Proc-B(X.W.IA) endi f do 50 j = 1. n e d g e s - l o c a l IB(j) = .. I B ( j ) . . 50 c o n t i n u e 10 c o n t i n u e end S u b r o u t i n e Proc-A(A.B.C.D.E) Ca 1 1 G a t h e r ( A , S c h e d l ) do 20 i = 1 . n e d g e s - l o c a l C ( i ) = C ( i ) + A(D(i)) 20 c o n t i n u e SchedP = I r r e g - S c h e d ( E )

Call G a t h e r ( B . S c h e d 2 ) do 30 i = 1 , n e d g e s - l o c a l C(i) = C(i) + B(E(i)) 30 c o n t i n u e do 35 i = 1 , nnodes l o c a l B ( i ) = ... 35 c o n t i n u e end S u b r o u t i n e Proc-B(X.W,IA) do 4 0 i = 1 , n e d g e s - l o c a l W(i) = W ( i ) + X(IA(1)) 40 c o n t i n u e do 45 i = 1 , n n o d e s - l o c a l X ( i ) = ... 45 c o n t i n u e end

FIG. 10. Result of intraprocedural compilation (left), and code after interprocedural optimizations (right).

124

JOEL S A L E H A L .

In Fig. 10, we also show the program after interprocedural optimization of communication preprocessing routines and gather routines. We refer to the loop in the main of the program (which encloses the calls to the routines Roc-A and R o c B ) as the time step loop. Initially, interprocedural partial redundancy elimination is applied for communication preprocessing statements. Since the array IA is never modified inside the time step loop in the main procedure, the schedules Schedl and Sched3 are loop invariant and can be hoisted outside the loop. Further, it can be deduced that the computations of Schedl and Sched3 are equivalent. So, only Schedl needs to be computed, and the gather routine in P r o c B can use Schedl instead of Sched3. For simplicity, Schedl is declared to be a global variable, so that it does not need to be passed as a parameter at different call sites. After placement of communication preprocessing statements is determined, we apply the IPRE analysis for communication routines. The gather for array I A in routine Proc-B is redundant because of the gather of array D in routine Roc-A. Note that performing LPRE on communication preprocessing statements before applying IPRE on communication statements is critical, since it is important to know that Sched3, which affects the gather for array I B, can be replaced by Schedl . We developed an initial prototype implementation of our schemes as an extension to the existing Fortran D compilation system. We experimented with two codes: an Euler solver on an unstructured grid and a molecular dynamics template. Our experimental results show that optimized interprocedural placement of communication and communication preprocessing can improve performance by 30-50% in the codes which require runtime preprocessing. 4.

Runtime Support for Pointer-based Codes: CHAOS+ +

Unfortunately, many existing runtime systems for parallelizing applications with complex data access patterns on distributed memory parallel machines fail to handle pointers. Pointers are frequently utilized by many applications, including image processing, geographic information systems, and data mining, to synthesize complex composite data types and build dynamic complex data structures. CHAOS+ + is a runtime library for object-oriented applications with dynamic communication patterns. It subsumes CHAOS, which was developed to efficiently support applications with irregular patterns of access to distributed arrays. In addition to providing support for distributed arrays through the features of the underlying CHAOS library, CHAOS++ also provides support for distributed pointer-based data structures, and allows flexible and efficient data exchange

PROGRAMMING IRREGULAR APPLICATIONS

125

of complex data objects among processors [16]. CHAOS + + is motivated by the way pointers are often used in many real parallel applications. In these applications, hierarchies of data types are defined, such that ones at higher levels serve as containers for those at lower levels. Pointers are often used by container objects to point ro the objects they contain. Objects that are solely contained within a container object are referred to as sub-objecfs. A sub-object is effectively part of its container object, although it does not necessarily occupy memory locations within that of its container object. Objects of data types at the top of the hierarchy (i.e., objects of the outermost container class) can further be connected through pointers, forming complex pointer-based data structures. Such data structures are dynamic: their elements are often created and/or deleted during program execution, and accessed through pointer dereferences. Access patterns to such data structures cannot be determined until runtime, so runtime optimization techniques are required. As an example, Fig. 11 shows the declaration of a set of C + + classes, which can be used to describe how pixels of an image are clustered into regions, and how regions containing pointers to adjacent regions form a map. The R e g i on class is implemented as a container class for the P i x e l class, so that a P i x e l is a sub-object of a R e g i on. Since different regions may consist of different numbers of pixels, the Reg i on class uses a pointer to an array of its constituent pixels. A set of regions interconnected with pointers then form a graph, defined by the class R e g i on-Map. Figure 12 gives an example of such a graph. When a graph is partitioned among multiple processors, the runtime system must be able to traverse pointers to support remott: data accesses. CHAOS + + is implemented a!; a C+ + class library. The design of the library is architecture-independent and assumes no special support from C+ + compilers. CHAOS + + currently uses message passing as its transport

class Pixel { int x. y:

//

d

s i n g l e p i x e l o f an image

// x . y c o o r d i n a t e s

t ; c l a s s Region I i n t num-pi x e l s ; Pixel * p i x e l s : i n t num-neighbors: Region * * n e i g h b o r s :

// a r e g i o n c o n s i s t i n g o f p i x e l s / / rrimber o f p i x e l s / / (in a r r a y o f p i x e l s / / number o f a d j a c e n t r e g i o n s // l i s t o f p o i n t e r s t o a d j a c e n t r e g i o n s

I: c l a s s Region-Map

I

//

I Region * r e g i o n :

//

d

p o i n t e r t o some R e g i o n i n t h e g r a p h

t; FIG. 11. Pointer-based data structures containing complex objects.

126

JOEL S A L E ETAL.

i

-

p T J

array of Pixels

Region pointer to Region

-----

pointer to Pixel

FIG. 12. A graph of Region objects.

layer and is implemented on several distributed memory machines and on networks of workstations, 4.1

Mobile Objects

CHAOS++ defines an abstract data type, called M o b j e c t , for mobile objects. These are objects that may be transferred from one processor to another, so they must know how to marshal and unmarshal their contents. In general, the object model that CHAOS++ supports is one in which an object is owned by one processor, but other processors may possess shadow copies of an object, as will be discussed in Section 4.2. This implies that a distributed array of objects is treated by CHAOS + + as multiple objects, so that it can be distributed across multiple processors. The M o b j e c t class is designed as a base class for all objects that may migrate between processors, and/or will be accessed by processors other than the ones they are currently assigned to. M o b j e c t contains two pure virtual member functions, p a c k and u n p a c k , which must be supplied by any derived objects so that CHAOS++ can move or copy a M o b j e c t between processors. An implication of requiring the user to provide p a c k and u n p a c k functions for all M o b j e c t s is that CHAOS++ does not allow distributed arrays of C+ + base types (e.g. d o u b l e , i n t , etc.), because C + + does not allow a user to define member functions for base types. One way for an application user to implement such a distributed array using CHAOS+ + is

PROGRAMMING IRREGULAR APPLICATIONS

127

to define a class derived from Mo b.j ec t consisting solely of a member with the base type, and then provide the p a c k and u n p a c k functions for that class. In the applications we have investigated so far, this is not a major problem, because all the distributed arrays have been arrays of complex structures. For an object that occupies contiguous memory, the p a c k and u n p a c k functions consist of a simple memory copy between the object data and the message buffer. For a more complex object that contains pointers to subobjects, the p a c k and u n p a c k provided by the user must support deep copying. p a c k can be implemented by deriving the classes for all subobjects from Mobject, and having the p a c k function for an object recursively call the p a c k function of each of its sub-objects. On the receiving processor side, the u n p a c k function must perform the inverse operation (i.e., recursively unpack. all sub-objects, and set pointer members to sub-objects properly).

4.2

Globally Addressable Objects

In pointer-based data structures, elements (objects) may be added and removed dynamically. No static global names or indices are associated with the elements, and accesses to those elements are done via pointer dereferences. It is therefore not feasible for the runtime system to rely on the existence of global indices, as would be the case for distributed arrays. Furthermore, partitioning a pointer-based data structure may assign two elements connected via pointers to two different processors. This raises the need for global pointers. As supported by such languages as Split-C [41], C C + + [ 2 3 , 3 5 ] , and p C + + [11,67], a global pointer may point to an object owned by another processor, and effectively consists of a processor identifier and a local pointer that is only valid on the named processor. In CHAOS + +, these problems are addressed by introducing an abstract data type, called globally addressable objects, which we now discuss in detail. One obvious mechanism for managing global pointers is to define a C + + class for global pointers and ovmload the dereference operator (*), so that whenever a global pointer is ciereferenced, the necessary interprocessor communication is automatically generated. This approach, however, does not allow collective communication, which is an important technique for achieving high performance in a loosely synchronous execution model. Furthermore, dereferencing a global pointer requires a conversion between a reference to a remote object and a reference to a local buffer. This imposes additional overhead with every dereference of a global pointer. It is more desirable to perform the conversion only when the binding between the global pointer and the local buffer changes.

128

JOEL S A L E ETAL.

Instead of defining a class for global pointers, CHAOS++ defines an abstract C + + base class for globally addressable objects, or Gobjects. A Gobject is an object with ownership assigned to one processor, but with copies allowed to reside on other processors. These copies are referred to as ghost objects; each processor other than the one assigned ownership of the Gob j e c t may have a local copy of the Gob j e c t as a ghost object. Figure 13 shows a graph which is partitioned between two processors. The dashed circles represent ghost Gobjects. Each Gobject has a member function that determines whether it is the real object, or a ghost object. A ghost object caches the contents of its remote counterpart, but the decision about when to update a ghost object from a real object is determined by the application. The contents of ghost objects are updated by explicit calls to CHAOS++ data exchange routines. This description implies that all Gob j e c t s must also be CHAOS + + Mobj e c t s, to support transfer of data between real and ghost objects that are owned by different processors. In the object model supported by CHAOS+ +, a pointer-based data structure is viewed as a collection of Gobjects interconnected by pointers. Partitioning a pointer-based data structure thus breaks down the whole data structure into a set of connected components, each of which is surrounded

.................

~..

processor 0

-..-.-._j

processor 1

(b) FIG.13. Partitioning a graph between two processors: (a) shows the graph to be partitioned along the dotted vertical line; (b) shows the two components as the result of the partition, with one layer of ghost objects.

PROGRAMMING IRREGULAR APPLICATIONS

129

by one or more layers of ghost objects. In the partitioned data structure, pointers between two Gobjects residing on the same processor are directly represented as C + + pointers. Pointers to Gobjects residing on other processors are represented as C + + pointers to local ghost object copies of the remote Gob j ects. Since accesses to elements of a pointer-based data structure are done through pointers, the layers of ghost objects surrounding each connected component encapsulate all the possible remote accesses emerging from that connected component. Accesses to remote objects that are more than one ‘link’ away can be satisfied by creating ghost objects for remote objects that are pointed to by local ghost ob,jects, and filled on demand. A mapping sbucture is constructed by CHAOS + + for each distributed pointer-based data structure on each processor, to manage the ghost objects residing on that processor. The mapping structure maintains the possible remote accesses from the local processor by creating a list of all the ghost objects. The mapping structure also records the processor number and the local address of the remote object that each ghost object represents. The mapping structure is used during the inspector phase of a collective communication operation for translating global references into processor and local address pairs to generate communication schedules. The CHAOS + + data exchange routines then use the schedules to transfer data between real Gobj ec t s and ghost Gob j e c t s in the executor phase.

4.3

Building Distributed Pointer-based Data Structures

A distributed pointer-based data structure is defined by its nodes and edges, as well as by how it is partitioned among the processors. Distributed pointer-based data structures are usually built in two steps: all processors first construct their local connected components, and then compose those components to form the final distributed data structure. In general, there are two possible scenarios: one in which each node in the structure has a globally unique identifier, and another in which no such identifier exists. In both cases, CHAOS+ + provides primitives to assist in the construction of such structures, and to create their corresponding mapping structures.

4.3.7

Nodes With Unique Identifiers

In many applications, each node in a pointer-based data structure is associated with a globally unique identifier. In such cases, nodes can be named by their associated identifiers, edges can be specified by the identifiers of their two end points, and partitioning information can be described by pairs of processor numbers and node identifiers. One example

130

JOEL SALT2 ETAL.

in which this is usually the case is an unstructured CFD code, in which a node table is used to record all the node information for the graph (including initial values, node identifiers and the assigned processor numbers), and an edge table is used to specify the connectivity of the graph. When node identifiers are available, CHAOS++ provides a hash table on each processor that stores, for each node of the local component of the data structure, the node identifier, its local address, and its assigned processor number, if known. Records in the table are hashed by node identifiers, so accesses through node identifiers are fast. Figure 14 demonstrates how the CHAOS++ hash table assists in constructing a distributed data structure. Applications can store information class Graph-Node : Gobject I . . . I ; chaosxx-hash_table htable: I / d e c l a r e d CHAOS++ h a s h t a b l e Graph-Node * n o d e , *from-node. *topnode: I / p o i n t e r s t o g r a p h nodes I / assume a r e p l i c a t e d n o d e t a b l e and edge t a b l e

/ I go t h r o u g h t h e node t a b l e f o r ( i = 0: i
k

I1 an e d g e f r o m a l o c a l node if

((to-node

=

htable.get-pointer(k))

==

NULL){

I / a g h o s t n o d e needs t o b e c r e a t e d

to-node

=

n e w Graph-Node(. . . 1 :

/ I c r e a t e a g h o s t node

I / r e g i s t e r t h e n o d e w i t h CHAOS++ hash t a b l e htable.set-pointer(k. to-node);

I I / add a p o i n t e r

f r o m *from_node t o *to-node

from-node->neighbor[from->nurnber-of-neighbors++l

=

to node;

I

I / I c r e a t e t h e m a p p i n g s t r u c t u r e f o r CHAOS++

TPMapping

*map

=

htable.create-map(

):

FIG.14. Constructing a distributed graph with a CHAOS+ + hash table.

PROGRAMMING IRREGULAR APPLICATIONS

131

about their distributed pointer-based data structures in any format. For simplicity, the application in this example uses replicated C+ + arrays Node-Tab1 e and Edge-Tab1 e. Figure 14 consists of three steps. In the first step, the program scans through the node table and registers node information in the hash table. Nodes assigned to the local processor are created and initialized in this step. Nodes that are not assigned to the local processor are marked by recording their assigned processor numbers. If a node is known to be remote, but the owner processor is not yet known, the constant CHAOSXX-REMOTE is used as the owner processor. CHAOS+ + uses this information to bind local copies with remote copies, in the final step of the program. Exact knowledge of the owner processors of remote nodes makes that process more efficient, since a single request suffices to locate a remote node. When the owner processor is not known to the runtime system, locating a remote node may require the local processor to send multiple requests. In the second step, the program scans through the edge table and creates the specified pointers. Only edges that originate from a node assigned to the local processor are of interest. The CHAOS+ + hash table is used to find the addresses of the end nodes, specified by the node identifiers stored in the edge table. Nodes that are assigned to the local processor are created in the first step, and their addresses can be retrieved from the hash table through their node identifiers. Nodes that are not assigned to the local processor are created as ghost objects and registered with the hash table upon their first appearance in the edge table. At the end of the second step, each processor has constructed its local component of the distributed data structure, containing both objects assigned to the local processor and ghost objects. The final step of Fig. 14 constructs an appropriate mapping structure. The mapping structure is of type T P M a p p i ng, and records the association between ghost objects and real remote objects, using the information stored in the hash table. This is done via a collective communication, in which all processors exchange the node identifiers and the local addresses stored in their CHAOS + + hash tables.

4.3.2 Nodes Without Unique Identifiers CHAOS + + provides another interface for applications with objects that do not have unique identifiers. Since there is no natural way to name the objects in distributed pointer-based data structures, the connectivity of the data structures in these applications is usually determined from applicationdependent information. For example, the initial graph built by the image segmentation application described in Section 4.5 is defined by the input

132

JOEL SALT2 ETAL.

image. In this case, the CHAOS + + library assumes that each processor running the target application is able to build its assigned component of the distributed data structure. Furthermore, each processor is assumed to have the information necessary to order its boundary objects in a way that is consistent with the ordering on the other processors. CHAOS + + primitives can then be used to associate the corresponding boundary objects, to compose the local components into a global data structure, and generate an appropriate mapping structure. To be more specific, each processor i provides, for each other processor j , two lists of object pointers, local,, and ghost,,. The list local,, consists of the objects that are owned by processor i but have ghost objects on processor j , and ghost,, consists of the ghost objects residing on processor i that correspond to real objects on processor j . To compose the components between two processors correctly, object pointers in the corresponding lists must be listed in the same order. That is, object pointers in local,, must match exactly with the object pointers in ghost,,, one-to-one, and those in ghost,, must match with those in local,,, one-to-one. As an example, to compose the two components on processors 0 and 1 in Fig. 13, the processors would consmct the following matching boundaries: processor 0: loc~l,,= { D , E } processor 1: local,, = { J , F , G ]

ghost,, ghost,,

= { J', F ' ,

G' ]

= { D ' , E' )

The ordering of the lists implies that objects D and E on processor 0 are associated with ghost objects D' and E ' , respectively, on processor 1, and that objects J , F , and G on processor 1 are associated with ghost objects J ' , F ' , and G ' , respectively, on processor 0. Given the information for the boundaries between every pair of processors, CHAOS+ + can associate real objects with their corresponding ghost objects (i.e., compute the local addresses on each processor, for later communication) through collective communication, and store that information in the mapping structure.

4.4

Data Movement Routines

Data transfer between real and ghost G o b j e c t s is carried out by the CHAOS + + data movement routines. CHAOS + + allows processors either to update ghost objects with data from their corresponding remote objects on other processors (as in a CHAOS gather operation), or to modify the contents of remote objects using the contents of ghost objects (as in a CHAOS scatter operation). The data movement routines use the p a c k and unpack functions of Mobj ec t s to enable deep copying. The communication

PROGRAMMING IRREGULAR APPLICATIONS

133

schedules generated from the mapping structure, constructed using either of the two methods discussed in Section 4.3, ensure that neither polling nor interrupts are needed at the receiving processors, so that communication can be performed efficiently.

4.5

Performance

To evaluate the performance of CHAOS + +, we present results for three applications. The applications are taken from three distinct classes: computational aerodynamics (scientific computation), geographic information systems (spatial database processing), and image processing. These experiments were conducted on an Intel iPSC/860 and an IBM SP1. The first application is a direct simulation Monte Car10 (DSMC) method. DSMC is a technique for computer modeling of a real gas by a large number of simulated particles. It includes movement and collision handling of simulated particles on a spatial flow field domain overlaid by a threedimensional Cartesian mesh 1571. On a sample problem, the CHAOS++ version of the code was at most 15% slower than the comparable Fortran code that uses CHAOS for parallelization. The Fortran/CHAOS code has been shown to be a good implementation, so this result shows that CHAOS + + can provide performance competitive with other optimized libraries on scientific codes. The second application, Vegetation Index Measurement (VIM), computes a measure of the vegetation on the ground from a set of satellite sensor images. It has been developed as part of the on-going Grand Challenge project in Land Cover Dynamics at the University of Maryland [Sl]. In the VIM application, a user specifies a query region of interest on the ground and a set of satellite images to process. The query region is overlaid with a two-dimensional mesh, whose resolution is likely to be coarser or finer than that of the given images. For each mesh cell, the algorithm selects from the given images the set of data points that spatially intersect with the mesh cell, using a C + + class library that supports spatial operators [60], and computes a vegetation index. CHAOS + + has been linked to this spatial operator class library to implement a parallel version of VIM. Performance results on several queries showed that good speed-up was obtained relative to the sequential code, up to 32 processors, on both the iPSC/860 and SP1. The third application we provide results for is image segmentation. This application segments a given image into a hierarchy of components, based on the border contrast between adjacent components, and serves as a preprocessing phase for an appearance-based object recognition system developed at the University of Maryland. The hierarchy this preprocessing

134

JOEL SALTZ ETAL.

generates is used by a high-level vision phase to heuristically combine components from various levels of the hierarchy into possible instances of objects. This application provides an interesting test for CHAOS + + , because this application uses both arrays and pointer-based graph data structures. Again, the parallelized version using CHAOS + + obtains good speed-ups on both the iPSC/860 and SP1. In this case, because the parallel implementation was based on sequential code that had not been highly optimized, the speedups were relative to the (optimized) parallel code running on one processor. Load balancing was the main bottleneck in speeding up the computation, because the data partitioning is not adaptive. Currently the input image is initially partitioned, and that partition decides how the graph components are distributed among processors. This results in a somewhat unbalanced workload among the processors, and also contributes to sub-linear speed-ups as the number of processors gets large.

4.6

CHAOS+ +: Summary

CHAOS + + is a portable object-oriented runtime library that supports SPMD execution of adaptive irregular applications that contain dynamic distributed data structures. In particular, CHAOS + + supports distributed pointer-based data structures, in addition to distributed arrays, consisting of arbitrarily complex data types. CHAOS+ + translates global object references into local references, generates communication schedules, and carries out efficient data exchange. The library assumes no special compiler support, and does not rely on any architecture-dependent parallel system features, other than an underlying message-passing system. Integration with the CHAOS runtime library, for array-based adaptive irregular applications has already been accomplished, and integration with the Multiblock Parti runtime library, for array-based multiple structured grid applications, is currently in progress. One of the major difficulties in using the current version of the library is the complexity of the user interface. A user is asked to derive classes from the M o b j e c t base class, and provide implementations for the pack and unpack functions to support deep copies. Some of this could be automated by a compiler, perhaps with the help of annotations provided by the user. On the other hand, building a distributed graph requires some understanding of the way the runtime library works, and extra work from the user (for example, laying out the Gobjects on the boundaries of the subgraph owned by a processor in a consistent order, as described in Section 4.3.2). At this point in time, we have yet to find a more general interface for building distributed graphs. Furthermore, CHAOS + + relies heavily on C + + virtual function invocations, which are usually somewhat more expensive than

PROGRAMMING IRREGULAR APPLICATIONS

135

normal function calls. Compiler analysis and optimization that reduces the cost of virtual function invocations could significantly improve the performance of the CHAOS + + runtime library. CHAOS + + is targeted as a prototype library that will be used to provide part of the runtime support needed for High Performance Fortran and High Performance C/C+ + compilers. We are also in the process of integrating CHAOS++ into the runtime software being developed by the Parallel Compiler Runtime Consortium. The goal of this consortium is to provide common runtime support for compilers of data parallel languages, through specification of interfaces for data structures and for routines for deriving and optimizing data movement among processors. Runtime support, such as that provided by CHAOS++, could then be used by any compiler that understands these interfaces, allowing the use of multiple runtime support packages (e.g. for coping with different array distributions) by a single compiler.

5.

lnteroperability Issues: Meta-Chaos

We have developed a prototype “meta-library” called Meta-Chaos that makes it possible to integrate multiple data parallel programs (written using different parallel programming paradigms) within a single application. MetaChaos also supports the integration of multiple data parallel libraries within a single program [21]. Potential applications for this work include developing applications coupling multiple scientific simulations, perhaps running at different sites, and integrating results from multiple sensor databases or from multiple medical databases. The ability to compose multiple separately developed parallel applications is likely to be of increasing importance in many application areas, such as multidisciplinary complex physical simulations and remote sensing image database applications. In a collaborative project with Dennis Gannon’s group at Indiana, Meta-Chaos has been used to exchange data between data parallel programs written using High Performance Fortran [36], the CHAOS and Multiblock Parti [ l ] libraries, and the runtime library for pC+ +, a data parallel version of C + + from Indiana University [67]. Experimental results on an IBM SP2 show that Meta-Chaos is able to move data between libraries at an efficiency that is comparable to that achieved by the CHAOS library, with preprocessing overheads that range from one to four times that of the CHAOS library. We have used Meta-Chaos to implement a scheme that establishes mappings between data structures in different data-Farallel programs and implements a user-specified consistency model. Mappings are established at

136

JOEL SALTZ ETAL.

runtime and can be added and deleted while the programs being coupled are executing. Mappings, or the identity of the processors involved, do not have to be known at compile-time or even link-time. Programs can be made to interact with different granularities of interaction without requiring any recoding. A priori knowledge of consistency requirements allows buffering of data as well as concurrent execution of the coupled applications. Efficient data movement is achieved by pre-computing an optimized schedule. We have developed a prototype implementation and evaluated its performance using a set of synthetic benchmarks [56].

5.1 Meta-Chaos Mechanism Overview There are at least three potential solutions to provide a mechanism for allowing data parallel libraries to interoperate. We assume that interactions between libraries will be relatively infrequent and restricted to simple coarse-grained operations, such as copying a large section of an array distributed by one library to a section of an array distributed by another library. Any solution should encourage the use of multiple specialized and optimized libraries in the computation portions of an application, to provide the best possible performance. The first approach is to identify the unique features provided by all existing data parallel libraries and implement those features in a single integrated runtime support library. Such an approach requires extensive redesign and implementation effort, but should allow for a clean and efficient integrated system. However, existing runtime libraries cover only a subset of potential application domains, and it would be difficult to reach a consensus on an exhaustive set of features to provide in an all-inclusive library. Another major problem with such an approach is extensibility. It seems clear that such a library would be rather difficult to extend to accommodate new features or support new application domains, since it would be quite complex and contain many functions. A second approach is to use a custom interface between each pair of data parallel libraries that must communicate. This approach would allow a data copy between two libraries to be expressed by a call to a specific function. However, if there are a large number of libraries that must interoperate, say n, this method requires someone to write n2 communication functions. Therefore this approach also has the disadvantage of being difficult to extend. The third approach is to define a set of interface functions that every data parallel library must export, and build a so-called meta-library that uses those functions to allow all the libraries to interoperate. This approach is often called a framework-based solution, and is the one we have chosen to

PROGRAMMING IRREGULAR APPLICATIONS

137

implement in the Meta-Chaos runtime library. This approach gives the task of providing the required interface functions to the data parallel library developer (or a third party that wants to be able to exchange data with the library). The interface functions provide information that allows the metalibrary to inquire about the location (processor and local address) of data distributed by a given data parallel library. Suppose we have programs written using two different data parallel libraries, named libX and libY, and that data structure A is distributed by libX and data structure B is distributed by IibY. Then the scenario presented in Fig. 15 consists of copying multiple elements of A into the same number of elements of B, with both A and B belonging to the same data parallel program. On the other hand, the scenario presented in Fig. 16 copies elements of A into elements of B , but A and B belong to different programs. In either scenario, Meta-Chaos is the glue that binds the two libraries, and performs the copy. The two examples show the main steps needed to copy data distributed using one library to data distributed using another library. More concretely, these steps are: (1) Specify the elements to be copied (sent) from the first data structure, distributed by libX. (2) Specify the elements which will copied (received) into the second data structure, distributed by libY. (3) Specify the correspondence (mapping) between the elements to be sent and the elements to be received. Program P 1

A d i s t r i b u t e d u s i n g LibX

..c a l l LibX.Fct ( A )

... ... Al=some elements o f A Bl=sorne elements of B MC-Copy ( B 1 ,A 1 1

... ... cal LibY.Fct(B)

...

FIG. 15. Meta-Chaos for communicating between two libraries within the same program.

138

J O E L SALTZ ETAL.

Program P 1

P r o g r a m P2

A d i s t r i b u t e d u s i n g LibX

B d i s t r i b u t e d u s i n g LibY

...

... c a l l LibX.Fct ( A )

... ...

Bl=some e l e m e n t s o f B Meta-Chaos

Al=some e l e m e n t s o f A MC-Send ( A l )

I

cal LibY.Fct ( B )

...

... end

FIG. 16. Meta-Chaos for communicatingbetween libraries in two different programs.

(4) Build a communication schedule, by computing the locations (processors and local addresses) of the elements in the two distributed data structures. ( 5 ) Perform the communication using the schedule produced in step 4.

The goal of Meta-Chaos is to allow easy data parallel library interoperability. Meta-Chaos provides functions that support each of the five steps just described. In the following sections, we describe the mechanisms used by Meta-Chaos to specify the data elements involved in the communication (steps 1 and 2), the virtual linearization (step 3), and the schedule computation (step 4). Step 5 uses the schedule computed by step 4 to perform the data copy, and uses system-specific transport routines (e.g. send and receive on a distributed memory parallel machine).

5.2 Data Specification We define a Region as a compact way to describe a group of elements in global terms for a given library. A Region is an instantiation of a Region type, which must be defined by each data parallel library. For example, High Performance Fortran (HPF) [36] and Multiblock Parti utilize arrays as their main distributed data structure, therefore the Region type for them is a regularly distributed array section. CHAOS employs irregularly accessed arrays as its main distributed data structure, through either irregular data distributions or accesses through indirection arrays. For CHAOS a Region type would be a set of global array indices. A Region type is dependent on the requirements of the data parallel library. The library builder must provide a Region constructor to create regions and a destructor to destroy the Regions specified for that library. The library builder also implicitly defines a linearization of a Region. Depending

PROGRAMMING IRREGULAR APPLICATIONS

139

on the needs of the data parallel library, Regions are allowed to consist of collections of arbitrarily complex objects. However, throughout this paper, we will concentrate on Regions consisting of arrays of objects of basic, language-defined types (e.g. integer, real, etc.). As for CHAOS + i, functions for allowing transport of complex objects between processors (to move them between local memory and communication buffers) would have to be provided by the library builder. Regions are gathered into an ordered group called a SetOfRegions. A mapping between source and destination data structures therefore specifies a SetOfRegions for both the source and the destination. Figure 17 shows a data move from distributed array A into distributed array B, with the SetOfRegions defined as shown. For this example, a Region for the array is a regular section, and the order within a section is row major. The SetOfRegions for A and B define the 1- 1 mapping between elements €or the data move.

5.3 Communication Schedule Computation The communication schedule describes the data motion to be performed. Meta-Chaos uses the SetOfRegions specified by the user to determine the elements to be moved, and where to move them. Meta-Chaos applies the (data parallel library-specific) linearization mechanism to the source SetOfRegions and to the destination SetOfRegions. The linearization mechanism generates a one-to-one mapping between each element of the source SetOfRegions and the destination SetOfRegions. The implementation of the schedule computation algorithm requires that a set of procedures be provided by both the source and destination data parallel libraries. These procedures are essentially a standard set of inquiry functions that allow Meta-Chaos to perform operations such as: 0

0 0

dereferencing an object in a SetOfRegions to determine the owning processor and local address, and a position in the linearization; manipulating the Regions defined by the library to build a linearization; packing the objects of a source Region into a communication buffer, and unpaclung objects from a communication buffer into a destination Region.

A major concern in designing Meta-Chaos was to require that relatively few procedures be provided by the data parallel library implementor, to ease the burden of integrating a new library into the Meta-Chaos framework. So far, implementations for several data parallel libraries have been completed, including the High Performance Fortran runtime library, the Maryland

140

JOEL SALT2 ETAL.

B' = MC-Copy (. . , A , S,, B , SB,..)

FIG. 17. Before and aftex a data move from distributedarray A to B.

PROGRAMMING IRREGULAR APPLICATIONS

141

CHAOS and Multiblock Parti libraries for various types of irregular computations, and the p C + + [ l l ] runtime library, Tulip, from Indiana University. The pC+ + implementation of the required functions was performed by the pC+ + group at Indiana in a few days, using MPI as the underlying message-passing layer, which shows that providing the required interface is not too onerous. Meta-Chaos uses the information in the communication schedule in each processor of the source data parallel library to move data into contiguous communication buffers. Similarly, Meta-Chaos uses the information in the schedule to extract data from communication buffers into the memory of each processor of the destination data parallel library. The communication buffers are transferred between the source and destination processors using either the native message passing mechanism of the parallel machine (e.g. MPL on the IBM SP2), or using a standard message-passing library on a network of workstations (e.g. PVM [24] or MPI [61]). Messages are aggregated, so that at most one message is sent between each source and each destination processor. A set of messages crafted by hand to move data between the source and the destination data parallel libraries would require exactly the same number of messages as the set created by Meta-Chaos. Moreover, the sizes of the messages generated by Meta-Chaos are also the same as the handoptimized code. The only difference between the two set of messages would be in the ordering of the individual objects in the buffers. This ordering depends on the order of the bijection between the source objects and the destination objects used by Meta-Chaos (the linearization), and the order chosen by the hand-crafted procedure. If they choose the same ordering of the objects, the messages generated by Meta-Chaos and the ones generated by the hand-optimized procedure would be identical. The overhead introduced by using Meta-Chaos instead of generating the message prissing by hand is therefore only the computation of the communication schedule. Since the schedule can often be computed once and reused for multiple data transfers (e.g. for an iterative computation), the cost of creating the schedule can be amortized.

5.4

Meta-Chaos Applications Programmer Interface (API)

An applications programmer can use Meta-Chaos to copy objects from a source distributed data structure managed by one data parallel library to a destination-distributed data structure managed by another data parallel library. The distributions of the two data structures across the processors of the parallel machine or network of workstations are maintained by the two data parallel libraries.

142

JOEL SALT2 ETAL.

There are four steps that an applications programmer must perform to completely specify a data transfer using Meta-Chaos:

(1) specify the objects to copy from the source distributed data structure; (2) specify the objects in the destination distributed data structure that will receive the objects sent from the source: (3) compute the communication schedule to move data from the source to the destination distributed data structure; (4) use the communication schedule to move data from the source to the destination distributed data structure. The first two steps require the user to define the objects to be sent from the source-distributeddata structure and the objects to be received into at the destination. This is done using Regions, as was described in Section 5.2. A Meta-Chaos routine is then used to gather multiple Regions into a SetOf. Regions. The applications programmer must create two SetOfRegions, one for the source and one for the destination-distributeddata structure. The third step is to compute the communication schedule, both to send data from the source data structure and receive data into the destination. Meta-Chaos provides the routine to compute the schedule for the user, given the source and destination SetOfRegions. The sender SetOfRegions is mapped to the receiver SetOfRegions using the linearization. The final step is to use the communication schedule to perform the data copy operation. Meta-Chaos provides functions for efficiently moving data using the schedule.

5.5

Performance

Experimental results using Meta-Chaos show that the framework-based approach can be implemented efficiently, with Meta-Chaos exhibiting low overheads, even compared to the communication mechanisms used in specialized and optimized data parallel libraries [21]. We have performed two classes of experiments to evaluate the feasibility of using Meta-Chaos for efficient interaction between multiple data parallel libraries. The first class of experiments looked at a set of application scenarios that quantify the overheads associated with using Meta-Chaos. The experiments compared the performance of Meta-Chaos with that of highly optimized and specialized data parallel libraries, in particular the Maryland CHAOS and Multiblock Parti libraries. Results from these experiments showed that the overhead incurred by Meta-Chaos was very small relative to the cost of using a single optimized library. The second class of experiments showed the benefits that Meta-Chaos can provide by allowing a sequential or parallel client program to exploit the services of a parallel server program

PROGRAMMING IRREGULAR APPLICATIONS

143

implemented in a data parallel language, in this case High Performance Fortran. The experimental results showed that both sequential and parallel client programs could benefit greatly from using a parallel server to offload expensive computations, using Meta-Chaos to do the communication between the programs. While it was necessary to perform significant amounts of computation in the parallel server to amortize the costs of both building the Meta-Chaos schedules and performing the communication, the experiments showed that the performance gains in such cases can be very high.

6. 6.1

Related Work

Runtime Support for Irregular Problems

There have been a significant number of efforts to develop runtime support for various classes of unstructured, irregular and adaptive computations. In this paper, we have described some of the tools we have developed to target irregular and adaptive problems. A variety of different approaches have been taken in the development of tools that target such problems. Katherine Yelick at the University of California, Berkeley, has developed the Multipol [14,69] library of distributed data structures for programming irregular problems on large-scale distributed memory multiprocessors. Multipol includes parallel versions of data structures such as trees, sets, lists, graphs, and queues. The data structures address the trade-off between locality and load balance through a combination of replication, partitioning, and dynamic caching. To tolerate remote communication latencies, some of the operations are split into a separate initiation and completion phase, allowing for computation and communication overlap at the library interface level. A group at the University of Texas at Austin, under the leadership of Jim Browne, has developed a comprehensive data management infrastructure for implementation of parallel adaptive solution methods for sets of partial differential equations [48,49]. The data management infrastructure is founded on the observation that several different adaptive methods for solution of partial differential equations have a common set of requirements for dynamic data structures. The project has been structured by defining the solution process in terms of hierarchical layers of abstraction. The scalable dynamic distributed hierarchical array (SDDA) is the lowest level in the hierarchy of abstractions. The Directed Acyclic Grid Hierarchy (DAGH) and the Dynamic Distributed Finite Element Mesh (DDFEM) are programming abstractions built upon the SDDA. DAGH implements hierarchical

144

JOEL SALTZ ETAL.

adaptive grids for finite difference methods. DDFEM implements hpadaptive meshes for finite element methods. A clean separation of array semantics from the higher-level computational operations such as operations defined on grids or meshes is critical to providing a data management foundation which will support multiple adaptive methods. Stephan Taylor has developed a library called the Concurrent Graph Library to support distributed data structures that consist of graphs with nodes and directed edges. Concurrent Graph nodes correspond to partitions of an unstructured (or structured) problem, and edges correspond to communication channels. Graph nodes comprise a kind of virtual processor; the Concurrent Graph library associates nodes with processors, and is able to dynamically remap nodes during the course of a computation. The library maintains information on amounts of computation, communication, and idle time for each node, and uses this information to make load balancing decisions. Concurrent Graph library users write application-specificroutines to carry out work associated with each node in the usual SPMD manner.

6.2

Runtime Support for Irregularly Coupled Regular Mesh Applications

There has also been work directed at tackling problems that are composed of multiple irregularly coupled regular meshes. In some cases, all meshes are defined when work begins on a problem. In other cases, meshes are adaptive generated during problem execution. Multiblock Parti was developed by several of the authors to efficiently handle problems which have a relatively small number of irregularly coupled or nested meshes [ 5 , 6 ] . Multiblock Parti allows the user to assign each mesh to a subset of the machine’s processors. Multiblock Parti has been used by application programmers to port applications by hand; it has also been used as compiler runtime support in prototype compilers. Scott B. Baden, at University of California, San Diego, has developed the KeLP [22] and LPARX [40] libraries for implementing adaptive mesh finite difference methods and particle methods. LPARX is a C+ + class library for implementing portable scientific applications on distributed memory parallel computer architectures and workstation clusters. LPARX is intended for multi-level adaptive finite difference methods and particle methods. LPARX isolates parallel coordination activities from numerical computation, and existing serial numerical kernels can often be utilized in LPARX applications with little change. KeLP, the more recent effort, is a C + + class library for managing irregular multi-resolution data structures, arising in adaptive mesh refinement (AMR) applications, on distributed memory architectures. KeLP is the successor to LPARX. KeLP makes use of a

PROGRAMMING IRREGULAR APPLICATIONS

145

communication model based on the inspector executor model. One of KeLP’s strengths is the ability to support application-specific optimizations that enable the programmer to improve memory and thread locality. A+ + and P+ + (University of Colorado and Los Alamos National Laboratory) [50] are object-oriented array class libraries for structured grid applications. A+ + is a serial library for use on serial machines or single processors of parallel distributed architectures. P+ + is a corresponding parallel array class library for structured gnd applications on distributed memory multiprocessor architectures. AMR+ + [7] is built on top of P+ + and is designed to simplify the development of serial and parallel adaptive mesh refinement applications. AMR+ + is specifically targeted to structured adaptive problems which may make use of overlapping or moving grids.

6.3 Compiler Methods for Irregular Problems 6.3.1 Compiler Projects that Target Irregular and Adaptive Problems Over the past decade, there have been a substantial number of projects dedicated to developing compiler support of various kinds for irregular and adaptive problems. Koelbel et al. [37,38,39] designed the Kali compiler, the first to support both regular and irregular data distribution. The important parallel constructs in a program written for Kali are the data distribution statement, the virtual processor array declaration and the forall statement. The virtual processor array allows for the parameterization of the program, thus making it portable to various numbers of physical processors. All statements inside a forall loop can be executed in parallel. The iteration partition is accomplished by the special on clause. For irregular computation, an inspector/executor [45] strategy is used. The Syracuse Fortran 90D compiler [12] has been extended, by a group that includes some of this chapter’s authors, to support irregular templates [53]. This work used a runtime method for reusing the results of communication preprocessing. A compiler analysis phase records the list of parameters of communication preprocessing functions whose results can be potentially reused. A time-stamp is introduced for each of these parameters. Then, for each statement in the entire program which can modify any of these parameters, additional statements are inserted to update the corresponding timestamp. At runtime, before execution of any communication preprocessing statement, the time-stamps of the parameters are checked to determine if any of the parameters have been modified since the last execution of the processing statement. If not, the result of the previous execution is reused.

146

JOEL SALT2 E TAL.

This scheme proved to be surprisingly effective and in many cases the measured overhead in a set of small parallel codes proved to be minimal. Where the infrastructure for intraprocedural analysis exists, there are many advantages to using the compiler-based approach. First, if the parameters are renamed across procedure boundaries, interprocedural analysis is required for tracking the statements which can modify the parameters. Second, if procedures are called at different call sites with different sets of parameters, interprocedural analysis is required for inserting the variables which maintain these time-stamps and passing them across procedure boundaries. In addition, we do not see a way of extending tirne-stamping to perform a variety of additional optimizations that have proved to be possible when we use intraprocedural partial redundancy elimination methods, such as deleting data structures which are no longer required, the placement of scatter operations and the use of incremental and coalesced routines. Joint work carried out at Rice, Syracuse and Maryland led to the development of language constructs and compilation methods that allow programmers to direct a compiler to produce code that, at runtime, carries out user-specified data partitioning strategies [31,54]. This work involved the development of two compiler prototypes; one prototype was built using Rice’s Parascope-based Fortran D compiler and the other used the Syracuse Fortran 90D compiler infrastructure. Gerasoulis and his group at Rutgers University are pursuing the problem of parallelizing codes associated with irregular problems such as sparse direct linear solvers or sparse triangular solvers in which parallelization is mhibited by true dependences that are determined only at runtime. They have built a system called PYRROS which performs scheduling and code generation for task graphs [25,26,68]. The group is in the process of developing a compiler that is able to generate tasks that can be scheduled by PYRROS. The PARADIGM [42,43] compiler is a source-to-source parallelizing compiler based upon Parafrase-2 [52], a parallelizing compiler for shared memory multiprocessors developed at the University of Illinois in the Center for Supercomputing Research and Development. PARADIGM currently accepts either a sequential Fortran 77 or High Performance Fortran (HPF) program and produces an optimized message-passing parallel program (in Fortran 77 with calls to the selected communication library and the PARADIGM runtime system). In [a], Ujaldon et al. present new methods for the representation and distribution of sparse matrices on distributed-memory parallel machines. They are based on the specific requirements for sparse computations as they arise in many problem areas. They then introduce special syntax and semantics to specify these new elements within a data-parallel language.

PROGRAMMING IRREGULAR APPLICATIONS

147

This provides the compiler as well as the runtime system with important additional information which can be used to achieve a number of important optimizations such as memory savings, faster global to local translation, significant reduction in preprocessing time and high access locality. [62] and [63] describe such optimizations in detail and compare the overall scheme with other existing approaches, such as the CHAOS runtime library. Gerndt et 01. developed the semi-automatic parallelkahon tool SUPERB [13,27,28] for paratlelization of programs for distributed memory machmes. The SUPERB tool has an interactive environment, and it transforms annotated Fortran programs into parallel codes. Initially, array element level communication statements are generated, after which aggressive message vectorization is performed using data dependency information. The compiler automatically generates array overlaps which are used to store off-processor data. Rectangular data distribution can be specfied by the user to lay out the data. For parameter passing between procedures interproceduraldata-flow analysis is used.

6.3.2 PartiaI Redundancy Elimination and lrregu lar Program Optirnization Hanxleden [29] has developed Give-N-Take, a new communication placement framework. This framework extends partial redundancy elimination in several ways, including a notion of early and lazy problems, which is used for performing earliest possible placement of send calls and latest possible placement of receive calls. Allowing such asynchronous communication can reduce communication latencies. The interprocedural partial redundancy elimination work presented in Section 3.2 builds on Give-N-Take by considering interprocedural optimizations and presenting several new optimizations. Recently, Chakrabarti et al. have presented intraprocedural techniques for reducing communication overheads within a program [ 151. This encompasses several optimizations, like redundant communication elimination and combining messages. Another approach for parallelizing sparse codes is that followed by Bik and Wijshoff [lo], who have implemented a restructuring compiler which automatically transforms programs operating on dense two-dimensional matrices into codes that operate on sparse storage schemes. During this transformation, characteristics of both the target machine as well as the nonzero structure of the arrays is accounted for, so that one original dense matrix program can be mapped to different implementations tailored for particular instances of the same problem. This method simplifies the task of the programmer at the risk of inefficiencies that can result from not allowing the user to choose the most appropriate sparse structures.

148

JOEL SALTZ ETAL.

7.

Summary

The CHAOS procedures described in Section 2 provide a portable, compiler independent, runtime support library. The CHAOS runtime support library contains procedures that

(1) support static and dynamic distributed array partitioning; (2) partition loop iterations and indirection arrays; (3) remap arrays from one distribution to another; (4) carry out index translation, buffer allocation and communication schedule generation. While the developers of the CHAOS library have developed parallelized versions of standard data partitioners, our innovations are not in partitioning methods as such. We will not attempt here to survey the extensive literature on partitioners. For examples of work in this area, see [ 8 , 9 , 2 0 , 5 5 ] . In Section 3, we described a scheme that can be used by a compiler to transform codes with complex subscript functions to yield programs where data arrays are indexed only by a single indirection array. We have shown that the techniques can be utilized to generate parallel code for irregular problems. In Section 3, we also described interprocedural optimizations for compiling irregular applications on distributed memory machines. In such applications, runtime preprocessing is used to determine the communication required between the processors. We have developed and used interprocedural partial redundancy elimination for optimizing placement of communication preprocessing and communication statements. We have further presented several other optimizations which are useful in the compilation of irregular applications. These optimizations include placement of scatter operations, deletion of runtime data structures and placement of incremental schedules and coalesced schedules. We presented, in Section 4, a portable object-oriented runtime library that supports SPMD execution of adaptive irregular applications that contain dynamic distributed data structures. In particular, CHAOS + + supports distributed pointer-based data structures, in addition to distributed mays, consisting of arbitrarily complex data types. CHAOS + + translates global object references into local references, generates communication schedules, and carries out efficient data exchange. The library assumes no special compiler support, and does not rely on any architecture-dependent parallel system features, other than an underlying message passing system. Integration with the CHAOS runtime library, for array-based adaptive irregular applications and integration with the Multiblock Parti runtime library, for array-based structured grid applications, are both currently in progress.

PROGRAMMING IRREGULAR APPLICATIONS

149

Finally, in Section 5 , we describe an effort to extend our CHAOS and CHAOS + + library efforts to develop a prototype “meta-library” that makes it possible to integrate multiple data parallel programs (written using different parallel programming paradigms) within a single application. MetaChaos also supports the integration of multiple data parallel libraries within a single program [21]. Potential applications for this work include developing applications coupling multiple scientific simulations, perhaps running at different sites, and integrating results from multiple databases. REFERENCES 1. Agrawal, G., Sussman, A., and Saltz, J. (1995). An integrated runtime and compile-time

approach for parallelizing structured and block structured applications. IEEE Transactions on Parallel and Distributed Systems, 6(7), 747-754. 2. Agrawal, G., and Saltz, J. (1994). Interprocedural communication optimizations for distributed memory compilation. Proceedings of the 7th Workshop on Languages and Compilers for Parallel Computing, pp. 283-299, August. Also available as University of Maryland Technical Report CS-TR3264. 3. Agrawal, G., and Saltz, J. (1995). Interprocedural compilation of irregular applications for distributed memory machines. Proceedings Supercomputing ‘95. IEEE Computer Society Press, December. Also available as University of Maryland Technical Report CSTR -3447. 4. Agrawal, G., Saltz, J., and Das, R. (1995). Interprocedural partial redundancy elimination and its application to distributed memory compilation. Proceedings of the SICPLAN ‘95 Conference on Programming Language Design and Implementation, pp. 258 -269. ACM Press, June. ACM SICPLAN Notices, 30(6). Also available as University of Maryland Technical Report CS-TR-3446 and UMIACS-TR-95-42. 5. Agrawal, G., Sussman, A., and Saltz, J. (1993). Compiler and runtime support for structured and block structured applications. Proceedings Supercomputing ’93 pp. 578-587. IEEE Computer Society Press, November. 6. Agrawal, G., Sussman, A., and Saltz, J. (1994). Efficient runtime support for parallelizing block structured applications. Proceedings of the Scalable High Performance Computing Conference (SHPCC-94), pp. 158- 167. IEEE Computer Society Press, May. 7. Balsara, D., Lemke, M., and Quinkan, 13. (1992). AMR++, a C+ + object oriented class library for parallel adaptive mesh refinement fluid dynamics applications. 8. Bamard, S. T., Pothen, A., and Simon, H. D. (1993). A spectral algorithm for envelope reduction of sparse matrices. Proceedings of Supercomputing ‘93,pp. 493-502. 9. Berger, M. J., and Bokhari, S . H. (1987). A partitioning strategy for nonuniform problems on multiprocessors. IEEE Transactions an Computers, C-36(5),570-580. 10. Bik, A. J. C., and Wijshoff, H. A. J. (1996). Automatic data structure selection and transformation for sparse matrix computations. lEEE Transactions on Parallel and Distributed Systems, 2(7), 109-126. 11. Bodin, F., Beckman, P., Gannon, D., Narayana, S., and Yang, S . X. (1993) Distributed pC+ +: Basic ideas for an object parallel language. Scientifrc Programming, Z(3). 12. Bozkus, Z., Choudhary, A., Fox, G., Haupt, T., Ranka, S . , and Wu, M.-Y. (1994). Compiling Fortran 90D/HPF for distributed memory MIMD computers. Journal of Parallel andDistributed Computing, 21(1), 15-26.

150

JOEL SALT2 H A L .

13. Brezany, P., Gemdt, M., Sipkova, V., and Zima H. P. (1992). SUPERB support for irregular scientific computations. Proceedings of the Scalable High Performance Computing Conference (SHPCC-92),pp. 314-321. IEEE Computer Society Press, April. 14. Chakrabarti, S., Deprit, E., Jones, J., Knshnamurthy, A., Im,E. J., Wen, C. P., and Yelick, K. (1995). Multipol: A distributed data structure library. Technical Report 95-879, UCB/CSD, July. 15. Chakrabarti, S., Gupta, M., and Choi, J.-D. (1996). Global communication analysis and optimization. Proceedings of the SIGPLAN '96 Conference on Programming Language Design and Implementation, pp. 68-78. ACM Press, May. ACM SIGPLAN Notices, 31(5). 16. Chang, C., Sussman, A., and Saltz, J. (1996). CHAOS++. In Parallel Programming Using C+ + (G. V. Wilson and P. Lu, Eds), Scientific and Engineering Computation Series, Chapter 4, pp. 131-174. MIT Press, Cambridge, MA. 17. Das, R., Mavriplis, D. J., Saltz, J., Gupta, S. and Ponnusamy, R. (1994). The design and implementation of a parallel unstructured Euler solver using software primitives. AfAA Journal, 32(3), 489-496. 18. Das, R., Havlak, P., Saltz, J., and Kennedy, K. (1995). Index array flattening through program transformation. Proceedings Supercomputing '95, IEEE Computer Society Press, December. 19. Das, E., Uysal, M., Saltz, J., and Hwang, Y.-S. (1994). Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3), 462-479. 20. Diniz, P., Plimpton, S., Hendnckson, B., and Leland, R. (1995). Parallel algorithms for dynamically partitioning unstructured grids. Proceedings of the 7rh SlAM Conference on Parallel Processing for Scientifrc Computing, pp. 615 -620. 21. Edjlali, G., Sussman, A. and Saltz, J. (1996). Interoperability of Data Parallel Runtime Libraries with Meta-Chaos. Technical Report CS-TR-3633 and UMIACS-TR-96-30, University of Maryland, Department of Computer Science and UMIACS, May 1996. 22. Fink, S. J., Baden, S. B., and Kohn, S. R. (1996). Flexible communication mechanisms for dynamic structured applications. Proceedings of Irregular '96. IEEE Computer Society Press, August. 23. Foster, I. (1996). Compositional parallel programming languages. ACM Transactions on Programming Languages and Systems, 18(4), 454-476. 24. Geist, A., Beguelin, A., Dongarra, Jiang, W., Manchek, W., and Sunderam, V. (1993). PVM 3 user's guide and reference manual. Technical Report ORNL/TM-12187, Oak Ridge National Laboratory, May. 25. Gerasoulis, A., Jiao, J., and Yang, T. (1995). Experience with graph scheduling for mapping irregular scientific computation. Proceedings of the IPPS '95 First Workshop on Solving Irregular Problems on Distributed Memory Machines, April. 26. Gerasoulis, A., Jiao, J., and Yang, T. (1995). Scheduling of Structured and Unstructured Compurarion. American Mathematics Society. 27. Gerndt, M. (1992). Program analysis and transformation for message-passing programs. Proceedings of the Scalable High Performance Computing Conference (SHPCC-921, pp. 60-67. IEEE Computer Society Press, April. 28. Gerndt, M. (1990). Updating distributed variables in local computations. Concurrency: Practice and Experience, 2(3), 171-193. 29. Hanxleden, R. von, and Kennedy, K. (1994). Give-n-take-a balanced code placement framework. Proceedings of the SIGPLAN '94 Conference on Programming Language Design and Implementarion, pp. 107-120. ACM Press, June. ACM SICPLAN Notices, 29(6).

PROGRAMMING IRREGULAR APPLICATIONS

151

30. Hanxleden, R. von, Kennedy, K., Koelbel, C., Das, R., and Saltz, J. (1992). Compiler analysis for irregular problems in Fortran D. Technical Report 92-22, ICASE, NASA Langley Research Center, June. 31. Hanxleden, R. von, Kennedy, K.. and Saltz, J. (1994). Value-based distributions and alignments in Fortran D. Journal of Programming Languages, 2(3), 259-282. 32. Hiranandani, S., Kennedy, K., and Tseng, C.-W. (1992). Compiler support for machineindependent parallel programming in Fortran D. In Languages. Compilers and Runtlme Environments for Distributed Memory Machines, (J. Saltz and P. Mehrotra, Eds), pp. 139- 176. Elsevier, Amsterdam. 33. Hwang, Y.-S., Das, R., Saltz, J. H., Hodoscek, M., and Brooks, B. R. (1995). Parallelizing molecular dynamics programs for distributed memory machines. IEEE Computational Science and Engineering, 2(2): 18-29. Also available as University of Maryland Technical Report CS-TR-3374 and UMIACS-TR-94- 125. 34. Hwang, Y.-S., Moon, B., Sharma, S. D., Ponnusamy, R., Das, R., and Saltz, J. H. (1995). Runtime and language support for compiling adaptive irregular programs. Sofhvare- Practice and Experience, 25(6), 597-621. 35. Kesselman, C. (1996). C C + + . In Parallel Proqramminq Using C++, Scientifrc nnd Engineering Computation, (G. V. Wilson and P. Lu. Eds), Chapter 3, pp. 91-130 MIT Press, Cambridge, MA. 36. Koelbel, C., Loveman, D., Schreiber, R., Steele, G. Jr., and Zosel, M. (1994). The High Performance Fortran Handbook. MIT Press, Cambridge, MA. 37. Koelbel, C., and Mehrotra, P. (1991). Compiling global name-space parallel loops for distributed execution. IEEE Transactions on Parallel and Distributed Systems, 2(4), 440-45 1. 38. Koelbel, C., Mehrotra, P., and Rosendale, J. van (1990). Supporting shared data structures on distributed memory architectures. Proceedings of the Second ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP),pp 177-186. ACM Press, March. 39. Koelbel, C . (1991). Compile-time generation of regular communication patterns. In Proceedings Supercomputing ‘91. IEEE Computer Society Press, November. 40. Kohn, S. R., and Baden, S. B. (1995). The parallelization of an adaptive multigrid eigenvalue solver with LPARX. Proceedings of the Seventh SIAM Conference on Parullel Processing for Scientific Computing, pp. 552-557. SIAM, February. 41. Krishnamurthy, A,, Culler, D. E., Dusseau, A., Goldstein, S. C., Lumetta, S., Eicken, T. von, and Yelick, K. (1993). Parallel programming in Split-C. Proceedings Supercomputing ‘93,pp. 262-273. IEEE Computer Society Press, November. 42. Lain, A., and Banejee, P. (1995). Exploiting spatial regularity in irregular iterative applications. In Proceedings of the Ninth International Parallel Processing Symposium, pp. 820-826. IEEE Computer Society Press, April. 43. Lain, A., and Banerjee, P. (1996). Compiler support for hybrid irregular accesses on multicomputers. Proceedings of the 1996 International Conference on Supercomputing , pp. 1- 10. ACM Press, May. 44. Mavriplis, D. J., Das, R., Saltz, J., and Vermeland. R. E. (1995). Implementation of a parallel unstructured Euler solver on shared- and dishibuted-memory architectures. Journal of Supercomputing, 8(4). 45. Mirchandaney, R., Saltz, J. H., Smith, R . M., Crowley, K., and Nicol, D. M. (1988). Principles of runtime support for parallel processors. Proceedings of the 1988 International Conference on Supercomputing, pp. 140- 152. ACM Press, July. 46. Moon, B., Patnaik, G., Bennett, R., Fyftt, D., Sussman, A., Douglas, C., Saltz, J., and Kailasanath, K. (1995). Runtime support and dynamic load balancing strategies for

152

JOEL S A L E ETAL.

structured adaptive applications. Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, pp. 575-580. SIAM, February. 47. Nance, R., Wilmoth, R., Moon, B., Hassan, H., and Saltz, J. (1994). Parallel DSMC solution of three-dimensional flow over a finite flat plate. Proceedings of the 6th AIM/ ASME Joint Thermophysics and Heat Transfer Conference, Colorado Springs, CO, June. 48. Parashar, M., and Browne, J. C. (1995). Distributed dynamic data-sbuctures for parallel adaptive mesh-refinement. Proceedings of the International Conference for High Performance Computing, December. 49. Parashar, M., and Browne, J. C. (1996). Object-oriented programming abstractions for parallel adaptive mesh-refinement. Proceedings of Parallel Object-Oriented Methods and Applications (POOMA), February. 50. Parsons, R., and Quinlan, D. ((1994). A + + / P + + array classes for architecture independent finite difference computations. Proceedings of the Second Annual Object Oriented Numerics Conference, Sunriver, Oregon, April. 51. Parulekar, R., Davis, L., Chellappa, R., Saltz, J., Sussman, A,, and Townshend, J. (1994). High performance computing for land cover dynamics. Proceedings of the International Joint Conference on Pattern Recognition, September. 52. Polychronopoulos, C. D., Girkar, M., Haghighat, M. R., Lee, C. L., Leung, B., and Schouten, D. (1989). Parafrase-2: An environment for parallelizing, partitioning, synchronizing, and scheduling programs on multiprocessors. Proceedings of the 1989 International Conference on Paratlei Processing, pages 11-39 - 11-48. Pennsylvania State University Ress, August 53. Ponnusamy, R., Saltz, J., Choudhary, A., Hwang, Y.-S., and Fox, G. (1995). Runtime support and compilation methods for user-specified irregular data distributions. IEEE Transactions on Parallel and Distributed Systems, 6(8), 815-831, August. 54. Ponnusamy, R., Hwang, Y.-S., Das, R., Saltz, J. H., Choudhary, A., and Fox, G. (1995). Supporting irregular distributions using data-parallel languages. IEEE Parallel nnd Distributed Technology, 3(1), 12-24. 55. Pothen, A., Simon, H. D., Wang. L., and Bamard, S. T. (1992). Towards a fast implementation of spectral nested dissection. Proceedings Supercomputing '92, pp. 42-51. IEEE Computer Society Press, November. 56. Ranganathan, M., Acharya, A., Edjlali, G., Sussman, A,, and Saltz, J. (1996). Runtime coupling of data-parallel programs. Proceedings of the I996 International Conference on Supercomputing, pp. 229-236. ACM Press, May. 57. Rault, D. F. G., and Woronowicz, M. S. (1993). Spacecraft contamination investigation by direct simulation Monte CarIo-contamination on UARS/HALOE. Proceedings MAA 31th Aerospace Sciences Meeting and Exhibit, Reno, Nevada, January. 58. Rogers, A., and Pingali, K. (1991). Compiling for distributed memory architectures. IEEE Transactions on Parallel and Distributed Systems, 5(3), 281-298. 59. Saltz, J. H., Mirchandaney, R., and Crowley, K. (1991). Runtime parallelization and scheduling of loops. IEEE Transactions on Computers, 40(5), 603-612. 60. Shock, C. T., Chang, C., Davis, L., Goward, S., Saltz, J., and Sussman, A. (1996). A high performance image database system for remotely sensed imagery. Proceedings of Euro-Par '96, Vol. 2, pp. 109-122. Springer, Berlin. 61. Snir, M., Otto, S. W., Huss-Lederman, S., Waker, D. W., and Dongma, J. (1996). MPI: The Complete Reference. Scientific and Engineering Computation Series. MIT Press, Cambridge, MA. 62. Ujaldon, M., Sharma, S. D., Saltz, J., and Zapata, E. L., (1996). Runtime techniques for parallelizing sparse matrix applications. Proceedings of the 1995 Workshop on Irregular Problems, September.

PROGRAMMING IRREGULAR APPLICATIONS

153

63. Ujaldon, M., Sharma, S. D., Zapata, E. L., and Saltz, J. (1996). Experimental evaluation of efficient sparse matrix distributions. Proceedings of the I996 International Conference on Supercomputing, pp. 78-86. ACM Press, May. 64. Ujaldon, M., Zapata, E., Chapman, B. M., and Zima, H. P. (1995). New data-parallel language features for sparse matrix computations. Proceedings of the Ninth International Parallel Processing Symposium, pp. 742-749. IEEE Computer Society Press, April. 65. Weiser, M. (1984). Program slicing. IEEE Transactions on Sofmare Engineering, 10, 352-357. 66. Wu, J.. Das, R., Saltz, J., Berryman, H., and Hiranandani, S. (1995). Distributed memory compiler design for sparse problems. IEEE Transactions on Computers, 44(6), 737-753 67. Yang, S. X., Gannon, D., Beckman, P., Gotwals, J., and Sundaresan, N. (1966). pC+ +. In Parallel Programming Using C + + (G. V . Wilson and P. Lu. Eds), Scientijic and Engineering Computation, Chapter 13. pp. 507-546. MIT Press, Cambridge, MA. 68. Yang, T., and Gerasoulis, A. (1992). PYRROS: Static scheduling and code generation for message passing multiprocessors. Proceedings of the I992 International Conference on Supercomputing, pp. 428-437. ACM Press, July. 69. Yelick, K., Wen, C. P., Chakrabarti, S., Deprit, E., Jones, J., and f(rishnamurthy, A. (1995). Portable parallel irregular applications. In Proceedings of the Workshop on Parallel Symbolic Languages and Systems, October.

This Page Intentionally Left Blank

Optimization Via Evolutionary Processes SRILATA RAMAN Unified Design Systems Laboratory Motorola Inc. Austin, Texas, USA AND

L. M. PATNAIK Microprocessor Applications Laboratory Indian Institute of Science Bangalore, India

Abstract Evolutionary processes have attracted considerable interest in recent years for solving a variety of optimization problems. This article presents a synthesizing overview of the underlying concepts behind evolutionary algorithms, a brief review of genetic algorithms, and motivation for hybridizing genetic algorithms with other methods. Operating concepts governing evolutionary strategies and differences between such strategies and genetic algorithms are highlighted. Genetic programming techniques and their application are discussed briefly. To demonstrate the applicability of these principles, representative examples are drawn from different disciplines.

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Simulated Annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Evolutionary Strategies (ESs) and Evolutionary Programming (EP) . . . . . . . . 2.1 Shortcomings of the ( m+ n)-ES . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Methods for Acceleration of Convergence . . . . . . . . . . . . . . . . . . . 3. Genetic Algorithms (GAS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Selection Strategies Used in GAS . . . . . . . . . . . . . . . . . . . . . . . 3.2 Fitness Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Parameters of a Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . 3.4 Classification of Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . 3.5 Implicit Parallelism in Genetic Algorithms . . . . . . . . . . . . . . . . . . 3.6 GAS in Constrained Optimization . . . . . . . . . . . . . . . . . . . . . . .

155 ADVANCES IN COMPUTERS, VOL. 45

156 157 160 160 161 I62 162 164 165 I66 166 167 I 67

Copyright 0 1997 by Academic Press Lld All nghts of reproduction in any form m e w e d

156

SRILATA RAMAN AND L. M . PATNAIK

4 . Extensions to Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Generating the Initial Population . . . . . . . . . . . . . . . . . . . . . . . 4.2 Use of Subpopulations in Place of a Single Population . . . . . . . . . . . . 4.3 Parallelism in Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . 4.4 Hybrid Genetic Algorithms (HGAs) . . . . . . . . . . . . . . . . . . . . . 4.5 Use of Intelligent Operators . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Avoidance of Premature Convergence . . . . . . . . . . . . . . . . . . . . 4.7 Messy Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8 Parameterized Uniform Crossover . . . . . . . . . . . . . . . . . . . . . . 4.9 Scaling of Fitness Values . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.10 Adaptive Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 1 GAS in Multimodal Function Optimization . . . . . . . . . . . . . . . . . . 4.12 Coevolution, Parasites and Symbiosis . . . . . . . . . . . . . . . . . . . . 4.13 Differences Between Genetic Algorithms and Evolution Strategies . . . . . 4.14 Reasons for Failure of Genetic Algorithms . . . . . . . . . . . . . . . . . . 5. Other Popular Search Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Population-based Incremental Learning (PBIL) . . . . . . . . . . . . . . . . 5.2 Genetic Programming (GP) . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 TheAntSystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. Some Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Partitioning Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 The Traveling Salesman Problem . . . . . . . . . . . . . . . . . . . . . . . 6.3 VLSI Design and Testing Problems . . . . . . . . . . . . . . . . . . . . . . 6.4 Neural Network Weight Optimization . . . . . . . . . . . . . . . . . . . . . 6.5 The Quadrature Assignment Problem . . . . . . . . . . . . . . . . . . . . . 6.6 The Job Shop Scheduling Problem (JSP) . . . . . . . . . . . . . . . . . . . 7 . Comparison of Search Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 8. Techniques to Speed up the Genetic Algorithm . . . . . . . . . . . . . . . . . . . 9. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

168 168 168 169 170 170 171 171 172 172 173 173 174 175 176 177 177 178 184 184 184 185 187 189 191 191 192 193 193 194

1. Introduction

In the fundamental approach to finding an optimal solution. a cost function is used to represent the quality of the solution. The objective function to be optimized can be viewed as a multidimensional surface where the height of a point on the surface gives the value of the function at that point . In case of a minimization problem. the wells represent high-quality solutions while the peaks represent low-quality solutions. In case of a maximization problem. the higher the point in the topography. the better is the solution . The search techniques can be classified into three basic categories . (1) Classical or calculus.based . This uses a deterministic approach to

find the best solution. This method requires the knowledge of the gradient or higher-order derivatives. The techniques can be applied to well-behaved problems.

OPTIMIZATION VIA EVOLUTIONARY PROCESSES

157

(2) Enumerative. In these methods, all possible solutions are generated and tested to find the optimal solution. This requires excessive computation in case of problems involving a large number of variables. ( 3 ) Random. Guided random search methods are enumerative in nature; however, they use additional information to guide the search process. Simulated annealing and evolutionary algorithms are typical examples of this class of search methods. Evolutionary methods have gained considerable popularity as generalpurpose robust optimization and search techniques. The failure of traditional optimization techniques in searching complex, uncharted and vast-payoff landscapes riddled with multimodality and complex constraints has generated interest in alternate approaches. The interest in heuristic search algorithms with underpinnings in natural and physical processes began as early as the 1970s. Simulated annealing is based on thermodynamic considerations, with annealing interpreted as an optimization procedure. Evolutionary methods draw inspiration from the natural search and selection processes leading to the survival of the fittest. Simulated annealing and evolutionary methods use a probabilistic search mechanism to locate the global optimum solution in a multimodal landscape. After we discuss the principles underlying Simulated Annealing (SA), and Evolutionary Algorithms, we present a brief survey of the Evolutionary Strategies (ESs) and Evolutionary Programming (EP). This is followed by a brief review of Genetic Algorithms (GAS). We then discuss various extensions to GAS such as parallel GAS, hybrid GAS, adaptive GAS, and deceptive GAS. We also highlight other popular search techniques such as Genetic Programming (GP), the ant system; and demonstrate the applicability of these methods. Diverse applications such as those encountered in partitioning, the traveling salesman problem, VLSI design and testing, neural network weight optimization, the quadrature assignment problem, and the job shop scheduling problem are explained. Prior to concluding this chapter with our brief observations on challenges and future prospects of this exciting area, we highlight a comparison of the various search algorithms, and methods to speed up the GAS.

1.1

Simulated Annealing

Annealing is the process of cooling a molten substance with the objective of condensing matter into a crystalline solid. Annealing can be regarded as an optimization process. The configuration of the system during annealing is defined by the set of atomic positions ri. A configuration of the system is

158

SRILATA RAMAN AND

L. M. PATNAIK

weighted by its Boltzmann probability factor, exp( - E ( r , ) / k T ) , where E ( r , )is the energy of the configuration, k is the Boltzmann constant, and T is the temperature [45]. When a substance is subjected to annealing, it is maintained at each temperature for a time long enough to reach thermal equilibrium. The iterative improvement technique for combinatorial optimization has been compared to rapid quenching of molten metals. During rapid quenching of a molten substance, energy is rapidly extracted from the system by contact with a massive cold substrate. Rapid cooling results in metastable system states; in metallurgy, a glassy substance rather than a crystalline solid is obtained as a result of rapid cooling. The analogy between iterative improvement and rapid cooling of metals stems from the fact that iterative improvement accepts only those system configurations which decrease the cost function. In an annealing (slow cooling) process, a new system configuration that does not improve the cost function is accepted based on the Boltzmann probability factor of the configuration. This criterion for accepting a new system state is called the metropolis criterion. The process of allowing a fluid to attain thermal equilibrium at a temperature is also known as the metropolis process.

7.1.1

The Metropolis Procedure

The metropolis procedure for a temperature T and starting state S is given below. Procedure M E T R O P O L I S ( S ,T ) begin r e p e a t M times begin News:= P e r t u r b ( S ) : del t a - c o s t : = C (News 1 - C ( S ) : i f (delta-cost
The terminology of the Metropolis procedure is as follows: 0

0

0

M . A large positive integer constant which represents the number of times the system configuration is modified in an attempt to improve the cost function. Perturb. Given a system configuration S, the function “Perturb” returns a modified system configuration News. C. A function which returns the cost of the given system configuration S.

OPTIMIZATION VIA EVOLUTIONARY PROCESSES

0

0

159

delta cost. The change in cost function when the system configuration changes from S to News. random. A function which returns a random number in the range 0 to 1.

1.1.2 The Simulated Annealing (SA) Algorithm The SA procedure is presented below. Simulated annealing essentially consists of repeating the metropolis procedure for different temperatures. The temperature is gradually decreased at each iteration of the SA algorithm. The step S2 of the procedure SA is the “cooling step”. The constant “alpha” used in the cooling step is less than unity; it is normally selected to be close to 1 so as to achieve the “slow cooling” effect. P r o c e d u r e SA begin T:= I n i t i a l T e m p e r a t u r e : S:= I n i t i a l system configuration w h i l e ( T > F i n a l T e m p e r a t u r e ) do b e g i n S1: M E T R O P O L I S ( S . I . ) ; /* T h e r m a l e q u i l i b r i u m a t T * / S2: T : = T * a l p h a ; / * c o o l * I endwhi 1e endprocedure

The various parameters such as the integral constant M, the initial temperature, the final temperature, and the value of real constant “alpha” are selected based on some thumb rule, experimental studies or theoretical basis. For practical implementations, the termination condition is modified as follows. The procedure SA terminates if b successive calls to METROPOLIS fail to modify the cost function.

1. 7.3 Probfemsin the Original Formulation of SA If the initial temperature is too low, the process gets quenched very soon and only a local optima is found. If the initial temperature is too high, the process is very slow. Only a single solution is used for the search and this increases the chance of the solution getting stuck at a local optimum. The changing of the temperature is based on an external procedure which is unrelated to the current quality of the solution, that is, the rate of change of temperature is independent of the solution quality. These problems can be rectified by using a population instead of a single solution. The annealing mechanism can also be coupled with the quality of the current solution by making the rate of change of temperature sensitive to the solution quality.

160

SRllATA RAMAN AND L. M. PATNAIK

1.2

Evolutionary Algorithms

Many researchers have been inspired by nature’s way of optimization using evolutionary techniques. In their quest, they have devised three broadly similar methods: genetic algorithms (GAS), evolutionary strategies (ES), and evolutionary programming (EP). All these methods are similar in the sense that they operate on a population of solutions. A population is a set of solutions. New solutions are created by randomly altering the existing solutions. A measure of performance is used to assess the “fitness” of each solution, and a selection mechanism is used to determine which solutions can be used as parents for the subsequent generation of solutions. These methods differ in their nature of modeling evolution and the search operators used. Darwin’s evolution is intrinsically a robust search and optimization mechanism. The biologcal systems that are evolved demonstrate complex behavior at every level (the cell, the organ, the individual, and the population.) The evolutionary approach can be applied to the problems where heuristic methods are not available or where heuristic methods generally lead to unsatisfactory results. Most widely accepted evolutionary theories are based on the Neo-Darwinian paradigm. These arguments assert that the history of life can be fully accounted for by the physical processes operating on and within the populations and species. These methods diEer in their emphasis on various types of search operators. Genetic algorithms emphasize on models of genetic operators as observed in nature, such as crossover, inversion and mutation; and apply evolution at the level of chromosomes. Evolutionary strategies and evolutionary programming emphasize mutational transformationsthat maintain a behavioral linkage between each parent and its offspring at the level of the individual or species. Evolutionary strategies rely on deterministic selection, and evolutionary programming emphasizes the stochastic nature of selection by conducting a stochastic tournament among the parents and offspring. The probability that a particular trial solution will survive depends on the score it obtains in the competition.

2.

Evolutionary Strategies (ESs) and Evolutionary Programming (EP)

The evolutionary algorithm as applied to function optimization problems, and discussed in [9], is as follows: (1) Find the real-valued n-dimensional vector associated with the extremum of the function to be optimized. (2) An initial population of P parent vectors is selected at random. The distribution of the initial trials is typically uniform due to the nature of the randomization function.

OPTIMIZATION VIA EVOLUTIONARY PROCESSES

161

(3) An offspring vector is created from a parent by adding a Gaussian random variable with zero mean and predetermined standard deviation. This is done for all the P parent vectors. (4) Selection then determines which of these solutions is to be maintained by calculating the errors of all the vectors. The P vectors that possess the least error become the new parents. (5) This is repeated until a stopping criterion is satisfied. Each component is viewed as a behavioral trait, not as a gene. It is assumed that whatever genetic transformations occur are the result of the changes in the behavior of the individual. Evolutionary strategies rely on a strict deterministic selection, whereas evolutionary programming uses a probabilistic selection mechanism by conducting a stochastic tournament selection to determine the population for the subsequent generations. The probability of survival of a particular solution depends on its rank in the population. Thus, the selection in EP emphasizes global exploration. ES abstracts coding structures as analogues of individuals while EP abstracts the structures as analogues of distinct species. Thus ES may use recombination operators to obtain new individuals but this is not used in EP as there is no sexual communication in the species. In ES, most often, multiple offspring are generated from each parent as opposed to a single offspring in EP. In the basic EP algorithm, the mutation operator is applied to each parent to get one offspring. The parents and offspring compete for selection. For every individual j , some c solutions are selected randomly from the mating pool (this includes both the parents and the offspring). Let q be the number of solutions among these which have a better fitness than that of j . This is recorded as the win for j . This number is computed for all the solutions in the population. Solutions characterized by high wins are selected as the new parents of the next generation. This is stochastic tournament selection [I 5,461. Let m parents generate n offspring during each generation. In (rn + n)-ES, both m parents and n offspring compete for survival, only the best m (population size) survive. The parents are allowed to exist until some better children supersede them. This may cause some super-fit individuals to survive forever. n)-ES, only the n offspring compete, and the best m survive for the In (m, next generation.

2.1 Shortcomings of the ( r n + n)-ES (1) For the class of problems where the optimum value changes with time, the algorithm tends to get stuck at an outdated good solution if the parameters cannot help the algorithm to jump to the new area [ 121.

162

SRILATA RAMAN AND L. M. PATNAIK

(2) The same problem can be seen if the measurement of fitness or the adjustment of object variables is prone to noise. (3) With m / n > P (the probability of a successful mutation), there is a deterministic selection advantage for those offspring which reduce some of their variances. A formal description of the evolutionary process is given in [ l 1. We can immediately recognize two state spaces-a genotypic (or coding) state space G and a phenotypic (or behavioral) state space P. Two alphabets are defined: an input alphabet I of the environmental symbols and the output alphabet 2 of behavioral responses. Evolution within a single generation can be explained as follows. Consider a population of genotypes Gi.Genetics plays a major role in the development of complex phenotypes. Cell development dependent on the local environment is called epigenesis. The process can be explained by the use of four mappings. The first mapping, epigenesis, incorporates the rules of growth under local conditions. It is represented by f,:I x G +P. The second mapping, selection, describes the process of selection, emigration and immigration within the populations. It is represented by f2:P - ) P. The third mapping, representation, describes the genotypic representation within the population. It is represented by f3: P+ G . The fourth mapping, mutation, describes the random changes that occur in the genetic material of the population. It is represented by f 4 : G -) G.

Methods for Acceleration of Convergence One way to achieve quick convergence is to decrease the variance of the Gaussian mutation especially as optimality is approached. In the initial stages, gross optimization occurs very fast, the rate being proportional to the slope of the objective function. As optimality is reached, the surface begins to flatten. The search must now be confined to a small area of the surface around the optimal region, and large variations in the population must be avoided. Evolutionary algorithms have been applied to problems with many constraints [34]. In such problems, penalties may be used to penalize those solutions which do not satisfy certain given constraints. In addition to this, the number of constraints that are violated is taken as an additional entity that is to be minimized. 2.2

3. Genetic Algorithms (GAS) Holland [31] designed a new class of search algorithms, called genetic algorithms, in which some of the principles of natural evolution process

OPTIMIZATION VIA EVOLUTIONARY PROCESSES

163

were incorporated. Genetic algorithms (GAS) are stochastic search algorithms based on the principles of biological evolution. Even though the mechanisms of the evolution process are not fully understood, some of its features have been observed. Evolution takes place on thread-like objects called chromosomes. All living beings are created through a process of evolution on these chromosomes. The traits and the features of the creatures are embedded into the chromosomes, and are passed down to posterity. Natural selection is the link between chromosomes and performance of the decoded structures, and the process of reproduction is the basis for evolution. Reproduction takes place through recombination and mutation. Recombination creates different chromosomes by combining the chromosome material of the two parents, and mutation causes the generated offspring to be different from those of the parents. Genetic algorithms are theoretically and empirically proven to perform robust search in complex spaces. Many research papers and dissertations have established the applicability of GAS in function optimization, control systems, combinatorial optimization, neural networks, and a class of engineering applications. GAS are not limited by restrictive assumptions (such as continuity and existence of derivatives) about the search space. GAS are different from other optimization and search procedures in four ways: (1) GAS work with a coding of the parameter set, not with the parameters themselves. (2) GAS search using a set of points (called a population). (3) GAS use a pay-off (objective) function. (4) GAS use Probabilistic transition rules.

In order to solve a problem using GA, the following issues have to be considered. 0 0

0 0 0

0

encoding or representation of a solution; generation of initial solutions; an evaluation function; a set of genetic operators; selection of GA parameters; termination conditions.

In the simple genetic algorithm (SGA), the main loop in the algorithm is executed once for each generation. In each generation, the algorithm calculates the fitness value of each individual in the population, selects fitter individuals for reproduction, and produces offspring using crossover and mutation operators. Selection, crossover, and mutation are the basic search

164

SRILATA RAMAN AND L. M. PATNAIK

operators of GA. The time steps (iterations) for evolution in a GA are called generations. The genetic algorithm, or the simple genetic algorithm (SGA), proposed by Holland [3 1] is as follows: Initialize population Evaluate population While termination condition is not reached Select solutions for the next generation Perform crossover and mutation Evaluate population EndWhile stop

3.1

Selection Strategies Used in GAS

The various selection strategies commonly used in GAS described in [46] are as follows: (1) Roulette wheel selection. A biased roulette wheel is used where the size of each slot is proportional to the percentage of the total fitness assigned to a particular string. The wheel is spun a number of times equal to the population size and the string pointed to by the roulette marker is selected each time. ( 2 ) Stochastic remainder selection. In the above selection process, highly fit structures may not get selected due to the probabilistic nature of the selection process. But in this method, such strings always get selected. The expected number of copies for an individual is E, where

4

E = - x Pop. Size

F

where Fi is the fitness of the individual, F is the average fitness and Pop. Size is the size of the population. Each string is allocated LEA copies. The remainder of the mating pool is selected by either of the two methods described below. (a) Stochastic remainder with replacement selection. Here the rest of the pool is selected using roulette wheel selection using the fractional parts as the weights for the wheel. (b) Stochastic reminder without replacement selection. Only the fractional parts are considered as probabilities and weighted coin tosses are performed to complete the pool.

OPTIMIZATION VIA EVOLUTIONARY PROCESSES

165

( 3 ) Stochastic universal selecrion. A weighted roulette wheel is used in this case as well. Along the boundary of the wheel, markers are placed at regular intervals corresponding to the average fitness of the population. Each time the wheel is spun, the number of markers within the slot of the individual determines the number of copies given to the individual. (4) Tournament selection. A random group, of size G, is selected and the best in this group is selected for the pool. This is repeated until the pool is full. A group size of G = 2 has been found to give good results. ( 5 ) Rank-bused selection. All individuals are ranked according to their fitness with the best (highest ranked) individual first. A non-increasing function is used to assign copies to the individuals. Proportionate selection is then performed to fill the mating pool.

3.2

Fitness Representation

Assignment of fitness values to the individuals can be done in many ways, depending on the problem. Methods of assigning fitness values, in the context of genetic programming, are discussed in [15]. These methods are also applicable to genetic algorithms and are discussed below. (1) Raw fitness (R,). This is the fitness value stated in the natural

terminology of the problem itself. If the raw fitness value is specified in terms of the error in the solution, raw fitness is to be minimized, otherwise it is to be maximized. (2) Standardizedfitness (S,). It is defined such that it is always minimized. If R, is the error, s,=Rf, else s,=R,,, - R,. Here, R,,, is chosen such that the best value of S, is zero. ( 3 ) Adjustedfrtness (A,). This is coded in the form, 1

(1 +S,)

It always lies between 0 and 1. It is a non-linear increasing function which is maximized. (4) Normalizedfrtness (N,). This is equal to,

5 T

c

where T = A,. It also lies between 0 and 1 and is an increasing function. The total of all the fitness values is 1.

166

SRILATA RAMAN AND L. M. PATNAIK

3.3 Parameters of a Genetic Algorithm A GA can be formally described by six parameters [ 111. (1) N , the population size. (2) C , the crossover rate. In each population, N x C structures undergo crossover. (3) M , the rnutation rate. M x N x L mutations occur in every generation. Here, L is the length of the coding structure. (4) G, the generation gap. This controls the percentage of population replaced in every generation. N x (1 - G) structures survive intact across generations. ( 5 ) W , the scaling window. The objective function is specified by U ( x )= F ( x ) - Fmm.F,,, is the minimum value that the fitness function F ( x ) can take. W determines how F,, is updated. As generations proceed, F,,, is updated to the minimum fitness value obtained in the last W generations. (6) S , the selection strategy used to determine the mating pool.

Numerous books and articles have been published on genetic algorithms. The interested reader can refer to [24,31,36,43,46,47,48] for a further understanding of the underlying principles of genetic algorithms.

3.4 Classification of Genetic Algorithms Genetic algorithms can be classified based on the following characteristics [31: (1) Selection method used. ( 2 ) Dynamic vs static. In dynamic GAS, the actual fitness values are used for selection; but in static GAS, only the rank of the individuals is used. (3) Extinctive vs preservative. The preservative approach guarantees non-zero probabilities of selection to every individual, while the extinctive approach ensures that some individuals are definitely not allowed to generate offspring. (4) Left vs right extinctive. In the right extinctive method, the worst performing individuals are not allowed to live, but in the left extinctive approach, some of the best performing ones are prevented reproducing in order to prevent premature convergence. ( 5 ) Elitist vs pure. In pure GAS, members have a lifetime of one generation. Elitist GAS provide unlimited lifespans to very good individuals. (6) Generational vs steady state (on the fly). In generational GAS, the set of parents is fixed until all the offspring have been generated,

OPTIMIZATION VIA EVOLUTIONARY PROCESSES

167

unlike in steady state GAS where the offspring immediately replace the parent if the offspring perform better than the parent.

3.5

Implicit Parallelism in Genetic Algorithms

GAS work on the basis of the schema theorem [31]. According to this theorem, each member of the population is simultaneously an instance of many schemata. The schemata are the building blocks which are tested implicitly for their fitness values. As generations proceed, these high-fitness low-order schemata are combined to form high-order schemata. The property of implicitly searching all the schemata to which an individual belongs is called implicit parullelism in GAS. The number of schemata processed implicitly has been shown by Holland [31] to be of the order of k 3 where k is the population size. The population size is taken to be equal to c x 2', where 1 is the length of the encoding and c is a small integer. The implication of the above result is that by having a population of only n strings, the algorithm is able to search n3 schemata. The result has been extended and generalized in [4] to a population of size k = 2@'where is a positive parameter (Holland's result is a special case when /3= 1.) It is shown that the lower bound on the number of schemata tested is a monotonically decreasing function of the population size and /3. By assigning values to ,8, it is found that increasing /? drastically reduces the order of the lower bound. An analysis of genetic algorithms with respect to their convergence is given in [MI. Finite Markov chains are used to prove whether or not canonical GAS converge. Since the state of the GA depends only on the genes of the individuals, it is represented as a probability matrix as required by the analysis. It is proved here that a GA with crossover and mutation probabilities in the and using proportional selection does not converge to the global range [0,1] optimum. It is also proved that if the same algorithm maintains the best solution (an elitist strategy) over generations, it is guaranteed to converge.

3.6

GAS in Constrained Optimization

When GAS are applied to constrained optimization problems, it is seen that simple crossover and mutation often produce individuals that are invalid. To overcome this problem, three methods are often used. (1) The GA operators are modified so that they produce only valid solutions. These modified operators must ensure exponentially increasing copies of good solutions for the GA to work according to the schema theorem. One drawback with this approach is that for different problems, the operators must be redefined.

168

SRllATA RAMAN AND L. M. PATNAIK

( 2 ) The normal GA operators are retained in the second method. Any invalid solutions produced are penalized using penalty functions. These penalty functions must ensure that the GA does not converge to invalid solutions. (3) In the third method, too, the normal GA operators are retained. Invalid solutions are repaired and converted to valid solutions before being evaluated in the algorithm.

4. 4.1

Extensions to Genetic Algorithms Generating the Initial Population 151

Instead of the initial population being generated once randomly, each member in the population is taken as the best of n individuals randomly generated, where n becomes a parameter defined by the user. This is seen as a generalization of the usual method where n = 1. When GA was used to optimize a 10-dimensional function, it was found that 14% of the function evaluations were to determine the initial population.

4.2

Use of Subpopulations in Place of a Single Population [2]

Instead of having a single population that proceeds through generations, the initial population is divided into a number of subpopulations. Each of these subpopulations proceeds independently for some generations until some criterion is met. This duration is called an epoch. At the end of every epoch, some individuals, normally the best ones, are exchanged with the neighboring subpopulations. This is continued for some epochs, at the end of which the subpopulations are merged into a single population and the GA proceeds with a single population. This lends itself to many variations. (1) The criterion for the duration of an epoch can be just a fixed number of generations or an epoch can end when the subpopulations tend to saturate. (2) The number of individuals exchanged can be made a parameter of the algorithm. (3) All the subpopulations can be combined together in one generation or they can be merged gradually over many generations.

The main motivation for use of subpopulations can be explained using a biological metaphor. Isolated populations tend to evolve their own

OPTIMIZATION VIA EVOLUTIONARY PROCESSES

169

distinctive characteristics depending on the local environments. It has been seen that when these individuals are put in a new environment, there is rapid evolution of the species. Here, changes due to the new environment cause improvement of an individual’s traits which are rewarded. This is called punctuated equilibria and is explained in [ 2 8 ] .

4.3

Parallelism in Genetic Algorithms

From the above, it is evident that the subpopulations can evolve independently and in parallel. The evaluation of the subpopulations can go on simultaneously, interaction being needed only between epochs. On the other hand, even with a single population, evaluation of the different individuals can take place simultaneously independent of one another. After the complete population has been evaluated, the results are collected and the genetic operators are applied to obtain the next generation. One other method to implement parallelism that has been proposed [19] is that of a master-slave parallel algorithm. Here, subpopulations evolve independently and at the same time a complete population (of size N) also evolves. If there are M subpopulations, then each will have a size of N / M . At the end of a specific number of generations, all the subpopulations are put together. N / 2 individuals are chosen from the complete population and N / 2 members are selected from subpopulations for the next generation. The different methods of parallelizing GAS are explained in [40]. 0

0

0

0

In the synchronous master-slave model, the master processor controls the mating and selection processes. The slave processor performs the evaluation in parallel. One of the main disadvantages in this method is that it depends a lot on the master processor to generate the new population. If the master processor fails for some reason, even though all the other processors are functioning, the algorithm cannot proceed. Efficiency of this method decreases if the evaluation time varies from individual to individual. In the semi-master-slave model, the synchronous constraints are reduced. This model is able to overcome the disadvantages of the previous model. In the distributed asynchronous concurrent model, each processor performs mating and selection independently. The population information is stored in a common memory shared among all the processors. In the network model, each processor runs a GA independent of the others. Each processor has its own memory. The best members are occasionally broadcast to the neighboring populations.

170

SRILATA RAMAN AND L. M . PATNAIK

4.4

Hybrid Genetic Algorithms (HGAs)

In HGAs, the genetic algorithm is supplemented with a local search mechanism like simulated annealing [45] or tabu search [14]. The main motivation for such a hybridization is that GAS always tend to improve the average fitness of the population rather than finding the global optimum. Thus GAS are able to very quickly find regions of good solutions but take relatively longer time to find the global optimum. It has been found fruitful to allow local searches to find the global optimum in good solution areas [14]. 4.4.1

Tabu Search

Tabu search [14] is a local search mechanism which is used often to try and improve the solution obtained by the genetic algorithm. The tabu search algorithm is explained as follows. The search starts with a random solution string. Then successive 2-opt exchanges are made to improve the solution. During 2-opt exchanges, two bits randomly chosen in the individual are exchanged. This is repeated to get many such randomly mutated copies of the original individual. All these copies are examined to determine their fitness. The copy which improves the solution better than the others is selected to replace the original solution. This is continued until some stopping criterion is reached. The stopping criterion may be based on the number of copies generated or the finding of a satisfactory result. To escape from local minima, exchanges which result in deterioration of the quality of the solution are also allowed. To prevent being pulled back into the local minima by successive improvements in the direction of the local minima, a list of exchanges that must be disallowed is maintained as a TABU list. In one implementation, the GA and the local search are run alternately. The results of one generation of GA are improved and handed over to the GA for one more generation. The complete population can be improved by the local search algorithm or only the best members of the GA can be given to the local search for optimization. Alternately, the GA is run until it can proceed no further, after which the local search completely takes over and tries to improve the result. Instead of using only one local search algorithm, many algorithms can be used in tandem to prevent bias towards any one algorithm, as each will have its own effective search area.

4.5

Use of Intelligent Operators

Rather than using the standard genetic operators, some problem specific information can be incorporated into them to improve the hill-climbing nature

OPTIMIZATION VIA EVOLUTIONARY PROCESSES

171

of the genetic algorithm. In some problems, the individuals in the population must satisfy some conditions to be accepted as valid. The standard operators may not be directly applicable because random crossover and mutation points may result in the string becoming invalid. Thus, the operators must maintain valid members while preserving the building blocks. In the case of the traveling salesman problem, for an individual to be considered legal, it must contain all the cities once and only once. If standard crossover and mutation operators are used, such conditions might not always hold. The operators must thus preserve such a criterion across generations.

4.6

Avoidance of Premature Convergence

This has been a problem as early convergence of the population often results in the GA getting stuck in a local optimum. To prevent this, many options have been proposed [ 7 ] . Some of these approaches are given below. A preselection mechanism suggested by Cavicchio replaces the parents with their offspring. In De Jong’s crowding scheme, the offspring replace the closest matching string from a randomly drawn subpopulation, the size of which is specified as a parameter called the crowding fuctor. A uniqueness operator was developed by Mauldin. This uses a censorship operator which allows an offspring to survive only if it is different from all the other members of the current population. A sharing function was used by Goldberg where probabilities are determined talung into account the average fitness of similar individuals.

4.7

Messy Genetic Algorithms

In order to obtain better convergence properties, and to ensure good results even if the linkages between the building blocks are weak, Goldberg et al. [lo] proposed the messy GA. Messy GAS (MGAs) use variable length coding which may have too many or too few bits with respect to the problem being solved. In traditional GAS, strings have all the genes, and thus each string is a solution to the problem. In messy GAS, all the genes need not be present in a string. In some cases, due to the usage of too many bits to represent the string (overspecification), many unwanted bits may also be present in the individual. Thus, in MGAs, two structures need not have any genes in common and variable length operators are used unlike the fixed length crossover in standard GAS. The cut operator cuts a string with a probability equal to ( I - 1) x P, where P, is the probability of the cut operator and I is the length of the string. The splice operator joins strings with a probability P,.

172

SRILATA RAMAN AND L. M. PATNAIK

The evolution of the population takes place in two stages. The first, called the primordial phase, consists of a number of generations where fixed length individuals are improved through reproduction without any other operators. The population is halved periodically. The second, called the juxtapositional phase, uses cut, splice and other genetic operators on a fixed population size. In cases of underspecification, a method of competitive templates is used to fill the remainder of the structure with locally optimal structures. To remove any inconsistencies due to the variable length coding, two mechanisms have been proposed.

(1) Genetic thresholding allows two structures to compete only if they have more than some threshold of common genes. (2) Tie breaking is used to prevent parasitic bits from tagging along with low-order building blocks obstructing the formation of higher-order blocks.

4.8

Parameterized Uniform Crossover

Simple uniform crossover has been parameterized by Spears and De Jong [20]. The main reason for this is that though uniform crossover produces offspring which are much different from either of the parents (due to the large disruptive power of the operator), its advantages are quite numerous. It is simple, having only one crossover form and more importantly, its disruptive power is independent of the length of the coding used for the individuals. The explorative power is also very useful in the initial stages of the search, when the operator distributes the members over a large area in the state space. In simple uniform crossover, the probability of swapping of bits is fixed at 0.5. This probability is parameterized by a variable Po to obtain an unbiased crossover operator which can easily be controlled by a single parameter. The importance of coding length independence is highlighted when many additional fake bits are added to the encoding of a simple optimization problem [20]. The results showed that though the performance of two-point crossover worsened (due to the fake bits), there was no change in the performance of uniform crossover.

4.9

Scaling of Fitness Values

Sometimes, few individuals whose fitness values are much higher than the rest of the population are generated. These superfit individuals are given many more copies for the next generation when compared to the rest of the population. This results in their dominating the population and forcing

OPTIMIZATION VIA EVOLUTIONARY PROCESSES

173

premature convergence. Scaling also becomes necessary if the objective function has negative values and proportional selection is used.

4.10

Adaptive Mutation

Application of the mutation operator as used in evolutionary strategies (ES) and genetic algorithms (GA) is discussed in [13]. In ESs, the probability of mutation (P,) is encoded as part of the individual’s string. During the process of generating offspring, P,,, is affected by the recombination and mutation operators. The mutation scheme first mutates the mutation rates, then these rates are used to mutate the members of the population. In traditional GAS, P, is usually specified as a parameter. If adaptive mutation is used, P, is also encoded as part of the member’s string as is done in ESs. 4.1 1

GAS in Multimodal Function Optimization

It is seen that GAS always converge to a single solution irrespective of the number of good solutions present in the function. This is termed genetic drift [36]. Obtaining many solutions instead of allowing the GA to converge to a single solution is discussed in [35]. Many schemes have been proposed to maintain the diversity of solutions so as to find more than one solution in a multimodal function optimization problem. The concept of a niche is often used. Each peak in the fitness landscape is thought of as a subspace or a niche. The biological metaphor for a niche is explained as follows. Each niche is an independent environment which can sustain some individuals depending on the fertility of the environment. The number of individuals that a niche can support is called its carrying capacity. If there are too many individuals in a niche, the weaker among them tend to die; if there are too few individuals, they are capable of exploiting the resources in the niche. The carrying capacity of a particular niche, or peak, depends on its fitness relative to the other peaks. The idea is to populate each niche with a small number of individuals so that they can find the best solution in that niche. In this way, many solutions can be obtained rather than a single solution. This entails maintaining diversity of the population as generations proceed. One method to preserve diversity is crowding proposed by De Jong [36]. Here, premature convergence is reduced by minimizing the change in the population between generations. Once offspring have been generated, a certain number of individuals are selected randomly from the population and the offspring replace the most similar individuals in the selected group. Similarity of

174

SRILATA RAMAN AND L. M . PATNAIK

individuals is decided based on some similarity metrics. These may be domain independent (the Hamming distance between the strings) or can be problem specific. The disadvantage with this method is that not much exploration is done by the members of the population. Another method similar to De Jong’s method is proposed in [37].In this method, domain-specific similarity metrics are used and instead of the offspring replacing an individual from a randomly chosen group, replacement is limited only to the parents. The child replaces the parent if it has a higher fitness compared to that of the parent. This method is termed deterministic crowding. Another method, called sharing, is discussed in [3 1 1. In this method, fitness of similar individuals is reduced in order to give preference to a solution that explores some new region of the space. A new method called dynamic niching is introduced in [35]. The main advantage with this method is reduced computation time as compared to sharing. In addition to the methods stated above, some restrictions are also imposed during reproduction in order to maintain diversity and these are as follows. In one case, once an individual has been selected for crossover, the other one is selected from only those members in the same niche as the first member. In another method, called line breeding, a good solution is repeatedly mated with others in the same niche. In inbreeding, members mate with other members in the same niche. If after many generations, the average fimess of the niche does not increase, cross-breeding with the other niche members is allowed. These mating restrictions coupled with the methods to maintain diversity have been used for optimization of the problems in [37]and are described in [35]. The concept of a gene-invariant genetic algorithm (GIGA) is presented in [38]. GIGAs are presented as a subclass of GAS where the genetic structure of the population does not vary. If the population is represented by a two-dimensional array, where each row corresponds to one member, genetic invariance means that the multiset of values in any column do not change with time. Thus, in any column, though the genes may be exchanged within the column as generations proceed, no new gene is introduced into the column. This invariance is maintained by ensuring that the children replace only their parents. The concept of a family is also introduced in [38]. A family is a set of offspring produced by a set of crossover operations on a pair of parents. The number of sets of offspring is the family size and the best offspring replace the parents.

4.12

Coevolution, Parasites and Symbiosis

The concept of coevolution is explored in [42]. In a scenario where coevolution and symbiosis are used, two gene pools which evolve separately

OPTIMIZATION VIA EVOLUTIONARY PROCESSES

175

are maintained. This is analogous to the biological situation where a host and a parasite evolve together. The population representing the solutions to the problem forms the hosts while the fitness cases used to evaluate the solutions form the parasites. The populations are assumed to be distributed on a two-dimensional toroidal grid. Both the host and parasite populations interact with each other through the fimess function. The hosts are given fitness values depending on the number of test cases which they are able to satisfy. On the other hand, the parasites are scored according to the number of solutions that are not able to satisfy the particular fitness case. This method has some inherent advantages. If a part of the population of hosts gets stuck in a local optimum, the members of the parasitic population would evolve towards it, thus reducing the fitness of that part of the population. This moves the population out of the local optimum. It is also seen that after many generations, due to the evolution of the parasites, only those test cases that are not satisfied by many solutions exist in the parasitic pool. This effectively reduces the number of test cases to be applied in order to find the fitness of the solutions.

4.12.1

Symbiosis in GAS

Similar to the above model, a model based on symbiosis is developed in [41]. In this model, a number of species cooperate in ways that are beneficial to all the species. In such an environment, a complex optimization problem is split up into many subproblems. Parallel GAS then simultaneously solve the subproblems. Each population now represents the solution to a subproblem. The fitness of the individuals in each population is based on the interactions with members in other populations.

4.13

Differences between Genetic Algorithms and Evolution Strategies

Following are some significant differences between GAS and ESs [13]. (1) GAS operate on fixed bit strings, later mapping them onto object values. GAS work on encoded representations of the actual problem and use functions to map them onto actual points in order to obtain their fitness values. ESs work on real-valued vectors. (2) Since ESs work completely in the phenotypic domain, they utilize much more problem specific knowledge than GAS. (3) ESs can only be applied to function optimization problems whereas GAS cover a much wider range of applications.

176

SRILATA RAMAN AND L. M. PATNAIK

(4) In GAS with proportional selection, reproduction rates are assigned dynamically to each individual based on the respective fitness values. The worst individual may also have some chance of reproducing. In ESs, reproduction rates are assigned statically, without any regard to the fitness values. ( 5 ) In GAS,mutation is mainly used as a secondary operator whose main purpose is to regenerate lost genetic material, but in ESs, mutation is the main operator and is implemented as a self-adapting hill climbing operator. (6) In ESs, the rate of mutation is controlled by a gaussian random variable. This rate is adjusted depending on the distribution of the fitness values of the individuals. This is known as collective selflearning of parameters, which is present in ESs and not found in the case of GAS.

These differences are mainly due to the difference in the representation schemes used for the two methods. 4.14

Reasons for Failure of Genetic Algorithms

Failure of GAShas been attributed to three main reasons [ 2 3 ] . (1) Deceptive problems. The GA’s search mechanism is based on the schema theorem. The GA finds solutions by combining several highfitness low-order schemata to get higher-order schemata. In some problems however, the optimal solution does not contain the highfitness low-order schemata. In such state spaces, the GA is led away from the global optimum and gets stuck in a local optimum. Such problems are called deceptive problems. (2) Sampling error. In some cases, even though a particular member may have a good fitness value, it may not be high compared to the other members. This may cause the member to die due to the selective pressure, even though it has above average fitness; that is, due to the other members having fitness values greater than that of this member, no copies may be given to it during the selection process. ( 3 ) Disruption of the schema. This happens if the crossover operator has not been properly designed. The operator quickly disrupts good low-order schema and prevents the formation of good solutions. In such cases, crossover is not able to guide the search to form high-order schemata even though the problem is not deceptive.

OPTIMIZATION VIA EVOLUTIONARY PROCESSES

5.

177

Other Popular Search Techniques

5.1 Population-based Incremental Learning (PBIL) In GAS, the population can be represented by a probability vector. This vector has the same length as each of the members of the population. In a fully generational GA with fitness proportional selection and a general pairwise recombination operator, the probability vector gives the probability of value j appearing in position i in a solution vector. As is obvious from the above, the same vector can represent radically different populations. Another property that the probability matrix has is that the population is unlikely to improve over the generations as the bits of an individual solution are considered independent of one another. The probability vector has been made use of by [2] to create a population-based search technique called PBIL. In PBIL, a probability vector is used to describe the population. In a binary-encoded string, the vector specifies the probability of the bit taking a value 1. In such a representation, maximum diversity is found when the probabilities are 0.5. Unlike GAS, operations are defined on the vector rather than on the population. Similar to a competitive learning network, the values in the vector are gradually shifted towards the vector representing a highfitness solution. The algorithm works as follows: (1) Initially the probability vector is initialized to 0.5. (2) From the vector, a population of representative solutions is generated. (3) The solutions are evaluated using a fitness function as required by the problem. (4) Mutation is performed to see if the solutions can be improved. (5) The vector is then moved towards the best of these solutions, by changing the probabilities of the probability vector to resemble the highest evaluating solution. (6) The above process is repeated until a satisfactory solution is obtained or some stopping criterion is met. The formula used to change the probabilities is given by

PROB, = (PROB,x (1.0- L.R.)) + (L.R. x VECTOR,). where PROB, is the probability of generating a 1 at position i, VECTOR, is the ith position in the solution vector towards which the vector is moved, and L.R. is the learning rate. The PBIL algorithm requires four parameters: the population size, the learning rate, the mutation probability and the mutation shift (the magnitude of the effect of mutation on the vector).

178

SRllATA RAMAN AND L. M. PATNAIK

Some extensions to the basic PBIL algorithm are also discussed in [2]. In one case, the vector is moved in the direction of the best M solutions, where M e N , the population size. This can be realized in many ways: the vector can be moved equally in all directions or the vector is moved only in the positions where there is a consensus in all or most of the best solution instances. In another case, the vector is moved based on the relative evaluations of the best M solutions. In this case the solutions are ranked based on the solution quality as only the rank of the best solutions is needed. The probability vector is moved away from the lowest evaluation solutions. In this case, the probabilities in the vector are moved in the direction opposite to the probability vector that is representative of poor solutions.

5.2

Genetic Programming (GP)

In genetic programming [15], the genetic algorithm is applied to a population of programs in order to find the one that gives the best solution to the given problem. The fitness value associated with the programs may either be a measure of how best they solve the problem (a maximization function) or the error in the solution produced by the program (a minimization function). The programs are represented as hierarchical trees to which the genetic operators are applied. The program may be boolean-, integer-, real-, complex-, vector- or symbolic-valued as required by the problem. The operators are applied to parts of the program (subroutines or subprograms) as in normal GAS. When this process is repeated over many generations, the programs produced will be of increasing fitness due to the very nature of neo-Darwinian evolution. The members in the population are organized as hierarchical symbolic expressions (S-expressions). The nodes in the trees are obtained from a function set and a terminal set of symbols which form the arguments to the functions. The function set includes boolean, arithmetic and conditional operators, iteration and recursion functions, etc. When applying the operators, care is taken so that all the trees produced are valid S-expressions. Invalid trees result when there is a mismatch in the number of operands to a function or the type of operands to a function. For example, the square-root function must take a real or integer-valued variable, not a boolean one, and the function to find xy must have two arguments. It can be easily seen that nodes from the function set always form the internal nodes in the tree (as they always have operands) and nodes from the terminal set always form the leaf nodes. Each S-expression tree is, therefore, a collection of functions along with

179

OPTIMIZATION VIA EVOLUTIONARY PROCESSES

OR

J /AND\ X

FIG. 1.

\

\Y

/AND NOT

NOT

Y

X

1

1

Tree representation of the Exclusive-OR function.

their arguments; that is, each tree represents a program which when executed solves a given problem.

Example. A simple example of a program to compute the ExclusiveOR function represented in the form of a tree is as follows. In Fig. 1, the function set is the set { A N D , NOT, O R ) and the argument or terminal set is 1 X , Y I . The same tree is expressed in the form of a LISP program as, f o r ( and ( x n o t ( y

)I)

t and

(

not

( x

1 y

) )

1.

Thus a program is a set of functions along with a set of arguments. Since these functions include comparison operations and iterations, any program can be represented as a tree for use in the GP algorithm. Many methods have been suggested for the generation of the initial population of trees. In one method, the tree is such that all the leaves of the tree are found in the same specified level. In another method, the only restriction on the trees is that they must not exceed a maximum specified depth. Sometimes a combination of these methods is used to obtain the initial population. The main operators used in GP are selection and crossover. In crossover, the crossover points are selected randomly from two selected parents. The subtrees at these points are exchanged. Following are the implications of using such an operator: (1) If leaf nodes are selected, the process becomes equal to one point mutation. (2) If the root of one parent is selected, this parent becomes a subtree of the second parent to form one of the offspring. The part of the second parent removed by the crossover operator becomes the other offspring. (3) If the roots of both the parents are selected as the crossover points, no crossover takes place.

7 80

SRILATA RAMAN AND L. M. PATNAIK

(4) Importantly, unlike the crossover operator in GAS, even if both parents are identical, the offspring produced may be very different from both the parents. This helps to a great extent in the prevention of premature convergence due to some superfit individuals. Some secondary operators used in genetic programming are as follows. (1) Mutation. A randomly generated subtree is inserted at a randomly selected point. ( 2 ) Permutation. This operator is defined as the generalization of the inversion operator used in GAS. It randomly permutes arguments of a randomly chosen point, unlike the inversion operator, where a gene sequence is inverted. (3) Editing. This operator evaluates subtrees and replaces them by their results. Example: X AND X is replaced by X . (4) Encapsulation. Potential subtrees are identified and named so that they can be referenced and used later.

The main parameters that must be chosen in genetic programming before starting the algorithm are as follows: (1) The terminal sef. The set from which arguments are given to the function nodes in the tree. (2) The function set. The set of functions which are used to determine the internal nodes of the tree. (3) The fitness evaluation technique. In some cases, where a continuous function is not available to compute the fitness, fitness cases are used. Fitness cases represent the value of the function at specific points, and are used if the function values are not available at any arbitrary point. Unless these points are taken from the entire range of the function, they will not be representative of the function and they may result in poor quality solutions. (4) The numeric and qualitative parameters. These include the parameters used in GAS. The numeric parameters include the population size, the maximum number of generations to be allowed before the algorithm is terminated, the probability of selection and the probabilities of the secondary operators used. The qualitative parameters include the selection method, the type of fitness function and the use of the elitist strategy. Other parameters not present in GAS but used in GP include the maximum depth allowed in the trees obtained after applying the crossover operator, the maximum allowed depth of the initially generated random trees and the method used to generate the initial population.

OPTIMIZATION VIA EVOLUTIONARY PROCESSES

181

( 5 ) The termination criteria. As in GAS, the termination criteria achieve a sufficiently good result, reaching some maximum number of generations and reaching a stage where the algorithm is not able to further improve the solution (saturation).

GP as applied to three classes of problems is discussed in [ 151, 0

0

0

Optimal control problems. In this class, the problem is represented by a set of state variables. Control variables have to be chosen such that the system goes to a specific state with an optimal cost. In one example, the algorithm finds a program to bring a cart to rest at a target spot in the least time. The cart is modeled as moving on a one-dimensional frictionless track. The program must give the optimum acceleration in the correct direction to stop the cart. Robotic planning. The GP algorithm has to find a program which correctly moves an artificial ant in a two-dimensional toroidal grid to find all the food pieces scattered in the grid. The Santa Fe Trail problem is used as an example to test the algorithm. In this problem, 89 food pieces are scattered in the grid. The GP algorithm has to generate a program which, when used by the ant, must lead it to all the pieces of food. The food pieces are not available on adjacent squares and there are many breaks in the trail which the ant must successfully cross. The permitted operations of the ant include moving forward, turning left or right and sensing food. The fitness value given to the program is the number of food pieces it successfully finds. GP was also run on the Los Altos Trail. This trail contains more than a hundred food pieces distributed on a larger grid. The Los Altos Trail is much more complex than the Santa Fe Trail and includes many more irregularities. The GP algorithm was able to find a program that successfully solves this trail. Compared to the program used to solve the Santa Fe Trail, this program is more complex. Symbolic regression. The algorithm must find a program that represents an expression that correctly fits the sample data. The algorithm not only has to find the constants in the expression, but also the expression itself. The difference between the sampled values and the generated expression’s values is taken as the measure of the fitness of the program. The test function chosen is a polynomial one. The function set includes trigonometric and logarithmic functions which are not necessary for the particular problem. After finding many closely, but not exactly fitting functions, the algorithm finds the correct function. Some examples of simple regression problems to which GP has been applied are given below:

182

SRILATA RAMAN AND L. M. PATNAIK

1. Trigonometric identities. GP has to obtain mgonometric identities by finding an expression equivalent to the one given. Example: Consider the identity sin(a + b ) = sin(a)cos(b)

+ cos(a)sin(b)

Given sin(a + b), GP finds a program that evaluates to the right-hand side of the above identity. The test expression used is cos 2x. The GP algorithm finds two programs which evaluate to 1 - 2 sin’x and sin(x/2 - 2x) respectively, both being equal to cos 2x. The algorithm thus finds two trigonometric identities. The fitness function used in the above problem consists of fitness cases. The main issue in the problem is the use of correct representative points for the fitness cases. The points used for the fitness cases must be distributed uniformly in the range [0, n] for the fitness function to properly represent the objective function cos 2x. 2. Symbolic integration. Given an expression, the algorithm has to find a program (a set of functions) that evaluates to the integral of the given expression. 3. Symbolic differentiation. The algorithm has to find a program which is the derivative of the given expression. 4. Solution of differential equations. Given a differential equation, whose solution is in the form of a function, GP has to find a program that represents the solution.

In all the problems considered, LISP has been used while generating the programs in the population. Genetic programming is also applied to a class of problems where a complex behavior develops as time progresses. An example of this is an ant colony [8]. The majority of the ants spend their time in collecting food for the colony. As more and more food is collected, the ants are able to distinguish between those places where abundant food is available and those places where there is no food. This collective behavior, explained in [8], is as follows. Ants, which are nearly blind, are able to find the shortest route between two places. It has been found that frequently used paths are established as pheromone trails. The ant lays varying quantities of a substance called pheromone along the path it travels. If a randomly moving ant encounters a pheromone trail, it follows the trail with a high probability and in the process reinforces the trail with its own pheromone. This leads to more and more ants using the trail; this has been termed as autocatalytic behavior. The probability of choosing a path increases with the number of ants that have already used the path. Since shorter paths mean less time for

OPTIMIZATION VIA EVOLUTIONARY PROCESSES

183

the ant to go to the destination and return, all the ants will eventually choose the shortest path available. GP has been used to model a colony of ants [15]. The behavior of the ants is represented in the form of a program. GP has to find a program such that by following it, the ants successfully find all the food. The fitness used is a measure of the distribution of food. The correct program for the ants must be the one which collects all the food at one place when the algorithm terminates. Thus at the completion of a program, the more the food is scattered, the less is the fitness value of the program. GP has also been tried in the area of game theory. The algorithm finds a program to play a game using the minimax strategy. A two player zero-sum game is used to test the program. The fitness evaluation consists of adding the gains of the moves generated by the program for all possible moves of the opponent. In traditional GP, random nodes are selected as crossover points. Though this maintains diversity, it is seen that building blocks become distributed over the entire tree due to the repeated crossover operations. Two new operators which preserve the context in which subtrees are located by restricting the crossover points are discussed in [39]. The context of a subtree is defined by its position in the tree. The position of a subtree is the unique path from the root to the subtree. One new crossover type introduced is strong context preserving crossover (SCPC). Here, the crossover points are selected such that the subtrees chosen have exactly the same position in both the trees. This type of crossover is found to be too restrictive and does not allow exploration of the entire state space. Another disadvantage is that good building blocks are not spread to other parts of the tree. SCPC is useful in those problems for which the solution trees contain some repeated code. Another crossover, weak context preserving crossover (WCPC) is also discussed in [39]. In this type, once two crossover points (nodes in the tree) have been selected, the subtrees to be exchanged are determined as follows. In one parent, the subtree at the crossover node becomes the subtree to be exchanged. However, in the second parent, a random subtree of the crossover node is selected to be exchanged. This results in an asymmetric exchange as opposed to a symmetric one in SCPC. One of the problems that was tested using the crossover operators 1391 is the food foraging problem described in [15]. It is seen that SCPC, along with regular crossover produces better results than standard GP. The results also show that the solution trees obtained by the new method are much smaller when compared to the standard GP algorithm. It is also seen that a mix of SCPC and regular crossover outperforms the case when WCPC alone is used.

184

SRILATA RAMAN AND L. M. PATNAIK

5.3 The Ant System This is an optimization technique taken from nature which follows the behavior of ants described earlier. The system is based on the way ants cooperate to find food. The algorithm consists of a number of ants (agents) which form the population. This is similar to strings used in GASthat form the population of potential solutions. The problem is represented in the form of a complete graph and the goal is to find a route that satisfies some criteria and minimizes some objective function. Initially, all the agents complete some tour that satisfies all the required criteria. Once the agents have completed the tour, the relative merits of the tours, which are the quality of the solutions, determine how much pheromone is laid on the paths of the tour. Once this is over, the agents again try to find a tour starting from their current position. This process of finding tours is repeated until some stopping criterion is met. As time progresses, knowledge about good routes is accumulated in the form of large quantities of pheromone on these routes. This knowledge is made use of by the agents by making the probability of selecting the next move a function of the quantity of pheromone found in the paths originating from the current position; more the quantity of pheromone, higher the probability of selecting the particular path. In this way, pheromone builds up on those paths that form good solutions, thus leading to the optimization of the function.

6. Some Optimization Problems 6.1 Partitioning Problems The problem of optimizing the allocation of tasks in multicomputers using hybrid genetic algorithms has been discussed in [17]. Here, the given problem is partitioned into subproblems and allocated to different processors in a multiprocessor system in order to minimize an objective function. The allocation is based on a loosely coupled synchronous parallel model where the computation and the communication phases do not overlap. Instead of allowing the search to proceed blindly, some problem specific knowledge is incorporated into the search algorithm in the form of hill-climbing operators. The algorithm divides the search process into three stages: a clustering stage which forms the basic pattern of the division of tasks based on interprocessor communication, a calculation balancing stage where the emphasis is on the computational load to increase the fitness, and finally a boundary adjustment stage where hill climbing is performed. At the end of the first two stages, a nearly optimal solution is obtained where each cluster

OPTIMIZATION VIA EVOLUTIONARY PROCESSES

185

represents the tasks allocated to a single processor. In the third stage, since the population is near convergence, the power of crossover diminishes due to the similarity of the individuals. Mutation is then used to try and improve the solutions by swapping some small tasks between the processors. This is essentially equal to fine-tuning the solution and is accomplished with the help of hill climbing by the individuals. Elitist ranking followed by random selection has been used as the solution strategy. The individuals are ranked between 1.2 (best) and 0.8 (worst) and the others in between. Those individuals above a rank of 1.O are given a single copy in the mating p o l . The fractional parts of the ranks of all the individuals are then used as probabilities to fill the remainder of the mating pool. Two-point crossover, the standard mutation and the standard inversion operators are used to obtain the offspring. In the hill-climbing operation, an element at the boundary of a cluster in an overloaded processor is moved into another processor in the system provided it causes an improvement in the objective function. The experiments lead to the conclusion that without the hill climbing stage, the quality of the result deteriorates and the algorithm becomes almost a hundred times slower. A similar method of solving the K-partition problem is discussed in 161. A parallel GA has been used on a hypercube model, where the subpopulations proceed in isolation, occasionally exchanging their solutions with their neighbours. The objective is to partition the graph of elements, each having some area, into K partitions such that the total area of each partition is below a certain value and the number of interconnections among the partitions is minimized. A fixed number of generations is used to mark an epoch. At the end of every epoch, each processor copies a subset of its n individuals to its neighbors. This results in each processor having more individuals than the subpopulation size. The processors then select n members required for the next epoch from this pool. The number of individuals exchanged is defined as a parameter, as is the number of generations in every epoch. One-point crossover and the standard mutation operators have been used as the genetic operators.

6.2 The Traveling Salesman Problem The advantages of hybrid GAS over standard GAS are examined in [14], where the algorithms are tested on the traveling salesman problem using two local search techniques: simulated annealing (SA) and tabu search along with the standard GA. The basic algorithm consists of the following steps: (1) Get the initial population of N different tours. (2) Run SA on the population to get some local solutions (solutions which are the best among all their neighboring solutions).

186

SRILATA RAMAN AND L. M. PATNAIK

(3) Run tabu search on the population to get some local solutions. (4) Run GA for one generation. ( 5 ) Repeat the above steps until the termination criteria are met. To prevent bias towards a single local search, two local search techniques are used together in the algorithm. The members in the population are represented by an array of names of cities such that each city is connected to its neighboring ones. A heuristic greedy crossover operator is used. In this case, to generate the offspring, a randomly chosen city forms the starting point of the offspring’s tour. The distances of the cities connected to this one, in each of the parents, are examined. The offspring’s tour is then extended by taking the shorter of these distances. If this creates a cycle, the next edge in the offspring’s tour is chosen randomly. The main issues in this hybrid algorithm are the tabu list conditions and the tabu list size. Several tabu conditions like imposing a restriction that one city must be visited before another or a fixed position for a city in the tour, have been proposed. It has been found that the tabu list has to be small in the case of highly restrictive tabu conditions. If its size is too small, cycling would result in the solutions being pulled back into the local optima; too large a size would move the solutions away from the global optimum during the later stages of the search. The experiments also demonstrate that the quality of the solution obtained by simulated annealing depends a lot on the cooling schedule used. If the schedule is carefully designed, SA finds better solutions than GA or tabu, but finding the optimum schedule is found to be computationally expensive. tabu search has been found to converge to solutions, though suboptimal, faster than both GA and SA. Much of the effectiveness of tabu search depends on the heuristically determined tabu conditions. The hybrid GA outperforms all the other methods used individually. When both the local search techniques are used, the performance improved considerably. The experiments have shown that GA + tabu + SA found the optimum route in the 100 city TSP every time it was run [14]. When run alone, none of the algorithms was able to find the global optimum even once. The known optimum of the TSP is 21 247 miles [30]. The programs were executed on a Sun4/75 computer using the C programming language. Though GA + S A + tabu finds the result in lesser generations, more time is spent in each generation refining the solution, thus increasing the time taken by the algorithm to converge to the optimum solution. The TSP has been solved using the ant system and has been compared with other heuristic techniques like tabu search and simulated annealing [8]. The TSP is represented as a complete graph. In the algorithm used, called the ant cycle, at any given town, the next town is chosen depending on the

OPTIMIZATION VIA EVOLUTIONARY PROCESSES

187

distance between the two towns and the amount of pheromone trail on the edge connecting the two towns. Once the tour is completed, a substance called trail is laid on all the edges visited by the ant. The ant is forced to make legal tours only with the help of a tabu list associated with each ant. This list contains all the cities visited by the ant so far, preventing the ant from visiting them again in the current tour. The number of ants used is equal to the number of cities in the tour; initially each ant being placed in one city. Once the tour is complete, the tabu list is emptied and the process is repeated. A balance is achieved between a greedy heuristic and the quantity of pheromone found on the edges of the graph connecting the cities. The greedy heuristic says that close towns must be visited with a high probability. In two variants of the algorithm, the pheromone trail is laid as soon as the ant makes a transition. In one of them, ant density, the amount of trail, is a fixed quantity while in the other, ant quantity, the quantity of pheromone is inversely proportional to the distance between the two cities. The algorithm is controlled by four parameters: the importance that should be given to the pheromone trail while selecting the next city in the tour; the importance that should be given to the heuristic when deciding the next city in the tour; (3) the persistence of the trail (how long the trail lasts); (4) the quantity of the pheromone deposited on the edges of the graph. All the parameters are used as probabilities to calculate the next city that the ant should visit.The algorithms were tested on the Oliver30 problem P61. Ant cycle has been found to give better results than ant quantity and ant density. This is explained by the fact that since in both ant quantity and ant density, pheromone is laid as soon as a transition is made, both the algorithms use local information to determine the quantity of the pheromone as opposed to global information used by ant cycle where the pheromone is laid only when the tour has been completed. The performance of the algorithm has been compared with special purpose heuristics designed for the TSP and also with general purpose ones like tabu search (TS) and simulated annealing (SA). The results show that the ant system performs better than the other algorithms,

6.3

VLSl Design and Testing Problems

The genetic algorithm is used in VLSI test case generation [21]. The GA finds the optimum set of test vectors that locate all the faults in the circuit.

188

SRllATA RAMAN AND L. M . PATNAIK

Faulty circuits are separated from fault-free ones by the different responses they produce to the same inputs presented to the circuits. The input to the circuit, in the form of 0s and Is, is directly used as the coding for the population, that is, each bit in an individual’s string represents the value of one of the inputs to the circuit. Since the inputs can take only two values, zero or one, the individual’s string is defined over a binary alphabet. Faults are detected by simulating the correct and faulty response of the circuit to a random input test vector. Once a fault is detected, it is removed from the list. This process of detecting and subsequently removing a fault is repeated until a sufficient percentage of the faults have been detected. The simple genetic algorithm (SGA) and the adaptive genetic algorithm (AGA) were used to solve the problem and compare the results. Scaling of fimess values, proportional selection and parameterized uniform crossover (uniform crossover parameter P,, = 0.05) were used in both the algorithms. It was observed that the AGA outperformed the SGA on all circuits, in some large circuits requiring only half the generations to find the result. The AGA’s performance is compared with Lisanke’s approach [16], which generates pseudo-random vectors without any correlation between successive vectors. The results clearly show the better performance of the AGA compared to Lisanke’s method. The problem of GA-based routing has been addressed in [19]. Different models of GAS are suggested to solve the problem. The main idea stressed is the use of intelligent, problem-specific genetic operators. The solutions are represented as graphs and operators that take advantage of this representation are developed. Different mutation and crossover schemes are proposed to solve the problem. Among the diEerent mutation and crossover schemes, one is selected probabilistically at runtime. Deterministic solution refinement scheme is also used after the termination of the GA to try and improve the result. GAS have been applied for the design of multi-chip modules (MCMs) [22]. The entire design process is split into three stages and at each stage, GAS are used to find the optimum solution. During the partitioning stage, the components must be assigned to various chips, with all the chips being finally placed on the same MCM. Each chip has its own constraints which must be satisfied. During placement, the chips must be allocated to slots on the chip layer substrate of the MCM so as to reduce the wiring length and get an even heat dissipation. In layer assignment, the connections between the components must be optimally distributed over a minimal number of layers in the MCM. A standard GA with a non-linear increasing objective function is used in the design process. The function is of the form A

OPTIMIZATION VIA EVOLUTIONARY PROCESSES

189

where A and B are constants and f ( x ) is a function of the variables to be optimized. The lower the value of f ( x ) , the better is the result and consequently the higher is the value of the objective function. The algorithm is tested on some benchmark circuits and the results have been compared with those of simulated annealing (SA). Genetic algorithms have demonstrated their superiority in solving partitioning problems [49]. A novel adaptive genetic algorithm-based partitioning scheme for MCMs integrates four performance constraints simultaneously: pin count, area, heat dissipation, and timing [50,51]. A similar partitioning algorithm based on evolutionary programming has also been proposed [50]. Experimental studies demonstrate the superiority of these methods over the deterministic Fiduccia Mattheyes (FM) algorithm and the simulated annealing technique. The adaptive algorithms yield improved convergence properties. The placement results of both SA and GA are found to be comparable in all the cases [22]. In layer assignment, the results of the genetic algorithm are compared with those of SA and a deterministic algorithm [22]. It is found that for large circuits, SA performs poorly when compared to the other algorithms; the results are roughly identical for small circuits.

6.4

Neural Network Weight Optimization

Training of large neural networks using GAS has been reported in [27]. Three major implementation differences exist between GAS that can optimize the weights in large artificial neural networks (requiring coding of more than 300 bits) and those that cannot. In those GAS that can optimize the weights: (1) encoding is real valued rather than binary; (2) a much higher level of mutation is used; ( 3 ) a smaller population is present, the implication of which is reduction in exploration of multiple solutions for the same network.

The GA used in [27] is a variant of GENITOR [25] which uses one-at-atime recombination and ranlung. Hill-climbing is also used in the algorithm. The algorithm is tested on two examples and the results are compared with those from the back propagation algorithm. The back propagation algorithm is a gradient descent method that uses the derivative of the objective function to descend along the steepest slope on the error surface to get to the minimum. For a neural network that adds two-bit numbers, the genetic hill-climber converges to a solution in 90% of the runs. Search times are roughly comparable with but not superior to back propagation with momentum.

190

SRILATA RAMAN AND L. M. PATNAIK

Another example used to test the GA is a large signal detection network. The network identifies a signal pulse in one of the several channels that spans a frequency range. The problem is complicated by the following facts: (1) A valid signal causes fake signals to appear in surrounding channels. (2) More than one valid signal exists simultaneously across multiple channels. Three hundred training examples and several thousand test examples were used. The results are comparable to those of back propagation. Mutation is used as a hill-climbing operator. If, after mutation, a better solution is obtained, the change is recorded; else mutation is continued. As generations proceed, the population is s h r u n k until only one member is left. After this, mutation remains as the only operator since crossover cannot be used on a single solution. By using this method, it is seen that though the speed of the algorithm consistently improves, the rate of successful convergence decreases. Training of neural networks by GAS is also reported in [21]. Here three examples are used in order to test the algorithm:

(1) A neural network to realize the exclusive-or function. It has 5 neurons and 9 weights. (2) A neural network to output a 4-bit parity. It has 4 inputs, 1 output, 9 neurons, 25 weights and 16 input patterns. The output is 1 for an odd number of ones in the input. (3) A neural network for encoding and decoding. It has 10 inputs, 10 outputs, 25 neurons, 115 weights and 10 input patterns. The input is encoded and decoded such that the output of the network is the same as the input. The results show that the better performance of the adaptive GA (AGA) becomes more noticeable as the problem size increases. It is also seen that the AGA does not get stuck even once in a local optimum. Training of neural networks using genetic programming (GP) is explained in [15]. This class of problems is different from other problems solved by GP in the sense that the solution trees generated have to possess a certain structure that corresponds to a neural network. Since any network cannot be classified as a neural network, the operators have to always maintain legality of the programs that are generated. The GP algorithm not only optimizes the weights, but also finds an optimal architecture for the network. The first step in finding the solution is to model the network as a tree. Some simple rules, which when applied recursively can be used to construct a tree which represents a neural network are described. The operators are designed to preserve the characteristics of the generated trees.

OPTIMIZATION VIA EVOLUTIONARY PROCESSES

191

6.5 The Quadrature Assignment Problem The quadrature assignment problem (QAP) has been solved using evolutionary strategies [18]. A QAP of order n is the problem that arises when trying to assign n facilities to n locations. The QAP is modeled as an integer programming problem. It has been shown that the QAP is NP-hard. The problem is represented using two matrices: 0 0

D specifies the distance between the locations. F specifies the flow of material, information, etc. between the locations.

The principal diagonal of both the matrices is 0. The method used here is (1, n)-ES: n children are created by copying the parent and then randomly swapping integer values on the string via mutation. Recombination is not employed. The parent is not allowed to compete for the next generation. This is reported to be better than when the parent also competes for survival. The number of swaps during mutation is randomly chosen to be one or two. The best child obtained becomes the parent for the next generation. If the child is not better than its parent, a counter is incremented; else it is reset to 0. When the counter reaches a predetermined value, some non-standard operator is applied in order to shift the focus of search to a new region of the state space and to escape from the local minimum. The ant system applied to the QAP is discussed in [8]. The algorithm is run on some standard problems described in [29] and the results are compared with those of other algorithms. It is seen that the ant system along with nondeterministic hill-climbing is able to find the best solution to all the test problems. The QAP has also been solved using an evolutionary approach in [32,33I. In both the approaches, local search methods are used to try and improve the results after every generation. The algorithms are implemented on parallel systems.

6.6 The Job Shop Scheduling Problem (JSP) In the JSP, n jobs must be processed by m machines. Each job must be processed on a specific set of machmes in a particular order and each machine takes a given amount of processing time. The JSP is to find a sequence of jobs in each machine so as to minimize an objective function. The objective function takes into account the total elapsed time to complete the job, the total idle time of the machmes and the specified due date of completion of the job. Scheduling in a production plant is essentially the job of finding an optimal way of interleaving a set of process plans (a process plan consists of a set of instructions to process a job) so as to share the resources. Given a job, there may be a large number of valid process plans. Thus, the optimizing algorithm must not find optimal process plans and optimal schedules in isolation of each other as some optimal process plans may cause bottlenecks in a schedule, leading to a sub-optimal schedule.

192

SRILATA RAMAN AND L. M. PATNAIK

The coevolution model is used to solve the JSP [41]. In this model, each population represents a feasible process plan for a particular job. The fitness of the individuals is calculated based on the resources shared between them. Thus, an optimal schedule is also found in this process without actually having to include a separate stage to find the optimal schedule. Another population of arbitrators whose main job is to resolve conflicts among population members with respect to the resources shared among them is maintained. The more the conflicts resolved by an arbitrator, the higher is its fitness. Each arbitrator consists of a table which specifies which population must be given precedence if a conflict occurs. The members of each population are spread over a twodimensional toroidal grid. A local selection method allows competition only among members in the neighborhood. Good results are reported for problems with up to ten jobs [41].

7. Comparison of Search Algorithms The salient features of the different algorithms mentioned in this chapter for optimization, are presented in Tables I and II. TABLEI STRLJ~~URES USEDIN THE SEARCH PROCESS Algorithm

Structure

GA GP

Population of fixed length strings Population of a hierarchical compositionof functions A single point in the state space A real-valued vector A vector of weights A domain specific structure

Hill-climbing ES NN SA

TAFJLE I1 OPERATORS THAT MODIFY THE STRUCTURE

Algorithm

Operations

GA

Selection, crossover, mutation Selection, crossover Gradient information Gaussian mutation Error measure or delta rule Domain-specificmethod

GP Hill-climbing ES NN SA

OPTIMIZATION VIA EVOLUTIONARY PROCESSES

193

8. Techniques to Speed up the Genetic Algorithm Since GAS are computationally intensive, even small changes in the algorithm leading to substantial savings in computation time are desirable. Some of the more common methods for speed-up are listed below:

(3) (4)

(5)

(9)

Recalculation of fitness of individuals not affected by mutation or crossover can be avoided. If the evaluation function has trigonometric or logarithmic functions, a look-up table can be used to get the values rather than using a generating series (such as the Taylor series.) For small state spaces, evaluation can be a look-up process. Complex calculations can be simplified and approximated if very accurate answers are not required. If the algorithm is being timed, unnecessary output (graphics or printer output) can be removed from the program. Programs can be compiled and optimized for speed. Finding the correct selection procedure saves time (rank-based procedures require sorting of the individuals in the population). Since in GA problems most of the computation time is spent in evaluating the individuals, even small improvements in the evaluation function greatly speed up the algorithm. Repeated access of secondary storage for every generation must be avoided, especially in a multiprogrammed environment.

9.

Conclusions

Though evolutionary concepts have yielded attractive results in terms of solving several optimization problems, there are many open issues to be addressed. Notable among them are: (i) the choice of control parameters; (ii) the characterization of the search landscape amenable to optimization; (iii) the exact roles of crossover and mutation; (iv) convergence properties. In recent years, such computing paradigms are emerging as independent disciplines, but they demand considerable work in the practical and theoretical domains before they are accepted as effective alternatives to several other optimization techniques. This article is aimed at providing a synthesizing overview of the several issues involved in the design of efficient algorithms based on evolutionary principles. The examples discussed in the chapter unfold the promise such techniques offer. It is hoped that the number and diversity of the applications will expand in future.

194

SRILATA RAMAN AND L. M. PATNAIK

Future developments in this significant area, among other things will be directed more towards the design of hybrid systems which have an association of evolutionary techniques and other optimization algorithms. A typical example is a combination of genetic algorithms and neural networks or expert systems. The underlying principle behind such hybrid algorithms have been highlighted in this chapter. “Best things come from others”, this optimization hopefully lies behind the further success of this significant area. REFERENCES 1. Atmar, W. (1994). Notes on the simulation of evolution. fEEE Transactions on Neural

Networks 5(1), 130-147. 2. Baluja, S. (1994). Population Based Incremental Learning: A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning. Technical Report CMU-(2-94-163, Camegie Mellon University, Pittsburgh, June. 3. Back, T., and Hoffmeister, F. (1991). Extended selection mechanisms in genetic algorithms. Proceedings of the 4th International Conference on GAS,Morgan Kaufmann, San Mateo, CA. pp. 92-99. 4. Bertoni, A., and Dorigo, M. (1993). Implicit parallelism in genetic algorithms. Artifzcial Intelligence, 61, 307-314. 5. Bramlette, M. F. (1991). Initialization, mutation and selection methods in GAS for function optimization. Proceedings of the 4th International Conference on GAS, Morgan Kaufrnann, San Mateo, CA, pp. 100-107. 6. Cohoon, J. P., Martin, W. N., and Richards, D. S. (1991). A multipopulation genetic algorithm for solving the K-partition problem on hypercubes. Proceedings of the 4th International Conference on GAS,Morgan Kaufmann, San Mateo, CA, pp. 244-248. 7. Davidor, Y. (1991). A naturally occurring niche and species phenomenon: the model and first results. Proceedings of the 4th International Conference on GAS,Morgan Kaufmann, San Mateo, CA, pp. 257-263. 8. Dorigo, M., Maniezzo, V., and Colorni, A. (1996). Ant system: optimization by a colony of co-operating agents. IEEE Transactions on Systems, Man and Cybernetics, 26(1), 29-41. 9. Fogel, D. B. (1994). An introduction to simulated evolutionary optimization. IEEE Transactions on Neural Nehvorks, 5(1), 3-14. 10. Goldberg, D. E., Deb, K., and Korb, B. (1991). Don’t worry, be messy. Proceedings of the 4th International Conference on GAS, Morgan Kaufmann, San Mateo, CA, pp. 24-30. 11. Grefenstette, J. J. (1986). Optimization of control parameters for GAS.IEEE Transactions on Systems, Man and Cybernetics, 16(1), 122-128. 12. Back, T., Hoffrneister, F., and Schwefel, H.-P. (1991). A survey of evolution strategies. Proceedings of the Fourth International Conference on GAS, Morgan Kaufmann, San Mateo, CA, pp. 2-9. 13. Hoffmeister, F., and Back, T. (1992). Genetic Algorithms and Evolution Strategies: Similarities and Differences. Technical Report No. SYS-1/92, University of Dortmund, February. 14. Kido, T., Takagi, K., and Nakanani, M. (1994). Analysis and comparisons of GA, SA, TABU search and evolutionary combination algorithms. Informatica, 18(4), 399-410.

OPTIMIZATION VIA EVOLUTIONARY PROCESSES

195

15. Koza, J. R. (1993). Genetic Programming On the Programming of Computers by Means of Natural Selection, MIT Press, Cambridge, MA. 16. Lisanke, B. F., De Gaus, A,, and Gregory, D. (1987). Testability driven random testpattern generator. IEEE Transactions on CAD, CAD6, 1082- 1087. 17. Mansour, N., and Fox, G. C. (1991). A hybrid genetic algorithm for task allocation in multicomputers. Proceedings of the 4th International Conference on GAS, Morgan Kaufmann, San Mateo, CA, pp. 466-473. 18. Nissen, V. (1994). Solving the quadrature assignment problem with clues from nature. IEEE Transactions on Neural Networks, 5(1), 66-72. 19. Prahalada Rao, B. P. (1994). Evolutionary Approaches to VLSI Channel Routing. Ph.D. Dissertation, Indian Institute of Science, Bangalore. 20. Spears, W. M., and De Jong, K. A. (1991). On the virtues of parameterized uniform crossover. Proceedings of the 4th International Conference on GAS, Morgan Kaufrnann, San Mateo, CA, 230-236. 21. Srinivas, M. (1993). Genetic Algorithms: Novel Models and Fitness Based Adaptive Disruption Strategies. Ph.D. Dissertation, Indian Institute of Science, Bangalore. 22. Vemuri, R. (1994). Genetic Algorithms for Partitioning, Placement and Layer Assignment for Multi Chip Modules. Ph.D. Dissertation, University of Cincinnatti. 23. Vose, M. D., and Liepins, G. E. (1991). Schema disruption. Proceedings of the 4th International Conference on GAS,Morgan Kaufmann, San Mateo, CA, pp. 237-242. 24. Ribeiro Filho, J.’L., Trelevan, P. C., and Alippi, C. (1994). Genetic algorithm programming environments. IEEE Computer, June, 28-43. 25. Whitley, D., and Kauth, J. (1988). GENITOR: a different genetic algorithm. Proceedings of the 1988 Rocky Mountain Conference on Artificial Intelligence, pp. 118- 130. 26. Whitley, D., Starkweather, T., and Fuquay, D. (1989). Scheduling problems and traveling salesman: the genetic edge recombination operator. Proceedings of the 3rd International Conference on GAS,Morgan Kaufmann, pp. 133- 140. 27. Whitley, D., Dominic, S., and Das, R. (1991). Genetic reinforcement learning with multilayer neural networks. Proceedings of the 4th International Conference on GAS, Morgan Kaufmann, San Mateo, CA, pp. 562-569. 28. Cohoon, J. P., Hedge, S. U., Martin, W. N., and Richards, D. (1988). Distributed Genetic Algorithms for the Floor Plan Design Problem. Technical Report TR-88-12, School of Engineering and Applied Science, Computer Science Department, University of Virginia. 29. Nugent, C. E., Vollmann, T. E., and Ruml, J. (1968). An experimental comparison of techniques for the assignment of facilities to locations. Operations Research, 16, 150- 173. 30. Smith, J. M. (1982). Evolution and The Theory of Games, Cambridge University Press, Cambridge. 31. Holland, J . H. (1975). Adaptation in Natural and Artificial Systems. Ph.D Thesis, University of Michigan Press, Ann Arbor, MI. 32. Brown, D. E., Hurtley, C. L., and Spillane, R. (1989). A parallel genetic heuristic for the quadratic assignment problem. Proceedings of the 3rd International Conference on Genetic Algorithms, Morgan Kaufmann, pp. 406-415. 33. Muhlenbein, H. (1989). Parallel genetic algorithms, population genetics and combinatonal optimization. Proceedings of the 3rd International Conference on Genetic Algorithms, Morgan Kaufmann, pp. 416-421. 34. Kunsawe, F. A. (1991). A variant of evolution strategies for vector optimization. In Parallel Problem Solving from Nature, (H. P. Schwefel and R. Manner, Eds), pp. 193-197. 35. Miller, B. L., and Shaw, M. J. (1995). Genetic algorithms with dynamic niche sharing for multirnodal function optimisation. IlliGAL Report No. 95010, University of Illinois, December.

196

SRILATA RAMAN AND L. M. PATNAIK

36. De Jong, K. A. (1975). Analysis of the Behavior of a Class of Genetic Adaptive Systems. Ph.D. Dissertation, University of Michigan, Ann Arbor, Michigan. 37. Mahfoud, S. W. (1992). Crowding and preselection revisited. In Parallel Problem Solving from Nature-2, (B. Manner and B. Manderick, Eds), Elsevier, Amsterdam, pp. 27-36. 38. Culberson, J. (1992). Genetic Invariance: A New Paradigm for Genetic Algorithm Design. Technical Report TR92-02, University of Alberta, Canada, June 92. 39. Dhaeseleer, P. (1994). Context preserving crossover in genetic programming. Proceedings of the First IEEE Conference on Evolutionary Computation, IEEE Press, pp. 256-261. 40. Grefenstette, J. (198 1). Parallel Adaptive Algorithms for Function Optimisation. Technical Report CS-81-19, Vanderbilt University, Computer Science Department. 41. Husbands, P., and Mill, F. (1991). Simulated coevolution as the mechanism for emergent planning and scheduling. Proceedings of the 4th International Conference on Genetic Algorithms, (R. Belaw and L. Booker, Eds), Morgan Kaufmann, San Mateo, CA, pp. 264 - 270. 42. Hillis, W. D. (1990) Co-evolving parasites improve simulated evolution as an optimisation procedure. Physica, D.42, 228-234. 43. Davis, L. (1991) Handbook of Genetic Algorithms, Von Nostrand Reinhold, New York. 44. Rudolph, G. (1994). Convergence analysis of canonical genetic algorithms. IEEE Transactions on Neural Nefworks, S(1). 96- 101. 45. Kirkpatrick, S., Gelatt, C. D., and Vecchi, M. P. (1983). Optimization by simulated annealing. Science, 220, 671 -680. 46. Goldberg, D. E. (1989). Genetic Algorithms in Search. Optimization, and Machine Learning, Addison-Wesley, Reading, MA. 47. Srinivas, M., and Patnaik, L. M. (1994). Genetic algorithms: a survey, lEEE Computer, June, 17-26. 48. Grefenstette, J. J. (1984). Genesis: a system for using genetic search procedures. Proceedings of the Conference on Intelligent Systems and Machines, pp. 161- 165. 49. Raman, S., and Patnaik, L. M. (1995). An overview of techniques for partitioning multichip modules. International Journal of High Speed Electronics and Systems, 6 (4), 539-553. 50. Raman, S., and Patnalk, L. M. (1996). Performance-driven MCM partitioning through an adaptive genetic algorithm, IEEE Transactions on VLSl Systems, 4(4), 434-444. 51. Majhi, A. K., Patnaik, L. M., and Raman, S. (1995). A genetic algorithm-based circuit partitioner for MCMs. Microprocessing and Microprogramming, The Euromicro Journal, 41.83-96.

Software Reliability and Readiness Assessment Based on the Non-homogeneous Poisson Process AMRIT L. GOEL AND KUNE-ZANG YANG Electrical and Computer Engineering Syracuse University, Syracuse, NY

Abstract This chapter addresses the inten-elated issues of software reliability and readiness assessment based on open and closed software problem reports (SPRs). It describes a systematic methodology consisting of the following three steps: use of the Laplace trend statistic to dcterrnine reliability growth or decay, fitting nonhomogeneous Poisson process (NHPP) models and reliability or readiness assessment. The mathematical framework prrtlnent to the Laplace statistic and the NHPP models is discussed at length. SPR data from commercial and military systems are used throughout the chapter for il!ustration and explanation.

1. Introduction and Background . . . . . . . . . . . . . 1.1 Software Reliability . . . . . . , . . . . . . . . . 1.2 Readiness Assessment . . . . . . . . . . . . . . 1.3 Chapter Organization . . , . . . . . . . . . . . . 1.4 Chapter Objective and Reading Suggestions . . . 2. Software Reliability and Readiness Assessment . . , . 2.1 Background . . . . . . . . . . . . . . . . . . . . 2.2 Basic Reliability Concepts and Notations . . . . 2.3 Software Reliability Models . . . . . . . . . . . 2.4 Readiness Assessment . . . , . . . . . . . . . . 3. NHPP and its Properties . . . . . . . , . . . . . . . 3.1 Definitions . . . , . . . . . . . . . . . . . . . . 3.2 Distribution and Conditional Distribution of N ( f ) 3.3 Some Important Properties . . . . , . . . , . . . 3.4 Software Reliability Models Based on the NHPP 4. Trend Testing for Software Failure Data . . . , . . . . 4.1 Background . . . , . . . . . . . . . . . . . . , 4.2 Trend Testing . . . . . . . . . . . . . . . . . . 4.3 Laplace Trend Test . . . , , . . . , . . . . . . .

197 ADVANCES IN COMPUTERS. VOL 45

. .. ..

.

. . .

. . . ,

. ,

. . . . . . ,

. . . . . . . . . . . . . . . . . , . . , ..., . ... .. . . . . .. . . . . ... . .... . . . . . . . . . . . . . . . . .... ....

..

.

,

.

. . . ... ., . ... .. . . . . .. . , . . . . .

... , . .

,

. . .

. .

. ,

. .

. . . . . , . . . . . . , . . . . , . . .... ... .

, . . . . . . . . . . . , . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . .

.. . . . . . . . .

., .. . . . .

.

.

. . . . . . . . . . . . . . .

. . . . . . . .. . . . . . . . .. .. . . . . . ...

i98 i99 ?00

20 1 20 1 202 202 202 205 214 2 14 214 215 216 217 220 220 22 I 222

Copyright 0 1991 hy AcndemiL h e + I td All nghts of irproduction in sny form re\cr?ed

198

AMRlT L. GOEL AND KUNE-ZANG YANG

5. Parameter Estimation for NHPP Models Using Laplace Trend Statistic . . . . . . 5.1 Models and Their Mean Value Functions . . . . . . . . . . . . . . . . . . . 5.2 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Characteristic Points of NHPP Models . . . . . . . . . . . . . . . . . . . . . 5.4 LaplaceTrendStatistic for Estimating Characteristic Points . . . . . . . . . . 6. Software Reliability Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 SSPR Data and Laplace Statistic . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Model Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Reliability Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 . Readiness Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Four Cases of Readiness Assessment . . . . . . . . . . . . . . . . . . . . . . 7.2 Illustrative Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Example Based on Real Data . . . . . . . . . . . . . . . . . . . . . . . . . . 8. Readiness Analysis of a Commercial System to . . . . . . . . . . . . . . . . . . . 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Month 14 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Month 17 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Month 21 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9. Readiness Analysis for an Air Force System . . . . . . . . . . . . . . . . . . . . . 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Assessments at Months 70,75, 80 and 86 . . . . . . . . . . . . . . . . . . . 9.4 Summary of Assessments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10. Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

225 225 226 228 233 234 236 236 239 241 241 242 243 244 244 245 248 251 254 254 254 257 261 263 264

1. Introduction and Background The field of software engineering recognizes its technical limitations and inability to produce correct computer programs that meet the needs of the user and are delivered on time . Software developers and managers are concerned with assessing. throughout the life cycle from requirements analysis to system testing. whether the system will be completed within budget. have the desired quality and be ready for release as planned . Yet. in spite of the tremendous advances over the past twenty-five years in the theory and practice of software development. “many systems are delivered late. over budget and full of errors” [Ze193]. It is the purpose of reliability and readiness assessment. especially during testing stages. to minimize the likelihood of delivering software with an unacceptable level of errors . Successful software development requires a balance between the competing demands of quality. cost and schedule . Balancing these three forces has been termed the software free body problem as shown in Fig. 1 [McK95]. In the last thirty years. much work has been done to understand. model. monitor and control these three prime movers of a software system .

SOFTWARE RELIABILITY AND READINESS

199

Quality

I

cost

Schedule FIG.

1 . Software free body problem.

A whole field of software metrics [Fen91, Bas851 has emerged to deal with many of the issues raised in this context. In fact, software metrics are now routinely used by major software developers to assess and control their products and processes. Cost-related measures are monitored to ensure that the project will be within budget. Regression and other models are employed to estimate cost as a function of size, functionality, development environment, etc. [Goe96]. Metrics related to requirements and design stability, manpower changes, and development progress are used to track schedule. Finally, quality is assessed via complexity, testing coverage and fault profile metrics. A commonly used measure of software quality, especially for readiness or release assessment, is its current reliability. This measure is also used for a variety of other purposes during software development as discussed in [IM90]. Software reliability and its role in readiness assessment are the main topics of coverage in this chapter.

1.1

Software Reliability

A common procedure for determining software reliability is to fit an appropriate stochastic model to the available failure data and, based on this model, determine the current system reliability. Future reliability values are then obtained by analytically evaluating the fitted model as a function of time. A commonly used model that has been found to be useful for this purpose is based on the non-homogeneous Poisson process (NHPP).It was originally proposed by Goel and Okumoto in 1979 [G079b]. Since that time, it has been employed in a variety of environments [Y0083, Ohb84,

200

AMRlT L. GOEL AND KUNE-ZANGYANG

M083, KMS911. In addition to its simplicity, it has very good theoretical properties and practical interpretation. The original model was based on the exponential mean value function, and since then several modifications to the exponential form have been proposed by other authors. Two popular modifications are the delayed S-shaped and inflection S-shaped mean value functions [Y0083, Ohb841. Many investigators have found that at least one of these three mean value functions can be used to describe the failure process in most situations [Kan95]. In some cases, all three are applicable but one may give better results than the others. One major difficulty in fitting these reliability models to failure data is the estimation of model parameters. In general, the estimation equations have to be solved numerically and the results tend to be very sensitive to the initial values chosen for the numerical procedure. In this chapter, we address this problem by first studying the characteristic points of the theoretical mean value functions and their derivatives. Then we derive relationships between the model parameters and the characteristic points. Finally, we use these relationships and the data-derived Laplace trend test to develop guidelines for determining good initial values for the model parameters. This step-bystep procedure provides a systematic, objective and analytically sound approach for software reliability modeling.

1.2

Readiness Assessment

The purpose of software testing is to develop confidence in its correctness. As testing progresses, problems are detected (opened) and corrected (closed) in accordance with established organizational procedures. It is a common practice to keep track of the data about the cumulative number opened and the cumulative number closed. An important decision based on these data sets is to determine whether the software system is ready for operational testing. This determination, of course, is based on the criterion used for readiness. For example, the AFOTEC Software Maturity Evaluation Guide [Air901 provides details of the data needs and assessment approach to be used for Air Force systems. The key criterion for readiness is that causes of all severity 1 and 2 (based on five severity levels, 1 being the most critical) failures be fixed prior to the scheduled operational test and evaluation (OT&E) start date. If the current data indicates that this criterion is not met, an assessment is made to determine the projected time to readiness, which is the time required for resolution of the unresolved problems. The closure or resolution rate used in [Air901 is an average value computed from the total problems resolved up to the present. An equivalent problem in commercial applications is to determine readiness for beta test, readiness for release or readiness for first customer

SOFTWARE RELIABILITY AND READINESS

201

ship. Several studies over the past fifteen years [OG80, OC901 have addressed this problem for both defense and commercial systems. Most of these use a decision rule in conjunction with a software reliability model to predict software readiness. Some other approaches are based on minimizing a predefined cost function [OG80, OC901. In practice, assessment of software readiness is a difficult process which involves, in addition to closure criterion, consideration of factors such as test rate, test completeness, and requirements stability [ASO]. In this chapter, we describe a recent approach [GY95, GHMY961 which uses data on total problems opened, total problems closed and problems remaining open to make readiness assessment. These assessments are made for four cases which are based on two closure rates and two assumptions about future faults to be detected.

1.3 Chapter Organization This chapter is divided into three parts following this introduction. The first part (Sections 2-5) is devoted to the key analytical concepts, models and related results required for software reliability and readiness assessment. Section 2 is a summary of the main reliability and readiness assessment models. Section 3 presents details of the non-homogeneous Poisson process (NHPP), its properties, and its use in software reliability modeling. Section 4 deals with trend testing for software failure data and Section 5 uses the Laplace trend statistic to derive parameter estimates for the NHPP models. The second part (Sections 6 and 7) describes the step-by-step methodology for software reliability evaluation (Section 6 ) and readiness assessment (Section 7). The third part (Sections 8 and 9) details the use of the methodology for reliability and readiness evaluation of a large commercial system (Section 8) and an Air Force software system (Section 9).

1.4

Chapter Objective and Reading Suggestions

The main purpose of this chapter is to present a systematic step-by-step approach for modeling software reliability and evaluating software readiness. An important step in this methodology is an assessment of reliability decay or growth judged by the Laplace trend statistic. This statistic can be used to determine when to start modeling, to choose the “best” NHPP model and to efficiently and accurately determine the model parameters. Readers who want to get a good understanding of the theoretical underpinning of the methodology would need the material in Part 1 (Sections 2-5). Readers who are primarily interested in the step-by-step methodology

202

AMRlT L. GOEL AND KUNE-ZANGYANG

can go directly to Part 2 (Sections 6 and 7) and then read Part 3 for a detailed readiness analysis of two systems. Finally, Part 3 (Sections 8 and 9) should suffice for readers who only want to see the application of the methodology.

2. Software Reliability and Readiness Assessment 2.1

Background

During the past 25 years, a large number of analytical models have been developed to describe the software failure phenomenon. Such models are commonly used for monitoring and predicting the reliability of software as it proceeds along various development stages. The usual procedure is to first study the development environment and past software failure data. An attempt is then made to use this information in selecting a model that seems to be most compatible with the modeled environment. Model parameters are then estimated from the available data using statistical estimation techniques. Very often, several models are considered and the one that provides the best fit, in some specified statistical sense, is chosen. Once a fitted model is obtained, it is used for prediction of quantities such as current reliability, number of errors yet to be found, time to a specified quality goal, etc. In some cases, results from several models are combined to obtain predicted values. A readiness assessment model is basically a decision rule which, in conjunction with a software reliability model, provides a framework for determining whether software is ready for release. The decision is based on a predetermined criterion such as current reliability, number of remaining errors or total cost. In this section we provide a brief summary of the relevant material about software reliability and readiness assessment. First, we describe some basic reliability concepts and notations. Then we summarize various software reliability and availability models. Finally, we give an overview of the various approaches that have been proposed for assessing software readiness, or the so-called release time.

2.2

Basic Reliability Concepts and Notations

Before 1970 most reliability research had been centered on hardware. However, due to the increasing cost of software development and maintenance, focus shifted toward software. Although software and hardware are quite different in nature (for example, software does not degrade physically),

SOFTWARE RELlABlLlN AND READINESS

203

many basic concepts that originated for the study of hardware reliability are also used in the software context. These are summarized below.

2.2.1 Reliability and Mean Time to Failure (MlTF) Let X be a random variable representing the time to failure of a component, and let f ( x ) and F ( x ) be its probability density function (pdf) and cumulative distribution function (cdf ), respectively. Then the probability that the component does not fail for x time units, denoted by R ( x ) , is simply

R ( x ) = 1 - F ( x ) , xao.

(1)

Furthermore, the mean time to failure E ( X ) can be expressed as

Both R ( x ) and E ( X ) are extensively used measures for describing the software failure phenomenon.

2.2.2 Failure Rate (Hazard Rate; Force of Mortality) Another measure which plays an important role in describing the failure characteristics of a component is the (instantaneous) failure rate (hazard rate; force of mortality), denoted by h(x), which is defined as h(x) = lim

P {x <X

Ax-0

x + Ax I X > x } Ax

-

f (x) 1- F(x)

(3)

Given the failure rate h ( x ) , the pdf f ( x ) and the reliability function R ( x ) can be uniquely determined as follows:

and

2.2.3 Expected Number and Rate of Occurrence of failures (ROCOF) Let N ( t ) be the random variable representing the number of failures which occur during (0, t ] . The expected number of failures is denoted by

204

AMRIT L. GOEL AND KUNE-ZANG YANG

m ( t ) , i.e., m ( t ) = E { N ( t ) ] . The rate of occurrence of failures (ROCOF), denoted A ( t ) , is defined as the derivate of m ( t ) ,that is,

In an NHPP context, i.e., N ( t ) is described by an NHPP, m ( t ) is also referred to as the mean value function, while A ( ? ) is also referred to as the intensityfunction.

2.2.4 Software Error, Fault, and Failure The following definitions are commonly used in software engineering literature.

Error. Human action which results in software containing a fault. Fault. A manifestation of an error in software; a fault, if encountered, may cause failure. Failure. An unacceptable result produced when a fault is encountered. Even though these three terms have distinct meanings, they are often used interchangeably in the literature.

2.2.5 Software Reliability Because software often undergoes changes in code or in the operational conditions, it is essential to state the conditions under which the reliability of a software system is defined. One widely accepted definition of software reliability which captures this need is given below.

Software reliability. The probability that the software will not cause failure of a system to perform a required task or mission for a specified time in a specified environment. Since reliability predictions are usually made based on the failure data collected during testing of the software, one must be aware of the possible changes of the operating conditions after the software is released. Nevertheless, as it is implicit in most existing software reliability models, we shall make it a standard assumption that the software is to be operated in a manner similar to that in which the reliability predictions are made. Other quantitative measures such as the number of remaining faults in the software are also commonly used for assessment of software quality. Like the reliability measure, they are also used for determining the readiness of

SOFTWARE RELIABILIW AND READINESS

205

the software. When the reliability measure is not obtainable, these measures provide alternatives to reliability estimation.

2.3

Software Reliability Models

A large number of software reliability models have been proposed over the past 25 years. However, most of these can be grouped according to a few judiciously chosen classes, see e.g. [Goe85], [Mi186], [MI0871 and [RB82]. In this paper, we classify software reliability models as shown in Fig. 2. The first classification into static and dynamic models reflects whether the reliability estimation is independent of time or has a time-based prediction capability. In the former case, reliability estimation is for a fixed point while in the latter, predictions into the future are made based on a stochastic model for the fault discovery history. The former models are useful only for estimation while the latter can be used both for estimation and prediction. The static models can be classified into three categories: viz fault seeding, input domain or complexity metrics. The fault seeding models use fault seeding followed by testing to track the number of seeded and indigenous faults found. Using statistical methods, an estimation can be obtained for the number of indigenous faults and hence for the static reliability. In input domain models, a relative frequency view is used to obtain a current estimate of reliability. The complexity metric models employ a statistical model such as a regression equation or principal component regression to estimate the number of faults or reliability as a function of relevant complexity metrics. Additional details of this class of models can be found in [Goe85], [Tra85] and [GY96]. The stochastic models can be further divided into four classes as shown in Fig. 2, viz, Markov/semi-Markov, non-homogeneous Poisson process (NHPP), order statistics and Bayesian. The Markov/semi-Markov group can be further classified into de-eutrophication, imperfect debugging and availability models. Since NHPP is the main focus of this chapter, it is discussed in detail in Section 3. The rest are discussed below.

2.3.7 Markovl semi-Markov Process Models Models of this class can be described by Markov or semi-Markov processes, and hence the name of the class.

Jelinski- Moranda (JM) De-eutrophication Model and Variations. The Jelinski-Moranda (JM) de-eutrophication model [JM72] is one of the

Software Reliability Models

Static Models

Fault Seeding

Input Data Domain

Stochastic Models

Complexity Metrics

Markov/ Semi-Markov hocess

NHPP

De-Eutrophication

FIG. 2. Classification of Software Reliability Models.

Order Statistics

Availability

Bayesian

SOFTWARE RELIABILITY AND READINESS

207

earliest and the most influential models. The basic assumptions of the JM model are: (1) The initial fault content of the tested software is an unknown fixed constant. (2) The failure rate is proportional to the current fault content of the tested software, and remains constant between failures. (3) All remaining software faults contribute the same amount to the failure rate. (4) A detected fault is corrected and removed immediately. ( 5 ) No new fault is introduced after a fault is corrected. Denoting the initial fault content by N o , and the proportionality constant by Cp, we have N o - ( i - 1) faults remaining after i- 1 faults are removed. Hence, the failure rate between the ( i - 1)th and the ith failures is h(i)=+[No-(i-l)],

i = 1 , 2 ,..., No.

(6)

Then the pdf of X i , which represents the time between the ( i - 1)th and ith failures, is given by

The parameters N o and @ can be estimated using the maximum likelihood method. After the parameters are determined, various software quality measures such as the number of remaining faults and reliability can be computed. Worth noting is that the number of faults detected and removed during (0, f], denoted by N ( t ) , is binomially distributed as

and m(t>

E{N(t)) = N,[l - e-@'].

(91

Moranda [Mor75 ] modified the de-eutrophication process by assuming the failure rate to decrease geometrically instead of decreasing in constant steps. The failure rate of this model has the form h ( i )= k'-'D, where k and D are unknown constants. He further extended the model to include a third parameter 8; that is,

h ( i ) = k'+'D+ 8. The addition of 8 makes the new process a superposition of a geometric de-eutrophication process, which describe the bum-in phase, and a Poisson process that describes the steady state.

208

AMRlT

L. GOEL AND KUNE-ZANG YANG

Many other variations of the de-eutrophication model have been proposed in the past. They basically follow the same assumptions except about the failure rate function. We list these models along with their references in Table I. We note that the models for which the failure rate is constant between two successive failures have exponential time-between-failure distributions and that they can be described by Markov processes. On the other hand, models with time-dependent failure rate do not have exponential time-between-failure distributions; they therefore can be described by semi-Markov processes. One model with general time-dependent failure rate, which does not appear in Table I, was formulated by Shanthhmar [Sha81]. This model assumes that, after n faults are removed, the failure rate of the software is given by W t ,n )= @ ( t ) ( N - n),

(10)

where @ ( t ) is a time-dependent proportionality factor which can be interpreted as the per-fault failure rate. If we denote by N ( t ) the number of faults detected and removed during (0, f ] , then it can be shown, using the forward Kolmogorov’s differential equations, that 0 s i < No,

where F ( t ) = e-IbB(x)dx is the per-fault failure distribution. And it follows that m(t>= E { N ( ~ )=JN,[I -e-lbs(x)”x].

(12)

Shanthhmar’s model reduces to the JM model if @ ( t )is a constant. However, it is not to be taken as a generalization of the linear or parabola DE models shown in Table I. The failure rate of Shanthhmar’s model is defined as a function of t , which is the time elapsed since the beginning of the testing of software, while the failure rate of linear or parabola DE TABLEI A SUMMARY OF THE De-EUTROPHICATION (DE) PROCESS MODELSAND VARIATIONS Model

Failure rate

Reference

h ( i )= @ [No- (i - 113 h ( i )= k ‘ - l D h ( i ) = k ’ - l D+ 0

Jelinski & Moranda [JM72] Moranda [Mor75] Moranda [Mor75]

~

JM Geometric Geometric and Poisson Power Exponential Linear Parabola

h ( i ) = @ [ N o - (i- l ) ] ” h ( i ) = @[e-8‘”r’-L+ 1 1 - 11 h ( x , ) = @ “ , - (i-1)Ixr h ( x , ) = @ [ N , - ( I - l)][-nx,?+bx,+c]

Xie & Bergman [XBSS] Xie [Xie91] Schick & Wolverton [SW73] Schick& Wolverton [SW78]

SOFTWARE RELIABILITY AND READINESS

209

models is defined based on xi,which is the time elapsed since the occurrence time of the (i - 1)th failure.

Imperfect Debugging Models. The JM model and its variations assume that faults are removed with certainty when detected. However, in practice that is not always the case. Quite often software faults are considered as removed while they are actually not removed during the testing process. Or fault removal may lead to insertion of new faults. Several researchers have proposed models that allow the fault correction to be imperfect. As a relaxation to assumption 5 of the JM model, Goel and Okumoto [G079a] proposed an imperfect debugging model in which each detected fault is removed with probability p , l or remains in the software with probability q = 1 - p . With assumptions (1)- (4)of the JM model intact, the fault-removal process was then formulated as a continuous-time Markov chain (or, more specifically, a pure death process), in which the state is the fault content and the transition probabilities p i j from state i to state j , i , j = 0,1,2, ..., N o , are given by

11,

ifi=j=O; p , i f j = i - 1; Pij = q, i f j = i; 10, otherwise. Many performance measures can be derived based on this model. Particularly, it was shown that the reliability function after (i - 1) faults were removed can be approximated by

qx)- " g - p ( ' ~

')I

@I

, i = 1,2, ....

(14)

There are more general Markovian imperfect debugging models. Using the fault content as the state, Kremer [Kre83] presented a birth-death process model which allows the fault content to increase by 1 from inappropriate debugging. The transition probabilities are given by 1, p, p .11. = . q, r, 0,

ifi=j=O; i f j = i - 1; i f j = i; ifj=i+l otherwise.

Or we rather say a new fault is to be introduced to the software with probability 1 - p while an existing fault is removed with certainty, since after debugging the fault is not quite the same as it was before.

210

AMRlT L. GOEL AND KUNE-ZANG YANG

where p + q + r = l . Sumita and Shanthikumar [SS86] further presented a multiple-failure Markov process model. This model allows, by using a transition manix, multiple fault introduction and fault removal, i.e., transitions from state i to state j where 1 i - j 1 > 1 are also allowed. For this model most software quality measures require use of complicated numerical procedures.

Software Availability Models. Trivedi and Shooman [TS75] presented a Markov model for estimating and predicting software reliabihty and availability. In this model the system states are divided into distinct up and down states according to the number of faults remaining in the software and whether the software is operating or not. First, assume that the software is in an up state at time t = 0. When a failure occurs, the system is shut down and enters a down state. The fault which caused the failure is then detected and removed before the system begins to operate again. The operating time and the repair time are both assumed to be random variables described by the failure rate 4( N o- i) and repair rate p ( N o- i ) , where No is the total number of faults in the software and i is the number of failures that have occurred. The reliability of the software when operating at time t with i failure occurrences is ~ , ( x=) e - 9 (No - ')*. (16) By defining software availability A ( t ) as the probability that the software is operating at time t, it can be shown that NO

where PNo-k(t)is the probability that the software is in the kth up state at time t , and ck, and dkiare given by

0, 1,

if; = 0; i f j = 1;

( J! '),

otherwise.

and

dkj=

SOFTWARE RELIABILITY AND READINESS

21 1

Okumoto [Oku79] extended Trivedi's work to incorporate imperfect debugging. Many others also take hardware failures into account; see, for example, Goel and Soenjoto [GS81], Sumita and Masuda [SM86], Goyal and Lavenberg [GL87], and Othera et al. [0'90]. Availability models are widely used in hardware reliability in order to obtain cost-effective replacement policies. However, as quoted in Xie [Xie91], the application of these models is limited for a software system since the up and down states of software are not obvious, e.g., software can still be used while it is being repaired.

2.3.2 Order Statistics Models These models also originated from the study of hardware reliability. Cozzolino [Coz68] presented a model which he called the initial defect model for a repairable hardware system. This model is based on the following assumptions: (1) Each new system has an unknown Poisson distributed number, N o , of initial defects. (2) Each defect independently has a constant failure rate 4, i.e., the perdefect failure density is given by f( x ) = @e-@ x . (3) When a failure occurs, the defect causing it will be discovered and repaired perfectly, i.e., the defect will never reappear. (4) The time to repair is negligible. With these assumptions, it was shown that this model yields an NHPP with intensity function

A(r)

=

E { N o ] 4e-d'.

(20)

To use the model for software, one needs only to translate the hardware defects to software faults. The initial defect model is closely related to the JM model, and the exponential NHPP model ([G079b]) which is to be discussed later in Section 3. It is easy to see that if N o is an unknown constant the model results in the JM model. Furthermore, the model is equivalent to Shanthikumar's model given in 2.3.1, if the defect failure rate is assumed to be a function of t , i.e., if 4 is replaced by @ ( t ) . There are other variations which have been proposed for modeling software reliability. Nagel et al. [NS82] [NSS84] proposed an order statistic model with an infinite initial number of faults and a different failure rate for each fault; that is, denoted by q5 ,, the failure rate of fault i is given by @ , =a/?', 1 s i < w ,

O < B < 1.

(21)

212

AMRlT L. GOEL AND KUNE-ZANG YANG

Adams [Ada841 reported a per-fault failure rate observed by R. W. Phillips of IBM, which is given by

@ i = a i - p , 1 a i < - , 1
2.3.3 Bayesian Models So far the models we have reviewed assume that the model parameters are unknown constants, which are usually estimated using the maximum likelihood method. In the Bayesian models, the model parameters are assumed to be random variables with some prior distributions. Based on the failures observed, the posterior distributions of the model parameters are derived using Bayes theorem. These posterior distributions, together with other model assumptions, are then used to predict various reliability measures. The best-known model of this class is the Littlewood and Verrall (LV) model [LV73], which we describe below. Earlier in this section we had seen that the JM and other similar models assume that the failure rate decreases in some deterministic way as the faults are removed. We have pointed out that, in reality, attempts to remove faults from software may in fact introduce additional faults; and some imperfect debugging models have been proposed to deal with this problem. In [LV73] Littlewood and Verrall took a different approach. Let X ibe the random variable representing the time between the (i - 1)th and the ith failures; and let h ( i ) be the failure rate associated with X I . In recognition of the problem mentioned above, they argued that a decrease in the failure rate, i.e., h ( i ) < h(i - l), should be transformed to a probability statement. They assumed that h ( i ) is a random variable with the desired property P [ h ( i )< I ) 3 P ( h(i - 1)< I ) ,

for all 1, i.

(23)

Let f ( x , ) denote the pdf of Xi.It was assumed that f ( x , I h (i)

=

1) = le -'*.

(24) For mathematical tractability, the pdf of h ( i ) , denoted by g i ( l ) , is assumed to be a gamma distribution with shape parameter a and scale parameter q(i),i.e.,

SOFTWARE RELIABILITY AND READINESS

213

It can be shown that, as a function of additional parameters a and v(i),the pdf of X i is

f@;;a , V ( i ) >= l 0 - f ( x ;I h(i) = l ) g i ( l ) dl

To preserve (23), ~ ( iis)assumed to be a monotonic increasing function; it is also assumed to be known although it may differ from software to software. Now, the failure behavior can be described by only one parameter, a. To apply Bayes theorem, a is also assumed to be a random variable with some prior distribution, denoted by p,(a). Suppose the observed times between failures are given by x I ,x,, ...,x,. The posterior distribution of a , denoted by p1( a ) ,based on the failures observed can be obtained by Bayes, theorem as p,(a)=pl(alx17x2, ---,xn) -

1,-

f(x17X2,

...*xrlIa)p,(a)

,

(27)

-..,xnIa)p0(a)da

( ~ i , ~ 2 ,

where f ( x 1 , x 2 ..., , x,I a ) =n:=, f ( x i ; a , v ( i ) )is the joint pdf of X I , X,, ..., X,. By assuming a uniform prior distribution, it was shown that

where

Combining (26) and (28), one can obtain the posterior distribution of Xias

f(G IltIi))= ] o - f ( x i ; a , W ) p * ( a )d a

Letting i = n + 1 in (30), we can obtain the pdf of time between the nth and the ( n + 1)th failures, which can be used to predict reliability. It was shown that the reliability would improve if q ( i )increases more rapidly with i than a linear function of i. Further discussion and methods regarding the choice

214

AMRlT L. GOEL AND KUNE-ZANG YANG

and estimation of q ( i )are given in [LV73]. Some additional discussion of this model can be found in [Lit79, LitSOa, Lit80bl. There are many other models which are based on Bayes’ theorem. Many of them extend existing non-Bayesian models by assuming the model parameters to be random variables with some prior distributions, and use Bayes’ theorem to obtain the posterior distributions of the parameters and to predict reliability. For example, a Bayesian formulation of the JM model can be found in Langberg and Singpurwalla [LS85], Jewel1 [Jew85], and Littlewood and Sofer [LS87]. Kyparisis and Singpurwalla [KS84] also presented a Bayesian formulation of the Duane power law NHPP model. Some other Bayesian models can be found in Thompson and Chelson [TC80], Liu [Liu87], and Becker and Camarinopoulos [BC90].

2.4 Readiness Assessment Readiness assessment is the process of determining whether the software system is ready for release to the next phase. This activity is variously called determination of optimum release time, operational readiness, first customer ship, etc., depending on the objective of the assessment. It is a difficult problem and involves trade-offs between continuous testing to increase reliability and release as soon as possible to decrease testing cost. A number of studies over the past fifteen years have addressed this problem. Most of the published approaches [Oku79, OC90, FS771 optimize an analytical objective function such as reliability, expected number of remaining errors or cost, under specified constraints. Some less analytical approaches [Air901 use a criterion such as that all open priority 1 and 2 problems be closed before software is considered ready for operational testing. Almost all of the published literature uses open problem data for decision-making. One of the earlier papers [FS77] developed release policies that would meet a specified reliability objective. In [Oku79], this was extended to include cost as the criterion under an NHPP failure model. [Y ‘841 extended this model to incorporate a penalty cost for delay, and [Y085a] considered cost minimization subject to a specified reliability goal. [KK83] developed a release policy that ensures a specified failure rate. A good survey of release models is given in [OC90].

3.

NHPP and its Properties

3.1

Definitions

Non-homogeneous Poisson processes (NHPP) have been widely used in the study of both hardware and software reliability. To give the definition of

215

SOFTWARE RELIABILITY AND READINESS

a non-homogeneous Poisson process, we first introduce the concept of a function being o ( h ) .

Definition 1 The function f is said to be o ( h )if

Now, the definition of the NHPP is as follows.

Definition 2 A counting process { N ( t ) ,t > O ] is said to be a nonhomogeneous Poisson process with intensity function A ( t ) , t > O , if the following axioms hold. Axiom 1: N ( 0 ) = 0; Axiom 2: { N ( t ) , t 2 01 has independent increments, i.e., for 0 t , < t , < f 3 , N ( t 2 ) - N ( l , ) and N ( t , ) - N(t,) are mutually independent; Axiom 3: P { N ( t + h ) - N ( t ) = 1 ] = A(t)h + o ( h ) ; Axiom 4: P { N ( t + h ) - N ( t ) 2 2 ) = o ( h ) . An NHPP is often described by its intensity function, A ( r ) , or its mean value function, m ( t ) ,which is defined as m ( t )=

I'

A(x)dx,

(32)

It can be shown (see, for example, [Kos93]) that Axioms 2, 3 and 4 jointly imply N ( t + x) - N ( t ) is Poisson distributed with mean m(t + x) - m ( t ) , for t 2 O and x>O, i.e., k = 0, 1, .... ( 3 3 ) Conversely, (33) in conjunction with Axiom 2 implies Axioms 3 and 4. Therefore, (33) and Axioms 1 and 2 jointly give an alternative definition for the NHPP.

3.2 Distribution and Conditional Distribution of N ( t ) For simplicity we use poim(.; m ) to represent the Poisson density function with mean rn, i.e., poim( k; m )= mk/k!e -'".The distribution of N ( t ) following (33) is P { N ( r ) = k ] = P [ N ( t ) - N ( 0 ) = k ] = poim(k; m(r)).

(34)

However, to make prediction of N ( t + x) at time t , we would like to know the conditional distribution P ( N ( t + x ) = k I N ( t ) = y ) and its mean. By

216

AMRlT L. GOEL AND KUNE-ZANG YANG

Axiom 2 we have P { N ( t + X ) = k I N ( t ) = y ] = P ( N ( t + x )- N ( t ) = k - y I N ( t ) = y } =P

f o r k = y , y + l , ..., and

[N ( t + X ) - N (t )= k -y ]

=poim(k-y,m(t+x)- m(t)),

(35)

E{N(t+x)IN(t)=y= } E"N(t+x)-N(t)]+ [N(t)-N(O)l ) N ( t ) = y } = E ( N (t

+ X ) - N (t ) I N ( t ) = y } + E { N ( t ) - N ( 0 ) I N ( t ) = y }

=m(t+x)-m(t)+y.

3.3

(36)

Some Important Properties

The NHPP is generalized from the (homogeneous) Poisson process in which A ( t ) = A, A > 0, is a constant. Many properties of the Poisson process can be applied to the NHPP. Here we list some important ones. These theorems can be proved based on the relevant results in [Ros93]. Theorem I (Additive property) If ( N , ( t ) ,t * O } and I N 2 ( [ ) ,t 3 O ) are two independent NHPPs with mean functions m , ( t ) and m 2 ( t ) ,respectively, then [ N ( t ) , t 3 0 } , N ( t ) = N , ( t )+ N 2 (t ) , is NHPP with mean function m ( t )= m l ( t )+ m,(t).

The additive property is useful when separate models are used for different severity levels. Theorem 2 (Decomposition property) Suppose that each event of an NHPP with mean m ( t ) is to be classified into either a type-1 or a type-2 event with probabilities p and 1 - p , respectively. Let N j ( t )be the number of type-i events that occur by time I , i = 1, 2. Then { N , ( t ) , t 3 0 } and { N 2 ( t ) ,t 5 O } are independent NHPPs with respective means p m ( t ) and (1 - P").

The decomposition property is useful for incorporating imperfect debugging in the reliability model. Theorem 3 (Order statistics interpretation) Given N ( t ) = n, the n occurrence times T I ,T2,...,T,, have the same distribution as the order statistics from n identical and independent random variables with distribution F ( x )= m ( x ) / m ( t )over the interval [0,t ] .

Theorem 3 can be used for goodness of fit test for reliability modeling, e.g., see [Oku79],or used to relate the NHPP reliability models to the order statistics models.

SOFTWARE RELIABILITY AND READINESS

3.4

217

Software Reliability Models Based on the NHPP

Originally used in the study of hardware reliability, the NHPP-based software reliability models assume that the cumulative number of failures of the system up to time t, N ( t ) , can be described by an NHPP. The most influential model proposed for software reliability estimation and prediction based on NHPP is the Goel-Okumoto exponential ( E D ) NHPP model [G079b]. In this model it is assumed that a software system is subject to failures at random times caused by faults present in the system. Letting N ( t ) be the cumulative number of failures observed by time t, they proposed a model with the following assumptions: (1) The counting process [ N ( t ) ,t z O ) is an NHPP with mean value function m(r), where m(=)= a. (2) Each failure is caused by one fault. (3) All detected faults are removed immediately and no new faults are introduced. (4) For a small At, the expected number of software failures detected during ( t , f + A t ) is proportional to the expected number of undetected faults; that is,

m ( t + Ar) - m ( t ) = h ( a - m ( t ) ) A t + o(Ar),

(37)

where o(At)/At-+O and b is a constant of proportionality. Based on these assumptions and the initial condition m(t ) = 0, it was shown that m ( t ) = u(1 - e-b').

(38)

And the reliability of the software at time t is given by

R ( x I t ) = P ( no failures in ( t , t + x ) ) = P (N ( t

+ x ) - N ( t ) = 0}

The model has physically interpretable parameters. Since the parameter a is the expected number of faults which will eventually be detected, it therefore can be interpreted as the expected initial number of faults. And the parameter b can be treated as the per-fault failure rate. We note that the mean value function of the EXP NHPP model has the same form as of the Jelenski-Moranda (JM) model with a corresponding to N o and 6

218

AMRlT L. GOEL AND KUNE-ZANG YANG

corresponding to 4. However, the EXP MIPP model differs from the JM model in two aspects: (1) In the EXP NHPP model, the initial number of faults N o is treated as a random variable with E I N O ]= a , while in the JM model N o is a fixed number. In the JM model, the times between failures are assumed to be (2) independent of each other, while in the EXP NHPP model the time between failures k - 1 and k depends on the time to failure k - 1. (3) In the EXP NHPP model, the number of faults detected in a time interval ( t , t + XI, N ( t + x) - N ( t ) , is independent of the number of faults detected by time c, N ( t ) , while in the JM model N ( t + x ) - N ( t ) depends on N ( t ) . Originated from different assumptions and motivation, the EXP NHPP model is mathematically equivalent to Cozzolino's initial defect model described in Section 2.3.2. Indeed, by the order statistics interpretation given in Theorem 3, a model described by an NHPP with bounded mean value function m ( t ) can always be realized as a model like Cozzolino's with the pdf of time to failure of each defect being

The EXP NHPP model has a very simple form and very good theoretical and practical interpretation. Since its introduction, many software reliability models based on NHPP have been proposed with different mean value functions; for example, see Yamada et al. [YOO831. They proposed the delayed S-shaped (DSS) model in which the mean value function is given by m(t)=~[l-(l+bt)e-~'];

(41)

and Ohba [Ohb84] presented the inflection S-shaped model (ISS) with mean value function

where c is called the inflection parameter. There are generalizations to the EXP NHPP model too. Goel [Goe82] proposed a generalized exponential model with mean value function r n ( t ) = a ( l -e-b'c), a,b,c>O.

(43)

219

SOFTWARE RELIABILITY AND READINESS

Yamada [Y '861 generalized the EXP NHPP model by replacing the time t by a testing-effort function. The mean value function of the model is assumed to be

m ( t )= a11 - e-bW(r)],

(44)

where W ( t )is the testing-effort spent by time t. Some models which were originally proposed for other applications have been used for software reliability. They include, for example, the power law model ([DuaM, Cro74 I), the modified Duane model (Littlewood [Lit84]), the logistic growth curve model (see e.g. [YOSSb]), and the Gompertz growth curve model (see e.g. [Y085b]). One earlier model proposed for software was presented by Schneidewind [Sch75]. Most models mentioned above are based on calendar time. Models have also been proposed for failures observed as a function of software execution time, rather than calendar time; see, e g , the Musa basic execution time model [Mus71] and the logarithmic Poisson execution time model (Musa and Okumoto [M083]). One advantage enjoyed by the NHPP models over many other reliability models is that the NHPP is closed under superposition (see the additive property in Section 3.3). Therefore it can be used to model TABLEI1

A SUMMARY OF NHPP MODELS Model

Mean value function

Reference ~

Exponential (EXF') Generalized exponential Delayed S-shaped (DSS) Inflection S-shaped (ISS) Power law Modified Duane

Logistic growth curve

Gompertz growth curve Basic execution time Logarithmic Poisson execution time

m ( t ) = a ( l -e-h') m ( t ) = a ( I -e-*') m ( t ) = o [ l - ( 1 +bf)e-h'l U ( I -e-b') m(t) = 1 + ce-"' m ( t )= atfl m(t) =

m(r) =

1

-(&)I

m(a) =

Ohba [Ohb84] Duane[Dua64] & Crow [Cro74] Littlewood [Lit841

Yamada & Osalu [YOSSb]

1+

m ( t ) = ah", m(s)= a(l

~

Goel & Okumoto [G079b] Goel [G079a] Yamadaetal [YO0831

O< c i1 -e-BcK")

In ("'

+

a

Yamada & Osaki [Y085b] Musa [Mus71]

Musa & Okumoto [M083]

220

AMRIT L. GOEL AND KUNE-ZANG YANG

systems with many types of faults or many characteristic modules; see, e.g., Ohba [Ohb84], Yamada et a1 [Y’85] and Kareer et a1 [K’90]. In addition, unlike other models which are based on observed times between failures, the NHPP models can be based on the cumulative failures or fault counts. Table I1 lists the mean value functions of some of the aforementioned models. All parameters given in the table are greater than 0 unless indicated. The calendar time is represented by t while the execution time is represented by z. Note that, although the Musa basic execution time model and the EXP model are mathematically isomorphic, they are based on different measures of time and have differently interpretable parameters. 4. Trend Testing for Software Failure Data

4.1

Background

Software fault data constitute a particular type of stochastic process, called a point process. The rate at which these “points” occur is called the rate of occurrence. As a software system undergoes testing and changes necessitated by it, an analysis of the trend in this rate can provide useful information about its development status. In theory, the trend could indicate all kinds of patterns, such as stationary, non-stationary, cyclic, etc. The main interest, however, is in determining whether the trend is constant, increasing or decreasing. While other factors, such as rate of testing, play an important role in interpreting the observed pattern, in general, a constant, increasing or decreasing trend indicates stable, deteriorating and improving failure rate, respectively. Mathematically, a stochastic point process exhibits a trend if the following conditions are satisfied. For every i 2 1, i < j , x > 0 and independent random variable Xiand Xi, Fx,( X I < Fx,(x)

or F x , ( x ) > Fx,(x),

where F,(x) represents the distribution function of X . Sometimes the independence of X, and Xi is too restrictive an assumption. Then an alternative is to consider that successive failures follow a non-homogeneous Poisson process. Let N ( t ) be the cumulative number of failures observed by time t , then its mean value function is denoted as m(t) = E [N ( t ) ] and the failure intensity as A ( t ) = dm(t)/dt.

SOFTWARE RELIABILITY AND READINESS

4.2

22 1

Trend Testing

There are many trend testing techniques available in the literature. These can be divided into two broad categories, viz, graphical and statistical. Some of these are briefly described below. However, only the Laplace test is discussed in detail since it is the most appropriate one for the software failure occurrence process. Additional details can be found in [AF84], [CL66] [GYP92], [A+93], [Gau921 and [KL941.

4.2.1

Graphical Techniques

There are several graphical procedures which can be used to determine if software reliability is improving (growing) or deteriorating (decaying). Such plots are especially useful for seeking salient features in data and for assessing the adequacy of assumptions underlying the mathematical models being considered for fitting. One commonly used graphical technique is to plot cumulative failures versus cumulative time on linear paper. Such a plot will be concave downwards if the reliability is improving and concave upward if it is deteriorating. This is because in the former case, times between failures are stochastically increasing while in the latter they are stochastically decreasing with time. Another plot is the so-called Duane plot which consists in plotting t / N ( t ) against t on log-log paper, where t is cumulative time and N ( t ) is number of failures up to t. This plot is linear if the failure process is NHPP with a rate A p t y p, A > O , t 2 0 . Other plots use different specialized plotting papers to determine specific failure behavior and can be found in the references mentioned above.

Alternate

Null 0 0 0

Homogeneous Poisson Process (H) Renewal (R) General Stationary Sequence (S)

0 0

Monotonic Trend (M) Non-monotonic Trend (N)

222

AMRlT L. GOEL AND KUNE-ZANG YANG

Amongst the tests which have been proposed for testing the Homogeneous Poisson Process (H) hypothesis versus a monotonic trend, the Laplace test is probably the most well-studied test and has been discussed in several publications such as [CL66], [Gau921, [ISMS91I, [GYP921, and [KL94]. Software failure data are generally obtained as times between failures or failure counts [Goe85]. In the former case, inter-failure times are observed for a period up to T and the Laplace test is based on calculating the Laplace factor for the period [0, TI. In the latter case, data is obtained about the number of failures in each of the failure intervals. Then the Laplace factor is obtained from these failure counts. Based on the material available in the literature and reported experiences with its use for analyzing software failure data, the Laplace test is discussed here in detail. In particular, it has been found that, while other tests give similar results in their ability to detect reliability trend variations, the Laplace test is superior from a theoretical point of view.

4.3

Laplace Trend Test

Two cases are considered in this section, viz, inter-failure time data and failure count data. In each case, the Laplace factor is defined, its distribution under the null hypothesis is discussed and the relevant statistical properties of the test are presented.

4.3.1

Trend Test for Inter-failure Time Data

In this case, software is tested over time [O, TI. Let T,, T2,..., T, be the random variables that represent times to the first, second, and nth failure, respectively. Then T, - T n - ,is the time between the ( n - 1)th and the nth failures. Let S,=C:=, T,, i.e., S, is the sum of times to the first failure, second failure, etc., up to the nth failure. Then the Laplace factor u(T) for this case is defined as follows [CL66, Gau921

nT

s, - u(T) =

L

(nT’/ 12)1’2

If S, is small, it indicates reliability growth and if S, is large, it indicates reliability decay. If the failure process is HPP, the T,s are distributed as independent, ordered random variables with uniform distribution on [0, TI. Using the central limit theorem, u(T) can be approximated by a standard normal distribution. Thus, this test compares the mean failure time to the

SOFTWARE RELIABILITY AND READINESS

223

middle of [O,T]. The u ( T ) statistic can be used to test for the following hypotheses (see [Gau92] for details): (1) no-trend ( H a )vs reliability growth (HI); (2) no-trend ( H , ) vs reliability decay (HI); (3) no-trend (H,) vs any-trend ( H , ) . The practical use of the Laplace test in the context of reliability growth is as follows (see [KL94]): 0 0 0

negative values of u ( T ) indicate reliability growth positive values of u ( T ) indicate reliability decay values of u ( T )around zero indicate stable reliability.

Using a 5% level of significance ( a = 0.05) H a is rejected against the alternative of reliability growth if u ( T )< -1.645, against reliability decay if u ( T )> 1.645 and against any trend if I u ( T )I > 1.96.

4.3.2

Trend Test for Failure Count Data

In this case, numbers of failures are observed in specific time intervals. Let n(1) be the number observed in time unit 1, n ( 2 ) in time unit 2, etc. up to n ( k ) in time unit k. Then, the Laplace factor is given as [AF84, KMS911: k

( k - 1)

Using arguments very similar to those of Section 3.3.1, u ( k ) can be interpreted just as u ( T ) in terms of indication of stable reliability, reliability growth and reliability decay.

4.3.3 Optimality of Laplace Test Gaudoin [Gau92] has summarized the results of his Ph.D. thesis, which studied the statistical optimality of the Laplace test for several software reliability models. The main points relevant to the use of Laplace test in this chapter are described below and details can be found in the above reference. For purposes of evaluating the Laplace test, Gaudoin considered the following three criteria for determining the optimality of a test (see [Roh76]).

224

AMRlT L. GOEL AND KUNE-ZANG YANG

0

Uniformly most powerful (UMP) and uniformly most powerful unbiased (UMPB)-these criteria maximize the power for all alternatives.

0

Locally most powerful (LMP)-this neighborhood of the alternative.

maximizes the power in a

Laplace Test in NHPP Models 0

0

Goel-Okumoto exponential model. This model belongs to a two parameter exponential family. For inter-failure time and for failure count data, the Laplace test is UMPB for testing reliability growth, decay or stability in the Goel-Okumoto model. According to [Gau92] “the Laplace test is the best trend test in every operational situation for the Goel-Okumoto model. This result is very satisfactory, and explains the interest in the test”. Musa-Okumoto Logarithmic Poisson Model. “. .. the model is very restrictive compared to others. We cannot find UMP or UMPB tests. However, the Laplace test has local optimal properties. In practice, this can be good enough because we make a test only when we cannot decide by an eyeball analysis if the reliability has grown or not.. .. “The nature of the intensity function determines whether we can deduce optimal.. . tests”. For an exponential type intensity function, a variant of Laplace test provides the best deterministic test.

Laplace Test in Other Classical Models 0

Littlewood-Verrall Bayesian model. This model is so complicated, even though it is reasonable and justified, that it is quite impossible to say anything about the Laplace test for it.

0

Jelinski-Moranda de-eutrophication model. “There does not exist a UMP test for 8 = 0 versus 8 > 0 or 8 < 0 .... An LMP test ... is the Laplace test. There is a similar test for 8 = 0 vs 0 < 0 but not for 8 = 0 vs 8 # 0. Here again we find the Laplace test. This is remarkable but not surprising, because the Jelinski-Moranda model looks like a discrete version of the Musa-Okumoto model.”

To summarize the above results from Gaudoin, it is clear that the Laplace test is the optimal choice for trend testing in software failure data. It is for this reason that the Laplace trend test is used in this chapter to analyse the failure data for software reliability and readiness assessment.

SOFTWARE RELIABILITY AND READINESS

225

5. Parameter Estimation for NHPP Models Using Laplace Trend Statistic In this section we discuss the problem of parameter estimation for the three commonly used NHPP models, viz, EXP, DSS and ISS.

5.1

Models and Their Mean Value Functions

The three models considered here, their mean value functions m(t), and intensity functions A ( t ) are described below. 0

Exponential model (EXP). NHPP model with exponential mean value function has two parameters a and b, where a is related to the number of faults and b to the fault detection rate. For this model (45)

m ( t ) = a ( l -e-b') and dm (0 A ( t ) = -= abe-b'. dt 0

Delayed S-shaped model (DSS). NHPP model with delayed S-Shaped mean value function also has two parameters a and b. Its m(t) and A ( t ) function are given by m(t) = a ( 1 - (1 + bt)e-b')

(47)

and

0

Injection S-shaped model (ISS). The NHPP model with inflection Sshaped mean value function has three parameters a , b, and c, where c is the inflection parameter. Its mean value function and intensity function are given by

and dm A ( t ) = __ dt

= ab(1

+ c)

e+' (1 + ce-b')2

where c = (1 - r ) / r . Here, r represents the inflection rate which

226

AMRlT L. GOEL AND KUNE-ZANG YANG

indicates the ratio of the number of detectable faults to the total number of faults in software.

5.2 Parameter Estimation Let N ( t ) be the random variable representing the number of failures observed by time r. Let y, be the number of observed failures by time t,, i = 1,2, . .., n. Then PfN(t,) = y 1 , N ( f 2 )= y 2 , ...,N(t, )=y,J

=n

Yr - Y, I ~

~m(t,>-m(t,-Jl

1=1

-

.e

(4,) - mV,

(Y~-Yl-lY

Given observed values y,, y2,.. ., ynr the log likelihood meters is given as

Expressions for estimating the parameters of the three NHPP models can be obtained from equation (51) by first substituting the corresponding expression for m(t,)s and then taking the derivatives with respect to the model parameters. Such expressions for the three models are given below. EXP model. Taking the derivatives of L with respect to a and b and setting them equal to zero, we obtain the equations

a=-

Yn

1-e

-br,,

'

and

Next, on substituting for a from equation (52) in equation (53), we get

The estimate of b can be obtained by numerically solving equation (54) and then a can be obtained from equation (52).

SOFTWARE RELIABILITY AND READINESS

0

227

DSS model. Taking the derivatives of L with respect to a and h and setting them equal to zero, we obtain the equations:

and

The estimate of b is given as one of the solutions of equation (57), which is obtained from equations ( 5 5 ) and (56). As before a can be obtained by substituting for b in equation (55):

0

ISS model. We first assume that c is a known fixed constant. Then, taking the derivatives of L with respect to a and b and setting them equal to zero, we obtain

and -br.

1

C

The estimate of b is given as one of the solutions of equation (60), which is obtained from equations (58) and (59). As before, a can be estimated from equation (58):

The case when c is also to be estimated will be discussed later in this chapter.

228

AMRlT L. GOEL AND KUNE-ZANG YANG

5.2.1 Difficulties in Getting Estimates We have found that equations (54), (57) and (60) corresponding to the three NHPP models may have multiple solutions. Some of the solutions could be misleading. Further, in ISS, the parameter c may not be known a priori. If it is the case, parameter estimation will have to be performed for each value of c. This problem, however, does not arise for the EXP model if the data y,, y2, ...,y, are collected at equi-interval time periods. In this case, equation (54) has either zero or one root. If it has no root, then the EXP model is not applicable for the observed data. The necessary conditions for the applicability of the EXP model have been derived and will be discussed later. For the DSS and ISS models, we have developed a heuristic approach to overcome the estimation problem. The approach uses the Laplace statistic and some characteristic points of the observed data to determine the initial values for the model parameters. Starting from these initial values, a rootfinding procedure can usually yield a good solution. In particular, for the ISS model, the proposed approach is also used for determining the value of parameter c. This eliminates the current need for a trial and error approach to parameter estimation.

5.3 Characteristic Points of NHPP Models In this section we first identify those time points at which the intensity function of an NHPP model attains certain specific characteristics. Then we derive equations that relate these characteristic points to the corresponding model parameters. We then show how these equations are used for estimating model parameters. The first characteristic point, denoted as K,, is that value of t at which dd(t)/dt is positive and maximal. This corresponds to the point of maximal reliabiIity decay. The second characteristic point, denoted as K,,is where d ( t ) is maximal. Normally, it would be the point where the system exhibits a change from reliability decay to reliability growth. Finally, the third characteristic point, denoted as K,, is where dA(t)/dt is negative and maximal. This corresponds to the point of maximal reliability growth. In the following subsections, we discuss the significance and use of these characteristic points for the EXP, DSS and ISS models.

5.3.1 EXP Model For this model, the A ( t ) function is monotonically decreasing. Hence, K,, does not exist. Also, K2and K, both equal zero. In other words, for the EXP model, no characteristic points can be identified. However, the following

SOFTWARE RELIABILITY AND READINESS

229

theorem can be used to determine if this model is applicable to observed failure data. (For proof, see [YG94].) Theorem 4 For the EXP model, if the observed data is collected at equi-interval time periods then equation (54) has at most one root. The necessary condition for the existence of a root is

l
n + l

2 where

r=l

Y n

If the necessary condition is not satisfied, then equation (54) has no root, and EXP is not applicable. Note that V can be written as V = C:= iw,, where w ,= ( yI - y,-,)/y, = zz/ y,) and w, = 1. So V can be viewed as a weighted center of n data points. If the weighted center V is located in the first half of the data points then equation (54) has one root. Otherwise it has no root.

c:=,

5.3.2 DSS Model Recall that for this model

A ( t ) = a(1 - e -b').

(62)

Taking the derivative of L(t) and setting it equal to zero, we obtain

dt

The solution for the above is t = l / b . Therefore, from the definition of K,, we get

K,= l/b or b = l/K2.

(63)

Next, by taking the derivative of dL(t)/dt, and equating it to zero, we get a solution t = 2/b. On substituting this value in the above equation, we

230

AMRlT L. GOEL AND KUNE-ZANG YANG

note that this solution represents a negative and minimal value. Therefore we get

K, = 2 / b or b = 2/K3. For this model, K ,does not exist.

5.3.3 ISS Model We show below that for this model

(c) K2= b

K, =

ln(c) - ln(2 - &) b

Recall that for the ISS model, A(t)

= ab(1

+C)

e- br (1 + c e - b r ) 2'

Taking the first derivative of A ( r ) and setting it equal to zero we obtain -ab2(1

+ c)e-br

I - ce-" ( 1 + ce-br)3.

The solution for the above is r = l n ( c ) / b . So from the definition of K 2 , we get hn(c)

K2 = b

Taking the second derivative of A ( t ) and setting it to zero, we obtain

SOFTWARE RELIABILITY AND READINESS

231

Solving the above, we obtain

and

From the definitions of K , and K , , we get

and

The parameters b and c can be determined by any two of equations (66), (67) and (68). Solving equations (66) and (67) we obtain

b=

ln(2 + K2

J5)

- KI

and

c = exp

(

)

K , h (2 + &>

K2 - K1

Solving equations (66) and (68), we obtain

and c = exp

(

)

K , In (2 + &) K3

- K2

5.3.4 Parameter Estimation Based on Characteristic Points The advantage of characterizing DSS and ISS by K , , K , and K , is that the characteristic points can be estimated by plotting the observed failure data in

232

AMRlT L. GOEL AND KUNE-ZANG YANG

the non-cumulative way. Though these estimates are not based on statistical considerations, we can use them to develop guidelines for determining the initial values of the model parameters which are needed to solve the maximum likelihood equations. In the following we describe how these characteristic points can be used for this purpose.

EXP Model. Though none of K , , K, and K , can be identified for the EXP model, we can view V as an alternative characteristic point and use it to determine, from Theorem 1, whether to apply the model or not. When the model is applicable, simply set the initial value of parameter b to a positive value. A simple foot-finding routine such as the bisection method should always find the correct root, since only one root exists. DSS Model. The initial estimation of the parameter b is based on equation (63) or equation (64).The rules are described below. (1) If K2 is observable from the data, use b = l f K , as the initial value for solving equation (57). (2) If K , is not observable, and K, is observable, set b = 2 / K 3 . Otherwise, set b to a large value.

ISS Model. The initial estimation of parameters b and c is based on equations (69-72). The basic rules are described below. (1) If K , and K2 are observable, use equation (70) to fix c, and use equation (69) as the initial value of b for solving equation (60). (2) If K , is not observable, and K, and K , are observable, then use equation (72) to fix c, and use equation (71)as the initial value of b for solving equation (60). The following relations between characteristic points and r can be derived from equations (70) and (72). These relations can help in choosing values for c when two of K , , K , and K3 are known., They can also used to improve the results of (1) and (2) by choosing different values of c around the initial estimate. In the following cases, when we see that any of K , , K,, or K , are less than 0, it implies that they can not be observed from the data. Case 1: K3 > K , > K , Case 2: K 3 > K ,

5

5

0 >K,*

Recall that c = (1 - r ) / r .

O*O

1 ~

1 ~

3+&’

SOFTWARE RELIABILITY AND READINESS

Case 3: K, >, 0 > K 2 > K,* 0.5 < r <

Case 5: K 3 = K2 = K , EXP model.

5.4

233

1

___

3-45.

r = 1. In this case the ISS model becomes the

Laplace Trend Statistic for Estimating Characteristic Points

While the characteristic points can be observed directly, we have found that the Laplace trend statistic can be used in computer implementation for estimating these characteristic points. We have also found that this statistic is better than estimates of the first and second derivatives from the observed data for this purpose, because it has less random variations. Suppose the observed data are collected at n equi-interval time periods. Then, as mentioned earlier, the Laplace trend factor of the data is given by k

u(k)=

/z

Yk

2

,

k = 2, ..., n.

(73)

A plot of u(k) values versus k is useful in assessing whether the failure data reveal reliability growth or reliability decay based on the following results: (1) If u ( k ) is increasing and >O, it implies global and local reliability decay (RD). (2) If u(k) is decreasing and >O, it implies global RD and local reliability growth (RG). (3) If u ( k ) is decreasing and < 0, it implies global and local RG. (4) If u ( k ) is increasing and < 0, it implies global RG and local RD. The characteristic points can be estimated from u(k) via the following steps. Step 1-Compute u ( k ) Compute u(k) from equation (73) €or k = 2 , ..., n. Set u(0) = u(1) = u(n+ 1)=0.

234

AMRlT L. GOEL AND KUNE-ZANG YANG

Step 2-Compute Set

K , before K , and K,

I-*

otherwise

where S = { q 1 2 s q~ n , u ( q ) > O , u ( q - 1)s u ( q ) and u ( q + u(q)l. Here S can be considered as the set containing all the time periods at which u ( k ) has a positive relative maximum. Step 3-Find Set

K,

- u(qma, - 1) 3 u ( i )- u ( i - ) Here qmaxis the time period at which u(qmaX) foralll
Step 4-Find Set

K,

Here qmlnis the time period at which u(q,,,) for all K2< i < n, and K , < qmm< n.

-

u(q,,, - 1) s u ( i ) - u ( i - 1)

6. Software Reliability Evaluation Failure data during testing is commonly used to assess software reliability. As mentioned earlier in this chapter, many analytical models have been proposed which, under a variety of assumptions, provide one or more measures of reliability. In this section we explain how the results of Sections 3 , 4 and 5 are employed to develop a three-step approach for reliability assessment. The three steps discussed in this section are use of the Laplace trend statistic, model fitting and reliability evaluation. These are illustrated by analyzing software failure data from an army system. The data consists of number of faults per month over a 30-month period of testing (Table 111). A plot of the cumulative fault curve is shown in Fig. 3.

235

SOFTWARE RELIABILITY AND READINESS

TABLE111

SSPR MONTHLY FAULTDATA Time

No. of faults

Time

No. of faults

1 2

20 40 80 90 95

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

740 780 850 920 920 lo00 1150 1260 1460 1560 1640 1680 1700 1710 1720

3

4 5

100 105 110 130

6

7 8

9 10 11 12 13 14 15

150 180 280 490 650 700

#Faults x lo3 1.80

~

1.70 1.40’ 1 .so 1.40

11-

1.30

1.20 1.10

-

~

c

1.00 1 0.90

-~ --

0.80 0.70 0.60 0.50 0.40

..

-

1

I-

‘

1-

--

I I--.

-

0.30 -

-~

0.20 L

-

0.10

c

0.00

r

c

-

0

I

~-

5.00

1000

~

1500

-

I

_1

-

I

+ I

I

-

-

-F,

1

-

-

2000

.~

1

-

2500

3000

FIG 3 A plot of the SSPR cumulative fault data.

Time

236

AMRIT L. GOEL AND KUNE-ZANG YANG

6.1 SSPR Data and Laplace Statistic The Laplace trend statistic u(k) for this data was computed on a monthly basis (see Section 4.3.2) and is shown in Fig. 4. First we note that the u(k) plot exhibits software reliability growth and hence an applicable model can be used to assess reliability. Next, we note that there are three relative maxima in Fig. 4. They are (3,2.74), (14,22.91) and (25,22.5). Using the result in Step 2, Section 5.4, we get an estimate of K, as

Kz =

2.74 x 3 + 22.91 x 14 + 22.54 x 25 2.74 + 22.91 + 22.54

= 18.52 (74) To estimate K, and K,, we note that the conditions for Step 3 and 4 are satisfied for qmax= 13 and qmin= 29, so that we have K,= 12.50 and K3 = 28.50.

6.2 Model Fitting Now, we use the computed values of the characteristic points ( K l ,K, and K 3 ) to determine the NHPP model(s) and the corresponding initial parameter estimates as detailed in Section 5.3. Trend

0.00

5.00

10.00

15.00

20.00

25.00

FIG. 4. SSPR Laplace trend statistic.

30.00

237

SOFTWARE RELIABILITY AND READINESS

EXP Model The value of V for the SSPR data

6.2.1

and n+l

30+1

2

2

-=--

- 15.50

The necessary condition for the EXP model is given in equation (61) and since V = 18.03> 15.50, the EXP model is not applicable for the SSPR data. For illustration purposes, suppose we did not use this test and tried to fit an EXP model anyway. The estimates of a and b from the defining equations (equations (52) and (54)) tend to be 00 and 0, respectively. The plot of the force fitted model is shown in Fig. 5. Obviously, it is an inappropriate model.

6.2.2

DSS Model

Next, we use an initial value of b from equation (63) as 1/K, = 0.054. With this initial value, equations (55) - (57) yield the final estimates as

0.00

5.00

1000

15.00

20.00

25.00

30.00

Time

FIG. 5. Plot of SSPR data and the force fitted EXP NHPP model.

238

AMRlT L. GOEL AND KUNE-ZANG YANG

u = 4312.3 and b = 0.046. A plot of the fitted model is shown in Fig. 6. It should be pointed out that without using the initial value b = 0.054, the rootfinding equation yielded a value of b = 0, not a correct value. Thus we see that the use of the Laplace statistic and the characteristic points avoided an inappropriate model fit and provided a correct estimate of model parameters.

6.2.3 ISS Model First, we estimate r from equation (72) as 1

1 r=-l i c

K2

In (2 + &)

= 0.017.

K2 - K1

Next, the initial value of b is obtained from equation (71) as b=

In(2+J3)

= 0.219.

K2-Ki

Using b = 0.219, equations (60) and (58) yielded parameter estimates b = 0.217 and a = 1869.2. For a better fit, we tried different values of r #Faults x

lo3 actual .. ... 'fiss.'.

0.00

5.00

FIG.

10.00

15.00

20.00

25.00

30.00

Time

6. Plot of SSPR data and the fitted DSS NHPP model.

239

SOFTWARE RELIABILITY AND READINESS

around 0.017. We note that in this case, since K, > K, > K , > 0, the value of r must not exceed 1/(3 + 6). We found r = 0.019 yielded the best result, at which the ML estimates of Q and b based on our proposed method are

a = 1879.09 and b=0.211. Figure 7 shows the fitted ISS model for r = 0.019. We observe that without using the proposed method, one might find other roots for b at b = 0.108 and b = 0.039. The estimation results for the DSS and ISS models are summarized in Table IV.

6.3

Reliability Evaluation

The fitted NHPP model is used next to compute measure as mean time to next failure, number of remaining errors and software reliability. The analytical expressions for these measures were given in Sections 2 and 3 of this chapter, and are used in this section for the SSPR data. A summary of the fitted models for the SSPR data is given in Table IV. The reliability values at t = 3 0 based on the DSS and ISS (r=0.019) models are listed in Table V and shown graphically in Fig. 8. #Faults x

1.40-

lo3

~

0.90

~

4

-

0.80 I

I

~--

~

0.20,

I

I

,-

~I

~.

I

~.

'

I

0.00

5.00

10.00

15.00

20.00

25.00

30.00

FIG.7. Plot of SSPR data and the fitted ISS NHPP model

Time

( r = 0.019).

TABLEIV RESULTS OF PARAMETER EsTIMATION FOR

SSPR DATA ISS

B

EXP

DSS

r = 0.017

r = 0.019

N/A

4312.28 0.046 2592.29 330222.0 5393.1

1869.17 0.217 149.17 177739.4 5521.4

1879.09 0.211 159.09 174403.5 5521.9

6 N SSE

lk

TABLEV RELIABILITY MEASURES AT 1 = 30

DSS model ~

_

_

_

_

_

R(0.002 130) R(0.004 130) R(0.006 1 30) R(0.008 1 30) R(O.O1O 130) R(0.020 I 30) R(0.040 I 30)

ISS model ( r = 0.019)

~

0.872 0.760 0.662 0.577 0.503 0.253 0.064

0.940 0.884 0.831 0.78 1 0.734 0.540 0.292

X

FIG. 8. Reliability functions (R(x I t)) at t = 30.

SOFTWARE RELIABILITY AND READINESS

7.

241

Readiness Assessment

An important objective of tracking problem data is to determine when software would be ready for release. This requires knowledge of open problems, problem closure rate and future failure phenomenon. Based on this, it is possible to calculate when the unresolved problems will reach an acceptable level. Four cases are considered in this chapter as discussed below. The assessments are illustrated via two case studies in Sections 8 and 9. 7.1

Four Cases of Readiness Assessment

The AFOTEC Software Maturity Evaluation Guide [Air901 recommends readiness assessment based on faults remaining open aqd some other relevant factors. The methodology discussed here extends this assessment by providing additional information using trend analyses and reliability modeling. Two types of assessments are proposed, viz, assume no new failures and assume additional failures according to a fitted software reliability model. Further, two estimated closure rates are used for each type. 0

0

0

0

0

Assume No New Failures The closure rate can be taken to be either an Average Closure Rate (ACR) based on closure data to date or a value calculated from a model fitted to closure data, i.e., a Model Closure Rate (MCR). The first one is used in the AFOTEC maturity guide. The second one can be obtained by fitting an appropriate model to total closed data. An appropriate NHPP or some other model for this data can be selected by studying the Laplace trend plot for the total closed data. Assume Additional Failures This involves first estimating the expected number of new failures. It is assumed that new failures will continue to occur according to a fitted software reliability model. Next, the future closure curve is obtained. This can be done by taking an average closure rate up to this point or by fitting a stochastic model to the total closed curve, in the same way as for the “no new failures” case. Thus, in this chapter the following four cases are considered. Case I . No new failures are considered and the problem closure rate is the average closure rate (ACR) to date. Case 2. No new failures are considered and the problem closure rate is derived from a model, i.e., model-based closure rate (MCR). Case 3. New failures occur according to a software reliability model and closures are at ACR.

242 0

AMRlT L. GOEL AND KUNE-ZANG YANG

Case 4 . New failures occur according to a software reliability model and closures are at MCR.

Any appropriate readiness criterion can be used for these four cases. The criterion used in the illustrative examples is that all weighted failures remaining open be resolved (different weights are assigned to each of the five severity level faults [Air90]). For cases 1 and 2 this situation can easily occur since no new failures are being considered. However, for cases 3 and 4 this could happen even when software contains yet undetected faults.

7.2

Illustrative Example

A simple example is used to illustrate the computations for the four cases discussed above. Assume that current month number is 10, total weighted failures discovered to date are 207, and the total resolved is 180. Hence, the weighted failures remaining unresolved in month 10 is 27 and the average closure rate (ACR) is 180/10 = 18 per month. Also, assume that a stochastic model fitted to the resolved data yields a model closure rate (MCR) of 21 per month. Finally, assume that a software reliability model estimates 15 weighted new failures in month 1 1 , 7 in month 1 2 , 2 in month 13, 1 in month 14 and 1 in month 15. Based on these numbers, the readiness assessment for the four cases is done as follows. Recall that for illustrative purposes only, the readiness criterion used is zero remaining unresolved weighted failures. 0

0

0

0

Case I . Time to close 27 problems at ACR of 18 is 27/18 = 1.5 months. Case 2. Time to close 27 problems at MCR of 21 is 27/21 = 1.29 months. Case 3. Calculations are done monthly for ACR of 18 as follows. Month 11 27 (open) + 15(new) - 18(closed) = 24 open Month 12 24(open) + 7(new) - 18(closed) = 13 open Month 13 13(open) + 2 (new) - 18 (closed) Note that the number remaining open will become zero sometime during month 13. Thus, under our simplified assumptions and the readiness criterion of zero remaining unresolved problems, the software will be ready for release during month 13. Case 4 . Calculations are done monthly for MCR of 21 as follows: Month 11 27(open) + 15(new) - 21 (closed) = 21 open Month 12 21 (open + 7 (new) - 21 (closed) = 7 open Month 13 7(open) + 2(new) - 21 (closed) In this case also the number remaining open will become zero sometime during month number 13.

243

SOFTWARE RELIABILITY AND READINESS

Note that in both cases 3 and 4, the number remaining open has gone to zero even though the software reliability model estimates additional undetected faults remain in the software.

7.3 Example Based on Real Data We provide another example to further demonstrate the calculations for the above four cases using actual failure data. For this example, the failures are weighted according to the method used by AFOTEC and are called change points. Figure 9 shows the time to closure of open change points for cases 1 and 2 while Fig. 10 shows these times for cases 3 and 4. These figures are explained below. In this case, at the end of month 75, there are 3083 change points remaining open. The average closure rate (ACR) and the model closure rate (MCR) at month 75 are 339.6 and 254.0, respectively. Using these values, time to close all open change points for case 1 is 9.1 months (3083/339.6) and for case 2 it is 12.1 months (3083/254). The curves labeled Average CR and Model CR show how the number of change points remaining open goes to zero at months 84.1 (75 + 9.1) and 87.1 (75 + 12.1), respectively

040

,

I

000,

-020

I

-

0 20

I

-

70.00

-

-

7500

FIG

-1

8000

I ~

8500

I--

-

9000

-

-

-

9500

Time

9 Readmess analycis not accountmg for new failures

244

AMRlT L. GOEL AND KUNE-ZANG YANG

Change Points x 1O3

L A 75.00

L-l--i-i-__~ 80.00

FIG.

85.00

90.00

95.00

1

100.00

Time

10. Readiness analysis accounting for new failures.

For this data, an NHPP-ISS model provided the best fit. Based on the model, the estimated number of undetected weighted failures at month 75 is 3084. This is the value from which the bottom curve starts. If testing were to proceed beyond month 75, the number of unresolved weighted failures would decrease in accordance with this curve, which goes down to about 260 at month 100. This is so because weighted failures are being detected as testing proceeds leaving a smaller unresolved number. The top two curves start at 6167 (3083 open plus 3084 yet to detect). The closures occur at either the ACR or the MCR rate according to the two curves (broken lines). All open failures are closed by month 91.3(75 + 16.3 months to close) for ACR and by month 98 (75 + 23 months to close) for MCR.

8. Readiness Analysis of a Commercial System t O 8.1

Introduction

In this section, the methodology described in Section 7 is used to perform the readiness analysis of a large commercial software system (data set to) which has been undergoing testing in recent months. The purpose of this analysis is to demonstrate the application of the methodology.

SOFTWARE RELIABILITY AND READINESS

245

Data set to is based on failure data taken from the development of a commercial embedded controller consisting of approximately 2.5 million lines of code, about 80% of which is commercial off-the-shelf code. The controller was developed over a 22-month period. The original plan was to deliver the tO controller for initial operational testing in the final hardware in Program Month 15 and for final operational testing in Program Month 18. For this application, the failure data were first examined in Program Month 14, one month prior to the original preliminary delivery date. Further analyses are then done in Program Month 17 and in Program Month 21. Failure data for this analysis is provided in terms of change points, a weighted sum of the number of cumulative problems for each priority where priority 1 has a weight of 30; priority 2 has a weight of 15; priority 3 has a weight of 5; priority 4 has a weight of 3; and priority 5 has a weight of 1 [Air901. Also, the analysis is presented as if decisions would be made based on that analysis; the analysis results are compared to what actually occurred for this software system.

8.2 M o n t h 14 Analysis Looking at the cumulative number of problems opened each month, we see in Figure 11 at month 14 that there is an increasing number of faults detected, i.e., the curve shows an increasing failure rate at month 14. A total of 9446 change points have been detected so far. The first step in the proposed methodology is to look at the Laplace Trend Statistic (LTS) shown in Fig. 12 for the number of failures detected by month 14. The LTS curve clearly shows a globally increasing failure rate during months 1 through 14, i.e., reliability decay. This indicates that the system is not ready for reliability growth modeling. However, for illustrative purposes only, a model fit was attempted which led to an ISS-NHPP model with a = 23793, b = 0.0159 and r = 0.079. According to this model, 14 347 (23 793 - 9446) change points are yet to be detected and the failure rate at month 14 is 1015 change points per month. The proposed methodology was similarly used to analyse the cumulative number of problems closed through month 14. The curve of the cumulative faults closed data in units of change points indicated a non-constant closure rate. The Laplace trend statistic curve for this data corroborates the result from viewing the cumulative faults closed curve that closure rate is not constant over the period of months 1 through 14 and hence a stochastic model should be fitted to the cumulative faults closed data. Per the methodology, the LTS curve was used for initial estimates of model parameters. From this fitted model it was determined that the closure rate at month 14 is 1074 change points per month.

2 46

AMRlT L. GOEL AND KUNE-ZANG YANG

000

2.00

Time 400

600

800

1000

1200

1400

Rc. 11. Cumulative open data to month

14.

-

Trend 36.00

trend

~~

0.00

2.00

4.00

6.00

8.00

10.00

12.00

Time

14.00

FIG. 12. Trend test for open data to month 14.

2 47

SOFTWARE RELIABILITY AND READINESS

As the third step in the proposed methodology, given the fact that this software was to be delivered for preliminary operational testing in month 15, we analysed the number of remaining problems. Using the estimates from the reliability model developed from the cumulative faults opened data and the stochastic model developed for the cumulative faults closed data, we were able to determine the following:

(1) If undetected faults are not accounted, it is expected to take approximately 1.9 months to close the current problems remaining open based on an average closure rate (ACR) through month 14 of 595 change points per month. In other words, all open problems should be closed by month 15.9 (see Fig. 13, Average CR Curve). (2) If the closure rate associated with the stochastic model for the cumulative faults closed curve (1074 change points per month) was used, the problems currently remaining open would be closed within 1.0 month, i.e., by month 15.0 (see Fig. 13, Model CR curve). ( 3 ) Based on the increasing failure rate shown by the trend test in Fig. 12, the NHPP model for the cumulative failures opened data was used to account for the problems expected to be detected over the next several months. Taking these into account and using MCR of 1074, it was estimated that it would take at least 8.6 months (to Change Points x 1

lo3 Remaining -CR Average CR

so

.. Model

140

130 120

~

110 100-

090

~

~

~

0 80 0 70 060 0 50 040 0 30 0 20 0 10

0000.00

5.00

10.00

15 00

Time

FIG. 13. Readiness analysis at month 14 not accounting for new faults.

2 48

AMRlT L. GOEL AND KUNE-ZANG YANG

month 22.6) to find and close all of the currently opened problems plus all of the problems expected to be found during that time period. Assuming the average closure rate, the time to close the problems currently open plus the undetected problems was 24 months. Figure 14 shows the expected undetected problems based on the NHPP model fitted to the cumulative problems opened data. There would still be undetected problems in the software equivalent to 6500 change points, a large number. This fact, coupled with an increasing failure rate shown by the Laplace trend statistic for the problems detected data at month 14 and the large number of remaining problems to be detected as predicted by the NHPP model, led to a decision that software was not yet ready. Further, software probably would not be ready for final operational testing until month 22.

In reality, based on the number of remaining problems, it was judged that software would not be ready at month 15. As a result, software development was continued.

8.3 Month 17 Analysis The updated actual plan was to deliver the software for preliminary operational testing in month 18. An analysis of the problem data in month 17 Change Points x

lo3

20.00

30.00

40.00

50.00

Time

60.00

FIG. 14. Readiness analysis at month 14 accounting for new faults.

2 49

SOFTWARE RELIABILITY AND READINESS

was performed to see if the software would be ready for preliminary operational testing in month 18. Analyses using the proposed methodology were performed for the data through month 17. First, the cumulative problems detected data were examined (Figure 15). This curve shows an apparent decreasing failure rate starting in month 15 and carrying through month 17. This is borne out further by examining the Laplace trend test curve for cumulative problems opened in Fig. 16, where the trend statistic is decreasing during months 15-17. This indicates a possible decreasing failure rate for the software. Utilizing the trend test statistics the suite of NHPP models used for the prior analysis was fitted to the cumulative problems detected through month 17. The best model was found to be the Inflection-S NHPP with parameters a = 12 020, b = 0.0286, and r = 0.058. The model predicted remaining problems equivalent to 1437 change points (down from the 14 347 change points predicted in month 14) and an estimated failure rate at month 17 of 364.3 change points per month. A similar analysis was performed on the cumulative number of problems closed through month 17. Again the cumulative problems closed curve showed a non-constant closure rate. This was also indicated by the Laplace trend statistic curve. The stochastic model for the cumulative problem closed

1

I

i

I

7 _ _ _ ~1

0.00

500

~~

10.00

-

15.00

FIG. 15. Cumulative open data to month 17.

'

Time

250

Trend

3400 32 00

AMRlT L. GOEL AND KUNE-ZANG YANG

I

300028 00

I

2600240022001 2000~ 1800, 16001400 1200L

1000800

600 400

200 OM3

Change Points x

lo3

13.00 12.00 11.00

10.00 9.00 8.00

7.00 6.00

5.00 4.00

3.00 2.00 1.00

0.00

FIG.17. Open data and fitted NHPP models at month 21

SOFTWARE RELIABILIW AND READINESS

251

curve based on the non-constant closure rate estimated a closure rate at month 17 of 584.4 change points per month. The analysis per Step 3 of the proposed methodology revealed that: (1) Assuming no further testing (and no further problems detected) it would take about 0.3 months to close the problems currently remaining open. This was based on both assuming the average closure rate (Average CR) through month 17 of 611.6 change points per month and the 584.4 change points per month closure rate (Model CR) from the stochastic model for cumulative faults closed. (2) Assuming faults are detected according to the “best fit” NHPP model for cumulative problems detected, it was estimated it would take 0.62 months to close all currently open and expected new problems through that period of time based on the average closure rate (average CR). The corresponding value based on the stochastic model closure rate (model CR) was 0.75 months. The closure curve met the projected undetected curve at month 17.68 in the first case (average CR) and at month 17.75 for the second case (model CR). The undetected problems curve indicated that at month 18 there would be undetected problems remaining in the software with a weighted equivalent of approximately 1120 change points. This would decrease to 850 change points by month 19. The above analysis indicated that the software would be ready for delivery for operational testing in month 19 at the earliest, if 850 open change points was a realistic criterion for release and delivery. Given the decreasing failure rate shown by the trend test for cumulative problems detected coupled with the fact that only one month was estimated to be needed to achieve zero remaining problems, the software would have been judged to be ready for at least preliminary operational testing in month 18. It probably would not have been found ready for final operational testing at that time because there were still 186 change points worth of open problems at month 17, which would be too great for a delivery for final operational testing. In actuality, this was the decision that was made-the software was delivered for preliminary operational testing while further testing was performed to ensure that as many problems were found and corrected as possible (i.e., some additional “hardening” was deemed necessary).

8.4

Month 21 Analysis

A final analysis was performed at month 21. This would have given the software four months of hardening from month 17, which hopefully would have been sufficient.

252

AMRlT L. GOEL AND KUNE-ZANG YANG

Again Steps 1 and 2 of the proposed methodology were performed. Only the results of fitting the suite of NHPP models to the cumulative problems detected data using information from the Trend Test Statistics are shown in Fig. 17. The “best fit” NHPP model for the cumulative problems detected curve was found to be Inflection-S NHPP with parameters: a = 11 717 change points, b=0.2927, and r=0.058. The number of remaining faults was estimated to be 420.1 change points (down from 1437 change points in month 17) and failure rate at month 21 was estimated to be 118.7 change points per month. The cumulative faults detected curve clearly showed a decreasing failure rate in months 15-21, which was borne out by the Laplace trend test curve for cumulative problems detected. In fact, the trend value was negative, decreasing, and less than zero in months 19-21, providing clear evidence that there was a decreasing failure rate at month 2 1. The curves for the cumulative number of problems closed through month 21 and the Laplace trend test for this data showed a non-constant closure rate. Fitting a stochastic model to the cumulative problems closed data, it was estimated that the closure rate at month 21 was 162.1 change points per month. Looking at the number of remaining problems, analysis using the methodology found that: (1) Assuming no further testing and no further problems detected, it would take about 0.54 months to close the problems currently remaining open (assuming the “model” closure rate); using the average closure rate through month 21 of 533.8 change points per month the remaining 87 change points would be closed in 0.16 month (see Fig. 18). (2) Assuming the “best model fit” for detected problems, it would only require about 1.4 months to find and fix all of these new detected problems plus the currently open problems currently remaining open. Assuming the “model” closure rate, only 0.2 months was predicted if the average closure rate through month 21 for this was used. These closures by month 22.4 and 21.2, respectively, are shown graphically in Fig. 19. This also shows that at month 21.2 using the average closure rate there would be undetected problems with a weighted equivalence of approximately 400 change points; at month 22.4 assuming the model closure rate there would be the equivalent of 290 change points undetected in the software. The number of equivalent undetected change points reduces in either case to 180 by month 24.

2 53

SOFTWARE RELIABILITY AND READINESS

Change Points x

1.50

1-

1.40 1.30 1.20

lo3

1

1-

1.10

l.O0 0 90

OXO 070 0.60

t

k

1

0 50 -

0.400.30 0 20 -

O.1°

I

~

0.00

I

I

5.00

10.00

I

15.00

I

20.00

1ime

FIG. 18. Readiness analysis at month 21 not accounting for new faults.

Change Points

*

~t-

500.00 480.00 -I

21.00

22.00

I

23.00

-~

Undetected ................ .......... .. Model CR

........

I-

24.00

I

-

Time

FIG. 19. Readiness analysis at month 21 accounting for new faults.

254

AMRlT L. GOEL AND KUNE-ZANG YANG

From the above analysis one would conclude that given the small number of remaining problems in month 21 (87 change points), the clearly decreasing failure rate for detected problems, the fact that only 1.4 months were needed to close all currently open and expected problems, and the fact that the software only had undetected problems with an equivalent of 320 change points at month 22, the software was of adequate quality in terms of these criteria to be released for final operational testing in month 22. In reality, based on the number of remaining problems by priority and other factors not part of this analysis, the decision was made to release the software for final operational testing in month 22.

9.

Readiness Analysis for an Air Force System 9.1

Introduction

Development data from an Air Force software system are analysed in this section using the methodology of Section 7. The data consist of weighted originated failures and weighted closed failures. The weights are 30, 15, 5 , 3 and 1 for severity levels 1, 2, 3, 4 and 5 , respectively. The time period of data is 86 months. The step-by-step procedure and the analyses are similar to those described in Section 8 for system to. For this reason, only the more important details are included here. A brief description of the data is given in Section 9.2. Guided by the trend statistic curve, analyses and readiness assessments are then done at months 70, 75, 80 and 86 in Section 9.3. A summary of the assessments is presented in Section 9.4.

9.2

Data Description

A graph of the cumulative weighted originated failures, cumulative weighted closed failures and weighted failures remaining open is shown in Fig. 20. These values are called change points and thus the data are cumulative open change points (OCP), cumulative closed change points (CCP) and remaining open change points (ROCP). A cursory study of the OCP and CCP plots in Fig. 20 indicates very little failure activity for the first 25 months. Then there is an almost constant rate of increase up to month 60. This is followed by a convex curve for OCP and an almost straight line for CCP. The ROCP curve seems to be increasing up to month 50 and then remains constant up to month 70. Finally, it shows a decreasing trend up to month 86. A better understanding of their behavior can be gained from the Laplace trend statistics curves in Figures 21 and 22 for OCP and CCP, respectively.

SOFTWARE RELIABILITY AND READINESS

5000

2 55

I

0 rime

FIG.20. Accumulated software changes.

Figure 21 indicates a slight reliability decay and then some growth during the first twenty months. It is followed by stable reliability indication up to month 27, and reliability growth to month 40. Then there are indications of local reliability growth and decay. Starting with month 60, there is strong indication of continuing reliability growth up to the present, viz. month 86. Figure 22 seems to follow a pattern similar to that of Fig. 21. In practice, analysts track the failure phenomenon and management tries to keep up with the failure curve. In other words, as more change points are originated, management tries to ensure that more are closed. As mentioned earlier, readiness assessment is a difficult problem. In addition to the open and closed curves, it may require consideration of test rate, test completeness and requirements stability. Since these items are generally not available, the following assessments are based purely on the behavior of the OCP and CCP plots. Re-examining these plots in light of observations made above, it would seem that readiness assessment could have started with month 60. However, by month 70, there is strong indication of sustained reliability growth. In the following, the results of assessments at months 70, 75, 80 and 86 are briefly summarized.

256

AMRlT L. GOEL AND KUNE-ZANG YANG

Trend

50.00,~~~

. ..

~~

-

,

30.00 20.00

~

-

~

~

i

10.00L

p

. . I

~~

h

0.00

I

0.00

20.00

I Time

I

40.00

60.00

80.00

FIG. 2 1. Trend test for open data.

Trend

130.00 120.00 110.00L 100.00

1-

90.00I 80.00 70.00

1 ~

'

60.00

rI

50.00 *.O0 30.00 '

~

~

20.00 10.00

~

0.00I 0.00

Y

20.00

I

I

40.00

60.00

1 Time 80.00

FIG. 22. Trend test for closed data.

~

~

257

SOFTWARE RELIABILITY AND READINESS

9.3 Assessments at Months 70, 75, 80 and 86 In each case, the Laplace trend statistic curves were studied for total change points, originated and closed. These were used as guides for determining the NHPP model choice and initial parameter estimates as detailed earlier in this paper. After fitting the appropriate models, the best one was selected. The fitted models were then used to estimate the future failure curve and the model closure rate (MCR). The average closure rate (ACR) was computed from the change points remaining open data. The above values were then used to assess readiness. In the analysis given below, the system would be considered ready for release when problems remaining open become zero. The details of similar computations were discussed earlier in Section 7 and were illustrated in Section 8. As mentioned above, these details are not included here to avoid redundancy. The resulting analyses can be summarized graphically in four figures for each analysis month. The first two figures in each case would show fitted NHPP models to open and closed data, the thud problem closure months for cases 1 and 2 and the fourth problem closure months for cases 3 and 4. The figures for each of the analysis months were studied and the results analysed for readiness assessment. Such plots for months 80 and 86 only are shown in Figs. 23-26 and 27-30, respectively. Change Points x

32.00

lo3

I

30.00

28.00 26.00 24.00

rr

22.00 1 20.00, 18.00-I

16.00 14.00 12.0010.008.00

I

6.00j4.00

1

2.00 0.00 I L

000

20 00

4000

60 00

80 00

FIG 23 Open data and fitted model at month 80.

Time

258

AMRlT L. GOEL AND KUNE-ZANG YANG

Change Points x lo3

28.00

1

0.00

20.00

40.00

60.00

80.00

FIG.24. Closed data and fitted model at month 80.

Change Points x lo3

000

20.00

40.00

60 00

80 Do

Time

FIG 25. Readiness analysis at month 80 not accounting for new faults

2 59

SOFTWARE RELIABILITY AND READINESS

Change Points x

lo3 I

~

4.50 4.00

.Undetected M ~ . eCR i ... .................... Average CR

I

3.50 3.00 1

2.50

~

I 2.00 1.50,

I

1.00,

0.50 0.00' t 80.00

90.00

100.00

Time

1 10.00

FIG. 26. Readiness analysis at month 80 accounting for new faults.

I

000

~

~

20 00

4000

60 00

80 00

FIG. 27. Open data and fitted model at month 86.

Time

260

AMRIT L. GOEL AND KUNE-ZANG YANG

Change Points x

28.00

26.00

j

24.00

1

22.00 20.00 18.00

lo3

'~

~

L

16.00

t-

~~

10.00

8.00

-

:I

6.00 4.00

1

2.00

r

0.00

L L-

Time

~

1

(

20 00

60 00

4000

80 00

FIG 28. Closed data and fitted model at month 86.

Change Points x lo3 i-

~

85.00

90.00

95.00

100.00

105.00

l10.00

115.00

Time

FIG.29. Readiness analysis at month 86 not accounting for new faults.

26 1

SOFTWARE RELIABILITY AND READINESS

Change Points x lo3 Remaining Model CR .-.... Average C R

I

I 0.00

20.00

I 40.00

I 60.00

Time

80.00

FIG. 30. Readiness analysis at month 86 accounting for new faults.

9.4 Summary of Assessments Table VI summarizes the results of various analyses at months 70, 75, 80 and 86. It gives the failure closure month (month all remaining open failures are closed) for each assessment month and for each of the four cases. The corresponding values of ACR and MCR are given in parentheses. Thus for case 1 at month 70, the average failures closure rate is 332 per month and all currently open failures should be resolved by month 77.4. For case 4, month 80, the model-based closure rate is 238 per month and current unresolved failures and the failures to be detected should be resolved by month 98.3. A graphical representation of these results is shown in Fig. 3 1 . Some observations from Table VI are summarized below. 0

0

Case I. This represents the situation when no new detected failures are assumed and the average closure rate (ACR) is used to close the remaining open problems. For this data set, the ACR is almost constant. The changes in the month to reach zero remaining open problem in each assessment month is due to the additional new failures detected from the previous assessment month. Case 2. The model closure rate in this case is decreasing for each successive assessment month because of the decreasing closure rate. It

262

AMRIT L. GOEL AND KUNE-ZANG YANG

TABLEVI SUMivlARY OF ANALYSIS ~~~~~~

~

Assessment month

~

70

~

75

80

86 90.4 (349) 94.1 (191) 92.3 (349) 98.8 (191)

Case 1 Case 2 Case 3 Case 4

0

would take longer to resolve the open faults than for Case 1 for each respective assessment month. Case 3. Compared to Case 1 (which also assumes an average closure rate) this case explicitly accounts for the extra time required to resolve the failures to be detected in future months. This is a more realistic situation than Case 1 would represent.

in0

I

I'

+

'case 2 ' 'case 3 '

-+-

'Case

'case

95

4 '

0

x

..

70

.I I

x

X

72

74

76

78 en Assessment Month

82

84

a6

FIG. 31. Graphical representation of readiness assessments at months 70, 75, 80 and 86.

SOFTWARE RELIABILITY AND READINESS

0

2 63

Case 4 . Just as in Case 2, the closure rate is decreasing for each successive assessment month. Hence it would take longer to resolve the problems remaining open than in Case 3 for each respective assessment month.

10. Concluding Remarks This chapter has provided a detailed description of a systematic approach for evaluating software reliability and readiness for release. Such assessment is necessitated, especially during the later stages of testing, to minimize the likelihood of delivering software with an unacceptable level of errors. The basic approach in use today consists of fitting a stochastic model to the failure data and using it to estimate measures such as mean time to next failure, number of remaining errors and reliability. These measures are then employed to determine readiness for release. This chapter presented a comprehensive coverage of the current approaches. Then it described a new methodology that explicitly incorporates both open and closed problem report data during reliability and readiness assessment. The three-step methodology provides a systematic and objective approach for addressing these two important issues. A summary of the underlying assumptions, and the benefits and limitations of the methodology is given below. 0

0

0

Assumptions. One of the key assumptions is that the software failure phenomenon can be modeled via a software reliability model. Although such models are increasingly being used in research and practical applications, use of this methodology explicitly requires that an appropriate model be identified during analyses. This chapter has exclusively used the NHPP models. Having established the failure model, the readiness assessment can be done for a specified set of assumptions. This chapter has illustrated how assessments were done for four distinct cases. Limitations. The methodology is quite general. Its main assumptions are quite reasonable and seem to be consistent with current commercial and government practice. Practical use of the methodology, however, would require a good understanding of the underlying theoretical framework and tool support to perform the necessary analyses. Benefits. The proposed approach provides an objective, analytical framework for performing readiness assessment. The four cases considered can all be used for a given application and resulting assessments compared to select the most appropriate one. Another important benefit of the proposed approach is that it can be easily adapted to be consistent with the approaches in current use.

264

AMRlT L. GOEL AND KUNE-ZANG YANG

ACKNOWLEDGMENTS The material in this chapter has evolved over several years. It includes the results from several funded research projects and reflects the insights gained during discussions with many colleagues. In particular, we would like to thank R. Paul (DoD), J. McKissick (Consultant), B. Hermann (AFOTEC), R. McCanne (Air Force), and A. Sukert (Xerox). We would also like to acknowledge the funding from the Air Force, NASA and the Army over many years of research into software reliability and related topics.

REFERENCES

[Ada841 [Air901 [A'93] W 8 41 [Bas851

[BC90]

[CL66] [Coz68] [Cro74]

[ h a 6 4I pen9 1] [FS77]

[Gau92] [Goe82] [Goe85]

[Goe96]

Adams, E. N. (1984). Optimizing preventive service of software products. IBM J.Res. Develop., 28,2-14. Air Force Operational Test and Evaluation Center (1990). AFOTEC Pamphlet 800-2. Volume 6 , Software Maturity Guide, October. Anderson, P. et 01. (1993). Statistical Models Based on Counting Processes. Springer Verlag, Berlin. Ascher, H., and Feingold, H. (1984). Repairable Systems Reliability: Modeling. Inference. Misconceptions and their Causes, Vol. 7 of Lecture Notes in Statistics. Marcel Dekker, New York. Basili, V. R. (1985). Quantitative evaluation of software engineering methodology. In Proceedings First Pan Pacific Computer Conference, Melbourne, Australia, September. Becker, G., and Camarinopoulos, L. (1990). A Bayesian estimation method for the failure rate of a possibly correct program. IEEE Trans. Software Engineering, SE-16, 1307-1310. Cox, D. R., and Lewis, P. A. (1966). The Statistical Analysis of Series of Events. Methuen, London. Cozzolino, J. M., Jr. (1968). Probabilistic models of decreasing failure rate processes. Nav. Res. Logistics Quarterly, 15, 361-314. Crow, L. H. (1974). Reliability analysis for complex, repairable systems. In Reliabilify and Biometry (F. Proschan and R. J. Serfling, Eds), pp. 379-410. SIAM, Philadelphia. Duane, J. T. (1964). Learning curve approach to reliability monitoring. IEEE Trans. Aerospace, 2,563-566. Fenton, N. (1991). Software Metrics. Chapman & Hall. Forman, E. H., and Singpurwalla, N.D. (1977). An empirical stopping rule for debugging and testing computer software. J. Amer. Statist. Assoc., 72, 250-253. Gaudoin, 0. (1992). Optimal properties of the laplace trend test for software reliability models. IEEE Trans. Reliability, 41(4), 525-532, December. Goel, A. L. (1982). Software reliability modelling and estimation techniques. Technical report, RADC-TR-82-263, October. Goel, A. L. (1985). Software reliability models: Assumptions, limitations, and applicability. IEEE Trans. Sofhvare Engineering, SE-11(12), 1411-1423, December. Goel, A. L. (1996). Software metrics: Analysis and interpretation. Tutorial Notes, IEEE RAMS Symposium, Las Vegas, NV, January.

SOFTWARE RELIABILITY AND READINESS

265

Goel, A. L., and Kazuhira Okumoto (1979). A Markovian model for reliability and other performance measures of software systems. In Proceedings of Not. Comput. Conf., Vol. 48, pp. 769-774, New York. Goel, A. L., and Kazuhira Okumoto (1979). A time dependent error detection [G079b] rate model for software reliability and other performance measures. IEEE Trans. Reliability, R-28(8), 206-211, August. [GS81] Goel, A. L., and Soenjoto, J. (1981). Models for hardware-software system operational performance evaluation, IEEE Trans. Reliability, R-30, 232-239, 1981. [GY95] Goel, A. L., and Yang, K.-Z. (1995). Software maturity assessment for OT&E. Technical report, Syracuse University, September. [GY96] Goel, A. L., and Yang, K.-Z. (1996). Software metrics statistical analysis techniques. Technical report, Syracuse University, January. Goel, A. L., Yang, K.-2.. and Paul, R. (1992). Parameter estimation for NHF'P-DSS [GYP92I models. In Proceedings of Symp. on Interface: Computer Science and Statistics. [GHMY96] Goel, A. L., Hennann, B., McCanne, R., and Yang, K. (1996). Operational testing readiness assessment of an Air Force software system: A case study. In Proceedings of the 2Ist Annual Software Engineering Workshop. NASAIGSFC, Greenbelt, MD, December. [GL87] Goyal, A., and Lavenberg, S. S. (1987). Modeling and analysis of computer system availability. IBM J . Research and Development, 31, 651-664. [IM90] Iannino, A., and Musa, J. D. (1990). Software reliability. In Advances in Computers (M. C. Yovits, Ed.), Vol. 30. Academic Press. Jelinski, Z., and Moranda, P. B. (1992). Software reliability research. In [JM72] Statistical Computer Performance Evaluation (W. Freiberger, Ed.), pp. 465-497. Academic Press, New York. Jewell, W. S. (1985). Bayesian extensions to a basic model of software [Jew851 reliability. IEEE Trans. Software Engineering, SE-11, 1465- 1471. [Kan95 ] Kan, S. H. (1995). Metrics and Models in Software Quality Engineering. Addison Wesley. Kanoun, K., and Laprie, J. (1994). Software reliability trend analysis from [KJ-94 1 theoretical to practical considerations. IEEE Trans. Software Engineering, 20(9), 740-746. [KMS91] Kanoun, K., Martini, M. B., and De Souza, J. M. (1991). A model for software reliability analysis and prediction: Application to the TROPICO-R switching system. lEEE Trans. Software Engineering, 17, 334-344, April. [K '901 Kareer, N. et 01. (1990). An S-shaped software reliability growth model with two types of errors. Microelectronics and Reliability, 30, 1085- 1090. Koch, H. S., and Kubat, P. (1983). Optimal release time of computer software. t m 8 3I IEEE Trans. Software Engineering, SE-9, 323-327. Kremer, W. (1983). Birth-death and bug counting. IEEE Trans. Reliability, R[fie831 32,37747. 1983. [KS84] Kyparisis, J., and Singpunvalla, N. D. (1984). Bayesian inference for the Weibull process with applications to assessing software reliability growth and predicting software failures. In Proceedings of 16th Symp. on Interface: Computer Science and Statistics, pp. 57-64, Atlanta, GA. Langberg, N., and Singpurwalla, N. D. (1985). A unification of some software [LS85] reliability models. SIAM Journal of Scientific and Statistical Computation, 6 ( 3 ) , 781-790.

[G079a]

266 [Lit791 [Lit80a ] [Lit80b]

[Lit841 [LS87] [LV73] [Liu87]

[McK95] [Mil861 [Mor75] [Mus7 1 ] [M083]

[MI0871 [NS82]

[NSS84]

[Ohb84] [Oku79]

[OGSO]

[0'901 [OC90]

[RB82 ]

AMRlT L. GOEL AND KUNE-ZANG YANG

Littlewood, B. (1979). How to measure software reliability and how not to. IEEE Trans. Reliability, R-28, 103- 110. Littlewood, B. (1980). The Littlewood-Verrall model for software reliability compared with some rivals. J . Systems and Software, 1, 251-258. Littlewood, €3. (1980). Theory of software reliability: How good are they and how can be improved? IEEE Trans. Software Engineering, SE-6, 489--500. Littlewood, B. (1984). Rationale for a modified Duane model. IEEE Truns. Reliability, R-33, 157- 159. Littlewood, B., and Sofer, A. (1987). A Bayesian modification to the JelinskiMoranda software growth model. Software Engineering J., 2, 30-41. Littlewood, B., and Verrall, J. L. (1973). A Bayesian reliability growth model for computer software. Applied Stutistirs, 22, 332-346. Liu, G. (1987). A Bayesian assessing method of software reliability growth. In Reliability Theory and Applications ( S . Osalu and J. Cao, Eds.), pp. 237-244. World Scientific, Singapore. McKissick, J . ( I 995). Personal Communication. Miller, D. R. (1986). Exponential order statistic models of software reliability growth. IEEE Trans. Sofrware Engineering, SE-12(I),12-24. Moranda, P. B. (1975). Prediction of software reliability during debugging. In Proceedings of Annual Reliabiliiy and Maintainability Symp., pp. 327-332. Musa, J. D. (1971). A theory of software reliability and its applications. IEEE Trans. Softwure Engineering, SE-I, 312-327. Musa, J. D., and Okumoto, K. (1983). A loganthmic Poisson execution time model for software reliability measurement. In Proceedings of 7th Int. Con$ Software Eng., pp. 230-237, Orlando, FL, March. Musa, J. D., Ianinno, A., and Okumoto, K. (1987). Softwure Reliability: Measurement, Prediction, Application. McGraw-Hill, New York. Nagel, P. M., and Skrivan, J. A. (1982). Software reliability: Repetitive run experimentation and modeling. Technical Report NASA Contractor Rep. 165836, NASA Langley Res. Center, February. Nagel, P. M., Scholz, F. W., and Sknvan, J. A. (1984). Software reliability: Additional investigations into modeling with replicated experiments. Technical Report NASA Contractor Rep. 172378, NASA Langley Res. Center, June. Ohba, M. (1984). Software reliability analysis models. IBM J . Res. Develop., 28(4), 428-443. Okumoto, K. (1979). Stochastic Modelling for Reliability and Other Performance Measures of Software Systems with Applications. Ph.D. thesis, Syracuse University, May 1979. Okumoto, K., and Goel, A. L. (1980). Optimum release time for software systems based o n reliability and cost criterion. Journal of Systems and Soffwure, 1(12), 315-318. Othera, H. et 01. (1990). Software availability based on reliability growth models. Truns. IElCE of Japun, 13, 1264-1269. Ozekici, S., and Catkan, N. A. (1990). Software release models: A survey. In Proceedings of the Fifth International Symposium on Computer arid Information Science, pp. 203-210. Ramamoorthy, C. V., and Bastani, F. B. (1982). Software reliability-status and perspectives. IEEE Trans. Softwure Engineering, SE-8(4), 354-370.

SOFTWARE RELIABILITY AND READINESS

[Roh76] [Ros93] [SW731 [SW78] [Sch75]

[Sha8 1] [SM86] [SS86]

[TC80]

[Tra85 1 (TS751

[Xie9 1] (XB88l

[Y085a 1 [Y085b] [Y 0 0 8 3]

[Y '841 [Y '851

[ Y -861 [YG94] [Ze193]

267

Rohatgi, V. K. (1976) Art ftitroduc tiort to Probability Theory urtd Mutheniutrc~ill Statistics. Wiley, New York Ross, S. M., (1993). Introduc~rrorito Probability Models (5th edn). Academic Press, San Diego, C A . Schick, G. J., and Wolverton, R W. (1'173). Assessment of software reliability. In Proc. Operutions Reseur-cir, pp. 395 -422. Schick, G. J., and Wolverton, R. W. (1978). An analysis of competing softwai-e reliability models. IEEE Trim.\. Sifnvure Engineerinx, SE-4(2), 1 0 ' - 120. Schneidewind, N. F. (1975). Amilysis of error processes in computer software. In Proc. Inr. Car$. Reliable Sqfhviire, pp. 337-346, Los Angeles, CA, April. Shanthikumar, J. G. (I98 1 ). A general software reliability model for perfoi-niance prediction, Microelec/rrwrcJ utid Rcliuhility, 21, 67 1 -682. Sumita, U., and Masuda, Y (1986). Analysis of software eiTor prediction models. IEEE Truris. Sojiiurr Erigirtccrtng, SE-12, 32-41. Sumita, U., and Shanthikumar. J . G. (1986). A software reliability model w i t h multiple-error introduction UJ reriioval. IEEE Truns. Reliubility, R-35, 459-462. Thompson, W. E., and Chelson. P. 0 (1980). On the specification and testing of software reliability. In P r - o c w t l i r t , q . s of Ann. Relrahility and Muirttuinubiliry symp., pp. 379-383. Trachtenberg, M. (1985). The lineal software reliability model and uniform testing. IEEE Trurts. Reliubilrty. K-34( I ), 8- 16. Trivedi, A. K., and Shooman. M. L. (1975). A many-state Markov model for the estimation and prediction o f coniputer software performance parameters In Proceedings of Interrtutiorrtil C'orforerice on Relioble Software, pp. 208 -720, Los Angeles. Xie, M. (1991). Sofhyure Relrtrhrliry Modelling. World Scientific. Singapore. Xie, M., and Bergman, B. (1988). On modelling reliability growth for softwue. In Proceedings of 8th IFRC .S,vrrip o r t Ideritijcarion and Sysrern Pururrrrrcr Estimation, Beijing, China Yamada, S., and Osaki, S. (1985). ('oat-reliability optimal release policies tor software systems. IEEE Trurrs. Rdiuhrlity. R-34, 422--424. Yamada, S. and Osaki, S. (1985). Software reliability growth modeling: Models and applications. IEEE Truri.\. So/hvur-eEngineering, SE-11, 1431 - 1437. Yamada, S., Ohba. M., and O s k i . S. (1983). S-shaped reliability growth modeling for software error detectioil. f Trarrs. Keliabiiity, R-32, 475-478. December. Yamada, S. et ul. (1984). Optimiini release policies for a software system w i t h scheduled software delivery time. lrit .I. Syst. Sci., 15, 905-914. Yamada, S. et al. (1985). A software reliability growth model with two type\ of errors. R.A.I.R.O., 19, 87-104. Yamada, S. et al. (1986). Software reliability growth models with testing effort. IEEE Truns. Reliuhility, R-35, 19 -23 Yang, K., and Goel, A. L. (1993). Models for software reliability predictwii: Some new results. Technical repon. Syracuse University, January Zelkowitz, M. V. (1993). Role of verification in the software specification process. In Advurrces in Corripitrr.v (M. C. Yovits, Ed.), Vol. 36, pp. 44- 108. Academic Press, New Yorh.

This Page Intentionally Left Blank

Computer-Supported Cooperative Work and Groupware JONATHAN GRUDlN Information and Computer Science Department University of Calif0rnia, Irvine, California

STEVEN E. POLTROCK Information and Support Services The Boeing Company Seattle, Washington

Abstract This chapter surveys Computer-Supported Cooperative Work (CSCW) research and groupware development. In the !980s, as computer and software vendor companies focused attention on supporting networked groups, they came to share interests with researchers and developers oriented toward management information systems, social sciences and other disciplines. CSCW can be seen as a forum attracting diverse people who have partially overlapping interests and a willingness to overcome the difficulties of multidisciplinary interaction. In this chapter, we discuss the different assumptions and priorities that underlie small-system and large-system work. We note differences in emphasis that are found in North America, Europe and Asia. We then provide an overview of the current state of research and development by technology area, examining in turn technologies that focus primarily on supporting human communication, on providing shared information spaces, and on coordinating the flow of work. We discuss challenges to designing and deploying groupware, taking particular note of the behavioral challenges that often prove to be thornier than technical challenges. Approaches to addressing these challenges are described, followed by our summary of some trends and future issues. The first part of this chapter extends work presented in Grudin (1994a).

1. TheCSCWForum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 272 2. Research and Development Contexts . . . . . . . . . . . . . . . . . . . . . . . . 274 2.1 Research that Spans the Boundaries . . . . . . . . . . . . . . . . . . . . . . 2.2 The Challenge of Being Multidisciplinary . . . . . . . . . . . . . . . . . . . 275 3. From Small-group Applications to Organizational Systems . . . . . . . . . . . . . 276 3.1 A Contrast: Large Systems and Small-group Applications . . . . . . . . . . . 276 3.2 Project and Large-group Support . . . . . . . . . . . . . . . . . . . . . . . . 277

269 ADVANCES IN COMPUTERS, VOL. 45

Copyright 0 1991 by Academic Press Ltd. All rights of reproduction in any form reserved.

270

JONATHAN GRUDIN AND STEVEN E. POLTROCK

4 . CSCW in North America. Europe and Asia . . . . . . . . . . . . . . . . . . . . . 4.1 A Contrast: CSCW In North America and Europe . . . . . . . . . . . . . . . 4.2 Asia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 . Groupware Typologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Categorization by Group Activity . . . . . . . . . . . . . . . . . . . . . . . 5.2 Features that Support Communication, Collaboration, Coordination . . . . . 5.3 Categorization by Groupware Technology . . . . . . . . . . . . . . . . . . . 6. Communication Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Electronic Mail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Real-time Conferencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Multicast Video and Audio . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 . Shared-information-spaceTechnologies . . . . . . . . . . . . . . . . . . . . . . . 7.1 Real-time Shared Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Asynchronous Shared Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 8. Coordination Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Calendars and Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Workflow Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9. Challenges to Groupware Development and Use . . . . . . . . . . . . . . . . . . 9 . I Technical Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Social and Organizational Challenges . . . . . . . . . . . . . . . . . . . . . 10. New Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 . Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

278 278 28 1 282 282 284 285 285 286 287 290 291 291 299 304 304 305 309 309 310 311 313 314

1. The CSCW Forum In 1984. Irene Greif at MIT and Paul Cashman at Digital organized a workshop. inviting people from various disciplines who shared an interest in how people work and how technology could support them . They coined the term “computer-supported cooperative work” to describe this common interest . Since then. thousands of researchers and developers have adopted the term and attended CSCW conferences. Some writers describe CSCW as an emerging field or discipline (Bannon and Schmidt. 1991. present a nice case for doing so). but today it more resembles a forum. an undisciplined marketplace of ideas. observations. issues. and technologies. Differences in interests and priorities are as notable as the shared interests . People come to CSCW. as to a forum. from different places . It is useful-perhaps essential-to know where each is from and why they have come . Not everyone speaks the same language or makes the same assumptions. so we often must work out a means of communicating . If we think of CSCW as an emerging field or common enterprise. we may be frustrated by this mosaic of different pieces. the frequent misunderstandings. and the lack of intellectual coherence. But when understood and respected. the differences form the core of richer. shared understandings .

COOPERATIVE WORK AND GROUPWARE

27 1

Groupware is coming into prominence following decades of being “promising” but not successful technologies. The growth of the Internet and World Wide Web and the wide deployment of Lotus Notes are key demonstrations of our readiness for group support technologies. Understanding the initial lack of success is important in navigating through the present. In the 1960s, mainframe transaction processing systems had succeeded in addressing tasks such as filling seats on airplane flights and printing payroll checks. From the late 1970s through the 1980s, minicomputers promised to support groups and organizations in more sophisticated, interactive ways. “Office automation” was the term used to describe these group support systems. Their general lack of success was due less to technical challenges than to insufficient understanding of system requirements, as summarized in Grudin (1988), Bullen and Bennett (1990), Markus and Connolly (1990). The technology was built but did not meet group needs. More knowledge was needed about how people work in groups and how technology affects their work. Some engineers, notably Douglas Engelbart, had stressed the coevolution of technology and organizations all along. (Greif, 1988, includes four early works of Engelbart in her collection of influential research papers.) Some information systems specialists understood this to be central to deploying large systems. But recognition of the interdependence of technology and social organization was largely absent from discourse among the designers and developers in the vendor companies most engaged in developing group support applications. CSCW started as an effort by technologists to learn from economists, social psychologists, anthropologists, organizational theorists, educators, and anyone else who can shed light on group activity. CSCW also encompasses system-builders who share experiences and inform others of technical possibilities and constraints. Applications include desktop conferencing and videoconferencing systems, electronic mail and its refinements and extensions, collaborative authoring tools, systems built to provide shared information spaces including electronic meeting rooms, workflow management systems, and virtual worlds. Not strongly represented in CSCW collections, but logically related, are computer-assisted design/ computer-assisted manufacturing (CAD/CAM), computer-assisted software engineering (CASE), concurrent engineering, distance learning, and telemedicine. Why call it “computer-supported cooperative work”? Some have noted that “cooperation” is often more a goal than a reality. “Groupware” or “workgroup computing” are terms that shift the focus from the work being supported to the technology, and suggest small organizational units. ‘‘Workflow management systems”, a more recent coinage, describes technologies that support group processes in a particular organizational context.

272

JONATHAN GRUDIN AND STEVEN E. POLTROCK

The next section identifies historical shifts, demographic patterns, and geographic distinctions that underlie contributions to CSCW.

2. Research and Development Contexts Each ring in Fig. 1 represents one focus of computer systems development and the principal “customer” or “user” of the resulting technology, primarily from a North American perspective. Until recently most activity was in the outermost and innermost rings. The former represents major systems and applications, primarily mainframe and large minicomputer systems designed to serve organizational goals such as transaction processing, order and inventory control, and computer-integrated manufacturing. The innermost ring represents applications designed primarily for the individual users of PCs and workstations: word processors, debuggers, spreadsheets, games, and so forth. The two rings between these represent projects (or other large

Minicomputers, Networks

(GDSSMlorMlowl

FIG. 1. US research and development contexts for CSCW and groupware.

COOPERATIVE WORK AND GROUPWARE

273

groups) and small groups. Large group support includes electronic meeting rooms and workflow management systems, which are most useful for groups of half a dozen or more. In contrast, a major focus of small group support-computer-mediated communication (CMC)-includes applications that often work best with fewer than four or five users, such as desktop and and desktop videoconferencing. Technologies in each of the middle rings is called “groupware.” However, CSCW gatherings, especially in the United States and Asia, have focused primarily on small-group support. In contrast, trade-oriented groupware conferences have focused more on project-level support and “workflow management”, and European work has had more of an organizational focus. On the left of Fig. 1 are software development contexts that dominate development of systems and applications of different scope. Most software systems that support an entire organization (the outermost ring) are unique to the organization. Some may be contracted out, but historically, internal or in-house development has produced an extensive body of software. In contrast, in the innermost ring, single-user applications are the province of commercial off-the-shelf product developers, who address the large s h n n k wrapped software market and who do little customization for individual purchasers. The two central rings represent groupware development: (i) contracting, initially government contracting, has stimulated considerable project-level software support; (ii) small-group support is a new focus for commercial product developers, and for telecommunications companies that have focused on technologies such as video that create demand for high bandwidth communication. The emergence of CSCW in the 1980s included both but is most strongly tied to the second, the shift of vendor company attention to small networked groups. On the right of Fig. 1 are major research areas associated with the development and use of systems linked to each development context, and dates by which they were firmly established. A literature associated with systems in organizations arrived in the mid- 1960s with “third generation” computer systems built with integrated circuits. It has been called data processing (DP), management information systems (MIS), information systems (IS), and information technology (IT). In an excellent survey of this field, Friedman (1989) summarizes, “There is very little on the subject up to the mid-1960s. Then the volume of literature on (computers and) the organization of work explodes. Issues of personnel selection, division of labour, monitoring, control and productivity all subsequently receive considerable attention.” The complexity of managing large government software contracts provided incentive to apply technology to the management of large projects (the next ring). In the 1970s, the field of Software Engineering (SE), as well

274

JONATHAN GRUDIN AND STEVEN E. POLTROCK

as Office Automation (OA), emerged. Software engineering is of course a specific kind of project, but technology-rich development environments are a natural setting for efforts at computer support for large groups. Although OA did not survive as a field, many of the same issues are again being considered under workflow management. (Greif, 1988, contains several influential papers from the OA literature; Medina-Mora et al., 1992, Abbott and Sarin, 1994, Dourish et al., 1996, and Agostini et al., 1997 are recent workflow management papers.) The innermost ring emerged next, with the emergence of PCs rapidly followed by the formation in 1983 of the Association for Computing Machinery Special Interest Group in Computer-Human Interaction (ACM SIGCHI) as a research forum dedicated to single-user applications and interfaces. The most recent is small-group support and CSCW. The 1984 workshop mentioned above was followed by conferences in 1986 and annually since 1988, with European conferences in odd years. (Many of the conference proceedings are available from ACM or Kluwer; they remain archival sources, along with the journal Computer Supported Cooperative Work.) CSCW conferences draw researchers from the IS, SE, and the former OA communities, but the North American conferences draw primarily from the computer and software companies that support the predominantly singleuser human-computer interaction (CHI) community. Differences in emphasis in Europe and Asia are discussed in Section 4. Although many papers reflect the expanded focus of vendor companies to include small-group applications, it has not proved possible to market groupware to the millions of small groups in the way that word processors, spreadsheets, and games were marketed to individuals. The organizational settings of group activity are too salient to be ignored and too complex to be easily addressed. CSCW is not easily compartmentalized. 2.1

Research that Spans the Boundaries

Figure 1 represents general tendencies. For example, organizations do not develop all software internally; they also contract out software development and increasingly acquire commercial software as well. For our purposes, the most important caveat is that CSCW is not wholly restricted to one “ring”-CSCW represents a merging of people, issues, approaches and languages. By spanning boundaries, CSCW and groupware create an exciting potential for cross-fertilization and for doing work with broad implications. Indirect as well as direct effects are studied: the use, in group and organizational settings, of applications that were developed for individual users; the ways in which software, developed to support groups,

COOPERATIVE WORK A N D GROUPWARE

275

affects individuals and is adapted to different organizational contexts; systems developed to support organizational goals as they act through individuals, groups, and projects. Individual, group, project and organizational activity are fundamentally intertwined. Figure 1 is one partitioning of the system development world, and can obscure issues that transcend the divisions.

2.2

The Challenge of Being Multidisciplinary

Whether we view the shared and the disparate interests as a melting pot or a mixed salad, making sense of them is a lively process. Opportunities to learn and to inform generate enthusiasm, which is needed to overcome inevitable obstacles. It is not always apparent why others’ perspectives and priorities differ. It takes patience to understand conflicts and to find mutually advantageous modes of operation. It is exciting to find a new source of information and a new potential audience, but it is frustrating when the other group is ignorant of work that you assume to be basic shared knowledge. The groups participating in CSCW are not always aware of the extent to which they rely on different conferences, journals, and books. Consider the “Tower of Babel” problem-participants from different areas use the same terms in subtly different ways. Basic terms such as “system”, “application”, “task”, “implementation”, and even “user” differ across these communities (for details see Grudin, 1993). For example, in the field of HCI, “user” generally refers to a person sitting at a display, entering information and commands, and using the output. In the IS field, “user” often refers to a user of the output, a person who might not touch a keyboard. To deal with the ambiguity, IS coined the term “end user” to describe a person at a terminal or keyboard, a term not needed or used by most in HCI. To software engineers developing tools, “user” typically means the tool user, a software developer. Small wonder that bringing these people together leads to confused conversations and misunderstood articles! CSCW is logically broader than it is in practice. Many topics are omitted from the conferences and anthologies, either because the topics are covered in other conferences and journals, because their foci are of less interest to the core CSCW constituency, or because the writing is misunderstood. The most comprehensive collection of readings in Groupware and ComputerSupported Cooperative Work (Baeker, 1993), with over 70 papers, accurately represents the literature, but contains nothing on computermediated education and distance learning, project-level software engineering support, workflow management, computer-integrated manufacturing, and other topics.

276

JONATHAN GRUDIN AND STEVEN E. POLTROCK

3. From Small-group Applications to Organizational Systems 3.1

A Contrast: Large Systems and Small-group Applications

The design of individual productivity applications such as word processors stressed functionality and human-computer interfaces. Interface design focused on perceptual and cognitive aspects of learning and use. Developers succeeded with minimal attention to the workplaces in which single-user applications were used. As product developers extended their view to computer support for groups, many confronted social issues in customer settings-group dynamics-for the first time. With groupware, social, motivational, and political aspects of workplaces become crucial (Grudin, 1994b). Organizational systems-mainframes and large minicomputers-have been around for decades, and the importance of social dynamics is familiar to IS researchers and developers, who have incentives to share their knowledge with product developers: networked PCs, workstations, and software products are increasingly important components of organizational information systems. Also, as the large systems and applications that have been the focus of IS study decline in cost, they are used by smaller organizational units, providing additional shared focus. The small-group application and IS communities have differences, as well. For example, most small-group support emphasizes communication. Small groups are generally formed to bring together people who have a need to communicate. Communication is also the priority for the telecommunications industry. In contrast, organizational systems focus more on coordination, because coordinating the efforts of disparate groups is a major problem at the organizational level (Malone and Crowston, 1995). Similarly, members of small groups usually share key goals. As a result, product developers anticipate relatively little friction or discord among users and assume a “cooperative” approach to technology use. This is directly reflected in the second “C” of CSCW. In contrast, researchers and developers focusing on organizational systems must attend to the conflicting goals that are generally present in organizations (e.g., Kling, 1991; Kyng, 1991). Some in the IS community have argued for changing the meaning of the second “C” or for dropping it altogether. Another contrast is that product developers are more concerned with the human-computer interface, whereas the developers of organizational systems and their customers are more focused on functionality. Product developers compete in discretionary markets where useful functionality is

COOPERATIVE WORK AND GROUPWARE

277

quickly adopted by others, at which point the human-computer interface provides an important edge. In contrast, internal developers of information systems must accurately gauge the functionality needed in the workplace, and often cannot justify the cost of fine-tuning the interface for their relatively fixed user population. Out of such differences in priorities comes misunderstanding and confusion. Speakers from the IS field berate small-group application developers for focusing on “cooperation” and ignoring conflict, or criticize research that focuses on the thin surface layer of the human-computer interface. On the other side, those working to resolve technical problems question the value of research into organizational politics that is distant from their concerns. CSCW includes social scientists and technologists, but this is often not the real source of conflict. In large information system environments, decades of experience have surfaced non-technological problems, whereas in smallsystems environments, technological hurdles still predominate. For example, Scandinavians worlung on tools and techniques for collaborative design are often associated with the “social science” perspective, despite being computer scientists who do not practice social science. They came to realize the importance of social effects in the course of developing large systems. Conversely, many behavioral and social scientists who are hired into industry research labs evolve to be “technologists”. Until we understand the origins of our differences we will not succeed in addressing them.

3.2

Project and Large-group Support

Small groups and large organizations represent extreme points. Our intervening category, large group support, lies between them in terms of group purpose, cohesion, conflict, and so forth. Technologies such as meeting support and workflow management deal with the same issues in less sharply contrasting ways. Workflow management is discussed in Section 8.2. In this section we outline the history of meeting support systems. Once expensive and directed at high-level decision-making, these are now inexpensive and flexible enough to support a variety of meeting types. Their evolution and role in CSCW illustrates several points made earlier. Electronic meeting rooms were originally a central component of group decision support systems (GDSS). Unlike most groupware applications, they did not emerge from product development environments, nor did papers on GDSS appear in HCI conferences. Until recently, there were no commercial electronic meeting room products. GDSS research and development began over 20 years ago in the IS field, in management schools. Consider the “D’ in GDSS. Decision-making was emphasized because management-as-decision-making was the dominant perspective in

278

JONATHAN GRUDIN AND STEVEN E. POLTROCK

schools of business and management (King et al., 1992). In addition, expensive early systems could best be justified in organizations (and in management school cumcula) by focusing on high-level decision-making. In the mid-l980s, the first CSCW conferences drew GDSS researchers from the IS field. Conflicting use of terminology went unrecognized. The IS community construed GDSS broadly to include all technology that contributes to decision-making, including electronic mail and other common applications. Some in the IS field considered GDSS to be a synonym for CSCW. Upon encountering the term GDSS, many from the HCI field assumed it referred only to electronic meeting support, the one technology feature unfamiliar to them. As the cost of the technology fell, GDSS use was no longer restricted to high-level “decision-makers”. It could be used to support meetings of various kinds. In addition, management trends lessened the emphasis on high-level decision-making. As rungs are removed from many organizational ladders, responsibility for decisions often shifts to the groups that will implement them. The “D’ has been dropped to form group support systems (GSS). The reduced cost, together with improved technology and a better understanding of the process of effective use (Grudin, 1994b), led to successful commercial electronic meeting room products around 1990. GSS is support for projects or large groups-meeting support is not as useful with fewer than five or six participants. The small-group application developers who play a central role in CSCW have different priorities than the GSS system developers, and few GSS papers appear in CSCW conferences. GSS researchers, observing that small-systems researchers are unfamiliar with their literature, have become less involved in CSCW. They participate in conferences with an IS orientation, initiated a newsletter that rarely mentions CSCW, and spawned their own journals. They have, however, adopted the “groupware” label, as has the workflow management community-another group focused on large-group support. Thus, the term “groupware” is found in both GSS and CSCW literatures, used to describe overlapping but different technologies. The divide is only partial; some information systems research is presented at CSCW meetings, and both groups can benefit from interaction. But the fragile nature of participation in CSCW is apparent.

4. 4.1

CSCW In North America, Europe and Asia

A Contrast: CSCW In North America and Europe

American and European approaches to CSCW overlap, but also differ markedly. This partially reflects the distinctions outlined in Section 3.

COOPERATIVE WORK AND GROUPWARE

279

Major American computer and software vendor companies have more direct and indirect influence than their counterparts in Europe. In addition to direct corporate support in the US, students are hired as interns, Ph.D.s are hired into research labs and development organizations, and in recent years many corporate researchers have been hired into respectable academic positions. In Europe, governments provide more student support and sponsor research through public universities and research laboratories. The focus has been on large-scale systems, in particular systems that are developed or deployed in organizations that are not primarily computer or software developers. North American researchers and developers are more likely to focus on experimental, observational, and sociological data; others exhibit a technology-driven eagerness to build systems and then seek ways to use them. These approaches can be considered empirical: experiments by social psychologists looking at group activity among teams of students (e.g., Olson et al., 1992), anthropological descriptions of activity in schools and businesses (e.g., Suchman, 1983), descriptions of groupware that address interesting technical issues whether or not the technology is widely used (e.g., Conklin & Begeman, 1988). European contributions to CSCW are often motivated by philosophical, social, economic or political theory. They may be explicitly grounded in the writings of Wittgenstein, Heidegger, Elias, Marx, Vygotsky or others. (See, for example, contributions in Bjerknes et al., 1987; Floyd et al., 1992.) Other contributions are also theory-based but more formal, like other branches of European computer science or informatics. Typical projects include broad formulations of system requirements and implementations of platforms to support a suite of applications that in concert are to provide organizational support (e.g., Trevor ef al., 1995). The distinct European CSCW also reflects cultural norms in European countries, such as greater national homogeneity, co-determination laws, stronger trade unions, and more extensive social welfare. At the risk of oversimplifying, greater cultural homogeneity can lead to a systems development focus on skill augmentation (in contrast to automation) that is justified on economic as well as humanitarian grounds: In a welfare state, workers losing jobs to automation must be indirectly supported anyway. The Scandinavian participatory or collaborative design approach reflects these priorities (Kyng, 1991). Work in England bridges these cultures, with one happy consequence being an infusion of insightful ethnographic (anthropological) research into technology use in organizations. Several US technology companies have active research laboratories in England. The most notable fusion of approaches is at Rank Xerox’s prolific Cambridge Research Center,

280

JONATHAN GRUDIN AND STEVEN E. POLTROCK

including their collaborations with academic researchers in the UK. These include sociological analysis of group activity in settings ranging from the London Underground control room (Heath and Luff, 1993) to a printing shop (Bowers et al., 1995), and the construction and use of video communication systems (e.g., Dourish and Bellotti, 1992; Dourish and Bly, 1992). Recently Dourish (1995a, 1995b) has used some of the insights from social analyses to describe requirements for future systems development. CSCW in Europe has been supported by an enormous variety of grants. Major European Community projects funded by the European Strategic Programme for Research and Development in Information Technology (ESPRIT) and Research and Development in Advance Communications Technology in Europe (RACE) explicitly brought together researchers and developers from different countries. These also required academic and industry partners. Some projects involve tightly coupled work, others consist of more independent efforts at each site. These projects have been exercises in cooperative work whose content is CSCW research and development. Another effort to build cooperation among researchers and developers in the European Community countries was the CO-TECH project, carried out under the Cooperation in Science and Technology (COST) framework. This provided funding for organizing and attending meetings, not for research itself, and succeeded in building a greater sense of community. In addition, many European governments directly fund research in this area through government research laboratories and specific government projects. For example, the German GMD is conducting a major effort to develop an infrastructure to support the division of the country’s capital between Bonn and Berlin. The very strong research component of this project is arguably the most thoughtful and productive single effort in CSCW (numerous papers and videotapes have been published, including Haake & Wilson, 1992; Streitz et al., 1994; Pankoke-Babatz, 1994; Klockner et al., 1995). NSF has been an important supporter of US CSCW projects, but it has been influential than European funding agencies in shaping the research agenda. The CSCW’92 conference illustrated these differences. European presentations included two based on multinational ESPRIT projects and none from computer companies. The ESPRIT presentations described a working “model for automatic distributed implementation of multi-user applications” (Graham and Urnes, 1992) and a description of the requirements for supporting the Great Belt bridgeftunnel project in Denmark (Gronbak et al., 1992). European papers included two based explicitly on speech act theory (Medina-Mora et al., 1992) and activity theory (Kuutti and Arvonen, 1992). In contrast, several US companies were represented, along with five US and two Japanese contributions from telecommunications

COOPERATIVE WORK AND GROUPWARE

281

companies. In general, the papers reflected US interest in small-group applications and European emphasis on organizational systems. British contributions included several focused on ethnography as well as some focused on innovative technologies. These conferences have done well to overcome these differences as long as they have. Philosophically oriented European submissions often strike empirically oriented American reviewers as lacking content; American contributions strike European reviewers as unmotivated or shallow. Again, differences in terminology block mutual understanding, as when a European CSCW researcher criticizes an American group’s understanding of “task analysis”. (The latter used the term to describe a cognitive task analysis based on experimental interface testing, a standard practice in HCI. To the European, “task analysis” meant an organizational task analysis based on mapping the flow of information from person to person. He found it nonsensical to apply the term in an experimental setting.) Cultural differences in the role of research meetings exacerbate the split. In Europe, conferences are often gatherings of professionals to interact and share current results; most of those who attend also present. In the US, a conference is more often organized for a larger audience, with greater emphasis on polished results. The difference leads to misunderstandings over submission requirements, and the composition of the conferences appears to be concenBating upon ethnographic case studies and technical implementation studies.

4.2

Asia

Thus far, the principal Asian impact on CSCW and groupware research in the West has come from a growing number of Japanese contributions (e.g., Ishii and Kobayashi, 1992; Okada et al., 1994; Inoue et al., 1995). In Japan, government and industry cooperation in technology development includes support for CSCW. The Information Processing Society of Japan (IPSJ) has for some years had a special interest group devoted to CSCW and Groupware (translated as “SIG-GW’). Asian contributions to CSCW have come primarily from computer and software companies, with most major electronics companies supporting research in the area, and telecommunications companies, including NTT and ATR. In this respect Japanese participation matches the non-academic profile of US participation. There are differences in emphasis. Languagespecific technologies, such as the World Wide Web, can be initially less appealing in Asia than content-independent communication technologies, such as mobile computing. Somewhat slow to embrace the WWW, IPSJ has started a SIG for mobile computing. Beyond Japan, the internet and WWW also raise information control issues for non-democracies.

282

JONATHAN GRUDIN AND STEVEN E. POLTROCK

Japanese researchers have long been interested in technological support for group process. The “software factory” concept and interest in process programming were examples in software engineering. Today there is active interest in workflow management. Having spent time in Japan, we are often asked about the impact of cultural differences on technology use. There are undoubtedly such effects, but it is easy to oversimplify. For example, it is often suggested that Japanese enthusiasm for collaboration and consensus will increase groupware acceptance. Closer examination reveals a more complicated reality. Ishii (1990) notes that in Japan, the importance of showing consensus in meetings often leads to real decision-making occurring in private discussions, eliminating a role for meeting support software. More generally, the preference in Japan for personal contact and direct interaction could actually increase the resistance to technological mediation (Hiroshi Ishii and Gen Suzulu, personal communications). In addition, many social and work practices in Japan are intricately detailed, and efficiency is not the only goal; new technology will inevitably disrupt some of this. Thus, one should avoid predicting the success of a groupware technology in a different culture too quickly. Cultural issues are as complex as they are important.

5. Groupware Typologies 5.1

Categorization by Group Activity

Many typologies or categorizations of groupware have been proposed. Figure 2 presents a variant of the widely used space and time categorization of DeSanctis and Gallupe (1987), refined by Johansen (1989). Representative applications illustrate the different cells. Activity can be carried out in a single place (top row), in several places that are known to the participants, as in electronic mail exchanges, for example (middle row), or in numerous places not all of which are known to participants, as in a message posted to a netnews group (bottom row). Activity can be carried out “in real time”; that is, in one unbroken interval, as in a meeting (left column). Alternatively it can be carried out at different times that are highly predictable or constrained, as when you send mail to a colleague expecting it to be read within a day or so (middle column). Or it can be carried out at different times that are unpredictable, as in an open-ended collaborative writing projects (right column). Activities may not always match Fig. 2 precisely-for example, one collaborative writing project could take place in a single session, but another could involve an unpredictable, large set of people assembling a major piece of documentation. Some cells have enjoyed more computer

283

COOPERATIVE WORK AND GROUPWARE

Different

Different but predictable

and

Meeting facilitation

Work shifts

Team rooms

Different but predictable

Teleconferencing Video conferencing Desktop conferencing

Electronic mail

Collaborative writing

Different and unpredictable

Interactive multicast seminars

Computer boards

Workflow

Same

Same

unpredictable

-

A C

El

FIG. 2. A 3 x 3 map of groupware options.

support than others; for example, interactive multicast seminars are only starting to appear as “same time, unpredictable place” activity. This typology is easy to learn. It facilitates communication. It is widely used, especially by groupware developers, but not without risk: Fig. 2 obscures an organizational perspective. Most real work activity does not fall into one or another of these categories. As we go about o u r work, we generally engage in some face-to-face meetings and some distributed and asynchronous communication. Most work involves both communication and coordination. Narrow tasks interact with broader work activities and even the broadest concerns overlap and impact one another. Technology designed to support activity in one cell can fail by negatively impacting activity in another. For example, a stand-alone meeting support system that provides no access to existing databases or other on-line materials may be useless in some situations. Noting the interdependencies among activities, Robert Johansen calls for “any time, any place” support. A typology hobbles groupware developers if it focuses our attention too narrowly. At the same time, it serves legitimate purposes; for example, it helps identify applications that pose common technical challenges, such as those dealing with concurrent activity. A second typology distinguishes between the kind of collaborative tasks supported by the technology. Computer supported cooperative work typically involves communication between participants, collaboration or cooperation in

204

JONATHAN GRUDIN AND STEVEN E. POLTROCK

a shared mformation space, and coordination of the collective contributions. The technology features that support these tasks are the essence of groupware, whether these features are found in a groupware product or integrated into products from other domains, such as office systems.

5.2

5.2.1

Features that Support Communication, Collaboration, Coordination Features that Support Communication

Groupware communication features enable people (not processes) to communicate with one another. The communication may be real-time, like a telephone call. Real-time groupware features are found in video conferencing, voice conferencing, and text-based chat sessions. The communication may be an asynchronous electronic mail message, but still it may contain video, voice, text, and other media.

5.2.2 Features that Support Information-sharing and Collaboration Collaborative work generally involves creation of some artifact representing the outcome. Shared-information-space features provide virtual places where people create and manipulate information. These features often include a shared repository to store and retrieve information. Like the communication features, these may be real-time or asynchronous. Real-time features are found in multi-user white boards and application-sharing in desktop conferencing systems, brainstorming tools in meeting facilitation systems, and multi-user virtual worlds. Asynchronous features include information management, document management, multiuser hypertext systems, and threaded discussions. Information retrieval features such as hypertext links, navigational views, and full-text search support retrieval from shared information spaces.

5.2.3 Features that Support Coordination Coordination features facilitate interactions between or among participants. Virtually any collaborative activity requires some degree of coordination, and most groupware products include some sort of coordination features. For example, real-time communication features such as video conferencing are necessarily coupled with coordination features for establishing communication channels between or among the participants. Real-time shared-information-space features such as application sharing

COOPERATIVE WORK AND GROUPWARE

285

require the same coordination features and also incorporate mechanisms for passing application control from one user to another. Coordination features are essential when interacting asynchronously in shared information spaces. Access control features limit who can participate in a shared space. Library features in document management systems include checlung out documents for revision and maintenance of document versions. These features coordinate interactions at a relatively fine-grained level, and aim to do it as unobtrusively as possible. Some technologies support coordination at a more macroscopic level, facilitating management of the overall flow of work. These technologies include calendar and scheduling, project management, and workflow management systems.

5.3 Categorization by Groupware Technology Just as collaborative work involves some combination of communication, coordination, and information manipulation, groupware products and research prototypes generally combine features from these three categories. Groupware technology achieves its diversity through innovative features for supporting each type of collaborative activity and through innovative combinations of features. Often, however, features from one category dominate, and these dominant features can serve to categorize the groupware products and prototypes. For example, electronic mail and video conferencing products predominantly serve interpersonal communication; document management products predominantly provide a shared information space; and worMow management systems predominantly coordinate the flow of work. Groupware technologies that most effectively support collaborative work are the hardest to categorize because they support all aspects of the work. Consider Lotus Notes, for example. Its primary feature is an object store providing shared information spaces. It also supports communication through state-of-the-art electronic mail and through integration with video conferencing. Many of its features support automated information routing and tracking, capabilities typically found in workflow management systems. Although Lotus Notes contains all three categories of features, it would be categorized as a shared-information-spacetechnology because those features predominate. Sections 6-8 describe technologies from each of these three categories, identifying where these technologies use features from the other categories.

6. Communication Technologies As noted above, technologies can support both real-time and asynchronous communication. The real-time technologies provide a

286

JONATHAN GRUDIN AND STEVEN E. POLTROCK

communication channel for video, voice, or text. The asynchronous technologies transmit video, voice, text, or other media as electronic mail messages that are stored for the recipients.

6.1

Electronic Mail

Electronic mail or email is the most successful, best-known groupware technology. It is also a key element of well-known groupware products such as Lotus Notes, Microsoft Exchange, Novel1 Groupwise XTD, and ICL Teamware. The success of an application such as group meeting scheduling may require it to be tightly integrated with email. Email’s popularity may derive in part from its ease of use. Users readily understand the capabilities of email systems because the functionality and user interfaces are strongly based on the familiar metaphor of postal systems, including concepts such as mail, mailboxes, attachments, return receipts, and carbon copies. Flexible email systems also allow the equally familiar informality of conversation. After decades of use and widespread acceptance, electronic mail is a relatively mature groupware technology. It continues to evolve, however, to meet evolving capabilities of computers and users’ changing expectations. Improvements in email include intelligent agents that use message structure, standard message representations, a greater range of content, and more reliable, scalable architectures. Because of its maturity, other categories of groupware rely on electronic mail to deliver messages. Each of these points is addressed in more detail below. Email is inherently structured. Messages consist of a series of field labels (To, From, Subject, etc.) and field values, ending with a body field containing the content of the message. An important step in the evolution of email was to provide a capability for creating additional fields. The Information Lens (Malone et al., 1989) demonstrated how these fields, combined with agent technology, could help users process and handle their mail. Today many groupware products, including most email systems, contain tools for constructing such agents, and improved human-computer interfaces that make them more usable. Borenstein (1992) proposed a significant further step in which programs (similar to Java) are embedded within email messages and executed by the recipient. For years, messages could not be sent between different vendors’ email systems due to incompatible protocols for representing their structure and content. In response to this problem, the International Standards Organization (ISO) developed the X.400 standard. Concurrently, the protocol used on the Internet, SMTP/MLME, emerged as a de fucto standard. Today many email systems continue to use proprietary protocols for communication between clients and servers or between servers, but nearly all systems

COOPERATIVE WORK AND GROUPWARE

287

support one or both of these standards so that messages can be sent to other email systems. Until recently, email systems used either time-sharing architectures with poor performance and usability, or file server architectures with poor reliability and scalability. The current generation of email systems (characterized by Lotus Notes, Microsoft Exchange, and Novel1 Groupwise XTD, among others) have adopted client-server architectures. These systems can serve as universal “in-boxes” for email, voice mail, fax, and video messages. Experience with the Pandora Multimedia System, a research prototype developed at Olivetti Research Laboratories, showed that video mail can be a popular feature (Olivetti, 1992). Although the principal purpose of email is communication among people, its structure, reliability, and universality have encouraged its use as a means of delivering messages between processes and people or among processes. In this way, email supports coordination as well as communication. For example, many Lotus Notes applications, workflow management products, and calendar systems use email to alert a person of events or of tasks to be performed. Some workflow management systems use email as the mechanism for routing and presenting work to users (Abbott & Sarin,1994; MedinaMora et al., 1992). Goldberg et al. (1992) used email as the mechanism for establishing real-time desktop conferencing sessions.

6.2

Real-time Conferencing

Viewed from a computing perspective, the ubiquitous telephone combines simple, inexpensive client hardware with a powerful network and server infrastructure. Emerging computer-based communication technology may soon replace the telephone in many settings by offering greater capability and flexibility at lower cost. The current generation of personal computers has audio capabilities surpassing those of the telephone handset, supports live video, and can assume some of the processing performed centrally by telephone companies. Both intranets and the Internet can replace the telephone infrastructure as the network for voice communication. Existing software supports voice communication between any two computers connected to the Internet at no cost. Real-time video communication is also possible over phone lines, ISDN lines, and ethernet. Video conferencing technology has been available for decades, but only recently became available on personal computers. Large companies have more than 20 years of experience with video conferencing suites that support communication across geographically distributed sites. These suites typically feature large display screens showing a view of the speaker, all participants at another site, or the speaker’s presentation materials. The costs

288

JONATHAN GRUDIN AND STEVEN E. POLTROCK

of these expensive video conferencing technologies were justified by the value of frequent formal meetings on large distributed projects. Today’s desktop video conferencing systems enable people to see small, low-resolution pictures of one another while conversing. A video camera mounted on or near the display transmits a video (and audio) signal, which appears in windows on other participants’ displays. Advances in camera technology, compression algorithms, and network technology are rapidly improving the performance and driving down the cost of video conferencing. Performance has not reached television quality; most systems can maintain a maximum of about 12 to 15 frames per second. None the less, the market and the number of vendors for this technology are expanding rapidly; Perey (1996) lists 40 vendors of desktop video conferencing systems. Widespread adoption of real-time conferencing technologies will require improvements in image and voice quality, and in usability. Quality depends on the processing capacity and multimedia features of the clients, whch are steadily improving, and the infrastructure, especially network bandwidth, where the outcome of the race to meet burgeoning demand is less predictable. To improve usability will require innovative design based on careful analyses of the ways these technologies are used. Usability improvements are a major focus at universities and corporate research and development centers. Adoption of desktop video conferencing appears to be strongly influenced by past experiences with video conferencing suites established to support formal meetings. Many companies install desktop video conferencing technology in meeting rooms as an inexpensive way to acquire or expand this capability. They may use speaker phones while viewing a video image of meeting participants at other sites. Results have been mixed. The technology suffers from the low video resolution and display rate and the participants’ inability to control the cameras. People value seeing a speaker partly because they can observe nuances of facial expression and body posture that are not visible with current desktop technologies. In contrast, video has proved effective in communicating about physical objects. For example, defects encountered during assembly of airplane parts could readily be described to parts suppliers (Boeing, 1992). If everyone had video conferencing technology on their desktop, how would it be used? Some expect it to replace the telephone as the instrument of choice for informal communication, and many research projects have investigated ways of encouraging informal video conferences. Coordination has proved to be a significant challenge in establishing communication sessions and in taking turns within one. Normal social cues for turn taking are impaired by the absence of visual cues and because the audio in most conferencing systems is half duplex, allowing only one speaker at a time to be heard (Short et al., 1976).

COOPERATIVE WORK AND GROUPWARE

289

One goal is a simple, easy method for establishing contact. Bellcore’s Cruiser system (Root, 1988) simulated walking by an office, taking a quick glance inside, then deciding whether to stay and talk. In one version (Fish et al., 1992), users could open 3-second audio and high-quality video connections to one person or a sequence of people. During a 3-second connection, either person could choose to extend it. Calls were generally short, used only for scheduling and status reporting, and often perceived as intrusive. They had expected this system to simulate face-to-face conversations, but it was used more like a telephone. This “glance” method of establishing a call has been adopted by other researchers (e.g., Mantei et al., 1991). To address the frequent complaint of privacy invasion, researchers at SunSoft (Tang et af., 1994; Tang and Rua, 1994) changed the temporal dynamics of the glance. The recipient first hears an auditory signal, then a video image of the caller emerges slowly into view. Either party can accept the call by enabling the video; otherwise, the image fades away. Most calls are not accepted, presumably because the recipient is not present. The caller can then leave a note on the screen, send an email message, or consult a recipient’s online calendar. Small distributed teams would especially benefit if informal video conferences were effective and easy to establish. Researchers at Xerox PARC and Rank Xerox EuroPARC have investigated ways of supporting small teams. EuroPARC installed computer-controlled video throughout their Cambridge, England, laboratory and developed tools to support teamwork (Gaver et al., 1992). One tool, Portholes (Dourish and Bly, 1992), provides awareness of activity in offices both at EuroPARC and at Xerox PARC in the US. Portholes displays an array of miniature images captured periodically from cameras in a specified set of offices. The small images provide little detail, but do indicate whether team members are present, absent, or confemng with someone else. This visual awareness is comparable to that of a physically collocated team. Problems frequently reported with desktop video conferences are: (1) difficulty of making eye contact; (2) insufficient resolution to recognize important visual cues; (3) lack of appeal of static “talking heads” Considerable effort has been directed at these problems. Hydra (Sellen, 1992) consists of a set of small units, each containing a camera, microphone, monitor, and speaker. Up to four people at different locations could meet using Hydra as though seated around a table. At each location, three Hydra units are distributed around a table to represent the other three participants. When a meeting participant turns to look at the person on one monitor, everyone can see and interpret this shift of attention.

290

JONATHAN GRUDIN AND STEVEN E. POLTROCK

The miniature units of Hydra, with camera close to monitor, created an impression of eye contact. The MAJIC system enables eye contact with life size images of participants (Okada et al., 1994; Okada er al., 1995; Ichikawa er al., 1995). Not a desktop system, MAJIC’s key feature is a large screen that is transparent from one side but reflective on the other side. The display image is projected on the reflected side, and a camera captures the participant’s image from the other side. It is easy to establish eye contact and recognize nonverbal cues such as gestures or changes in body position. In an interesting, innovative project, Inoue et al. (1995) examined the way television producers vary camera shots, in an effort to automatically produce a more interesting mix of images in video conferences. Some researchers have questioned the value of video in interpersonal communication. Summarizing the results of many researchers, Whittaker (1995) noted that speech is the critical medium for interpersonal communications, and video can do little more than transmit social cues and affective information. Video adds value when used to show physical objects, not speakers and audiences. Heath et al. (1995) similarly conclude that “the principal concern in media space research with supporting (mediated) faceto-face communication has inadvertently undermined its ability to reliably support collaborative work” (p. 84). They observe that “where individuals do, for example, try to write a paper together using the media space, or provide advice on the use of new software, the inability to see and share objects and shift ones views of each other causes frustration and difficulty for those involved” (p. 86).

6.3 Multicast Video and Audio Live television offers a familiar mechanism for communication at the same time to different, unpredictable places. Producers of a live television show hope for a large audience, but they do not know who is watching or where they are located. Television can serve both entertainment and educational purposes, but today’s technology supports little or no opportunity for viewer feedback or participation. Groupware offers a similar capability, but with the potential for two-way communication. The Multicast Backbone (MBONE) on the internet (Macedonia and Brutzman, 1994) distributes live audio and video presentations. Many special interest groups within the Internet community have conducted online conferences using MBONE coupled with a shared white board program to display presentation materials. Isaacs et al. (1994, 1995) at SunSoft developed and evaluated a system called Forum that uses advanced MBONE technology to broadcast audio, video, and slides to a live audience. The speaker uses Forum to present and

COOPERATIVE WORK AND GROUPWARE

29 1

annotate slides, identify and conduct polls of the audience, and call on audience members. Audience members view a video image of the speaker, respond to polls, and request permission to speak in one window. In a second window audience members view the slides, and in a third window they can view a list of all audience members and exchange private messages. In a controlled study of face-to-face and distributed presentations, Isaacs et al. (1995) found that more people attended Forum presentations, but they paid less attention than face-to-face audiences, simultaneously reading their mail or talking to co-workers. Audiences strongly preferred attending Forum presentations over face-to-face presentations, but the speakers, not surprisingly, preferred the interactivity and feedback of the face-to-face presentations. Today, distributed meeting technology is at an early stage of development and in limited use. MBONE conferences are held frequently on the Internet, using freely available technology, but participation requires high-speed Internet access, appropriate hardware, and expertise in network technology. MBONE technology is rarely used within companies because of its potential impact on network performance. As this technology matures and network performance increases, distributed meeting technology is likely to be in widespread use for meetings both within and between enterprises.

7. Shared-information-space Technologies Information artifacts are created as the product of work, and in support of workplace activity. They typically play a central role in collaboration. Workgroups create these artifacts collaboratively, and some artifacts such as project schedules facilitate coordination among participants. Shared information spaces frame such collaboration. Some shared information spaces are created through tools for real-time concurrent interaction with the information. The tools typically engender an experience of direct collaboration and communication among the participants. Other shared information spaces are places to store and retrieve information in support of asynchronous collaboration. Some technologies, such as MUDS and MOOS, integrate these capabilities.

7.1

Real-time Shared Spaces

Real-time shared information spaces enable people to work together synchronously with awareness of other participants and their activities. Multi-user whiteboards and other multi-user applications enable teams to draw or type concurrently in a shared space. Meeting facilitation systems

292

JONATHAN GRUDIN AND STEVEN E. POLTROCK

provide shared spaces for capturing and manipulating the contributions of all meeting participants. MUDS, MOOS, and virtual worlds create the experience of interacting with people in an artificial environment.

7.1.1 Shared Whiteboards and Application Sharing Shared whiteboards and application sharing are two features of desktop conferencing technologies, often packaged with video conferencing products. Video conferencing features emphasize communication support, whereas the desktop conferencing features enable collaborative interaction with information artifacts. Shared whiteboards are simply multi-user graphics editors. In general, all users can draw, type, or telepoint simultaneously on the same virtual white board, can import images from other applications, and can store images generated in advance for a “group slide show”. These objects often serve as conversationalprops (Brinck and Gomez, 1992). The analogy with a physical whiteboard is even more obvious in products such as LiveBoard (Elrod et al., 1992) and Smart2000 (Martin, 1995), which include display screens the size of wall-mounted whiteboards. Input devices include cordless pens and touch sensitive screens. Tivoli (Moran et al., 1995), the editor included with the LiveBoard, allows independent manipulation of graphic objects, a capability that most current products do not support. Shared whiteboards are simple examples of the larger class of rnulti-user applications. A more advanced example is Aspects, released by GroupLogic in 1989, which included full-featured multi-user text, draw, and paint editors for the Macintosh. Despite its advanced capabilities, Aspects did not achieve market success. Its developers could not keep pace with the demand for platform independence and features that matched the latest versions of single-user text, draw, and paint editors. Application-sharing technologies allow a group to work together using a single-user application running on one of their computers. The software transmits the application’s windows to all users and integrates all users’ inputs into a single input stream. Examples include HP’s SharedX, X/ TeleScreen, Smart2000, Fujitsu’s DeskTopConferencing (DTC), and Microsoft’s NetMeeting. Video conferencing and multi-user applications usually run in distinct windows that compete for display space. The video cannot provide information about gestures or direction of gaze that would communicate which objects people are attending to within the shared application. ClearBoard (Ishii and Kobayashi, 1992; Ishii et al., 1992) solves this problem by integrating the video image of participants and the shared

COOPERATIVE WORK AND GROUPWARE

293

information space. The conceptual model for ClearBoard was working on opposite sides of a clear sheet of glass. ClearBoard overlays a video image with a multi-user application to achieve the same effect, reversing the image to achieve the same left-right orientation.

Desktop Conferencing Architectures. Architecturally, desktop conferencing systems differ as to whether the application is centralized or replicated (Greenberg et al., 1995). Both architectures feature a conference agent, the core of the conferencing product, running on all participating computers. However, the method by which a conference agent establishes and manages communication in a desktop conferencing session differs across the architectures. The centralized architecture depicted in Fig. 3 is the foundation for shared-application technologies. The conference agent intervenes in the communication between a single-user application and the computer’s window system. The application’s outputs are captured by the conference agent and transmitted to the conference agents on all participating computers. These agents convey the output to the window systems, which present it to the users. A user at any computer may interact with the application’s objects using keyboard and mouse. The conference agent integrates these inputs and

Single-user

Conference Agent

Window System

FIG. 3. Diagram of the centralized architecture underlying most shared-application technologies. The application runs on one computer and inputs and outputs for all conference participants are controlled and integrated by a conference agent.

294

JONATHAN GRUDIN AND STEVEN E. POLTROCK

delivers a coherent input stream to the application. To achieve a coherent input stream, the conference agent generally enforces a floor control policy, accepting inputs from only one user at a time. Examples of floor control policies are: (1) accept input from only one person until that person passes control to another designated participant; (2) accept input from anyone when no inputs have occurred within a specified interval; or (3) accept inputs from anyone at any time. Shared whiteboards can also be implemented using this centralized architecture. An early example was Wscrawl (Wilson, 1995), a public domain group sketching program that runs on the X-Window system on UNIX machines. The primary advantage of the replicated architecture is, of course, that everyone can create and edit information simultaneously. In practice, people rarely create or edit information at the same time, but this technology allows them to when the task demands it. Replicated architectures are also generally capable of high performance even with large numbers of participants. The principal disadvantage of this architecture is that the applications must be implemented within its framework. Few applications are developed to support multiple concurrent input streams from different users. Developers of these applications are handicapped by the existing application development environments, which evolved to support development of single-user applications.

Appraisal. The principal advantage of the centralized architecture is that conferences can be held using any application. A workgroup can use their favorite word processor, spreadsheet, graphics program, or an application for specialized work such as a CAD system. Furthermore, the conference can include different types of computers as long as they have a common window system. Using SharedX, for example, a desktop conference can include EM-compatible PCs, Macs, and UNIX machines running X Windows. Work in progress today should allow conferences on machines running different window systems, with translation between equivalent features on different window systems. The principal disadvantage of this architecture is that the application accepts only one input stream and does not distinguish among users. In addition, systems implemented using this architecture exhibit performance decrements as the number of participants increases. The replicated architecture shown in Fig. 4 is the foundation for most shared whiteboards and other multi-user applications. The same application runs on each computer, and the conference agent tries to ensure that all copies of the application remain synchronized. The conference agents do not transmit application output to other computers. Instead, they ensure that all users’ inputs are distributed simultaneously to all copies of the application.

COOPERATIVE WORK AND GROUPWARE

A Conference Agent

295

Conference Agent

FIG.4. Diagram of the replicated architecture underlying most shared whiteboards. The application runs on every participant’s computer and the conference agent7 ensure that all copies of the application have the same data.

As before, the conference agent enforces floor control policies. But with this architecture the policy may also permit simultaneous interactions with application objects. Aspects, for example, allowed users to edit text or graphics simultaneously, but not the same paragraph or the same graphic object, thus avoiding collisions. Desktop conferencing products are sometimes used in meeting rooms because they enable people to work together more effectively (Wolf et a!., 1995). Vendors of desktop conferencing technology emphasize the financial advantages of working together from different locations. Little evidence exists, however, that companies have reduced travel and saved money as a consequence of adopting desktop conferencing products. Evidence does exist that these technologies can change the way people perform their work. Mitchell et al. (1995) observed young students writing collaboratively using Aspects. Over time, the students shifted from parallel writing, to use of a single recording scribe, to synchronous editing. Olson et al. (1992) observed design teams using a similar editor and found that teams produced higher-quality designs and stayed more focused on key issues. Certainly there are offsetting disadvantages of desktop conferencing technologies. All participants must be available at the same time, which is especially difficult across time zones. Furthermore, a meeting conducted using desktop conferencing does not feel the same as a face-to-face meeting

296

JONATHAN GRUDIN AND STEVEN E. POLTROCK

and does not follow the same pattern of social interaction. The impact of desktop conferencing on team building is unknown, with suggestions that it is inadequate. Consequently, many companies support distributed teams with a mixture of face-to-face and desktop conferencing meetings.

7.1.2 Meeting Facilitation As noted earlier, meeting facilitation technology has different origins than other categories of groupware. University management science departments have long studied business meetings and sought ways to improve meetings. Their research has led to development of technologies, including hardware, software, and techniques for improving meetings. These technologies are often called group decision support systems (GDSS) or simply group support systems (GSS). The principal US academic centers for research and development of meeting facilitation technology have been the University of Minnesota and the University of Arizona. Both universities established meeting facilities where business meetings can be facilitated and observed, and the technologies they developed served as the nucleus of commercial products. A meeting facility includes a computer for each meeting participant, one or more large display screens, and software that facilitates meeting processes. Researchers at the University of Minnesota developed Software-Aided Meeting Manager ( S A M M ) as an integrated suite of tools intended to support meeting processes such as issue identification, brainstorming, voting, and agenda management pickson et af., 1989). This technology builds on a research program defined by DeSanctis and Gallupe (1987) that integrates behavioral science, group process theory, and adaptive structuration theory. Jay Nunamaker and his colleagues at the University of Arizona developed similar meeting facilitation prototypes, which Ventana Corporation integrated into a commercial product called Groupsystems (Nunamaker et al., 1991) and IBM marketed as TeamFocus (McGoff and Ambrose, 1991). The activities supported by Groupsystems include exploration and idea generation, idea organization and categorization, prioritizing and voting, and policy development and evaluation. Several different tools may support each of these activities. As a meeting evolves, a human facilitator selects tools to support the current processes. The value of these systems is most evident when meeting participants generate ideas, because all participants can enter ideas concurrently. With many more ideas generated, organizing them becomes the next challenge. Chen et af. (1994) developed an automatic concept classification tool that creates a tentative list of the important ideas and topics, which participants can examine and revise or augment.

COOPERATIVE WORK AND GROUPWARE

297

Support for face-to-face meetings remains an active area of CSCW research for technology developers as well as social scientists. For example, Streitz et al. (1994) developed a system called DOLPHIN that includes a large, interactive electronic white board and individual workstations for meeting participants. The design of DOLPHIN was based on observational studies of editorial board meetings where an electronic newspaper was planned and created. Using DOLPHIN, board members can create and share informal information such as freehand drawings or handwritten scribbles, and formally structured information such as hypermedia documents. Mark et al. (1995) found that groups organized more deeply elaborated networks of ideas using DOLPHIN. People who have never used meeting facilitation systems are often skeptical about their value. They point to the importance of social dynamics, face-to-face discussions, and nonverbal communication in meetings, apparently absent in anonymous typed interaction. Advocates of meeting facilitation systems have ready responses. First, people still talk to one another in a facilitated meeting; they use computers only in support of specific tasks, such as brainstorming. Second, these systems have fared well in some controlled experiments and field studies. Post (1992) conducted a field study of IBM’s TeamFocus in a large American corporation. The study included 64 meetings on a variety of topics, averaging over 10 participants. Prior to the meetings, Post and his team conducted interviews to determine the time and resources normally required to achieve the objectives of the meetings. They measured the actual time and resources when the work was performed in facilitated meetings. Typically, meetings at this corporation served to coordinate work done outside the meetings. By performing the work in the facilitated meetings, total flow time was reduced by a dramatic 91%. Including the costs of equipment, facilities, and trained facilitators, they predicted that the technology would provide a one-year return on investment of 170%. Nevertheless, the corporation did not adopt meeting facilitation technology, and other companies have also been slow to adopt it.

7.7.3 MUDS, MOOs, and Virtual Worlds Multiuser Dungeons’ (MUDs) and their object-oriented extensions (MOOs) are multi-user, text-based virtual worlds. Most MUDs provide game environments similar to Adventure or Zork except that (1) they have no score or notion of winning; (2) they are extensible, and (3) participants

’

The term “dungeon” has become a bit of an embarrassment, so the “D” is often rechristened “dimensions” or some other word.

298

JONATHAN GRUDIN AND STEVEN E. POLTROCK

can communicate (Curtis, 1992). Social interaction is a key feature of MUDs; in fact, more than 300 MUDs lack any game features, simply providing environments for communication and for building new areas or objects for general enjoyment (Curtis and Nichols, 1994). MUDs are being adapted to support work-related communication. MUDs maintain information about users, objects, and interconnected rooms. The MUD users interact with this database, moving from room to room, manipulating objects, and communicating with other users. The interconnected rooms form a virtual world described in text. Users type simple commands such as “Go north” to move from one room to another. When a user enters a room the MUD displays its description, including any objects or other people in the room. Users in the same room can talk to one another and interact with the objects. A MOO includes object-oriented tools for extending the MUD by building new objects and rooms. The MOO virtual world can serve as a workplace and support work-oriented communication. For example, Bruckman and Resnick (1993) report experiences with MediaMOO in which the virtual world corresponds to the physical world of the MIT Media Lab. MediaMOO provides a place for media researchers from around the world to socialize, talk about their research projects, interact with the virtual world, and create new objects and places. Towell and Towell (1995) created a virtual conference center within BioMOO, where professionals in biological sciences hold scientific seminars. People from around the world participated in discussions within the virtual conference center. The heart of a MOO is a shared information space that supports communication. Curtis and Nichols (1994) describe extensions including windowsbased user interfaces, shared tool access, audio, and video. When a user “looks at” a map found in a MOO, a window could open that shows the map. An example of a shared tool was reported by Masinter and Ostrom (1993) who created MOO objects that access Gopher servers, enabling collaborative Internet searching. Audio and video are achieved through integration with the multicast capabilities described in Section 6.3. Nichols er al. (1995) and Curtis et al. (1995) describe the technical implementation of these extensions. The emergence of the Virtual Reality Modeling Language (VRML) standard has allowed evolution from text-based MUDs and MOOS to graphical, three-dimensional virtual worlds. In these worlds, participants are represented by graphical “avatars”. Darner et al. (1996) evaluated five prototypes that provide multi-user graphical virtual realities. Participants communicate through text-based chat windows, as in MUDS. Greenhalgh and Benford (1995) developed and tested a virtual reality teleconferencing system called MASSIVE that enables a group to interact using audio, graphical, and

COOPERATIVE WORK AND GROUPWARE

299

textual media. Bowers e f al. (1996) studied social interactions during a MASSIVE virtual meeting and identified problems in turn taking and participation that must be addressed for this technology to be widely accepted.

7.2 Asynchronous Shared Spaces Collaborative work does not always require real-time communication or simultaneous interaction. Often people structure their work so they can contribute independently to a shared product. They need a well-organized, shared information repository where they can place their contributions, and retrieval tools to find information created by others. This section describes three technologies for storing and organizing information. Asynchronous computer conferencing tools organize information around ad hoc topics. Document management systems are specialized for supporting the creation and maintenance of electronic documents. Lnformation management tools provide flexible frameworks for diverse information structures.

7.2.1 Threaded Discussions or Asynchronous Computer Conferencing Asynchronous computer conferencing is among the oldest forms of groupware and continues to be widely used under such labels as bulletin boards, threaded discussions, news groups, and public folders. These technologies provide shared information spaces which are typically organized around interest areas. The Internet news group comp.groupware serves as an example. Anyone can post a message to comp.groupware about any topic, but social policies dictate that the message should be either a question or new information about groupware. Other people may post responses, and still others may respond to these responses. Computer conferencing technology maintains databases organized as collections of tree structures. The starting message is the head of a tree and responses to it are branches. Conferencing clients typically display the tree structure so that users can follow the thread of a discussion. The topic-and-response tree structure inherent in computer conferencing is widely used in groupware systems. The first version of Lotus Notes was a computer conferencing system with support for both wide and local area networks, and Notes databases still support the conferencing organizational model. Other groupware products that support asynchronous computer conferencing include Netscape’s CollabraShare and Attachmate’s OpenMind. Such products are being integrated with Web technology so that users can participate in discussions through their Web browser.

300

JONATHAN GRUDIN AND STEVEN E. POLTROCK

7.2.2 Document Management Documents have a central role in many collaborative work activities. Academic papers are often co-authored. In business settings one person may write a document but others may review, edit, and approve it. Teams writing large documents generally divide or “shred” documents into sections that are assigned to different authors who work in parallel, communicating with one another as necessary. Each section, and the document as a whole, may be reviewed, revised, and approved. A document’s contribution to collaborative work may continue long after its production. An engineering document describing a physical system design can inform the teams responsible for planning its manufacture and support. These teams may even reuse parts of the original engineering document. In essence, a document represents an external memory that can enable long-term collaboration among people who may never meet or know of one another. These two collaborative activities-document creation and document reuse-call for somewhat different capabilities. Document creation requires support for coordinating contributions, and document reuse requires support for finding relevant information. Document management systems support both activities. Document management systems complement and are integrated with word processors like Microsoft Word, publishing systems like Frame Builder, and other media editors. Instead of storing and retrieving documents in a file on a local disk or file server, documents are stored on and retrieved from a document management server. The basic elements of a document management system, as shown in Fig. 5, are a repository for the document objects, a database of meta-information about the objects, and a set of services. The essential document management services are access control, concurrency control, and version control. Access control determines who can create, modify, and read documents. Concurrency control, preventing different authors from changing the same document at the same time, is generally accomplished by “checking out” the document to the first person who requests write access. Other users can read or copy the document but cannot edit it. Version control determines whether a modified document replaces the original or is saved as a new version and how long old versions are retained. Document management systems rarely maintain information about the semantics or structure of the documents they manage. Whether text, graphics, video, or a CAD drawing, to the system it is a blob of unknown content. The semantic information, essential for managing and finding documents, is

301

COOPERATIVE WORK AND GROUPWARE

included in the document meta-information. This database includes the author, date, version number, check-out status, and access permissions. It may also include user-supplied keywords, application-specific fields, position within a hierarchy of folders, and relationships to other documents. A user can, for example, search for all documents written by a certain author between two specified dates. Unfortunately, this powerful search capability requires that authors enter the requisite meta-information, and resistance to this can be the greatest obstacle to the successful use of document management systems. When a workgroup creates a new document, the document management system must support an iterative, interleaved series of basic tasks such as planning, drafting, reviewing, revising, and approving (Sharples et af., 1993). Often different people perform different tasks, and sometimes many people perform the same or different tasks in parallel. Although two people cannot edit a document at the same time, one person could check it out, then use a desktop conferencing system to edit it collaboratively. Although workgroups generally divide a document and assign parts to different authors, few document management systems (one exception is Documentum) support this strategy by capturing the sequential relationship between document sections. The document management client shown in Fig. 5 typically provides search tools to support information reuse. By filling in a form, users can

1

I

Client

Document Management Server

I

Document Management Services

1

meta-information

FIG. 5. The basic elements of a document management system include client tools that communicate with a server to obtain data and services.

302

JONATHAN GRUDIN AND STEVEN E. POLTROCK

submit database queries of the meta-information combined with searches of the document text. For example, a user could easily request all documents by a specific author containing a particular word or phrase. Researchers are exploring more powerful information retrieval methods. Lucas and Schneider (1994) describe a document management system called Workscape that represents documents as two-dimensional objects in a threedimensional space. Users can group documents by stacking them, just as office workers typically do with paper documents. Rao et al. (1994) scanned paper documents into a document management environment, then deployed information retrieval methods to help users quickly find information. The World Wide Web offers an ideal environment for document management services. Few web servers, with the notable exception of Hyper-G (or HyperWave), provide these services yet, but vendors are integrating web technology and document management systems. Hyper-G is a web server with integrated access control and sophisticated information retrieval capabilities, including the ability to navigate through a threedimensional representation of the document space (Andrews et al., 1995; Maurer, 1996).

7.2.3 Information management Information management technologies such as Lotus Notes combine features of a document management system with structured objects. Most document management systems treat documents as uninterpretable; information management systems, in contrast, manage the structure of the document objects. Lotus Notes represents documents as a collection of named fields and their values. Some fields may contain text, graphics, video, audio, or other media. Other fields contain predefined keywords, dates and times, or other structured data that either the computer or a person can interpret. The combination of structured and unstructured fields constitute a semistructured document. Malone et al. (1987) established the power of semistructured documents as a foundation for collaborative work. A research prototype called Oval (Malone et al. 1992) demonstrated that semistructured documents can contribute to radically tailorable tools for collaborative work. Oval could be customized to behave similarly to gIBIS (Conklin and Begeman, 1988), The Coordinator, Lotus Notes, or Information Lens (Malone et al., 1987). The current version of Lotus Notes integrates the basic features of Oval to create a rapid application development environment for workgroup applications. A simple example illustrates the power of semistructured documents in an information management environment. An application for trackmg action

COOPERATIVE WORK AND GROUPWARE

303

items contains structured fields holding the name of the person responsible for the action item, its due date, and its title. Unstructured fields, potentially containing text, graphics, video, or audio, hold the purpose of the action item and a report about its outcome. Notes can interpret the structured fields, sending email to alert the responsible person of an impending due date. Views of action items show them ordered by due date and categorized by responsible person. The unstructured fields, intended for human interpretation, are not processed by Notes. Hypertext provides an alternative way of organizing information elements. SEPIA (Haake and Wilson, 1992) is a hypertext authoring system that links nodes within activity spaces. These spaces are designed to support the tasks of content generation and structuring, planning, arguing, and writing under a rhetorical perspective. An interesting feature of SEPIA is its support for multiple modes of collaboration. A graphical browser reveals to authors working within the same composite node which component node each person has checked out. Aware of working in the same space, they have the option of entering a tightly coupled collaborative mode by launching a desktop conferencing tool.

Shift Work. In many work settings, tasks continue from one shift to another, sometimes around the clock. These tasks are coordinated by systematically recording and passing information from one shift to another. Monitoring satellites, global financial markets, and hospital patients are examples of continuous activities. The Virtual Notebook System or VNS (Fowler et al., 1994) is one of many tools that support shift work in medical settings, providing an online repository for the information traditionally recorded in patients’ charts. When Lotus Notes was announced in 1989, the presentation included an example of shift work from the airplane manufacturing industry. Airplane assembly continues around the clock, and the demonstration tracked assembly status and problems to provide a smooth transition from one shift to the next. Shift work settings are somewhat unusual; most collaborative work performed at different times is also performed in different places. Both VNS and Lotus Notes are primarily used to support collaborative work performed in different places. They illustrate that technologies that support work at different times can often also be deployed for shift work or for work at different locations.

Team Rooms. Many companies establish team rooms or visibility rooms that workgroups use as shared information spaces. Teams post information on the walls of these rooms about their plans, accomplishments,

304

JONATHAN GRUDIN AND STEVEN E. POLTROCK

and work in progress. When team meetings are held, this information can be referenced. Between meetings, individuals or subgroups create and modify the shared information. Workgroups can use groupware to construct a virtual team room where information is maintained and referenced. Work is performed in different physical places but the same virtual place. For example, Cole and Johnson (1996) described a TeamRoom developed using Lotus Notes to support collaboration among physically distributed executives.

8. Coordination Technologies Virtually all groupware technologies include some coordination features to facilitate interactions among participants. For example, as noted earlier, real-time video conferencing and shared whiteboard products include coordination features for establishing and maintaining communication channels. Some technologies are principally intended to coordinate group activity. Calendar and scheduling technologies help find convenient times for group meetings and schedule resources for those meetings. Workflow management systems route information from one person to another in accordance with a business process model. Both workflow management and project management technologies help plan how work will be coordinated and resources allocated.

8.1

Calendars and Scheduling

Calendar and scheduling products often serve as personal information management systems while helping teams coordinate their work. Individual users are supported by personal calendars, action item lists, contact lists, and other features. Coordination is supported by group calendars, meeting reminders, on-line rolodexes, and especially by scheduling functions that aid in searching the calendars of multiple users to find convenient times for meetings and schedule resources such as meeting rooms. Integration with email can facilitate the invitation process. Support for meeting scheduling has been an active research area for over a decade; in fact, it has been adopted by the distributed artificial intelligence community as a demonstration problem on which to test approaches. Nevertheless, scheduling features in commercial products went unused for many years due to the lack of a “critical mass’’ of use in most environments-too many people found paper calendars more convenient (Grudin, 1988; 1994b). Calendar applications have matured, sporting better interfaces, a range of individual-support features, and email integration.

COOPERATIVE WORK AND GROUPWARE

305

Users and technical infrastructures have also matured, leading to widespread use of scheduling in some environments (Grudin and Palen, 1995).

8.2 Workflow Management Workflow management systems provide tools for coordinating work by managing the task sequence and the flow of information and responsibility. Workflow management technologies were first created to support imaging applications such as insurance forms processing. To improve efficiency and accountability, insurance companies installed technology to scan paper forms and process the form images. Workflow applications were developed to route information from one person to another when each task was completed. Building on their experience with custom applications, imaging system vendors developed tools for constructing process models and managing the flow of work. They hoped to adapt workflow management to a wide range of work settings, including many where scanning and image processing have no role. They began marketing their tools as technology for managing business processes. Concurrently, US industry became vitally concerned with improving business process efficiency. Drucker (1991) wrote, “The greatest single challenge facing managers in the developed countries of the world is to raise the productivity of knowledge and service workers.” To meet this challenge, corporations initiated business process re-engineering initiatives. The basic steps of business process re-engineering are: (1) collect data from current processes; (2) understand and model current processes; (3) the process participants re-design their processes; (4) implement re-designed processes; (5) go to step 1. The aim is to divide business processes into activities, such as designing, machining, testing, or installing a part, that add value to a business’s products or services, and those that do not add value, such as transporting and storing materials, or searching for information. Workflow management systems help reduce non-value-added knowledge work by minimizing the time spent deciding what to do, searching for information, and tracking work progress. Work process modeling is an essential step in business process reengineering, and workflow management systems offer tools for creating, analyzing, and revising these models. Once a detailed model of a reengineered business process has been constructed, a workflow management

306

JONATHAN GRUDIN AND STEVEN E. POLTROCK

system might help ensure that the process is followed, show the status of work in progress, and provide metrics of its performance. At this level of analysis, workflow management systems appear to be an ideal tool set to support business process re-engineering. However, workflow management systems require a more detailed model than the typical corporate business process model. In a large corporation the business process models describe organizational missions, objectives, and responsibilities, and the large-scale flow of information between organizations. Workflow management requires specification of tasks (e.g., approve purchase request), task sequence (e.g., draft, review, approve), roles (e.g., project manager), people (e.g., Linda Smith), tools (e.g., electronic form), data (e.g., item, amount, and signature), and dependencies (e.g., amount<$5000). A business process model serves at best as only a foundation for the far more detailed workflow model. Different users of production workflow management systems have different responsibilities and stand to realize different benefits. Programmers or system definers use the system to construct detailed models of the workflow tasks and roles, to create forms and views, and to integrate data and other tools. For them, the benefit is a set of tools that simplify construction of customized workflow models and an engine that interprets these models. Work performers (or “end users”) select tasks from a worklist, then perform the tasks with supplied data and tools. They benefit from access to a well-defined process, to information about task status, and to the tools and data. Managers define and modify the business processes, then monitor the results. They benefit from process metrics and analysis, adherence to policies and procedures, and more efficient resource utilization. Finally, a customer of the business process may not interact directly with the system, but benefits from shorter flow time and availability of status information about their work. Other types of products have integrated lightweight or ad hoc workJow features that allow end users to describe and initiate a simple workflow or routing model. Novell’s Groupwise XTD, for example, includes a workflow modeling tool and execution engine intended for end users. Lotus Notes 4 includes templates that enable end users to specify a routing and approval sequence for a document. These ad hoc workflow systems also track workflow status, and are gradually incorporating more of the capabilities of the production systems. Standards to support interoperability among workflow management systems have not existed. The Workflow Management Coalition initiated this important step in the maturation of the field, developing the reference architecture shown in Fig. 6. Six components are interconnected via five interfaces. The plan is to define a standard for each interface. The center of

COOPERATIVE WORK AND GROUPWARE

307

FIG. 6. The WorkRow Management Coalition reference architecture.

the architecture is the Workflow Enactment Engine, which interprets and executes the process models created by the Process Definition Tools. New work is initiated and delivered to the responsible users through the Worklist Tool. When a task is selected from the Worklist, the Workflow Enactment Engine may invoke other applications to assist users. An enterprise with multiple workflow systems will require an interface to Other Workflow Engines. Microsoft recently proposed MAPI-WF as the standard for this interface. The Process Definition Tools component is of special interest because its user models the workgroup processes. Until recently, system definers described workflow models as a list of preconditions and tasks, but most systems today offer a graphical editor for defining the process flow. Most systems adopt an Input Process Output (IPO) model, but an exception is the Action Workflow model (Medina-Mora et al., 1992). IPO models originated in process analysis and computer programming flowcharts. Their principal advantage is that they are conceptually easy to understand. A disadvantage is that they encourage an oversimplified, unidirectional, sequential view of business processes. The waterfall model would be a natural outcome of using P O models to describe software engineering practices. An example of an P O modeling method is Information Control Net (ICN) developed by Ellis (1979). The syntactic elements of ICN and a simple ICN model are depicted in Fig. 7. The Action Workflow model is more difficult for a novice to interpret. Business processes are represented as cycles of communication acts between a customer and a performer. In the simplest cycle the customer requests a deliverable, the performer agrees to produce it, later the performer reports its completion, and finally the customer agrees that the deliverable meets its requirements. An Action Workflow system supports communication about the work among all participants. Of course, each of

308

JONATHAN GRUDIN AND STEVEN E. POLTROCK

Or-split

And-split

Or-join

And-join

Director approval Proposal Process reception creation

Process analysis

Answerto customer

Archive

Sub-director approval

FIG. 7. The syntactic elements of Information Control Networks and a simple example of a workflow model composed using these elements.

the four basic communication acts can require additional communication, represented as additional cycles. An example of a very simple Action Workflow model is shown in Fig. 8. Research in workflow management could benefit significantly from establishment of the standards for the five interfaces shown in Fig. 6. Today, research in this field requires building a complete system or establishing a special relationship with one vendor. With standards in place, a researcher could focus on improving one component of the workflow architecture and use commercial products for the other components. Abbot and Sarin (1994) defined several issues requiring further research. A key issue is support for and management of exception handling. Exceptions to business processes are common, and the inability to handle them is a major source of failure in this area. For example, when and how should a system reassign work that has been assigned to a person who is sick, on vacation, or simply too busy with other tasks? How should the workflow system support negotiating a new deadline for overdue work? As workilow systems are adopted by knowledge workers who maintain on-line calendars, these two systems could be integrated very naturally, with the workflow system entering tasks and meetings in calendars and consulting

COOPERATIVE WORK AND GROUPWARE

309

FIG. 8. A simple workflow model using the Action Workflow modeling approach.

calendars for information about a worker’s task queue. Knowledge workers may also access libraries of process descriptions instead of creating new models for each new task. As their models grow in complexity, knowledge workers will require tools that provide change management for process models.

9. Challenges to Groupware Development and Use 9.1 Technical Challenges Of course, groupware development faces many technical challenges. Everything from better compression algorithms to faster processors contributes. A survey lies outside this review. We will restrict ourselves to providing a few examples of technical problems that are in part driven by the particular nature of groupware. Integration of media is a continuing trend that is unfinished. Many groupware successes come from integrating technologies that previously existed in isolation. Lotus Notes integrated email and information sharing; modem meeting schedulers integrate calendars with email; other examples include videomail and the more general use of attachments and embedded media. Interoperability is a key to supporting group use in environments where not everyone has the same platform. Much groupware requires most or all

310

JONATHAN GRUDIN AND STEVEN E. POLTROCK

group members to use it. For example, in organizations where people use incompatible calendars, scheduling features go unused. More generally, technical standards are particularly important with groupware applications, which often must work in concert with other software to be useful. It is overwhelmingly difficult to develop a co-authoring tool if it entails building a new full-function word processor, but if a standard interface to existing word processors has been defined, an opportunity exists. Insufficient flexibility has been a major problem for groupware. A technical approach to providing greater flexibility is to develop reflective systems that contain modifiable representations of their own behavior (Dourish, 1995a). Dourish (1995b) addresses the issue of reconciling conflicting changes resulting from parallel activity. Complete locking may be unnecessarily restrictive and inefficient, but permitting divergence requires mechanisms for flagging and reconciling changes.

9.2

Social and Organizational Challenges

Groupware is now achieving successes, but for decades groupware features and applications were developed without success, and failure is still more common than success. Grudin (1994b) presents a detailed account of non-technical challenges to designing, developing, and deploying groupware. The list below summarizes and extends his points. (1) Disparity in work and benefit. Groupware applications often require that some people do additional work. Often they are not the primary beneficiaries, and may not perceive a direct benefit from the use of the application. Therefore they do not do the work, and in many groups, it is unlikely to be required. This is a very common source of trouble. (2) Critical mass, prisoner’s dilemma, and the tragedy of the commons problems. Even when everyone would benefit, groupware may not enlist the “critical mass” of users required to be useful. Alternatively, it can fail because it is never to any one individual’s advantage to use it, the “prisoner’s dilemma.” Markus and Connolly (1990) detail these problems. The tragedy of the commons describes a situation where everyone benefits until too many people use it. This can be a problem for highways and information highways. ( 3 ) Disruption of social processes. Groupware can lead to activity that violates social taboos, threatens existing political structures, or otherwise demotivates users crucial to its success. Much of our knowledge of social conventions is implicit and cannot be built into today’s systems. Even where recognized, as in the area of privacy, addressing these issues is difficult.

COOPERATIVE WORK AND GROUPWARE

31 1

(4) Exception handling. Groupware may not accommodate the wide

(5) (6)

(7)

(8)

range of exception handling and improvisation that characterizes much group activity. The prevalence of this and its significance for information systems has been brought to light by the detailed observations of ethnographers (e.g., Suchman, 1983). Their papers are among the most valuable contributions to CSCW conferences. Unobtrusive accessibility. Features that support group processes are used relatively infrequently, requiring unobtrusive accessibility and integration with more heavily used features. Dificulty of evaluation. The almost insurmountable obstacles to meaningful, generalizable analysis and evaluation of groupware prevent us from learning from experience. Failure of intuition. Intuitions in research, development, and use environments are especially poor for multi-user applications, resulting in bad management decisions and an error-prone design process. Certain technologies, particularly those that might benefit managers, tend to be viewed too optimistically, and the value of other technologies is overlooked. The adoption process. Groupware requires more careful implementation (introduction) in the workplace than product developers have recognized. Examining this process, Francik er al. (1991) provide hints of the difficulties facing efforts to introduce a groupware product on a group-by-group basis. Groups often span organization chart unit boundaries. Which group will pay for the dispersed machines? Who decides where computers and the necessary peripherals are placed? What happens as groups reorganize or change their focus? “Shnnkwrap groupware” appears to be impractical.

10. New Approaches How do we address these challenges, specifically the novel social and organizational challenges? Software product developers in the past relied on market research and consultants, but at least until we have more experience with groupware it is not clear that they can help. Their approaches work better for assessing individual preferences than for understanding the dynamics of groups working together. Another traditional development approach, hiring a domain expert to work with the team, is highly susceptible to the biases of the individual. A system to support a newsroom that looks good to a typographer, may not look good to the reporter, proofreader, editor, or other team members.

312

JONATHAN GRUDIN AND STEVEN E. POLTROCK

Nevertheless, the knowledge developed in designing interactive software provides a solid foundation. Gould (1995) summarizes techniques from the field of human-computer interaction, focusing on early and continual user involvement and user examination of prototypes, iterative design, and consideration of all aspects of usability in parallel. Grudin (1991) and Poltrock and Grudin (1994) analyse the use of these techniques in organizational settings. The techniques are clearly valuable, but require an encompassing method of application that is appropriate for the given setting. Participatory design approaches have also been honed for decades, in particular sociotechnical design from England and collaborative Scandinavian approaches (Bjerkenes et al., 1987; Greenbaum and Kyng, 1991; Schuler and Namioka, 1993). These approaches maximize the involvement of users in development, often making them members of the development team, and focus on techniques for mutual communication, education, and contribution. Initially used primarily on large, in-house projects, they are increasingly being adapted to groupware development. Contextual inquiry, analysis and design is an approach developed by Holtzblatt and her colleagues that draws on strengths of user-centered and participatory design (Beyer and Holtzblatt, 1995; Holtzblatt and Beyer, 1993; Holtzblatt and Jones, 1993). Contextual inquiry centers on interviews conducted in the workplace as work is in progress. The questioning is thus intrusive and perhaps disruptive, but provides an efficient method for gathering data, with the goal of establishing a shared understanding of interviewer and worker about the work practice. The data from a series of interviews is then rigorously analysed from several perspectives to reach an understanding of the work context and practice in a form that can be communicated to other design team members. Contextual inquiry and analysis has been widely used at Microsoft and other organizations. As noted earlier, the Information Systems field, drawing on social science and management studies, has contributed. Galegher et al. (1990) is a compendium of work from the social sciences, including research contributions to some of the videoconferencing systems described earlier. Orlikowski (1992) conducted an influential study of the introduction of Lotus Notes in a consulting organization. She found, among other things, that the reward structure greatly affected the reception of the technology: Consultants had little incentive to share knowledge and were not inclined to use Notes, whereas the technical support staff benefited from sharing knowledge and did use Notes for this purpose. Ethnographic or anthropological studies take time to conduct, but can provide a wealth of detailed knowledge of group and organizational behavior. Perin (1991) showed the mixed benefits and costs of email in organizational settings, perhaps explaining its cautious spread over its first

COOPERATIVE WORK AND GROUPWARE

313

decades. Bowers ef al. (1995) describe work processes in a print shop that has adopted a workflow management system, revealing the extensive exception-handling and flexible accommodation to situations that are obstructed by systems based on notions of standard processes. These are examples of a rapidly growing literature from which specific and general lessons can be derived.

11. Future Directions Throughout this chapter we provided indications of current trends in CSCW and groupware. The tumultuous arrival of the World Wide Web demonstrates the futility of trying to forecast the future, but some statements are safe. We can confidently anticipate the increasing integration of media. Standards and interoperability will continue to drive progress, providing substantial short-term benefits with some long-term cost in efficiency, perhaps. We will see enterprise-wide adoption of technologies, rather than having them appear one group at a time, and of course we should see the integration of most groupware with email, intranets and internets. Successful designers and developers will pay far more attention to social issues and group dynamics, motivating research into organizational and group behavior. We conclude by showing how synergies among the diverse threads present in CSCW can provide insight into challenges that loom as we work on research, development, and on restructuring our organizations and societies. We draw on knowledge from social science, insights from technology experimentation, and observation of technological change. As computation is brought to bear to support work in many situations, it is clear that if a system can incorporate greater understanding of work processes, it can better support them. The tendency is often to turn to the “standard policy manual,” the official procedures for conducting work in an organization. It does not seem to make sense to try to support nonstandard procedures. However, social scientists have recognized that “standard procedures” should often not be followed literally. They can represent a goal to strive for. They can represent an external face a company wishes to present. They can represent a way to allocate responsibility for a breakdown, in full awareness that comers often have to be cut. The use of “work to rule” to sabotage an organization reflects our awareness that the rules are neither efficient nor generally followed. A system that forces an organization to follow such rules could be counterproductive. Suchman (1983) examined an apparently routine business process and

314

JONATHAN GRUDIN AND STEVEN E. POLTROCK

showed that in practice it required considerable exception-handlingand problemsolving. Thls and other studies reveal that the reality of work practices is much more chaotic than is generally recopzed. The orderly face presented to the outside world often masks a far less orderly internal operation. This presents a problem for groupware developers, as illustrated by Ishii and Ohkubo (1990). To design a system to support office procedures, they first consulted the organization’s “standard procedures manual.” Realizing that it might be misleading, they conducted interviews to determine how people actually worked, whch turned out to be quite different. They designed a system to support the “actual standard procedures” that people used. Nevertheless, upon completion, they found that the office work involved exception-handling that was beyond the capability of their system to address. Nor did they see hope of developing a system that could adequately cope with the level of exception-handling that they had discovered. However, a more serious issue looms on the horizon. With the rapid development of the communication technologies such as desktop videoconferencing, and information sharing technologies such as the World Wide Web and Lotus Notes, the computer is becoming less of a “computing machine” on our desk and more of a window onto the world. The window is by no means perfectly transparent-it filters, it selects-but we are moving toward far greater transparency, toward having much more information available when we want it. This has many benefits. But one side-effect, perhaps a long-term benefit or perhaps not, but which will surely be disruptive in the short term, is that the window will reveal much of the underlying “chaos” or non-routine activity that Suchman and Ishii and Ohkubo reported. The masks and myths of smooth, consistent operation are being stripped away. Because few people are aware of the degree of disorder that exists-our memory and our customs tend to suppress awareness of it-as the process of revelation picks up steam, it will often be highly unsettling. We will see the violations, the irregularities, the inconsistencies. Possibly we can use technology to recreate the masks and the myths, but probably not, and it is not clear we should try. Perhaps the new technologies will be suppressed. But if not, their use will surely lead to the rapid evolution of new social practices and organizations. REFERENCES Abbott, K. R., and Sarin, S.K. (1994). Experiences with workflow management: Issues for the next generation. Proceedings ofCSCW ’94, Chapel Hill, NC, October, pp. 113-120. Agostini, A,, De Michelis, G., and Grasso, M. A. (1997). Rethinking CSCW systems: The architecture of Milano. Proceedings of ECSCW ’97, Lancaster, September.

COOPERATIVE WORK AND GROUPWARE

315

Andrews, K., Kappe, F., and Maurer, H. (1995). Hyper-G and Harmony: Towards the next generation of networked information technology. CHf'95 Conference Companion, Denver, CO, May, pp. 33-34. Baecker, R. (1993). Readings in Groupware and Computer-supported Cooperative Work, Morgan Kaufmann, San Mateo. Bannon, L. and Schmidt, K. (1991). CSCW: Four characters in search of a context. In Studies in Computer Supported Cooperative Work: Theory, Practice, and Design (J. M. Bowers and S. D. Benford, Eds.), North-Holland, Amsterdam. Reprinted in Baecker (1993). Beyer, H. and Holtzblatt, K. (1995). Apprenticing with the customer. Communications of the ACM, 38(5), 45-52. Bjerknes, G., Ehn, P., and Kyng, M. (Eds.) (1987). Computers and Democracv-a Scandinavian Challenge, Gower, Aldershot. Boeing (1992). Enhanced factory communications. CSCW92 Technical Video Program, ACM SIGGRAPH Video Series, Issue #87. Borenstein, N. S. (1992). Computational mail as network infrastructure for computer-supported cooperative work. Proceedings of CSCW'92, Toronto, Canada, October-November, pp. 67-74. Bowers, J., Button, G., and Sharrock, W. (1995). Workflow from within and without: Technology and cooperative work on the print industry shopfloor. Proceedings of ECSCW'95, Stockholm. Bowers, J., Pycock, J., and O'Brien, J. (1996). Talk and embodiment in collaborative virtual environments. Proceedings of CHI'96. Vancouver, Canada, April, pp. 58-65. Brinck, T., and Gomez, L. M. (1992). A collaborative medium for the support of conversational props. Proceedings of C S C W 9 2 , Toronto, Canada, October-November. pp. 171- 178. Bruckman, A., and Resnick, M. (1993). Virmal professional community: Results from the MediaMOO project. Proceedings of The Third Infernational Conference on Cyberspace, Austin, TX, May. Bullen, C.V., and Bennett, J. L. (1990). Learning from user experience with groupware. Proceedings C S C W 9 0 , October, Los Angeles. Chen, H., Hsu, P., Orwig, R., Hoopes, L., and Nunamaker, J. F. (1994). Automatic concept classification of text from electronic meetings. Communications of the ACM, 37( 10). 56-73. Cole, R., and Johnson, E. C. (1996). Lotus development: TeamRoom-A collaboralive workspace for cross-functional Teams. In Transforming Organizations Through Groupware: Lotus Notes in Action (P. Lloyd and R. Whitehead, Eds.), Springer-Verlag Berlin. Conklin, J., and Begeman, M. L. (1988). gIBIS: A hypertext tool for exploratory policy discussion. Proceedings of CSCW88, Portland, OR, September, pp. 140- 152. Curtis, P. (1992). Mudding: Social phenomena in text-based virtual realities. Proceedings of the 1992 Conference on Directions and Implirations of Advanced Computing, Berkeley, May. Also available as Xerox PARC technical report CSL-92-4 and as . Damer, B., Kekenes, C., and Hoffman, T'. (1996). Inhabited digital spaces. Conference Companion of CH1'96, Vancouver, Canada, April, pp. 9- 10. DeSanctis, G. L., and Gallupe, R. B. (1987). A foundation for the study of group decision support systems. Management Science, 33(5), pp. 589-609.

316

JONATHAN GRUDIN AND STEVEN E. POLTROCK

Dickson, G. W., Lee, J. E., Robinson, L., and Heath, R. (1989). Observations on GDSS interaction: Chauffeured, facilitated and user-driven systems. Proceedings of the 22nd Annual Hawoii International Conference on System Sciences, Kailua Kona, HI, January, pp. 337-343. Dourish, P. (1995a). Developing a reflective model of collaborative systems. ACM Transactions on Computer-Human Interaction, 2(1). 40-63. Dourish, P. (1995b). The parting of the ways: Divergence, data management and collaborative work. Proceedings of ECSCW 95. Stockholm. Dourish, P., and Bellotti, V. (1992). Awareness and coordination in shared workspaces. Proceedings of CSCW92, Toronto, Canada, pp. 107-1 14. Dourish, P., and Bly, S. (1992). Portholes: Supporting awareness in a distributed workgroup. Proceedings of CH1'92. Monterey, CA, May, pp. 541-547. Dourish, P., Holmes, J., MacLean, A., Marquardesen. P.. and Zbyslaw, A. (1996). Freeflow: Mediating between representation and action in workflow systems. Proceedings of CSCW '96, Boston, MA, November, pp. 190-198. Drucker, P. F. (1991). The new productivity challenge. Harvard Business Review, 69(6), 69-79. Ellis, C. A. (1979). Information control nets: A mathematical model of office information flow. Proceedings of ACM Conference on Simulation, Modeling and Measurement of Computer Systems, August, pp. 225-240. Elrod, S., Bruce, R., Gold, R., Goldberg, D., Halasz, F., Janssen, W., Lee, D., McCall, K., Pedersen, E., Pier, K., Tang, J., and Welch, B. (1992). Liveboard: A large interactive display supporting group meetings, presentations and remote collaboration. Proceedings of CH1'92, Monterey, CA, May, pp. 599-607. Fish, R. S., Kraut, R. E., Root, R. W., and Rice, R. E. (1992). Evaluating video as a technology for informal communication. Proceedings of CH1'92, Monterey , CA, May, pp. 37-48. Floyd, C., Ziillighoven, H., Budde, R., and Keil-Slawik, R. (Eds.) (1992). Software Development and Reality Construction. Springer-Verlag, Berlin. Fowler, J., Baker, D. G.,Dargahi, R., Kouramajian, V., Gilson, H., Long, D. B., Petermann, C., and Gorry, G. A. (1994). Experience with the Virtual Notebook System: Abstraction in hypertext. Proceedings of CSCW94, Chapel Hill, NC, October, pp. 133- 143. Francik, E., Rudman, S. E., Cooper, D., and Levine, S. (1991). Putting innovation to work Adoption strategies for multimedia communication systems. Communications of the ACM, 34(12), pp. 52-63. In Readings in Human-Computer Interaction: Toward the Year 2000 (R. M. Baecker, J. Grudin, W. A. S. Buxton, and S. Greenberg, Eds.), Morgan Kaufmann, San Mateo, CA, 1995. Friedman, A. L. (1989). Computer Systems Development: History, Organization and Implementation. Wiley, Chichester, UK. Galegher, J., Kraut, R., and Egido, C. (Eds.)(1990). Intellectual Teamwork: Social andTechnological Foundations of Cooperative Work. Lawrence Erlbaum Associates, Hillsdale, NJ. Gaver, W., M o m , T.. MacLean, A., Lbvstrand, L., Dourish, P., Carter, K., and Buxton, W. (1992). Realizing a video environment: EuroPARC's RAVE system. Proceedings of CH1'92, Monterey, CA, May, pp. 27-35. Goldberg, Y., Safran, M., and Shapiro, E. (1992). Active Mail-A framework for implementing groupware. Proceedings of CSCW'92, Toronto, Canada, October-November, p ~75-83. . Gould, J. D. (1995). How to design usable systems. In Readings in Human-Computer Interaction: Toward the Year ZOO0 (R. M. Baecker, J. Grudin, W. A. S. Buxton, and S. Greenberg, Eds.), Morgan Kaufmann, San Mateo, CA.

COOPERATIVE WORK A N D GROUPWARE

317

Graham, T. C. N., and Umes, T. (1992). Relational views as a model for automatic distributed implementation of multi-user applications. Proceedings of CSCW92, Toronto, Canada, pp. 59-66. Greenbaum, J., and Kyng, M. (Eds.) (1991). Design at Work: Cooperative Design of Computer Systems, Hillsdale, NJ: Lawrence Erlbaum Associates. Greenberg, S., Hayne, S., and Rada, R. (1995). Groupware for Real-time Drawing: A Designer's Guide, McGraw-Hill, London. Greenhalgh, C., and Benford, S . (1995). Virtual reality tele-conferencing: Implementation and experience. Proceedings of the Fourth European Conference on Computer-Supported Cooperative Work, Stockholm, Sweden, September, pp. 165- 180. Greif, I. (Ed.) (1988). Computer-supported Cooperative Work: A Book of Readings. Morgan Kaufmann, San Mateo, CA. Gr~nbrek,K., Kyng, M., and Mogensen, P. (1992). CSCW challenges in large-scale technical projects: A case study. Proceedings of CSCW'92, Toronto, Canada, pp. 338-345. Grudin, J. (1988). Why CSCW applications fail: Problems in the design and evaluation of organizational interfaces. Proceedings of CSCW88, pp. 85-93. New York: ACM. In D. Marca and G. Bock (Eds.), Groupware: Software for Computer-Supported Cooperutive Work. IEEE Press, Los Alamitos, CA, pp. 552-560. 1992. Grudin, J. (1991). Interactive systems: Bridging the gaps between developers and users. IEEE Computer, 24(4), 59-69. In Readings in Human-Computer Interaction: Toward the Year 2000 (R. M. Baecker, J. Grudin, W. A. S. Buxton and S. Greenberg, Eds.). Morgan Kaufmann, San Mateo, CA. Grudin, J. (1993). Interface: An evolving concept. Communications of the ACM, 36(4), 110-119. Grudin, J. (1994a). CSCW: History and focus. IEEE Computer, 27(5), 19-26. Grudin, J. (1994b). Groupware and social dynamics: Eight challenges for developers. Communications of the ACM, 37(1), 92-105. Republished in Readings in Human-Computer Interaction: Toward the Year 2000 (R. M. Baecker, J. Grudin, W. A. S. Buxton, and S. Greenberg, Eds.), pp. 762-774, Morgan Kaufmann, San Mateo, CA. 1995. Grudin, J., and Palen, L. (1995). Why groupware succeeds: Discretion or mandate? Proceedings of ECSCW95, Kluwer, Dordrecht, The Netherlands pp. 263-278. Haake, J. M., and Wilson, B. (1992). Supporting collaborative writing of hyperdocuments in SEPIA. Proceedings of CSCw92, Toronto, Canada, October-November. pp. 138- 146. Heath, C., and Luff, P. (1993). Collaboration and control: Crisis management and multimedia technology in London Underground line control rooms. Computer Supported Cooperative Work, l(1). Heath, C., Luff, P., and Sellen, A. (1995). Reconsidering the virtual workplace: Flexible support for collaborative activity. Proceedings of the Fourth European Conference on Computer-Supported Cooperative Work, Stockholm, Sweden, September, pp. 83-99. Holtzblatt, K., and Beyer, H. (1993). Making customer-centered design work for teams. Communications of the ACM, 36(10), 92-103. Holtzblatt, K. and Jones, S. (1993). Contextual inquiry: A participatory technique for system design. In Participatory Design: Principles and Practices (D. Schuler and A. Namioka, Eds.), Lawrence Erlbaum Associates, Hillsdale, NJ. Ichikawa, Y., Okada, K., Jeong, G., Tanaka, S., and Matsushita, Y. (1995). MAJIC videoconferencing system: Experiments, evaluation and improvement. Proceedings of the Fourth European Conference on Computer-Supported Cooperative Work, Stockholm, Sweden, September, pp. 279-292. Inoue, T., Kobayashi, T., Okada, K., and Matsushita, Y . (1995). Learning from TV programs:

318

JONATHAN GRUDIN AND STEVEN E. POLTROCK

Application of TV presentation to a videoconferencing system. Proceedings of UIST’95, Pittsburgh, November, pp. 147- 154. Isaacs, E. A., Morris, T., and Rodriguez, T. K. (1994). A forum for supporting interactive presentations to distributed audiences. Proceedings ofCSCW’94, Chapel Hill, NC, October, pp. 405-416. Isaacs, E. A., Morris, T., Rodriguez, T. K., and Tang J. C. (1995). A comparison of face-toface and distributed presentations. Proceedings of CH1’95, Denver, CO, May, pp. 354-361. Ishii H. (1990). Cross-cultural communication and computer-supported cooperative work, Whole Earth Review, Winter. Ishii, H., and Kobayashi, M. (1992). ClearBoard: A seamless medium for shared drawing and conversation with eye contact. Proceedings of CHI’92, Monterey, CA, May, pp. 525-532. Ishii, H. and Ohkubo, M. (1990). Message-driven groupware design based on an office procedure model, OM- 1. Journal of Information Processing, Information Processing SocietyofJapan, 14(2), 184-191. Ishii, H., Kobayashi, M., and Grudin, J. (1992). Integration of inter-personal space and shared workspace: CiearBoard design and experiments. Proceedtrigs of CSCW92, Toronto, Canada, October-November, pp. 33-42. Johansen, R . (1989). IJser approaches to computer-supported teams. In Technological Support for Work Group Collaboration (M. H. Olson, Ed.), pp. 1-32. Lawrence Erlbaum Associates, Hillsdale, NJ. King, J., Ruhleder, K., and George, J . (1992). ODSS and the twilight of the decision support movement: Social segmentation and the legacy of infrastructure. Proceedings of 25th Hawaii International Conference on Systems Sciences, Vol. IV. pp. 472-481. Kling, R. (1991). Cooperation, coordination and control in computer-supported work. Communications ofthe ACM, 34(12), 83-88. Klockner, K., Mambrey, P., Sohlenlkamp, M., Prinz, W., Fuchs, L., Kolvenbach, S., Pankoke-Babatz, U., and Syri, A. (1995). POLIteam: Bridging the gap between Bonn and Berlin for and with the users. Proceedings ECSCW95, Stockholm Sweden. Kuutti, K., and Arvonen, T. (1992). Identifying potential CSCW applications by means of Activity Theory concepts: A case study. Proceedings of CSCW ‘ 9 2 , Toronto, Canada, pp. 233-240. Kyng, M. (1991). Designing for cooperation: Cooperating in design. Communications of the ACM, 34(12), 64-73. Lucas, P., and Schneider, L. (1994). Workscape: A scriptable document management environment. CHI’94 Conference Companion, Boston, MA, April, pp. 9- 10. Macedonia, M. R., and Brutzman, D. P. (1994). MBONE provides audio and video across the Internet, IEEE Computer, April, 30-36. McGoff, C. J., and Ambrose, L. (1991). Empirical information from the field: A practitioner’s view of using GDSS in business. Proceedings of the 24th Annual Hawaii International Conference on System Sciences, Kauai, HI, January, pp. 805-81 1. Malone, T. W., and Crowston, K. (1995). The interdisciplinary study of coordination. ACM Computing Surveys, 26(1), 87-1 19. Malone, T. W., Grant, K. R., Lai, K.-Y., Rao, R., and Rosenblitt, D. (1987). Semistmctured messages are surprisingly useful for computer-supported coordination. ACM Transactions on Ofice lnforrnatiori Systems, 5, 115 - I 3 1. Malone, T. W., Grant, K. R., Lai, K.-Y., Rao, R., and Rosenblitt, D. A. (1989). The Information Lens: An intelligent system for information sharing and coordination. In Technological Support for Work Group Collaborution (M. H. Olson, Ed.), pp. 65-88. Lawrence Erlbaum, Hillsdale, NJ. Malone, T. W., Lai, K., and Fry, C. (1992). Experiments with Oval: A radically tailorable tool

COOPERATIVE WORK AND GROUPWARE

319

for cooperative work. Proceedings of CSCW92. Toronto, Canada, October-November, pp. 289-297. Mantei, M. M., Baecker, R. M., Sellen, A. J., Buxton, W. A. S., Milligan, T., and Wellman, 9 . (1991). Experiences in the use of a media space. Proceedings of CHI'91, New Orleans, LA, April-May, pp. 203-208. Mark, G., Haake, J. M., and Streitz, N. A. (1995). The use of hypermedia in group problem solving: An evaluation of the DOLPHIN electronic meeting room environment. Proceedings of the Fourth European Conference on Computer-Supported Cooperative Work, Stockholm, Sweden, September, pp. 197-213. Markus, M. L., and Connolly, T. (1990). Why CSCW applications fail: Problems in the adoption of independent work tools. Proceedings of CSCW90, October, Los Angeles, pp. 371 -380. Martin, D. (1995). Smart 2000 conferencing system: What we learned from developing and using the system. In Groupware for Real-Time Drawing: A Designer's Guide (S. Greenberg, S. Hayne, and R. Rada, Eds.), pp. 179-197, McGraw-Hill, London. Masinter, L., and Ostrom, E. (1993). Collaborative information retrieval: Gopher from MOO. Proceedings of INET'93. Maurer, H. (1996). HyperWave: The Next Generation Web Solution, Addison-Wesley, Harlow, England. Medina-Mora, R., Winograd, T., Flores, R., and Flores, F. (1992). The Action Workflow approach to workflow management technology. Proceedings of CSCW 92, Toronto, Canada, October-November, pp. 281 -288. Mitchell, A,, Posner, I., and Baecker, R. (1995). Learning to write together using groupware. Proceedings of CH1'9.5, Denver, CO, May, pp. 288-295. Moran, T. P., McCall, K., van Melle, B., Pedersen, E. R., and Halasz, F. (1995). In Groupware for Real-Time Drawing: A Designer's Guide ( S . Greenberg, S . Hayne, and R. Rada, Eds.), pp. 24-36, McGraw-Hill, London. Nichols, D. A., Curtis, P., Dixon, M., and Lamping, J. (1995). High-latency, low-bandwidth windowing in the Jupiter collaboration system. Proceedings of UIST'95. Nunamaker, J. F. Jr., Dennis, A. R., Valacich, J . S., Vogel, D. R., and George, J. F. (1991). Electronic meeting systems to support group work. Communicafions of the ACM, 34(7), 50-61. Okada, K., Maeda, F., Icbikawa, Y., and Matsushita, Y. (1994). Multiparty videoconferencing at virtual social distance: MAJIC design. Proceedings of CSCW94, Chapel Hill, NC, October, pp. 279-291. Okada, K., Ichikawa, Y., Jeong, G., Tanaka, S., and Matsushita, Y. (1995). Design and evaluation , Norway. of MAJIC videoconferencing system. Proceedings of I ~ E R A C T 9 5June, Olivetti (1992). The Pandora Multimedia System. CSCW92 Technical Video Program, ACM SlGGRAPH Video Series, Issue #87. Olson, J. S., Olson, G. M., Stornosten, M., and Carter, M. (1992). How a group-editor changes the character of a design meeting as well as its outcome. Proceedings ofCSCW'92, Toronto, Canada, October-November, pp. 91-98. Orlikowski, W. J. (1992). Learning from Notes: Organizational issues in groupware implementation. Proceedings of CSCW'92, November, Toronto, 362-369. Pankoke-Babatz, U. (1994). CSCW for Government Work: POLIKOM-Video. CSCW94 Video Series. Perey, C . (1996). Desktop videoconferencing and collaboration systems. Virtual Workgroups, 1(1), 21-31. Perin, C. (1991). Electronic social fields in bureaucracies. Communications of the ACM, 34(12), 74-82.

320

JONATHAN GRUDIN AND STEVEN E. POLTROCK

Poltrock, S. E. and Grudm, J. (1994). Organizational obstacles to interface design and development: Two participant observer studies. ACM Transactions on Computer-Human Interaction, 1(1), 52-80. Post, B. Q. (1992). Building the business case for group support technology. Proceedings of the 25th Annual Hawaii International Conference on System Sciences, Kauai, HI, January. Rao, R., Card, S. K., Johnson, W., Klotz, L., and Trigg, R. H. (1994). Protofoil: Storing and finding the information worker's paper documents in an electronic file cabinet. Proceedings of CH1'94, Boston, MA, April, pp. 180-185. Root, R. W. (1988). Design of a multi-media system for social browsing. Proceedings of CSCW88, Portland, OR, pp. 25-38. Schuler, D. and Namioka, A. (Eds.) (1993). Participatory Design: Principles and Practices, Hillsdale, NJ: Lawrence Erlbaum Associates. Sellen, A. J. (1992). Speech patterns in video-mediated conversations. Proceedings of CHI'92, Monterey, CA, May, pp. 49-59. Sharples, M., Goodlet, J. S., Beck, E. E., Wood. C. C., Easterbrook, S. M., and Plowman, L. (1993). Research issues in the study of computer supported collaborative writing. In Computer Supported Collaborative Writing (M. Sharples, Ed.), pp. 9-28. Springer-Verlag. London. Short, J., Williams., E., and Christie, B. (1976). The Social Psychology of Telecommunications. Wiley, London. Streitz, N. A., GeiBler, J., Haake, J. M., and HoI, J. (1994). DOLPHIN: Integrated meeting support across local and remote desktop environments and LiveBoards. Proceedings of CSCW94, Chapel Hill, NC, October, pp. 345-358. Suchman, L. (1983). Office procedures as practical action: Models of work and system design. ACM Trans. Ofice Information Systems, 1, 320-328. Tang, J. C., and Rua, M. (1994). Montage: Providing teleproximity for distributed groups. Proceedings of CH1'94, Boston, MA, April, pp. 37-43. Tang, J. C., Isaacs. E. A., and Rua, M. (1994). Supporting distributed groups with a montage of lightweight interactions. Proceedings of CSCW94, Chapel Hill, NC, October, pp. 23-34. Towell, J. F., and Towell, E. (1995). Internet conferencing with networked virtual environments. Internet Research, 5(3), 15-20. Trevor, J., Rodden, T., and Blair, G. (1995). COLA: A lightweight platform for CSCW. Computer Supported Cooperative Work, 3(2). Whittaker, S. (1995). Video as a technology for interpersonal communications: A new perspective. SPIE, 2417,294-304. Wilson, B. (1995). WScrawl 2.0: A shared whiteboard based on X-Windows. In Groupware for Real-Time Drawing: A Designer's Guide ( S . Greenberg, S . Hayne, and R. Rada, Eds.), pp. 129-141, McGraw-Hill, London. Wolf, C. G., Rhyne, J. R., and Briggs, L. K. (1995). Using a shared pen-based tool for meeting support. In Groupware for Real-Time Drawing: A Designer's Guide ( S . Greenberg, S . Hayne, and R. Rada, Eds.), pp. 81-95, McGraw-Hill, London.

Technology and Schools GLEN L. BULL Curry School of Education University of Virginia Charlottesville, VA

Abstract Efforts to integrate technologies into schools have been more effective when they have placed the locus of control with teachers rather than requiring teachers to adapt their teaching style to the technology. Some educational technologies that are changing schools include graphing calculators, electronic library resources, integration of text and graphics in instructional materials developed by teachers, and use of the Internet for dissemination of student work. ln the long term educational computing is likely to have profound if somewhat unpredictable effects on K-12 education. In the short term limited access to technology in K-12 schools will constrain the overall impact. However, constraints on the speed with which teachers can be educated to use new technologies will he the primary factor that will limit the extent of use in schools.

1. Technology and Schools . . , . . . . . . . . . . . . . 2. Trends in Educational Computing . . . . . . . . . . . 2.1 From Mainframes to Microcomputers . . . . . . 2.2 The Computer as a Teacher . . . . . . . . . . . 2.3 The Computer as a Subject of Study . . . . . . . 2.4 Technology and School Reform . . . . . . . . . 2.5 Productivity Tools . . . . . . . . . . . . . . . . 2.6 Multimedia . . . . . . . . . . . . . . . . . . . . 2.7 The Internet . . , . . , . . . . . . . . , . . . . 3. Diffusion of Innovation . . . . . . . . . . . , . . . . 3.1 Educational Technology Standards . . . . . . , 3.2 Teacher Education . . . . . . . . , . . . . . . . 3.3 Access to Technology . . . . . . . . . . . . . . 3.4 Technological Barriers . . . . . . . . . . . . . . 3.5 Examples of Successful Diffusion in Schools . . 3.6 Promising Candidates for Successful Diffusion . 4. S u m m a r y . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . .

321 ADVANCES IN COMPUTERS. VOL. 45

. .

. .

. . . . ,

. . .

. . . ,

. .

. . . . .. . . . . . . . . .. . . . . . . . . . . . . . . .. .. . .

........ . . . . . . . . . . . .. . . . . . . . . . . , . . . . ....... . . . . . . , . . . . . , . . . . , . ........ . . . . . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . , . ........ . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. ...... . . . . . . . . .

.

.

. . . , . . . . . , . . ... ... ... . . . . . . . . . .. . . .. ... ... . . . . . . . . .

...

. 322 . 323 . 325 .

326

. 327 . . .

. . . .

. . .

. . .

328 332 333 334 335 137 338 342 343 344 347 352 354

Copynghf 0 1997 by Academic Press Ltd. All rights of reproduction in any form reserved.

322

GLEN L. BULL

1. Technology and Schools I believe that the motion picture is destined to revolutionize our educational system and that in a few years it will supplant largely, if not entirely, the use

of textbooks. Thomas Edison, 1922 At ceremonies commemorating the 50th anniversary of EMAC, Vice President Albert Gore discussed progress made during the last half century of computing. He noted that a 1949 issue of Popular Mechanics predicted that even though ENIAC weighed 30 tons, someday computers might weigh “as little as 1.5 tons.” The Vice President then opened a musical greeting card, observing that the digital chip in the one-ounce greeting card had approximately the same capacity as ENIAC. These sorts of comparisons are useful for illustrating the dramatic progress in computing over the last half century. However, they create the wrong mindset for thinking about educational uses of technology. Quantum leaps and dramatic advances often accompany new technologies. Changes in the education process involve social systems as well as engineering issues. Therefore far-reaching changes take place over a much longer time scale. While gains in computing capacity have been fairly constant, the effects of new information technologies have been less predictable, as Edison’s forecast for motion pictures demonstrates. Almost every public school in the United States has a videocassette recorder (VCR) and television monitor. Even though the technology for displaying motion pictures is ubiquitous and inexpensive, movies have not supplanted textbooks and are not likely to in the near future. Anyone who has watched an instructional film in a darkened classroom understands some of the reasons. 0

0

0

0

The content of videotapes is not customized to the cumculum of specific classes. It is easier to integrate books with the existing cumculum than to use part of an instructional tape or a series of tapes. While each school may have a VCR and monitor, the teacher may have to reserve it for a particular class, and bring it to the classroom. This makes a videotape less flexible and convenient than a text. While almost all homes in the United States have televisions and many have VCRs as well, there is no convenient way to give students sections of tapes to take home for further study. In comparison, it is relatively easy to make an assignment in a textbook. Creation of custom content is prohibitively expensive. James Burke, author of the science program Connections, has observed that he has several months and a budget of millions to make each one-hour show.

TECHNOLOGY AND SCHOOLS

0

323

Burke expressed sympathy for the science teacher who has five minutes and a budget of $5 for each class. While most teachers receive at least some training in the use of audiovisual materials, far more of their education is devoted to use of print-based materials.

Some classes are delivered to students via satellite downlinks and/or videotapes, allowing them to take courses that would otherwise not be available to them. The budget for many of these classes often makes it difficult to go beyond “talking heads” though; the budget employed for digital effects in Hollywood films is rarely available for such courses. For these and many other reasons, moving pictures have yet to fulfill Edison’s prediction that textbooks would be supplanted by them. Some uses of technology allow the teacher and learner to customize and adapt the technology to their own needs and style of learning. These uses of technology allow the teacher to appropriate it and assume ownership. Other uses of technology require the teacher to adapt to the technology. The rate of diffusion for these uses is far slower. The past decades have produced dramatic advances in hardware and software. Although access to technology is an important first step, simply adding more computers to a school is unlikely to have much impact on the education process. The record suggests that change in educational practice is more dependent upon school culture than technologic breakthroughs, and therefore is more llkely to reflect an on-going incremental process than a quantum shift. A review of recent trends in educational computing provides a background for discussion of the issues.

2. Trends in Educational Computing Near the beginning of the microcomputing era a book of readings- The Computer in the Classroom: Tutor, Tool, Tutee-categorized educational computing according to three main uses. The computer as a tutor reflected the traditional use of computers for computer-assisted instruction (CAI). The computer as a tool suggested uses such as spreadsheets that in many cases were inspired by the advent of microcomputers. The third category involved use of the computer as a tutee, in which the student teaches the computer. For example, in the process of teaching the Logo turtle to draw a circle, a student might learn about the relationship between the circumference of a circle and its diameter. The secondary category of “Computer as a Tool” is probably the most widespread use of computers in schools today.

324

GLEN L. BULL

Shortly after Tutor, Tool, Tutee was published, an instructional designer, Shirl Schiffman, described another use of computers in schools-“use of technology as a toupee to conceal the fact that nothing much is happening underneath” (Personal communication, 1981). Computers are sometimes used as window dressing that camouflages lack of instructional relevance. A decade later Alan Kay expressed similar concerns during Congressional hearings on educational technology, Perhaps the saddest occasion for me is to be taken to a computerized classroom and be shown children joyfully using computers. They are happy, the teachers and administrators are happy, and their parents are happy. Yet, in most such classrooms, on closer examination I can see that the children are doing nothing interesting or growth inducing at all! This is technology as a kind of junk food-people love it but there is no nutrition to speak of. At its worst, it is a kind of ‘cargo cult’ in which it is thought that the mere presence of computers will somehow bring learning back to the classroom. Here, any use of computers at all is a symbol of upward mobility in the 21st century. (Kay, 1995)

Another way of viewing the use of educational computing is according to locus of control. In some instances, the computer controls the educational agenda. The ultimate examples of this type of use are Integrated Learning Systems (ILLS) sold as complete packages by vendors. These systems track learning and adjust the presentation of content based on students’ performance. Complete reports of all the variables are available to the teacher. These systems employ a conventional frame of reference about schooling, and simply attempt to implement it more efficiently by automating the process. In contrast, learner-centered approaches to educational computing have been used as a vehicle for school reform. These approaches employ technology to allow the teacher to adopt the role of facilitator or coach. In these instances, technology is used to re-think the nature of schooling. In general teachers more readily adopt innovations that they can adapt to their particular style of teaching. Therefore the flexibility of a particular technology also affects the rate at which its use spreads throughout a school system. Computers have been in widespread use in K-12 schools less than a quarter of a century, a relatively short time to consider how they might best be used. In addition, the technology has been a constantly shifting target. While the first school microcomputers with few kilobytes of memory and today’s multimedia systems with megabytes of memory are technically both digital devices, an order of magnitude separates their respective capabilities.

TECHNOLOGY AND SCHOOLS

325

TABLEI TRJ3NDS IN EDUCATIONAL COMPUTING

Approximate era

Educational emphasis

1960s

Computer as a teacher (CAI) Computer literacy Computer as an instrument of reform (Logo, etc) Computer as a productivity tool Multimedia and hypermedia Internet and the World Wide Web

Late 1970s Early 1980s Mid 1980s Late 1980s Early 1990s

As a result, interest in various capabilities of educational computing has evolved over time. Table I illustrates some of the major trends during the past thirty years. It is misleading to regard these as discrete eras. For example, development of Logo began in the early 1970s even before the advent of microcomputers. Papert (1980) popularized Logo in the early 1980s as microcomputers were becoming widely available in the schools. It contributed to extensive experimentation with Logo and other uses of computing as instruments of educational reform. Experimentation with use of the computer as a teacher and development of CAI continued unabated during this time, overlapping with explorations of other ways of using computers. The sociologist Henry Jay Becker has wryly observed that each new technological advance or educational trend-CAI, Logo, productivity tools, hypermedia, and the Internet-is presented as a panacea for educational ills. By the time the limitations of the new medium become apparent, educational technologists revise the recommendations, presenting a moving target for educators who attempt to implement their advice.

2.1

From Mainframes to Microcomputers

Faculty at the University of Illinois designed a computer-based education system named “PLATO” in the 1960s. The Control Data Corporation (CDC) invested considerable financial resources in Project PLATO, in an effort to develop computer-assisted instruction (CAI) that would allow timeshare computers to serve as tutoring systems. Experiments such as Project PLATO conducted with mainframe and timeshare computers were restricted to a small percentage of the nation’s schools. The mainframe version of Project PLATO was discontinued in the mid-1980s in the face of increasing competition from microcomputers. CDC subsequently sold the

326

GLEN L. BULL

name to another firm. Although the PLATO software is still marketed today, one of the project’s programmers observed, The PLATO system was designed for Computer-Based Education. But for many people, PLATO’s most enduring legacy is the on-line community spawned by its communications features. (Woolley, 1994, p. 5) The educational computing era in kindergarten through 12th grade (K-12) schools effectively began with the development of the microcomputer. In the second half of the 1970s, school computers such as the Apple I1 computer, the Radio Shack TRS-80, and the Commodore PET were acquired in large numbers. These machines typically had 32 to 64 kilobytes of memory and a floppy disk drive.

2.2

The Computer as a Teacher

During the early microcomputing era, the two primary uses of these first computers were learning about the computer in courses such as Introduction to BASIC Programming, and using the computer as an instructional device. Authoring languages such as PILOT were developed in an attempt to allow educators without programming expertise to create tutorial software. Advances in hardware capacity and provision of authoring tools had a limited effect on teaching practice for several reasons. While development of non-technical authoring tools lowered the threshold for creation of tutorial software, it became apparent that the most difficult aspects of CAI development are creative rather than technical. As an analogy, it might be noted that a novelist who writes in longhand may find that a typewriter or a word processor facilitates the writing process-but a word processor by itself does not transform an average writer into an award-winning novelist. In its most limited form, tutorial software simply transfers textbook pages to the computer screen and adds an electronic quiz at the end of each unit. In more advanced form, tutorial software may employ animation to illustrate concepts that can not be conveyed on a static printed page and may use diagnostic tracking to identify and remedy misconceptions. Effective tutorial software still requires several hundred hours of development for every hour of material developed. It also requires graphic designers, instructional designers, content specialists, and others to support the development process. Aside from the difficulty of production, several factors have limited the use of tutorial software in K-12 schools. It is difficult to identify effective tutorial software. Evaluation formats tend to be superficial, often focusing on issues such as technical quality rather than accuracy of content and instructional effectiveness. A thorough review of

TECHNOLOGY AND SCHOOLS

327

tutorial software that evaluates the accuracy and effectiveness of every program branch can be an expensive and time-consuming process. The potential for individualized instruction is one of the promises offered by CAI. In practice the reality often falls short. Marshall (1993) reported that teachers she observed typically did not match software to students’ diagnosed learning deficits. Instead teachers used the software most readily available in their classrooms. Often all students in a computer lab were assigned to work at the same task whether or not they had mastered prerequisite skills. The most serious deficit Marshall observed was lack of appropriate feedback provided by the software used by students. In many instances students did not understand the remedial feedback provided by the CAI program. As a result, students often spent their time in the computer lab practicing their mistakes. Another problem is that school budgets are not structured to support acquisition of significant amounts of instructional software. The majority of a school budget supports salaried personnel. Computing equipment is sometimes acquired as a one-time capital cost, but there is rarely a substantial on-going budget for software acquisition and maintenance. Even if finance were not an issue, the traditional school instructional framework would be. Teachers generally are not trained to make use of software as a core component of the instructional process. Further, the majority of classrooms only have one or two computers, and school labs are quickly overbooked, making access for an entire class difficult. Finally, the standards of learning that establish the curriculum differ from state to state, and often from school district to school district. Even tutorial software of high quality may be regarded as intrusive and burdensome by teachers unless it matches the specific objectives of the curriculum they are required to complete by the end of the school year. These factors have made it difficult for developers of tutorial software to amortize the substantial costs of software development. This, in turn, has limited the number of vendors who have developed significant amounts of tutorial software of high quality for the K-12 market. Developers who have developed such software often find that the home market is as important a market for educational software as the public schools.

2.3 The Computer as a Subject of Study Near the beginning of the microcomputing era Arthur Luehrmann noted many of these factors and proposed an alternative frame of reference. Rather than relying upon professionals to serve as the sole developers of software used by school children, Luehrmann (1980) suggested that students themselves should work directly with computers. As an analogy, Luehrmann

328

GLEN L. BULL

pointed out that in the ancient world, lay persons found it necessary to hire scribes to write a letter for them or read a letter they received. In more modem times universal literacy allowed everyone to perform this function for themselves. While we still have paid professional writers today, individuals can also read and write. Luehrmann extended this analogy to suggest that equivalent use of computers by students could be described as “computer literacy.” The concept became quite popular for a time, but often was misinterpreted to mean learning about computers rather than using computers to learn. For this reason, the term fell out of favor among those who felt that memorizing definitions of RAM and ROM was not the best use of students’ time. Frequently, “computer literacy” became equated primarily with “BASIC programming” in the early days of educational microcomputing since BASIC was perceived as one of the primary means for controlling the computer. Presently, all too often “computer literacy class” translates into “typing class” (or “keyboarding” as it is now termed) in the early grades. The computer is of course a legitimate subject of study in itself. Frequently high schools offer advanced placement (AP) classes in programming. The difficulty occurs when the motivation for studying the computer as an end in itself is fostered by a belief that it is crucial for later success. An early commercial in the 1970s preyed on parent’s fears that their child would not be successful in college unless they purchased a microcomputer. In fact, the 16 kilobyte computer advertised was obsolete less than a year later. Similarly school systems today often develop computer literacy curriculum guidelines that are quickly outdated. These guidelines often focus on learning about the parts of the computer rather than using the computer to learn. One harmful side-effect is that these classes misrepresent the creative potential of computing by focusing on technical jargon.

2.4 Technology and School Reform Another alternative to CAI was proposed by Seymour Papert (1980). Papert suggested that Logo could be used to create educational “microworlds.” Created at Bolt, Beranek, and Newman (BBN) and substantially enhanced at MIT, Logo was the first educational computing language developed specifically for use by K-12 students. Papert contrasted the student who spends a summer in France learning to speak French with the more limited success of students who learn French in school. He suggested that microcomputers could be used to create a learning environment that would be the equivalent of “Mathland.” This was one of the first of a number of efforts to use computers to transform the nature of schools.

TECHNOLOGY AND SCHOOLS

329

Papert suggested that students could learn subjects such as mathematics in a substantive, interactive fashion by conaolling robotic devices such as the Logo floor turtle, which later transformed into an electronic triangle on the computer screen. Work by Clements and others that continues to this day, indicates that under some conditions Logo can serve as an effective computer-based environment for conveying mathematical concepts (Clements and Meredith, 1993). However, the overall impact of Logo has been limited. Teachers must have an in-depth understanding of both mathematical concepts and the programming concepts associated with Logo in order to employ it effectively. These prerequisites were not often met. In many instances Logo was simply used as a drawing tool, and became less popular when other alternatives such as paint programs and other graphic tools became available. The educational community that coalesced around efforts to use Logo as an instrument of reform did create a culture that continues to contribute to the on-going dialog about educational technology and constructivism. Microcomputer-based laboratories (MBLs) are another educational innovation that have the potential to transform some aspects of science education. The concept of microcomputer-based laboratories originated at the Technical Education Research Centers (TERC), a nonprofit research institute in Cambridge, MA. Writing in a TERC publication, Tom Lam identified the “forces” that act to reduce the use of hands-on lab experiences in science education: budget cuts, modular scheduling, and hypersensitivity to safety issues. If doing science with traditional equipment is too costly or too dangerous, students are likely to have few opportunities. MBLs provide a safe and cost-effective option (Lam, 1984). An MBL employs sensors attached to the computer that permit data to be collected and analysed that would otherwise be too complex, dangerous, or time consuming to gather. For example, analysis of a sound wave requires sampling at a rate of 1000 to 20 000 samples per second, and represents data that cannot be collected by hand. Most research laboratories in universities and commercial laboratories now routinely use computers along with specialized equipment for data acquisition and analysis. MBLs allow a general-purpose educational computer to perform similar functions in science classes. MBLs have not yet had the widespread effect on science education that they potentially might have in the future. This is in part because the use of MBLs has not been tightly integrated with the existing science curriculum (a curriculum designed with the previously mentioned constraints in mind). To some extent, this may be a chicken-and-the-egg problem. No matter how instructive an MBL experiment may be, if it does not address the required

330

GLEN L. BULL

curriculum it becomes an addition that steals time from other content that must be covered; ironically, the required content may have been chosen not because it was most important, but because it was most feasible. Robert Tinker, director of the TERC technology laboratory during the time that the MBL concept was developed, commented that, Studies show that [MBL] technology has the potential for important improvements in teaching, but many other conditions must be met before these improvements can be realized. In addressing the huge problems facing education, it is important to avoid looking for quick fixes of any kind, especially technological fixes. Improvements will only come by the hard work of excellent teaching, which is an art that has defied simplistic approaches. It cannot be packaged in a computer. (Tinker, 1989, p. 7)

The curriculum and evaluation procedures drive the diffusion of new practices as much or more than does the mere existence of technological advances. For example, some of the advanced placement (AP) examinations in mathematics offered by the Educational Testing Service (ETS) now permit or even require use of educational calculators, which has encouraged their use in advanced courses. In contrast, no national testing process requires hands-on proficiency with MBL data acquisition and analysis. Ruopp (1993) describes the preliminary outcome of LabNet, a computer network developed for support of project-oriented science education, in the following way: Because LabNet worked against the grain of established science education, it confronted ideological as well as technical and administrative difficulties. Some teachers rejected the project approach outright. Said one: “I teach objective, high-level, college-prep physics. 1 don’t want my students to get lost in social interactions or ‘activities’ that do not prepare for college.” Not many teachers seem to have fully embraced the philosophy behind the project approach. This is not surprising considering that it is a radically different way to teach science. Rather, many seem to have integrated elements of the approach while sticking to the traditional textbooks, lectures, and teaching of established facts and concepts. (Ruopp, 1993, p. xiii)

Although MBLs are not yet ubiquitous, they have influenced thinking about science education in a fundamental way. Tinker notes that science education journals such as the Physics Teacher now have multiple pages of advertisements for MBL products. The concept of MBLs also led to development of Calculator-Based Laboratories (CBLs ) that allow programmable calculators with sensors attached to perform similar functions. Logo and Microcomputer-Based Laboratories are examples of a continuing thread in educational computing-efforts to employ technology as an agent of reform. In both instances, students control the computer with

TECHNOLOGY AND SCHOOLS

33 1

the expert guidance of a teacher. Exemplary computer-using teachers report that they use the computer to allow them to assume the role of coach and facilitator. Perhaps one of the best descriptions of this process is found in “Kepler,” an article by Jim McCauley (1983-84) in an issue of The Computing Teacher devoted to Logo. His students used Logo to test their hypotheses about Kepler’s laws, using the Logo turtle as “an object to think with”. They reconstructed the laws themselves and used the information to simulate the orbit of a satellite. The elation of students when the orbit was successful after many false starts contrasts sharply with students who are given the laws and asked to memorize them. However, the need to cover a given amount of material in a limited period of time makes it difficult for most teachers to employ this method. It also takes considerable teaching skill to guide student exploration without dominating it. McCauley ’s use of computers to help his students explore the principles of physics reflects a broader trend in education known as “constructivism.” Constructivism is motivated by the observation that many subjects cannot and should not be acquired solely through rote memorization. Instead, children should be encouraged to construct their own knowledge systems in a way that allows them to use the information beyond the immediate context in which it is acquired. Students who memorize algebraic formulas may perform well when told which formula to apply, but often are at a loss when they must determine which algorithm is appropriate for a given problem. Researchers are examining the educational gains made by children using CAI, and asking whether such learning can be applied in off-computer contexts (Means and Olson, 1994). Computer-based drill-and-practice programs may be useful in assisting in memorization of factual information, but is this the best we should expect for such an investment of resources? In justification of a federally funded project looking at innovative uses of technology in teaching, the author observes, We reason that tutorial uses of technologies (e.g., drill and practice programs, tutoring systems, satellite transmission of lectures) may be useful but are unlikely to transform education. These uses in essence use technology to do the same things that schools have traditionally done for students-albeit perhaps more systematically and efficiently. (Means, 1996)

The best ways of using computers to facilitate a constructivist approach are not well understood at present, but the effort continues because of limitations of traditional CAI (Means and Olson, 1995). Applications which put more control in the hands of teachers and students may have more potential for impacting student’s acquisition and use of knowledge, even though such impact may be harder to isolate and measure.

332

GLEN L. BULL

2.5 Productivity Tools Application of productivity tools became the next widespread trend in use of computers in K-12 education. Productivity tools can be broadly defined as applications such as word processors, databases, spreadsheets, and drawing tools. The availability of integrated packages such as Appleworks, ClarisWorks, and Microsoft Works with educational discounts and pricing sustained the popularity of this trend. Productivity suites offer several other advantages that have led to sustained use in K-12 schools. They can be used for both administrative and instructional tasks. Teachers may begin using them for tasks that save time administratively (such as creation of electronic gradebooks) and later explore instructional uses once they have gained some proficiency with the software. These tools, which have already proved useful to adults in a wide array of career settings, are also useful in a variety of academic areas, from language arts to mathematics. This makes them immediately seem cost effective in contrast to more specialized software that may be applicable only to a specific subject area. For that reason, a productivity suite is the only piece of software that is virtually guaranteed to be available on the majority of computers in every school district, Many tutorial or exploratory technology applications are not adopted by schools and teachers because the content they convey does not match the objectives of the particular teacher, district, or state. Thus, the technology application is used only with a few students as “enrichment” or is never adopted at all. In contrast, technology applications that can be used as a tool or a communications vehicle (e.g., word processing and spreadsheet software, drawing programs, networks) can support any curriculum and can be fully assimilated into a teacher’s ongoing core practice. (Means, 1996)

Word processing, for example, is particularly well suited to facilitation of writing classes and the language arts. The ease of editing word processed work makes students more willing to revise it (Dwyer, 1994). Language arts instructors have found that student access to computers complements recent pedagogical trends in literacy education, such as the “process approach” to writing (Moffett, 1981). The process approach encourages students to proceed through a sequence of writing stages such as: pre-writing, writing, revision, proofreading, and publishing. Student use of graphical organizers (e.g., Inspiration) or outliners, word processing software, grammar and spell-checkers, and desktop publishing or Internet distribution helps them accomplish high-priority curricular goals. However, the majority of schools have a limited number of computers per student, whether they are massed in labs or distributed one or two per classroom. For this reason, the time each student has access to a computer in

TECHNOLOGY AND SCHOOLS

333

school generally can be measured in minutes per week, limiting the impact on the overall curriculum. As a number of commentators have noted, it is analogous to a circumstance in which a class of students had to share a single pencil. Under those circumstances, the productivity would be limited. In a few instances, such as in a recent initiative in Indiana, every student in a classroom has been provided with a laptop computer for use throughout the school year. It is difficult to extrapolate from these programs both because productivity gains associated with computers are difficult to measure both in business and education, and because some of the gains could possibly be attributed to the novelty of the situation.

2.6

Multimedia

In 1987, the introduction of Hypercard, a hypermedia program developed by Bill Atkinson, led to the next significant trend in educational computing. This program used a card-and-stack metaphor that allowed students to create links between cards in an electronic deck. This authoring program was bundled with every Macintosh sold for several years, during a time when the Apple Computer was responsible for a high percentage of sales to the K-12 educational market. It was also during a period in which Apple was incorporating multimedia capabilities into many of its computers. (“Multimedia” now refers to computer programs or presentations that combine text, graphics, sound, animation, and/or video.) It inspired development of other hypermedia programs based on the same authoring metaphor, such as LinkWay developed by IBM for MS DOS systems and ToolBook developed for Microsoft Windows. Hyperstudio, a cross-platform program available for the Apple 11, Macintosh, and Microsoft Windows, is one of the most widely employed hypermedia programs in K12 schools today. Low-threshold software such as Hyperstudio allows students to construct hypermedia projects and term papers that incorporate multimedia elements. Developers also began incorporating multimedia elements in tutorial software. Multimedia elements significantly extend the time (and expense) of development without necessarily improving the pedagogical quality of the finished product. If video elements are included, a video development team must be added to the development group, or rights to existing video footage must be purchased and incorporated into the application. The overhead is nontrivial. For instance, the first breakthrough multimedia game, Myst, allowed users to explore imaginary worlds stored on a CD-ROM. However, the product took a team of developers that included artists, writers, musicians, and technicians over a year to create. Because of the success of Myst, many other developers have attempted to create similar products, but

334

GLEN L. BULL

relatively few have achieved the same levels of artistic and financial success. The addition of educational objectives makes the design process even more complex.

2.7

The Internet

The Internet is the most recent trend in educational computing. World Wide Web browsers provide a graphical interface with easily constructed hypermedia links to other Internet sites. The Web has dramatically changed every aspect of the Internet, including educational uses. Schools are rapidly identifying approaches to establishing the infrastructure required to support Internet connections (Bull et al., 1994). Many schools have established Web pages on the Internet that allow them to present class work as well as view projects at other schools. The Internet allows teachers to collaborate in ways that otherwise would not be possible. Harris (1994) describes three general classes of Internetbased instructional activities: 0 0

0

interpersonal exchanges; information collections; collaborative problem solving.

Each of these categories has a number of activity types. Collaborative problem solving, for example, includes information searches, electronic process writing, sequential creations, parallel problem solving, simulations, and social action projects. One successful model, the Eratosthenes project, had students measure and compare shadows in order to calculate the circumference of the Earth (Ruopp, 1993). In another instance, a tenth grade reading teacher was faced with the problem of finding meaningful content for a class of high school students with learning disabilities. Because the students could only read at a third grade level, they were not motivated to read many of the materials available at their reading level. The teacher, Jeradi Hochella, linked the high school students with a third grade class. In on-line interactions, the older students portrayed characters from books read by the third grade students. In order to successfully adopt the persona of the characters they were portraying, the high school students found it necessary to master the books describing the characters. The high school students gained confidence by serving as role models, while the third grade students were delighted to be communicating with high school students. Activities of this kind require considerable project management skills and coordination on the part of both teachers, of course. However, the Internet

TECHNOLOGY AND SCHOOLS

335

provides a forum for mentoring, modeling, and sharing. For example, Jeradi Hochella subsequently became the curator of the Language Arts Pavilion of Virginia’s Public Education Network (PEN), collaborating with teachers on facilitation of these types of techniques in other classrooms. Networks in general and the Internet in particular can add another layer of technical complexity to classroom computing that sometimes can frustrate teachers. Anyone perplexed by a balky modem or a SLIP connection with a bad IP boot address is familiar with this feeling. However, in the long term Internet connections will become part of the school infrastructure, just as electricity is today (Bull et al., 1993). The nation’s phone companies and cable companies have pledged to contribute services and expertise to link schools to the Internet. John Gage, chief scientist for Sun Microsystems, proposed “NetDay” as a day in which volunteers in California would help wire local schools. These efforts will eventually ensure that all schools are linked. As the technology becomes commonplace, the serious issues will revolve around the process of educating teachers to take advantage of the new medium. In fact, Levin and his colleagues have devoted an article to discussion of the reasons why a common instructional use of the Internet by novices-electronic pen pals-is a “bad idea” (Levin et al, 1989). Successful collaboration requires teachers to rethink how they organize their classes. That is, collaboration requires that both teachers coordinate the timing of special events, projects, and evaluations. When successful, the effort to collaborate can be rewarding. A1 Rogers suggests that, When teachers and their students are “connected” to the world, teaching and learning strategies change. The “world” becomes an indispensable curriculum resource. When students communicate with people in distant and foreign places they begin to understand, appreciate, and respect cultural, political, environmental, geographic, and linguistic similarities and differences. Their view of the world and their place in the world changes. (Rogers, 1994)

There are many factors that affect adoption of technologies in schools other than the technologies themselves. These factors will be discussed in the next section.

3. Diffusion of Innovation In recent years numerous studies of the process of diffusion of innovation have been undertaken. It is now well recognized that beneficial innovations will not necessarily be adopted. One of the best-known cases is the

336

GLEN L. BULL

typewriter keyboard. The QWERTY keyboard (so called because of the sequence of keys on the top row) was developed in the 1870s. Later keyboards such as the Dvorak keyboard placed frequently occurring characters on the home row, substantially reducing the distance a typist’s fingers travel. However, the cost of re-engineering existing keyboards and re-training typists has prevented adoption of this innovation. Everett Rogers (1983) has noted several factors that s e c t diffusion of an innovation, including the superiority of the new method, compatibility with existing culture and values, complexity of the innovation, the degree to which the innovation may be adopted on a pilot basis, and the extent to which the benefits are observable and visible to others. Most factors that affect diffusion of innovations present challenges that are specific to K- 12 environments. Educational technologies often require extensive retraining on the part of the teacher before they can be successfully employed. Logo is a good example of an educational technology which can produce educational gains in some circumstances. However, the degree of training and insight required by the teacher before gains accrue is so high that these gains never occurred on a widespread basis. Further, the gains achieved are often subtle, requiring longitudinal studies measuring sophisticated factors in order to document them. School boards and communities generally prefer gains that can be produced quickly and that can be easily measured. The developers of Logo hoped that it might be employed to restructure education. For reasons that are apparent in hindsight, this did not occur. Many educational technologies are not congruent with the existing school culture. These innovations are presented as an addendum to the existing curriculum, as additional subject matter to be taught by the teacher. Since dozens of well-intentioned interest groups generate continual on-going pressure to add new subject matter to the curriculum in areas ranging from health to moral training, methodologies that require increased teaching loads are rarely met with widespread enthusiasm. One diffusion factor that favors adoption of educational technologies is piloting in a few classrooms prior to adoption on a wide-scale basis. Even this potential advantage is often lost. Two factors interfere with efforts to establish small-scale pilots: equity issues and the desire for quick results. First, parents of children in classes which do not receive the presumed technological benefit may express concern. Equity therefore becomes an issue. Secondly, superintendents, school boards, and state legislatures also are often predisposed to look for programs that make the most dramatic impact in the shortest period of time. In the case of superintendents this may be understandable. The average tenure of an urban superintendent is now less

TECHNOLOGY AND SCHOOLS

337

than three years. Therefore a program that requires long-term implementation is susceptible to cancellation by the next administration. For these reasons, a superficial large-scale program may be more likely to receive support than an in-depth program that affects a smaller number of students. The research in diffusion of educational innovations suggests that it typically requires 25 years on average for an educational innovation to be adopted in a majority of the nation’s schools (Rogers, 1983). In some instances, a much longer cycle can be required. For example, the telephone is a technology invented in the 19th century that still has not reached the majority of K-12 classrooms, and is unlikely to before the 21st century. A telephone could potentially bring guest lecturers to a classroom or make it possible to arrange for question-and-answer sessions with authorities in a field while a subject is being studied. (It could also reduce interruptions from announcements made over school intercoms.) Yet teachers are perhaps the only white collar workers in the United States who do not have ready access to telephones. The limiting factor in the future is likely to lie in the area of teacher education rather than technology. Sherry Turkle (1996) has observed that simply parachuting 50 computers into a school is unlikely to have any educational impact unless teachers receive the training necessary to employ educational technologies effectively. This is unlikely to happen in the near future for reasons that are more related to educational practice and diffusion of innovation than to computer technology.

3.1 Educational Technology Standards State standards for teacher licensure and periodic recertification play a significant role in teacher preparation. Design of educational technology licensure standards presents challenges. Standards that are closely tied to existing technologies may quickly become obsolete or meaningless. For example, an educational technology standard in one state requires that all teachers demonstrate proficiency in use of a modem to access on-line databases. However, some schools now have direct Internet connections that make use of dedicated routers rather than modems. On the other hand, a generic standard may be susceptible to such broad interpretation that it is meaningless. A standard that dictates that the teacher be able to “integrate Internet activities” into classroom use does not provide effective guidance because it does not suggest how teachers should prepare themselves to meet the standard or how they might be evaluated. Gwen Solomon (1996) has observed that public support for goals that involve learning about the computer is easier to obtain than support for goals that involve using the computer to learn. This is in part because the goal of

338

GLEN L. BULL

understanding how computers work is more easily communicated than the goal of using technology as a tool for exploring other content, and because objectives for learning about computers are more easily established. The content learned about computers in K-12 schools may not necessarily translate into improved job opportunities even though this is one of the primary motivations for support of these goals by parents. For example, in the 1970s an exploratory unit on computing was Uely to have included a module on BASIC programming even though only a few of those participating later became computer programmers. It is arguably the case that an exposure to BASIC programming may be useful in other subject areas, just as an understanding of Latin may be useful in English classes. Some educators would argue, however, that bad practices learned through such exposure may be detrimental in subsequent higher-level programming courses. In the long term, standards established by teachers and subject matter specialists in specific content areas may prove to be the most useful. The National Council of Teachers of Mathematics (NCTM), for example, has developed national standards that incorporate many educational technologies (NCTM, 1989). The standards developed by NCTM speak directly to ways in which math teachers can incorporate technologies in the teaching of mathematics and are integrated with other math education practices. The national NCTM standards, in turn, are finding their way into state and local standards of learning.

3.2 Teacher Education Effective preservice and in-service teacher education is a key to use of educational technologies in K-12 schools. One of the most important studies on integration of computers into classroom practice to date was conducted by Sheingold and Hadley (1990). They conducted a national survey of teachers who are accomplished in the use of classroom technology, and found that teachers on average required five to six years of experience to master computer-based teaching practices. A follow-up study by Henry Jay Becker (1994) found that allocation of resources to statfdevelopment was a key factor that characterized school districts with exemplary computer-using teachers. Based on the known characteristics of teachers who are exemplary in their use of educational technologies and research on diffusion of innovation, attributes of effective courses and in-service workshops can be suggested. The content of the courses should ideally be relevant to the discipline taught by the teacher. This may not occur for two reasons. In some instances the course instructor may be more familiar with technology than the content area. It is diacult to discuss the application of technology to teaching calculus if the instructor is unfamiliar with calculus, for example. Also, it

TECHNOLOGY AND SCHOOLS

339

may be difficult to form a class that only has teachers from a specific content area. Classes of mixed disciplines may be formed in order to achieve the numbers needed to make the class financially feasible. A Congressional Office of Technology Assessment report concluded, Furthermore, the kind of training, not just availability, is important. Much of today’s educational technology training tends to focus on the mechanics of operating new machinery, with little about integrating technology into specific subjects. (US Congress Office of Technological Assessment, 1995, p. 25)

Educational technology workshops should also make use of equipment and software available in the teacher’s classroom. It is difficult and expensive to provide an array of different types of equipment and software in an in-service workshop. However, a workshop that covers use of software that is not available to the teacher, or that uses computers different from the ones available in the teacher’s classroom, may have limited impact. Ongoing follow-up support should be provided after the workshop. The time required for teachers to master integration of technologies in teaching is measured in years rather than weeks or months. One of the factors associated with the presence of exemplary computer-using teachers is provision of ongoing staff support. Marshall (1993) found that many teachers report that training is too humed without adequate follow-up support. Teachers also indicated that most training is provided just before computer use begins, without adequate time to assimilate the information, and that it typically is not specific enough and is not related to their classrooms. Further, she found that some districts discourage teachers from requesting follow-up training and that often teachers are reassigned after receiving training. While these problems could probably be addressed in any one school district, and in fact there are many school districts that are exemplary in this regard, it is more difficult to achieve umformity across thousands of school districts nationally. Soloway (1996) characterizes the current status of teacher education and support in the following way: By and large, technology training for teachers is served by a once-a-semester, one-day workshop where techie, nonteacher types tell teachers how to operate a computer or a particular software package. . . . What’s wrong with this picture? (Soloway, 1996, p. 12-13)

A study by the US Congress Office of Technology Assessment (1995) comes to similar conclusions, reporting that: There is abundant evidence that “one-shot’’ or short duration training programs have little impact. Teachers need time to learn, plan, try things out, reflect on their successes and failures, revise, and try again. That takes time-months, if not years. (US Congress of Technology Assessment, 1995, p. 159)

340

GLEN L. BULL

Too often, training is not relevant to the teacher’s practice, and does not help address important issues such as how to effectively make use of a single computer in a classroom of 25 students (Mack Tate, personal communication, 1992). Alan Kay drew the following analogy, Suppose it were music that the nation is concerned about. Our parents are worried that their children won’t succeed in life unless they are musicians. Our musical test scores are the lowest in the world. After much hue and cry, Congress comes up with a technological solution: “By the year 2000 we will put a piano in every classroom! But there are no funds to hire musicians, so we will retrain the existing teachers for two weeks every summer. That should solve the problem!” (Kay, 1995)

Pre-service teacher education presents even greater challenges. Many schools assume that recent teacher education graduates will be familiar with the latest educational technologies. Often that is not the case. The National Council for Accreditation of Teacher Education (NCATE) has established standards developed in association with the International Association for Technology in Education (ISTE). However, of necessity the initial standards are somewhat broad, and not all colleges of education hold membership in NCATE. Only a small percentage of universities have established funded plans for amortizing and replacing computing labs as they become outdated. Many colleges of education simply do not have up-to-date equipment, and universities may assign higher priority to provision of computing equipment to other areas such as engineering. However, the problem is broader than lack of equipment, although that is an important element. The limitations include: (1) Lack of access to technology. Many schools and colleges of education do not have access to the full range of technologies which will be available to their students when they enter the public schools as teachers. (2) Lack of laboratory space. Space can be even more of a constraining factor than equipment. The majority of the current education buildings were constructed prior to the widespread distribution of microcomputers in the public schools in the 1980s. Establishment of a microcomputing facility which will allow classes of reasonable size to be offered (30-40 students per section) may mean conversion of an existing classroom, or costly installation of additional electrical, environmental control, and networking facilities. ( 3 ) Lack of laboratory staflng and support. This is a problem which varies widely among schools of education. At some institutions, it is not possible to leave laboratories and computing classrooms

TECHNOLOGY AND SCHOOLS

341

unattended-and due to budgetary reductions related to the recession, it may be difficult to provide support for staff or work study students to monitor the laboratories. A faculty member serving on a state task force to study this problem noted, This is an extremely important item, and could not be overemphasized. Our experience with our new lab has shown that infusion of new equipment and software demands a huge commitment of personnel, not just for installation, but primarily for continual maintenance, development and upgrading.

(4) Lack of support f o r faculty. There is a clear pattern which suggests that if university faculties do not have appropriate hardware and software in their offices, they usually do not employ these technologies in their courses. It is important for students to see their instructors model use of instructional technologies in their own courses, but this occurs less frequently than might be desirable. ( 5 ) Curriculum limitations. Limitations in the number of hours of teacher education course work present challenges for all aspects of the cumculum. Ideally, educational technologies should be integrated into all aspects of coursework rather than taught as a separate subject, but this is more difficult in practice than in principle.

One member of a state-wide educational technology task force on preservice education observed, The fact that pre-service teachers often do not have an opportunity to experience use of technology or to see their professors in education using technology is a critical factor in the molding and formulating of attitudes, methods of solving instructional problems, and generally the performance of that future teacher. To say there is a cost to conduct in-service workshops after that teacher graduates from a teacher education program is understating the case. There is not much one can do to balance one teacher who has not used technology nor seen it used against someone who has used technology, seen technology used by professors, and actually solved problems and constructed cumculum and activities with technology as a central focus. You certainly cannot balance this use against one in-service course. It is parallel to graduation of a physician who has never seen an X-ray, saying, “We will remediate basic deficiencies after graduation through in-service programs.” (Ray Ramquist, James Madison University, posting to Ed-Res mailing list, March, 1993)

The US Congress Office of Technology Assessment found that,

. . . overall, teacher education programs do not prepare graduates to use technology as a teaching tool. (US Congress Office of Technology Assessment, 1995, p. 184)

342

GLEN L. BULL

While half of the recent graduates of teacher education programs surveyed reported that they had been given preparation in use of drill-and-practice and tutorial software, fewer than one in 10 felt prepared to use formats such as multimedia packages, electronic presentations, collaborations over networks, or problem solving software (US Congress Office of Technology Assessment, 1995, p. 185) This failure presents a substantial in-service teacher preparation burden that contributes to ineffective use of technology in schools.

3.3 Access to Technology Access to technology without training has a limited impact on teachers’ approaches to instruction. Provision of training without access to technology is equally ineffective. In the early 1990s the ratio of computers to students in US public schools was roughly one computer for every ten students. However, about half of these computers were older eight-bit computers such as the Apple I1 for which newer software is no longer being written. A few rare teachers are able to assemble a mix of Apple II, DOS, MSWindows, and Macintoshes in a single classroom or lab, and use each of them effectively for the specific tasks for which they are best suited. However, most teachers find it sufficiently challenging to become reasonably familiar with a single operating system, much less support a mix of different systems. Businesses are sometimes generous in donating obsolete computers to schools. With proper planning these donations may be useful. However, they often add to the complexity of support problems. In some instances businesses donate computers because they run obsolete software packages that are no longer supported by vendors and because they are expensive to maintain. The same problems face schools who accept these machines. In worst case instances, the costs of maintaining older machines may exceed the value of the donations. The costs of on-going maintenance and training are support costs that should be factored into the overall cost of acquiring technology but are rarely included. For example, schools that install local area networks find there are on-going staffing issues associated with maintenance of the network. In some larger high schools with several librarians, one of the librarians has been replaced with a network technician. Replacement of academic personnel with technical staff raises numerous issues about the nature of schools. At a minimum, it suggests that academic gains achieved through provision of technology should be shown to be sufficient to offset the losses in academic personnel before such steps are taken.

TECHNOLOGY AND SCHOOLS

343

Schools have the option of massing computers in technology labs or distributing them across classrooms. There are some activities that are best accomplished with an entire class. However, a lab setting requires that teachers schedule use of computers in advance. Often the lab schedule for the semester fills up, making it inaccessible to other teachers who have not already scheduled time. The amount of time available to teachers who do use the lab is sometimes more limited than they would prefer. Placing computers in classrooms gives teachers ownership and allows them to rely upon them for administrative tasks. Schools that allow teachers to take computers home over the summer and on weekends encourage experimentation. Teachers who have a single computer and projection system can use the computer as a presentation tool or for class demonstrations. However, the cost of projection systems has made them relatively rare. When a school has one, it often must be shared among all the classrooms and scheduled in advance. The current generation of projection systems has some limitations-they usually require that the lights be dimmed, leaving students in darkness. In the absence of a projection system, a computing station in the classroom can be used by one or two students at a time. In those circumstances, it is often used as a reward for completion of other work. Tom Synder, a former science and social studies teacher, has been thlnlung about the issues associated with the single-computer classroom for a number of years. Tom Synder Productions now develops software designed to support teachers working in these situations. For example, Decisions. Decisions presents students with hypothetical social issuessuch as a flotilla of potential immigrants approaching the mainland shoreand allows them to vote on a series of branching choices. The computer acts as an electronic polling device that tracks decisions and presents consequences, and also as a reference system that provides information from analogous situations in the nation’s history. This software, perhaps because it is inspired by a teacher’s experiences in the classroom, goes beyond drilland-practice and rote memorization. Even though access to computers in schools is often limited, the effort to provide them is worth while. However, expectations about what might be accomplished under such conditions should be realistic. It is unrealistic to expect that a ratio of anywhere near one computer per child will be achieved in the near future. As long as access for each child is measured in minutes per week, the impact on education will be limited.

3.4 Technological Barriers Access is not only a matter of the number of computers but the types of computers as well. Newer multimedia machines make systems considerably

344

GLEN L. BULL

more accessible, allowing teachers and students to devote more time to academic content and less time to technological underpinnings. The potential of computers as learning devices was evident from the beginning. However, the engineering practices required to overcome technologic barriers consumed considerable amounts of energy. Earlier work, for example, focused on issues such as ways of controlling a cathode ray tube to form alphanumeric characters. A discussion by one of the pioneers of educational computing, Alfred Bork, illustrated some of the difficulties faced in pioneering efforts: First it is essential that the computer know whether it is dealing with a graphic terminal or not. It does no good to attempt to draw pictures on a teletype! However, to identify the terminal as graphic is not sufficient. Most existing graphic terminals have different code requirements for drawing pictures. That is the basic graphic data, usually a 10-bit X and a 10-bit Y ,is conveyed to the terminal in different ways. Data that will correctly drive our ARDS 100 will produce garbage on our Tektronix 2002, and vice versa. The computer must not only know that it is dealing with a graphic terminal, but must know precisely which graphic terminal. (Bork, 1971, p. 8)

Two decades later considerable progress has been made. Interchange standards such as the Graphic Interchange Format (GIF) on the World Wide Web permit images to be displayed with a fair degree of uniformity. Many of the underlying protocols are handled by the software, freeing the instructional designer to focus on content issues. The system still is not perfect, however. For example, only about 210 of 256 colors are shared in common by Web browser palettes for Macintosh and Microsoft Windows systems. Consequently an eight-bit GIF may have a different appearance on Macintosh and MS-Windows unless the designer takes this factor into account. While progress has been made, instructional designers must still know more about underlying hardware than is desirable.

3.5 Examples of Successful Diffusion in Schools There are two technologies that are significantly affecting the ways in which the average student learns: (1) information technologies in school libraries and (2) graphing calculators. These areas represent examples of successful diffusion of educational technologies in schools.

3.5.7 The Computer as a Reference Librarian Electronic reference services in the school library significantly affect the resources available to the average student. There are three factors that have

TECHNOLOGY AND SCHOOLS

345

changed the face of many school libraries and media centers in recent years: (1) Electronic Catalogs. The advent of affordable, microcomputer-based software has made it possible for many schools to develop electronic catalogs of their holdings in recent years. This transition allows students to become familiar with electronic search techniques and algorithms that they will continue to encounter throughout their lives. (2) CD-ROMs. Almost every school library now has a selection of electronic reference works on CD-ROM. One benefit of CD-ROMs is fixed, predictable cost that is not contingent upon external connections. Bill Gates describes CD-ROMs as “the new papyrus.” They have been so widely accepted in a relatively short period of time that a school librarian recently referred to “old-fashioned CD-ROMs” in a conference presentation. ( 3 ) On-line Reference Resources. School libraries are rapidly acquiring connections to a variety of on-line reference sources on the Internet and the World Wide Web. In the mid-1990s over half of the nation’s schools had access to the Internet, and the majority of schools without connections were making plans to obtain them. The majority of these connections are linked to central facilities such as school libraries rather than classrooms. Educational institutions ranging from the Smithsonian to the Library of Congress have made many of their resources available on-line. Over the course of a quarter century, electronic information technologies have transformed the very concept of school libraries. Because of these resources, students from smaller schools with commensurately smaller budgets and library holdings now often have access to electronic holdings and reference works that rival the largest, most cosmopolitan schools. Because libraries and electronic information technologies have now become inseparable, school librarians also function as technology leaders in many schools. These changes have made education in the appropriate use of electronic resources much more pressing. Sharon Owen, a library media specialist (LMS) in a California middle school, posted the following thoughts to a Library Media specialists mailing list: Last Wednesday I observed five electronic presentations by 5 groups of middle school students and teachers (1 teacher for 4-6 students). These presentations were the culmination of a 2i-week Technology Summer Camp at one of the middle schools where I am the LMS.. . . Two of the Camp Teachers are English teachers and I assumed they, at least knew correct documentation of research. I was wrong. In all five presentations, text appearing on the screen was lifted verbatim from the source [without attribution].

346

GLEN L. BULL

I know the Tech Literacy Mentor will be showing the video of these same presentations at a staff meeting in August. I know our principal and probably most will ooh and aah over these. In my opinion, they missed the whole point of presentation software. To use it, you must first create something to present. I heard many comments like “this is better than taking notes”, “we don’t have to write reports anymore”, and “kids like to watch television better than read”. None of these statements were challenged. (Sharon Owen, Posting to LM-NET Mailing List, July 1996, quoted by permission)

In a subsequent electronic mail message, the author commented that “In reading all the responses to my post, I am seeing that this problem is pervasive-much more so than I thought.” She concludes that it will be important to educate students (and teachers) who now have access to electronic reference materials and electronic presentation and hypermedia tools in their appropriate use. While these issues have always been important concerns at all academic levels, the ubiquity and ease of use of electronic source materials have made appropriate education in their use even more crucial.

3.5.2 Graphing Calculators Pocket calculators are having a significant effect on the way in which mathematics is learned. There are several reasons why calculators are impacting mathematics in a way that educational computers have not. Today’s pocket calculator has many of the characteristics of the first generation of educational microcomputers: it has memory, is programmable, and has an alphanumeric display. However, educational calculators are inexpensive enough to permit schools to purchase calculators for an entire class for the price of a single computer. Because they are small, they don’t interpose a visual barrier between the students and the instructor in the way that computer monitors sometimes do. Since they are hand-held and battery operated, they can be used in any classroom, unlike desktop computers which require electrical wiring. Relatively inexpensive projection systems allow the instructor’s calculator to be projected on a class display at the front of the room. (In contrast, computer projection systems are still comparatively expensive and rare in most classrooms.) These factors allow math teachers to employ calculators in every lesson for which calculators are appropriate. An NCTM position paper notes, Calculators are widely used at home and in the workplace. Increased use of calculators in school will ensure that students’ experiences in mathematics will match the realities of everyday life, develop their reasoning skills, and promote the understanding and application of mathematics. . . . Instruction with calculators will extend the understanding of mathematics and will allow

TECHNOLOGY AND SCHOOLS

347

all students access to rich, problem-solving experiences. This instruction must develop students’ ability to know how and when to use a calculator. . . . Research and experience have clearly demonstrated the potential of calculators to enhance students’ learning in mathematics. The cognitive gain in number sense, conceptual development, and visualization can empower and motivate students to engage in true mathematical problem solving at a level previously denied to all but the most talented. (NCTM, 1991)

When used in the fashion recommended by the NCTM standards, educational calculators can allow the teacher to illustrate mathematical concepts that would be difficult to explain using the chalkboard alone. Students can interactively follow the teacher’s reasoning at their desks. Educational researchers are currently involved in development of new generations of educational calculators specifically tailored to support the content of the mathematics curriculum. In some instances, they also make i t possible to add new topics to the cumculum that could not otherwise be taught.

3.6 Promising Candidates for Successful Diffusion In the long term, over the next half century, educational computers are likely to have profound if somewhat unpredictable effects on K-12 education. Ln the short term, over the next decade, limited access to technology in K-12 schools will constrain the overall impact. Even if access were not an issue (and it is), there are limits on how quickly teachers can be educated to use new technologies-not withstanding the fact that best practices in which teachers should be educated are also a moving target that will shift as the technologies change. There are two areas that are promising even in the short term. One is customization and adaptation of instructional materials that integrate text and graphics. The second is changing the scope of the audience for school work through publication and dissemination of class work via the Internet.

3.6.7 Integrated Text and Graphics One of the chief difficulties with many educational technologies is their imperviousness to adaptation or custornization by individual teachers. Provision of inexpensive cameras and VCRs does not empower most teachers to develop instructional materials because of the cost and complexity of producing such materials. Transfer of digitized video to a computerbased presentation system does not alter this equation. While a few teachers will create original computer-based animations and digitized video, the vast majority are likely to be consumers rather than creators of such materials for

348

GLEN L. BULL

the foreseeable future. Even if the technologies and expertise were readily available to teachers, time constraints limit development of original multimedia. Although continuing development of user-friendly tools will eventually lower the threshold, it is still not clear how such authoring capabilities might best be used in a standard classroom of today. However, it is plausible that teachers may begin making more extensive use of documents that integrate text and still graphics. The technologies for manipulating still graphics with paint and draw programs are much more accessible to the average teacher than multimedia programs such as Macromind Director. Productivity suites such as ClarisWorks and MicroSoft Works available on the majority of school computers include paint and draw programs with easily mastered interfaces. Graphics developed or modified in these programs can readily be incorporated into text documents. Although we may take this easy integration of text and graphics for granted now, it marks a major change in the routine design of documents. It is useful to take a quick look backward at the technology this new capability supersedes. Writing and illustrations were unified when human beings first recorded their observations in pictures and eventually words. Medieval monks had absolute control over integration of text and graphics, even though it took months or years to copy a complete book by hand. Gutenberg’s development of moveable type in 1452 changed this relationship. For the next five centuries, text and graphics diverged (see Table 11). The typesetting and printing process made it feasible to disseminate text on a widespread basis, but integration of text and graphics was comparatively awkward and difficult. The overhead associated with the laborious engraving process sometimes caused printers to reuse the limited number of illustrations available to them. One of the first books describing Columbus’ discovery of America used an engraving of a boat approaching the shore of a Swiss lake to illustrate the arrival of Columbus in the New World. TABLEI1 METHODS FORCREATING AND DISSEMINATING ACADEMIC MATERIALS

Era

Technology

Application

Dissemination

I500

Commercialprinter Typewriter

Typesetting Typewriting

First generationof personal computers Personal computer with graphic user interface

Character-based word processing Graphics-based word processing

Printing Wet copy duplication (mimeograph, ditto, etc.) Dry copy duplication (Xerox copy) Dry copy duplication and electronic networks

1900 1975 2000

349

TECHNOLOGY AND SCHOOLS

Printed documents shaped the formation of modem academic institutions. Print constitutes the primary method for storing and transmitting academic information. For 500 years the technology for creating documents has been character-based rather than graphics-based. Consequently, specialized skills were required for integration of graphics with text in instructor-designed documents. For example, prior to the typewriter, doctoral dissertations were typeset. This process did not encourage inclusion of graphics. The first dissertation at the University of Virginia, written by Thomas Marx Barton in 1885, addressed the subject of geometry. There are numerous references to geometric figures in the text, but the lack of any diagrams or illustrations in the dissertation suggests the difficulty of incorporating graphics in the document. By the early twentieth century, the typewriter was an acceptable technology in both commerce and academic institutions. The typewriter allowed instructors to develop their own customized documents, but did not advance the technology of graphic design. In order to integrate a graphic with text, it was necessary to type around an open space on the page, and later draw or paste in the figure. The introduction of the personal microcomputer in 1975 made it easier for instructors to update and revise lecture notes and teaching materials, but made it no easier to integrate graphics and text (see Fig. 1). Prior to adoption of the graphic user interface (GUI), few word processors provided an easy means of incorporating graphics in a document. The few which did provide a relatively crude means for inserting graphics, generally did not offer a way to see the graphic on the display screen as the document was being edited (see Fig. 2). These limitations were compounded by the fact that printing devices often lacked the capability to produce highquality printouts of computer-generated graphics or integrated text and graphics.

Input text only

\

/

I

K

Continental

_ -Drift ~-

I

Character-based Word Processor I

I

\

FIG.1. Text-only materials generated with microcomputers, circa 1975.

350

GLEN L. BULL

FIG. 2. Limited graphics capability became routinely available but was tedious to use.

The availability of more sophisticated printers and routine access to computers with graphic user interfaces has changed this state of affairs. Now it is possible for any instructor with the skills to do word processing, to integrate graphics and text in the same academic document (see Fig. 3). The positive effects of integrated text and graphics upon comprehension have been well documented. In a meta-analysis of 87 studies covering thirty years, Levin er a1 (1987) concluded that relevant illustrations in text consistently produce moderate to substantial gains in learning. The size of the effect depends upon the type of illustration, with the greatest gains associated with illustrations described as “transformational”. Until now this research has not been directly relevant to most instructors, since the options for producing graphics and integrating them with text have been limited. The graphic user environment alters this equation. A host of mechanisms for producing graphics are emerging in this graphic environment, including

Input

Input Continental Drift

Graphics User Interface

output integrated text & graphics

FIG. 3. Readily developed integrated text and graphics documents become possible for instructors with access to a computer with a graphical user interface (GUI).

351

TECHNOLOGY AND SCHOOLS

~

Continental Drift

- Phase

Handouts & lecture notes

I

1

Transparencies & slides

Label these continental rifts Quiz 1

lo Quizzes & examinations

@== G ____ Study guides &worksheets

FIG.4. Integrated text and graphics enhance the impact of a variety of teaching and testing materials which instructors routinely customize for their classes.

image scanners, digital cameras, video digitizers, clip art on CD-ROMs, and charting and graphing software. Graphics-based word processors allow these diagrams and illustrations to be easily integrated with text on both Macintosh and MS-Windows systems. More importantly, this technology is accessible to teachers and will allow them to customize materials such as handouts for their classrooms. Since K12 teachers have traditionally produced a variety of hand-outs and text-based materials for their classes, the new technology serves as a bridge that allows them to extend a previous practice in a more efficient manner (see Fig. 4). Since all students can take paper-based materials home to study, the number and type of computers present in the classroom or at home becomes less of an issue for this type of use. Since the same materials can also be incorporated into a presentation program or re-saved in a format appropriate for the World Wide Web, this use of computers can eventually serve as a bridge to more dynamic technologies as well.

3.6.2 Every Reader a Potential Pubfisher The World Wide Web changed the fundamental nature of the Internet. Every reader has the capability to publish information. This ability affects many aspects of society, including K-12 schools. It enables teachers to readily share instructional materials they develop with other teachers. It provides opportunities for classes to share information and collaborate with other classes. It also provides a window on schools for communities. Parents of students may be among the most interested in this audience, but many others will also be afforded an opportunity to gain insight into the classroom. During the last half century television has significantly impacted education and society. Children watch television several hours a day, on average. This

352

GLEN L. BULL

mass medium is largely a passive experience involving one-way communication. The majority of the bandwidth in television is from the transmitter to the receiver, with only a limited back channel on some cable systems. It is possible to use the World Wide Web in a similarly passive fashion. Instead, thousands of teacher-developed Web sites began appearing as soon as the technology became available in schools. Teachers and students could have used the technology in the same way that cable television is employed-in passive channel surfing on the net. Instead, many classes chose to publish their own work on the Web despite the relatively primitive state of the first Web editors and low-bandwidth connections to the outside world. Teachers’ efforts have been encouraged and facilitated by groups such as the Global School Network established by A1 Rogers. The proliferation of school Web sites might allow a high school physics teacher, for example, to seek out instructional materials developed by other physics teachers. On one hand the quality of such materials is not mediated by a commercial publisher or peer review, so it is up to the consumer to evaluate the accuracy and utility of the materials. On the other hand, publication and communication are immediate, leading to dialog and discourse in many instances. Publishing capability provides an audience for K-12 students. In the past the English teacher was often the sole audience of a student essay. The immediacy of the Web makes it possible to publish such works, making them available to a larger audience. Numerous studies have found that writing is affected by the intended audience. A term paper that will only be read and graded by a single teacher evokes a different response than purposeful communication. As might be expected, writing created with the purpose of sharing with others is of higher quality on the whole. For example, Riel has found that student writing shared with other students over a network tends to be of higher quality than writing produced for in-class use only (Riel, 1989). It is estimated that the average child has watched more than 6OOO hours of television by the end of kindergarten. One promising indication far the future is that hours spent on the World Wide Web appear to be taken from time spent viewing television (Turkle, 1996). While it is perfectly possible to surf the net in as mindless a way as 100 channels of television are surfed, using the Web at least requires a certain level of literacy. More importantly, the Web has the capacity for two-way interactions-publishing as well as viewing.

4.

Summary

In the future, technologies such as “multimedia” that are now taught as separate subjects will become part of the educational computing landscape.

TECHNOLOGY AND SCHOOLS

353

The descriptor “multimedia” will fade from use, just as references to “talking movies” are rarely heard today because all films include audio. Alan Kay outlined this future in his description of the Dynabook, a computer about the size of a writing table with a voice recognition interface, a database supported by an artificial intelligence query engine, and high bandwidth, wireless llnks to a world wide network. While today’s educational computers do not yet incorporate the capabilities of the Dynabook, they are changing rapidly. When the 30-ton capability of ENIAC became small enough to fit into a greeting card, the way in which computing was treated as a resource changed. It would have been unthinkably frivolous to use ENIAC for this purpose, but today it is not considered irresponsible to send a musical greeting that costs five dollars. This shift in thinking can already be seen in the widespread use of graphing calculators. The first educators who wrote about educational microcomputing in the 1970s often compared the singlecomputer classroom to early schools in which pencils were still relatively rare. This analogy was not far off the mark. As computers become an increasingly inexpensive, commonplace commodity, the nature of schools could well change. If they do, the change is likely to occur relatively gradually, over an extended period of time, so that it no longer appears remarkable by the time universal adoption occurs. An alternative possibility is that schools may absorb new technologic capabilities without changing in fundamental ways. This has previously occurred in the case of several technologies-notably in cases such as the telephone system, which is used administratively but is not available in most classrooms. Similarly, films and video have had a far greater impact in the home than in schools. HyperStudio is a good example of the way in which each successive wave of technology may be incorporated as educational technologies evolve. Hyperstudio was developed by a teacher, Roger Wagner, who wanted to provide a learner-centered tool that allowed teachers and students to tap the power of the computer without learning a programming language. However, it now includes a scripting language, HyperLogo, for those who wish to augment its capabilities. Recently plug-in modules were developed that allow Hyperstudio projects to be accessed and displayed through Web browsers such as Netscape. A community of Hyperstudio Web sites developed by teachers have appeared in locations around the world. This community allows teachers to share different instructional concepts as well as Hyperstudio stacks. While Wagner’s motivation for development of Hyperstudio was creation of a learner-centered tool, it can also be used for production of traditional CAI, and some teachers have employed it in this way. HyperStudio reflects a lineage that encompasses many different eras of educational computing. This flexibility allows teachers to adapt it to their

354

GLEN L. BULL

own classroom and teaching style. Therefore it is unsurprising that after productivity tools, it is one of the most widespread applications found in K12 schools today. Alan Kay (1995) comments, “Working on these ideas [such as the Dynabook] thirty years ago, it felt as though the next great ‘500 year invention’ after the printing press was being born.” At one time books were so valuable that they were chained to walls. In time the printing press made texts inexpensive and freely available. This changed education but had an even greater impact on the world. Educational technologies are ultimately driven by information technologies in society at large. Kuhn suggests that scientific revolutions occur as one generation of scientists retires and is replaced by the next, rather than because the force of an idea changes an existing generation (Kuhn, 1962). To the extent this is true, it applies to educators as well as scientists. For that reason, it is particularly important to ensure that new technologies are incorporated into the education of the next generation of teachers. Future teachers who use these technologies in their own education will incorporate them into the classes they teach in natural and effective ways.

REFERENCES

Becker, H. J. (1994). How exemplary computer using teachers differ from other teachers: implications for realizing the potential of computers in schools. Journal of Research on Computing in Education, 26(3), 291 -321. Bork, A. M. (1971). Terminals for education. Report from the Physics Computer Project, California University at Irvine, sponsored by the National Science Foundation. ERIC Document #060 624. Bull, G. L., Cothem, H. L., and Stout, C. (1993). Models for a national public school computing network. Journal of Technology and Teacher Education, 1 (l), 43-5 1. Bull, G. L., Sigmon, T., Cothem, L., and Stout, C. (1994). Establishing statewide K-12 telecomputing networks. Journal ofMachine-Mediated Learning, 4(2), 229-250. Clements, D. H., and Meredith, J. S. (1993). Research on Logo: effects and efficacy. Journal of Computing in Childhood Education, 4(3-4), 263-290. Dwyer, D. (1994). Apple classrooms of tomorrow: What we’ve learned. Educational Leadership, 51(7), 4-10. Harris, J. (1995). Mining the Internet: Organizing and facilitating telecollaborative projects. The Computing Teacher, 22(5), 66-69. Kay, A. (1995). Written remarks to Joint Hearing on Educational Technology in the 21st Century, Science Committee and the Economic and Educational Opportunities Committee, US House of Representatives, 12 October, Washington D.C. Kuhn, T. S . (1962). The structure of scientific revolutions. International Encyclopedia of Unified Science, 2(2). Lam, T. (Winter, 1984). Probing microcomputer-based laboratories. Hands On! (Newsletter of the Technical Education Research Centers, Cambridge, MA). Levin, J. R., Anglin, G. J., and Carney, R. N. (1987). On empirically validating functions of

TECHNOLOGY AND SCHOOLS

355

pictures in prose. In The Psychology of Illustration: Basic Research (D. M. Willows and D. A. Houghton, Eds), Springer Verlag, New York, pp. 51 -85. Levin, J. A., Rogers, A., Waugh, M., and Smith, K. (1989). Observations on educational electronic networks: The importance of appropriate activities for learning. The Computing Teacher, 16(8), 17-19. Luehrmann, A. (1980). Should the computer teach the student, or vice-versa? In The Computer in the Classroom: Tutor, Tool, Tutee (R. Taylor, Ed.), Teachers College Press, New York, pp. 129-135. McCauley, J. (1983). Kepler. The Computing Teacher, 11(5), 15-22. Marshall, G. (1993). Computer education myths and realities. In The Technology Age Classroom (T. R. Cannings and L. Finkel, Eds), Franklin, Beedle & Associates, Wilsonville, OR. Means, B. (1996). Technology and Education Reform (On-line report of a research prcject sponsored by the Office of Educational Research and Improvement). US Department of Education. Available: http:/lwww.ed.gov/pubs/EdReformStudies/EdTechl Means, B., and Olson, K. (1994). The link between technology and authentic learning. Educational Leadership, 51(7), 15-18. Means, B., and Olson, K. (1995). Technology’s Role in Education Reform: Findings from a National Study of Innovating Schools. Prepared for U.S. Department of Education. Menlo Park, CA: SRI International, 1995. Moffett, J. (1981). Active Voice: A Writing Program across the Curriculum. Boynton/Cook, Upper Montclair, N.J. National Council of Teachers of Mathematics (1989). Curriculum and Evaluation Standards for School Mathematics. National Council of Teachers of Mathematics, Reston, VA. National Council of Teachers of Mathematics (1991). Calculators and the education of youth: NCTM Position Statement [7 paragraphs]. NCTM [On-line document]. Available: http:// www.nctm.org/MATH.HTM#l Papert, S. (1980). Mindstorms: Children. computers. and powerful ideas. Basic Books, New York. Riel, M. (1989). The impact of computers in classrooms. Journal of Research on Computing in Education, 22(2), 180-189. Rogers, A. (1994). Global Literacy in a Gutenberg Culture [An abridged version of this article appeared in the article “Living the Global Village”, Electronic Learning Magazine, May/ June, 1994, p. 28-29.] [On-line document] Available: http://www.gsn.org/gsn/articles/ article.gutenberg.htm1 Rogers, E. M. (1983). Difusion oflnnovations (3rd edn). Free Press/Macmillan, New York. Ruopp, R. (Ed.). (1993). LabNet: Toward a Community of Practice. Lawrence Erlbaum, Hillsdale, NJ. Sheingold, K., and Hadley, M. (1990). Accomplished Teachers: Integrating Computers into Classroom Practice. Research report published by the Center for Technology in Education, Bank Street College of Education, NY. Solomon, G . (April, 1996). The national plan for educational technology. Keynote speech presented at the annual convention of the Society for Inshvctional Technology in Education, Phoenix, AZ. Soloway, E. (1996). Log on education: teachers are the key. Communications of the ACM, 39(6), 11-14. Taylor, R. P. (Ed.). (1980). The Computer in the Classroom: Tutor, Tool, Tutee. Teachers College Press, New York. Tinker, R. (Spring, 1989). The need for quality. Hands on! 12(1), 2 , 7 . (Newsletter of the Technical Education Research Centers, Cambridge, MA).

356

GLEN L. BULL

Turkle, S. (Apri1,1996). Life on the screen: Identity in the age of the Internet. Invited lecture presented at the Curry School of Education, University of Virginia, Charlottesville, VA. US Congress Office of Technology Assessment (April, 1995). Teachers & Technology: Making the Connection (OTA-EHR-616). U.S. Government Printing Office, Washington, DC. Woolley , D.R. (1994). PLATO: The emergence of on-line community. Computer-Mediated Communication Magazine l(3). [On-line document] Available: http://www.december.com/ cmc/mag/archive/

Author Index Numbers in italics indicate the pages on which complete references are given.

A Abbati, D., 6, 13,50 Abbott, K.R., 274,287,308,314 Acharya, A., 136,152 Adams, E.N., 212,264 Agostini, A., 274,314 Agrawal, G., 107, 121, 127, 144, 149 Air Force Operational Test and Evaluation Center., 200, 201,241, 242,245,264 Alippi, C., 166, 195 Ambrose, L., 296,318 Anderson, P., 221,264 Andrews, K., 302,315 Anglin, G.J., 349,354 Arvonen, T., 280,318 Ascher, H., 221,223,264 Atmar, W., 162,194

B Back, T., 161, 166, 173, 175,194 Baden, S.B., 144,150,151 Baecker, R., 275,289,295,315,319 Baker, D.G., 303,316 Balsara, D., 145, 149 Baluja, S., 168, 177, 178,194 Banerjee, P., 146, 151 Banerjee, U., 72,101 Bannon, L., 270,315 Barnard, S.T., 148,149,152 Basili, V.R., 199, 264 Bastani, F.B., 205,266 Beck, E.E., 301,320 Becker, G., 214,264 Becker, H.J., 338,354 Beckman, P., 127,135, 141,149,153 Begeman, M.L., 279, 302,315 Beguelin, A,, 141, I50 Bellotti, V., 280,316

Benford, S., 298,317 Bennett, J.L., 27 1,315 Bennett, R., 151 Berger, M.J., 148,149 Bergman, B., 208,267 Berryman, H., 117,118,153 Bertoni, A,, 167,194 Beyer, H., 3 12,315,317 Bik, A.J.C., 147,149 Bjerknes, G., 279, 312,315 Blair, G., 279,320 Bly, S., 280,316 Bodin, F., 127, 141, 149 Boeing., 288,315 Bokhari, S.H., 148,149 Borenstein, N.S., 286,315 Bork, A.M., 344,354 Bowers, J., 280,299, 313,315 Bozkus, Z., 145,149 Brarnlette, M.F., 168,194 Brezany, P., 147, 150 Briggs, L.K., 295,320 Brinck, T., 292,315 Brooks, B.R., 116, 117,151 Brown, D.E., 191,195 Brownes, J.C., 143,152 Bruce, R., 292,316 Bruckrnan, A,, 298,315 Brutzman, D.P., 290,318 Budde, R., 279,316 Bull, G.L., 334, 335,354 Bullen, C.V., 271,315 Button, G., 280, 313,315 Buxton, W., 289,316,319

L

Camarinopoulos, L., 214,264 Card, S.K., 302,320 Carney, R.N., 349,354

357

358

AUTHOR INDEX

Carter, K., 289,316 Carter, M., 279,295,319 Caselli, S., 6, 13,50 Catkan, N.A.,201,214,266 Chakrabarti, S.,143, 147,150,153 Chmg, C., 133,150,152 Chapman, B.M., 146,153 Chellappa, R., 133, 152 Chelson, P.O., 214,267 Chen, D.K., 71,82,83,87,89,90,93,99, 101 Chen, H., 296,315 Choi, J.-D., 147, I50 Choudhq, A., 107,111,145,146,149,152 Christie, B., 288,320 Clements, D.H., 329,354 Cohoon, J.P., 169, 185,194, 195 Cole, R., 304,315 Colomi, A., 182, 186, 191,194 Conklin, J., 279, 302,315 Connolly, T., 271, 310,319 Conte, G., 6, 13,50 Cooper, D., 31 1,316 Cothern, H.L., 334,335,354 Cox, D.R., 221,222,264 Cozzolino, J.M. Jr., 221,264 Crow, L.H., 219,264 Crowley, K., 110, 145,151,152 Crowston, K.,5,51,216,318 Culberson, J., 174,196 Culler, D.E., 127, I51 Curtis, P., 298,315,319 Curtis, W., 4, 5 , S I Cytron, R., 54.55, 71.72, 101

D Darner, B., 298,315 Daniels, R.M., 6.51 Dargahi, R., 303,316 Das, E., 114, 116, I50 Das,R., 107, 111, 115,116, 117, 118, 120, 121,127, 146,149,150,151,152, 153, 189,195 Davidor, Y., 171, 194 Davis, L., 133, 152, 166,196 Deb,K., 171,194 De Gaus, A., 188,195 De Jong, K.A.,166, 172, 173,195,196

De Michelis, G., 6,50,274,314 Dennis, A.R., 6,51, 296,319 Deprit, E., 143,150,153 DeSanctis, G.L., 282,296,315 Desel, J., 14, 15, 16, 25, 33,51 De Souza, J.M., 221,222,223,265 Dhaeseleer, P., 183, 196 Dickson, G.W., 296,316 Diniz, P., 148,150 Dixon, M., 298,315,319 Dominic, S . , 189,195 Dongma, J.W., 141,150,152 Dorigo,M., 167, 182,186, 191,194 Douglas, C., 151 Dourish, P., 274,280,289,310,316 Drucker P.F., 305,316 Duane, J.T., 219,264 Dunigan, T.H., 72,95,101 Dusseau, A., 127,151 Dwyer, D., 332,354

E Eager, D.L., 63,64, 67,68,70,71,92,102 Easterbrook, S.M., 301,320 Edjlali, G., 135, 136, 142, 149,150,152 Egido, C., 312,316 Ehn, P.. 219,312,315 Ellis, C.A., 3, 10, 13, 18,51, 307,316 Elrod, S., 292,316 Engelbart, D., 271 Esparza, J., 14, 15, 16,25, 33.51

F Faustmann, G., 40,51 Feingold, H., 221,223,264 Fenton, N., 199,264 Feo, J.T., 94, 101 Fink, S.J., 144,150 Fish, R.S., 289,316 Flores, F., 3, 12,5I, 274,280,287,307,319 Flores, R., 3, 12,51,274,280, 287, 307,319 Floyd, C., 279,316 Flynn, L.E., 54,56, 57,58,59,63,101 Fogel, D.B., 160,194 Forman, E.H., 214,264 Foster, I., 127, 150

AUTHOR INDEX

Fowler, J., 303,316 Fox, G., 107,11 1, 145,146,149,152 Fox, G.C., 184,195

Francik,E.,311,316 Frederick, R., 298,315 Friedman, A.L., 273,316 Fry, C., 302,318 Fuchs, L., 280,318 Fuquay, D., 187,195 Fyfe, D., 151

359

Gregory, D., 188,195 Greif, I., 271,274,317 Gronbzk, K., 280,317 Grudin, J., 269,271,275,276,278,292,304,

305,310,312,317,31K, 320 Gula, J.A., 10,24,51 Gupta, M., 147,150 Gupta, S., 116,150

H G

Haake, J.M., 280,297,303,317,319,

320 Galegher, J., 312,316 Gallupe, R.B., 282,296,315 Gannon,D., 127,135,141,149,153 Gaudoin, O., 221,222,223,224,264 Gaver, W., 289,316 Geissler, J., 297,320 Geist, A., 141,150 Gelatt, C.D., 158,170,196 George, J., 278,296.318,319 Gerasoulis, A,, 146,150,153 Gemdt, M., 147,150 Gilson,H., 303,316 Girkar, M., 146,152 Goel, A.L., 199,201,205,209, 21 1,217,

218,219,221,222,229,264,265, 266,267 Gold, R., 292,316 Goldberg, D., 292.316 Goldberg, D.E., 161,164,166,171,194, 196 Goldberg, Y., 287,316 Goldstein, S.C., 127,151 Gomez, L.M., 292,315 Goodlet, J.S., 301,320 Cony, G.A., 303,316 Gomals, J., 127,135,153 Gould, J.D., 312,316 Coward, S., 133,152 Goyal, A,, 21 1,265 Graham, T.C.N., 280,317 Grant, K.R., 286,302,318 Grasso, M.A., 6,50, 274,314 Greenbaum, J., 312,317 Greenberg, S., 293.317 Greenhalgh, C., 298,317 Grefenstette. J.J., 166,i69,194,196

Hadley, M., 338,355 Haghighat, M.R., 146,152 Halasz, F., 292,316,319 Harel, D., 4,51 Hams, J., 334,354 Hassan, H., 116,152 Haupt, T., 145,149 Havlak, P., 118,120,150 Hayes, G.S., 6,51 Hayne, S., 293,317 Heath, C., 280,290,31 7 Heath, R.,296,316 Hedge, S.U., 169,195 Hendrickson, B., 148,150 Hermann, B., 201.265 Hillis, W.D., 174,196 Hiranandani, S., 117,118,151,153 Hodoscek, M., 116,117,I51 Hoffman, T., 298,315 Hoffmeister, F., 161,166,173,175,194 Hol, J., 297,320 Holland, J . H . , 162,164,166,167,174,

195 Holmes, J., 274,316 Holtzblatt, K., 312,315,317 Hoopes, L., 296,315 Hsu, P., 296,315 Hunimel, S.F., 54,56,57,58,59, 63,lOf Hurson, A.R., 75,76,77,19, 04,101 Hurtado, C.A., 4,13,19,22, 25,33,35,40. SO, 51, 52 41,44,46,47,49, Hurtley, C.L., 191,195 Husbands, P.,175,192,196 Huss-Lederman, S., 141,152 Hwang,Y.-S., 107,111, 114,115,116,117,

145,146,150,151,152

360

AUTHOR INDEX

1

Ianinno, A., 199,205,265,266 Ichikawa, Y., 281,290,317,319 Im, E.J., 143, I50 Inoue, T., 281,290,317 Isaacs, E.A., 289,290,291,318,320 Ishii, H., 281, 282, 292,314,318

J Janssen, W., 292,316 Jelinslu, Z., 205,208,265 Jeong, G., 290,317,319 Jewell, W.S., 214,265 Jiao, J., 146, I50 Johansen, R., 282,318 Johnson, E.C., 304,315 Johnson, W., 302,320 Jones, J., 143,150,153 Jones, S., 312,317

K Kailasanath, K., 151 Kan, S.H., 200,265 Kanoun, K., 221,222,223,265 Kappe, F., 302,315 Kareer, N. et al., 220,265 Kauth, J., 189, 195 Kavi, K., 75,94,101 Kay, A,, 324,340,354,354 Keddara, K., 3, 10, 13, 18,SI Keil-Slawik, R., 279,316 Kekenes, C., 298,315 4, 5,51 Kellner, M.I., Kennedy, K., 117,118, 120,146,147,150,151 Kesselman, C., 127, 151 %do, T., 170, 185, 186,194 King, J., 278,318 Kirkpatrick, S., 158, 170, I96 Kling, R., 276,318 Klockner, K., 280,318 Klotz, L., 302,320 Kobayashi, M., 281,292,318 281, 290,317 Kobayashi, T., Koch, H.S., 214,265 Koelbel, C., 107, 120, 135, 145,151

Kohn, S.R., 144,150,151 Kolvenbach, S., 280,318 Korb, B., 171,194 Kouramajian, V., 303,316 Koza, J.R., 161, 165, 178, 181, 183, 190, 195 Kraut, R.E., 289,312,316 Kremer, W., 209,265 Krishnamurthy, A., 127,143,150,151,153 Krothapalli, V.P., 54,74,80,90,101 Kruskal, C., 57,101 Kubat, P., 214,265 Kuck, D.J., 54,57,58,102 Kuhn, T.S., 354,354 Kunsawe, F.A., 162,195 Kuutti, K., 280,318 Kwan, T.T., 72, I 0 1 Kyng, M., 276,279,280,312,315,317,318 Kyparisis, J., 214,265

L Lachover, H., 4,51 Lai, K.-Y., 286,302,318 Lain, A., 146,151 Lam, T., 329,354 Lamb, D.A., 9 , 5 I Lamping, J., 298,319 Langberg, N., 214,265 Laprie, J., 221,222, 223,265 Lauterbach, K., 6,51 Lavenberg, S.S., 21 1,265 LeBlanc, T.J., 63,64,67,101 Lee, B., 75,76,77,79,94,101 Lee, C.L., 146,152 Lee, D., 292,316 Lee, J.E., 296,316 Leland, R., 148,150 Lemke, M., 145,149 Leung, B., 146,152 Levin, J., 335, 350,354, 355 Levine,S.,311,316 Lewis, P.A., 221,222,264 Li, H., 6 6 7 0 , 101 Liepins, G.E., 176, J9S Lilja, D.J., 58, 59, 61,63,70,90,101,102 Lim, J.T., 75,76,77,79,94,101 Lindland, O.A., 10,24,51 Lisanke, B.F., 188,195

361

AUTHOR INDEX

Littlewood, B., 212,214,219,266 Liu, G., 214,266 Long, D.B., 303,316 Loveman, D., 107,135,151 Lovstrand, L., 289,316 Lucas, P., 302,318 Luehrmann, A., 327,355 Luff, P., 280,290,317 Lumetta, S . , 127,151

M McCall, K., 292,316,319 McCanne, R., 201,265 McCauIey, J., 331,355 Macedonia, M.R., 290,318 McGoff, C.J., 296,318 McKissick, J., 198,266 MacLean, A., 274,289,316 Maeda, F., 281,290,319 Mahfoud, S.W., 174,196 Majhi, A.K., 189,196 Malone, T.W., 5,51,276,286, 302,318 Mambrey, P., 280,318 Manchek, W., 141,150 Maniezzo, V., 182, 186, 191,194 Mansour, N., 184,195 Mantei, M.M., 289,319 Mark, G., 297,319 Markatos, E.P., 63,64,67,101 Markus, M.L., 271,310,319 Marquardsen, P., 274,316 Marshall, G., 327,339,355 Martin, D., 292,319 Martin, W.N., 169, 185,194,195 Martini, M.B., 221,222,223,265 Masinter, L., 298,319 Masuda, Y., 21 1,267 Matsushita, Y., 281,290,317,319 Maurer, H., 302,315,319 Mavriplis, D.J., 116,150, I51 Means, B., 331,332,355 Medina-Mora, R., 3, 12,51,274,280, 287, 307,319 Mehrotra, P., 145,151 Meredith, J.S., 329,354 Messer, B., 40,51 Mill, F., 175, 192,196 Miller, B.L., 173, 174, 195

Miller, D.R., 205, 266 Milligan, T., 289,319 Mirchandaney, R., 110, 145,151,152 Mitchell, A,, 295,319 Moffett, J., 332,355 Mogensen, P., 280.317 Moon,B., 107, 115, 116,151,152 Moran, T., 289, 292,316,319 Moranda, P.B., 205,207,208,265,266 Moms, T., 290,291,318 Muhlenbein, H., 191,195 Musa, J.D., 199,200, 205,219,265,266

N Naamad, A., 4,51 Nagel, P.M., 21 1,266 Nakanani, M., 170, 185, 186,194 Namioka, A,, 312,320 Nance, R., 116,152 Narayana, S., 127,141,149 National Council of Teachers of Mathematics, 338, 347,355 Ni, L.M., 57,59,102 Nichols, D.A., 298,315,319 Nicol, D.M., 110, 145,151 Nissen, V., 191, 195 Nugent, C.E., 191,195 Nunamaker, J.F., 296,315,319

0 O’Brien, J., 299,315 Ohba, M., 199,200,218,219,220,266,267 Ohkubo, M., 314,318 Okada, K., 281,290,317,319 Okumoto, K., 199,200, 201, 205,209, 21 1, 214,216,217,219,265,266 Olivetti., 287, 319 Olson, G.M., 279,295,319 Olson, J.S., 219, 295,319 Olson, K., 331,355 Orlikowski, W.J., 312,319 Onvig, R., 296,315 Osaki, S., 199,200,214,218,219,267 Ostrom, E., 298,319 Othera, H. et al., 21 1,266 Otto, S.W., 141,152

362

AUTHOR INDEX

Over, J., 4 , 5 , 5 1 Ozekici, S., 201, 214,266

P Padua, D.A., 54,102 Palen, L., 305,317 Pankoke-Babatz, U., 280,318,319 Papert, S., 325,328,355 Parashar, M., 143, I52 Pany, M., 6,51 Parsons, R., 145,152 Parulekar, R., 133,152 Patnaik, L.M., I51, 166, 189,196 Paul, R., 221,222,265 Pedersen, E., 292,316,319 Perey, C., 288,319 Perin, C., 312.319 Petermann, C., 303,316 Peters, L., 13.51 Pier, K., 292,316 Pmgali, K., 121, 152 Plimpton, S., 148,150 Plowman, L., 301,320 Pnueli, A., 4,51 Politi, M., 4, 51 Poltrock, S.E., 312,320 Polychronopoulos, C.D., 54. 57, 58, 72, I O l , 102, 146,152 Ponnusamy,R., 107, 111, 115, 116, 145, 146, 150,151,152 Posner, I., 295,319 Post, B.Q., 297,320 Pothen, A., 148,149,152 F’rahalada Rao, B.P., 169,195 Pratt, T.W., 9, 12,20,51 F’nnz, W., 280,318 Pycock, J., 299,315

Q Quinlan, D., 145, 149, 152

R Rada, R., 293,317 Rammoorthy, C.V., 205,266

Raman, S., 189,196 Ranganathan, M., 136,152 Ranka, S., 145,149 Rao, R., 286,302,318,320 Rault, D.F.G., 133,152 Read, D.A., 72,101 Reisig, W., 51 Resnick, M., 298,315 Rhyne, J.R., 295,320 Ribeiro Filho, J.’L., 166,195 Rice, R.E., 289,316 Richards, D., 169, 185,194,195 Riel, M., 352,355 Robinson, L., 296,316 Robinson, M., 2,51 Rodden, T., 279,320 Rodriguez, T.K., 290,291,318 Rogers, A., 121,152, 335, 352,355 Rogers, E.M., 336,337,355 Rohatgi, V.K., 223,267 Root, R.W., 289,316,320 Rosenblitt, D., 286, 302,318 Rosendale, J. van, 145,151 Ross, S.M., 215,216,267 Rua, M., 289,320 Rudman, S.E., 311,316 Rudolph D.C., 102 Rudolph, G., 167,196 Ruhleder, K., 278,318 Ruml, J., 191, 195 Ruopp, R., 330,334,355

S Sadayappan, P., 54,74,80,90,101 Safran, M., 287,316 Saltz, J.H., 107, 110, 111, 114, 115, 116, 117, 118, 120, 121, 127, 133, 135, 136, 142, 144, 145, 146, 147, 149, 149,150,151,152,153 Satin, S.K., 274,287,308,314 Schick, G.J., 208, 267 Schmidt, K., 270,315 Schneider, L., 302,318 Schneidewind, N.F., 219,267 Scholz, F.W., 21 1,266 Schonberg, E., 54,56, 57, 58,59,63,101 Schouten, D., 146,152 Schreiber, R., 107, 135,lSI

363

AUTHOR INDEX

Schuler, D., 312,320 Schultz, R., 13, 51 Schwefel, H.-P., 161, 194 Sellen, A.J., 289, 290,317,319,320 Sevcik, K.C., 66,70,101 Shanthikumar, J.G., 208,210,267 Shapiro, E., 287,316 Sharma, S.D., 107, 115, 147, 151, 152, 153 Sharples, M., 301,320 Sharrock, W., 280,313,315 Shaw, M.J., 173, 174,195 Sheingold, K., 338,355 Sherman, R., 4,51 Shirazi, B., 75,94,101 Shock, C.T., 133,152 Shooman, M.L., 210,267 Short, J., 288,320 Shtull-Trauring, A,, 4,51 Sigmon, T., 334,354 Simon, H.D., 148, 149.152 Singpurwalla, N.D., 214,264,265 Sipkova, V., 147,150 Sknvan, J.A., 21 1,266 Smith, J.M., 186,195 Smith, K., 335,355 Smith, R.M., 110, 145, 151 Snir, M., 141, 152 Soenjoto, J., 21 1,265 Sofer, A., 214,266 Sohlenlkamp, M., 280,318 Solomon, G., 337,355 Soloway, E., 339,355 Spears, W.M., 172,195 Spillane, R., 191, 195 Srinivas, M., 166, 187, 190,195, 196 Starkweather, T., 187, 195 Steele, G. Jr., 107, 135,151 Storr@sten,M., 279,295,319 stout, c.,334, 335,354 Straub, P., 13, 19,22,25, 33, 35,40,41,44, 4 6 , 4 7 , 5 0 , 5 152 ~ Streitz, N.A., 297,319, 320 Stumm, M., 66,70,101 Su, H.M., 72,90,102 Subramaniam, S., 63, 64,67, 68,70,71,92, 102 Suchman, L., 279, 31 1,313,320 Sumita, U., 210,211,267 Sundaresan, N., 127, 135,153

Sunderam, V., 141,150 Sussman,A., 107, 133, 135, 136, 142, 144. 149,149,150,151, 152 Swenson, K.D., 52 Syri, A,, 280,318

T Takagi, K., 170, 185, 186,194 T a n k a , S., 290,317,319 Tandri, S., 66,70, 101 Tang, J.C., 289,290,291,292,316,318,320 Tang, P., 56,102 Taylor, R.P., 354 Thompson, W.E., 214,267 Tinker, R., 330,355 Totty, B.K , 72, 101 Touzeau, P., 13.52 Towell, E., 298,320 Towell, J.F., 298,320 Townshend, J., 133,152 Trachtenberg, M., 205,267 Trakhtenbrot, M.B., 4,51 Trelevan, P.C., 166, 195 Trevor, J., 279,320 Trigg, R.H., 302,320 Trivedi, A.K., 210,267 Tseng, C.-W., 117, I51 Turkle, S., 337. 352,356 Tzen, T.H., 57.59,102

U Ujaldon, M., 146, 147, 152, I5S Urnes, T., 280.317 US Congress Office of Technology Assessment, 339, 341, 342,356 Uysal, M., 114, 116,150

V Valacich, J.S., 296,319 van der Aalst, W.M.P., 13, 33, 50.52 van Melle, B., 292,319 Vecchi, M.P., 158, 170,196 Vemuri, R., 188, 189,195 Vetmeland, R.E., 116, 151

364

AUTHOR INDEX

Verrall, J.L., 212,214,266 Vogel, D.R., 296,319 Vollmann, T.E., 191, I95 von Eicken, T., 127, I51 vonHanxleden, R., 120, 146, 147,150, 151 Vose, M.D., 176,195

W Walker, D.W., 141, I52 Wang, L., 148, I52 Waugh, M., 335,355 Weiser, M., 121, I53 Weiss, A., 57, IOI Welch, B., 292,316 Wellman, B., 289,319 Wen, C.P., 143,150,153 Whitley, D., 187, 189, I95 Whittaker, S., 290,320 Wijshoff, H.A.J., 147, I49 Williams, E., 288,320 Wilmoth, R., 116,152 Wilson, B., 280,294,303,317,320 Winograd, T., 3, 12.51, 274,280, 287,307, 319 Wolf, C.G., 295,320 Wolverton, R.W., 208,267 Wood, C.C., 301,320 Woolley, D.R., 326,356 Woronowicz, M.S., 133, I52

Wu, J., 117, 118,153 WU,M.-Y., 145, I49

x Xie, M., 208,211,267

Y Yamada,S., 199,200,214,218,219,220, 267 Ymg, K:Z., 201,205,221,222,229,265, 267 Yang, S.X., 127, 135, 141,149. I53 Yang, T., 146, I50,153 Yelick, K., 127, 143,150, IS!, I53 Yew, P.C., 56,71,72,82,90,93,101,102 Yue, K.K., 58,59,61,63,90,102

2

Zanichelli, F., 6, 13,50 Zapata, E.L., 146, 147,152,153 Zbyslaw, A. 274,316 Zellcowitz, M.V., 9, 12,20,51, 198,267 Zhu, C.Q., 82,102 Zima, H.P., 146, 147,150, I53 Zosel, M., 107, 135, I51 Ziillighoven, H., 279,3316

Subject Index

A A++, 145 Abstraction, 9-10, 24 Academic materials, methods for creating and disseminating, 348 Action Workflow system, 307-308 Activity, addition, 38 Activity node, 8 Actors, 5 Ada, 36,40 Adaptive genetic algorithm (AGA), 188, 190 Adaptive irregular applications, SPMD execution of, 134 Adaptive irregular programs, 109 Adaptive mesh refmement (AMR), 144 Adaptive mutation, 173 Adaptive problems, 114 compiler projects, 145- 147 Adaptive program, schedule generation for, 1I 3 Adjoint-convolution process, 90 Adjusted fitness, 165 Advanced placement (AP) classes in programming, 328 examinations, 330 Affinity scheduling algorithms, 93 comparison of, 68-69 Affinity scheduling methods, 92 Affinity scheduling schemes (AFS), 64,67, 70,92 comparison of, 67 AFOTEC Software Maturity Evaluation Guide, 200,241 AMR++, 145 Annealing, 157-159 Ant colony, 182-183 Ant cycle, 186-187 Ant system, 184 Application-sharing technologies, 292 Applications programmer interface (API), I41 - 142 Array-based multiple structured grid applications, 134

Association for Computing Machinery Special Interest Group in ComputerHuman Interaction (ACMSIGCHI), 274 Asynchronous computer conferencing, 299 Asynchronous shared spaces, 299-304 Authoring program, 333 Autocatalytic behavior, 182 Automated processes, 3-4 attributes, 3 Average closure rate (ACR), 241 -244,25 1, 257.261

B Base model, 34-36 definition, 35 exceptions, 36-37 threading, 35-36 Basic constructs, 25 Basic models, 8 BASIC programming, 328,338 Bayes’ theorem, 213,214 Bayesian models, 212-214 Behavioral anomalies, 36 Behavioral model, 35 Block scheduling, 55 Boltzmann probability factor, 158 Boundary adjustment stage, 184 Bounded system, 16

C C, 40 C++, 125 Calculation balancing stage, 184 Calculator-Based Laboratories (CBLs), 330 Calendars, 304-305 Call semantics, 20 Calls relation, 9 Causal net, 15 CD-ROMs, 333,345,351

365

366

SUBJECT INDEX

Cedar shared-memory multiprocessor, 89, 99 CHAOS, 108-1 17, 148 compilation methods, 117- 124 overview, 108- 11I runtime preprocessing, 111- 116 runtime system, summary, 116-1 17 schedule-generation process, 112 CHAOS++, 124-135 as prototype library, 135 performance evaluation, 133- 134 user interface, 134 CHARMM, 1 16,117 Chen’s runtime parallelization scheme (CRPS), 83-87,89,99, 100 Choice, 8 nodes, 8 Chromosomes, 163 Chunk sizes, 58,59,63,73-74 CICN, 8- 11,26,48 components, 19 definition, 17 -20 marked, 18 net, 41 nodes, 18 semantics, 18 ClearBoard, 293 Clustering stage, 184 Coevolution, 174- 175 Colaborative problem solving, 334 Collaborative systems, 2 Collective behavior, 182 Collective communication routine, 122 Combinatorial optimization, 158 Communication overheads, 147 Communication schedule, 1 12, 114 computation, 139- 141 Communication technologies, 285 -291 Communication time, 94 Compiler methods for irregular applications, 145 - 147 Complex behavior, 182 Complexity metric models, 205 Computational aerodynamics, 133 Computer-assisted designlcomputer-assisted manufacturing (CAD/CAM), 271 Computer-assisted instruction (CAI), 323, 325 educational gains using, 33 1 limitations, 331

potential for individualized instruction, 327 remedial feedback, 327 Computer-assisted software engineering (CASE), 271 Computer literacy, 328 Computer-mediated communication (CMC), 27 3 Computer-supported cooperative work (CSCW), 2,269-320 background, 270-272 forum, 270-272 in Asia, 28 1-282 in North America and Europe, 278 -28 I multidisciplinary challenge, 275 organizational systems, 276-278 project and large-group support, 277-278 research and development contexts, 272-275 research that spans the boundaries, 274-215 small-group application, 276-278 terminology, 27 1 see also Groupware Computers as instructional devices, 326-327 as reference librarian, 344-346 as subject of study, 327-323 in schools, 323-335 Computing capacity, 322 Concurrent engineering, 27 1 Concurrent Graph Library, 144 Connection addition, 37 Consistency of model, 33-34 Consistent labels, 28-29 Consistent models, 34 Constructivism, 331 Context-dependent model, 5 Context-independent model, 5 Control anomalies, 6-7 Control constmcts, 8- 12 Control flow diagrams (CFD), 5, 8 Control ICN. See CICN Control languages, 40 Control model, 4-5, 17-25 definition, 17- 18 Convergence aceleration methods, 162 properties, 189

367

SUBJECT INDEX

Cooperation in Science and Technology (COST) framework, 280 Coordination technologies, 304-309 Copy rule, 9, 10 CO-TECH project, 280 Credit application process, 10 Critical path method (CPM), 5, 8 , 9 , 4 3 Crossover points, 183 Crossover rate, 166 Cumulative closed change points (CCP), 254 Cumulative fault curve, 234 Cumulative open change points (OCP), 254 Cyclic scheduling (CYC), 55, 95, 96 Cyclic staggered distribution (CSD), 76,79, 94.96.99

D Dangling activities, 41 Data distribution statement, 145 Data movement routines, 132- 133 Data parallel language, 146 Data partitioning strategies, user-specified. 146 Data processing (DP), 273 Deadlock, 16, 33, 36, 89 definition, 21 Deadlock-freedom, 16, 21,40,47 Debugging models, imperfect, 209-21 1 Deceptive problems, 176 Delayed S-shaped (DSS) model, 218,225. 227-230,232,237-238 Desktop conferencing architectures, 293-294 products, 295 Deterministic algorithm, 189 Deterministic solution refinement scheme, 188 Differential equations, 182 Diffusion of innovation in schools, 335-351 Direct simulation Monte Carlo (DSMC) method, 133 Directed Acyclic Grid Hierarchy (DAGH), 143 Distance leaning, 271 Distributed array accesses, 119

Distributed asynchronous concurrent model, 169 Distributed decision, 40 Distributed memory machines, 110 Distributed memory parallel mechines, 146 Distributed pointer-based data structures, 129- 132 DOACROSS loops, 53-103 allocation, 73 irregular, 71,79-87,93,99 parallelization, 53 - 103 regular, 71 -76,93,94 DOACROSS scheduling algorithms comparison, 77 irregular, 88 DOACROSS scheduling schemes, 71 -90 comparison, 76-79 irregular, comparison, 87-89 DOALL loop scheduling comparative analysis, 59-63 on NUMA multiprocessors, 63 -67 DOALL loops, 53-103 DOALL scheduling algorithms, 55-59,90. 91 comparative analysis, 60 Document management systems, 300 -302 DOLPHIN, 297 Duane model, 2 19 Dvorak keyboards, 336 Dynamic distributed data StrucLures, 134 Dynamic Distributed Finite Element Mesh (DDFEM), 143- 144 Dynamic partitioned affinity scheduling (DPAS), 65,67,70,71,92 Dynamic scheduling, 5 4 ,5 6 ,7 3

E Editing, 180 Educational computing, trends, 323- 335 Educational technology, 336 standards, 337 -338 Educational Testing Service (ErS), 330 Electronic catalogs, 345 Electronic mail (email), 286-287 Elitist ranking, 185 Encapsulation, 180 ENIAC, 322,3.5 1 Epigenesis, 162

SUBJECT INDEX

Epochs, 168 ESPRIT projects, 280 Euler solver, 124 Euler’s constant, 58 Evolutionary algorithms, 160 Evolutionary processes, 155-196 mappings, 162 see also specific techniques and applications Evolutionary programming (EP), 160- 161 Evolutionary strategies (ES), 160-161 comparison with genetic algorithms, 175-176 (m+n)-ES, 161-162 Exception handling facilities, information processes, 40 Exclusive-OR function, 179 neural networks, 190 Executor, 82-84, 86,87,89,99, 110, 112, 120 EXP NHPP model, 217-220,225,226, 228-229,232,237 Expected number of failures, 203-204 Exponential NHPP, 212

F Factoring, 58-59,63,73,74,91 Failure count data, trend test, 223 Failure rates, 203,207,208, 212 Fault seeding models, 205 Feasible connector, definition, 45-46 Fiduccia Mattheyes (FM) algorithm, 189 Fitness evaluation technique, 180 Fitness representation, 165 Fitness values assignment, 165 scaling, 172- 173 Fixed-size chunking (FS), 57,63,91 Forall statement, 145 Force of mortality, 203 Fortran 77,146 FORTRAN code, 87 FomanD compilation system, 124 Forward Kolmogorov’s differential equations, 208 4-bit parity, neural networks, 190 Free-choice nets, 14-15, 19,25 Function set, 180

G Game theory, 183 Gene-invariant genetic algorithm (GIGA), 174 Generation gap, 166 Generic model, 4 Generic process language, submodels, 4 Genetic algorithms (GAS), 160, 162-168 adaptive, 188, 190 classification, 166- 167 comparison with evolution strategies, 175- 176 constrained optimization, 167- 168 dynamic vs static, 166 elitist vs pure, 166 extensions, 168-176 extinctive vs preservative, 166 family concept, 174 gene-invariant, 174 generational vs steady state, 166-167 hybrid, 170,184 implicit parallelism, 167 initial population generation, 168 intelligent operators in, 170- 171 left vs right extinctive, 166 messy, 171- 172 multimodal function optimization, 173-174 parallelism in, 169 parameterized uniform crossover, 172 parameters, 166 premature convergence, 171 reasons for failure, 176 routing problems, 188 scaling of fitness values, 172-173 selection strategies, 164 simple, 188 speed-up techniques, 193 symbiosis in, 175 use of subpopulations, 168- 169 Genetic operators, 188 Genetic programming (GP), 178-183 main parameters, 180 problem applications, 181- 183 secondary operators, 180 GENITOR, 189 Geographic information systems, 133 Ghost objects, 128, 129 Give-N-Take. 147

SUBJECT INDEX

Global control unit (GCU), 74, 80,81 Global inspector phase, 85 Global School Network, 350 Globally addressable objects, 127- 129 Goel-Okumoto exponential model. See EXP NHPP model Graphic Interchange Format (GIF), 344 Graphic user environment, 350 Graphic user interface (GUI), 349 Graphical languages, 7 Graphics integration in documents, 349 Graphing calculators, 346-347 Group activity categorization, 282-284 Group decision support systems (GDSS), 277-278 Group support systems (GSS), 278 GroupSystems, 296 Groupware, 271,273 categorization by technology, 285 features that support communication, 284 features that support coordination, 284-285 features that support information-sharing and collaboration, 284 future directions, 3 13 new approaches, 311-313 research and development context, 272 social and organizational challenges, 310-311 technical challenges, 309-310 typologies, 282-285 Guided self-scheduling (GSS), 57-58,63, 66,73, 74, 91,92

H Hash table, schedule generation with, 115 Hazard rate, 203 High Performace C/C++, 135 High Performance Fortran, 135,138,139, 146 Hill-climbing operators, 184, 189- 191 Homogeneous Poisson Process (H) hypothesis, 222 Hybrid genetic algorithms (HGAs), 170, 184 Hydra, 289-290 Hypecard, 333 Hypercube model, 185 Hypermedia, 333

369

Hyperstudio, 333,352 Hypertext, 303

I Image processing, 133 Image segmentation, 133-134 Imperfect debugging models, 209-21 1 Implicit parallelism, 167 Incremental case composition, 34,43-44 Index analysis, 112- 114 Index array, 114 Indirect indexing, 109 Indirection arrays, 109, 111, 114 Individual decisions, 25 Inflection parameter, 218 Inflection S-shaped model (ISS), 218,225, 227,228,230-233,238-239 Information Control Net (ICN), 10,307 Information Control Networks, 308 Information management technologies, 302-304 Information processes exception handling facilities, 40 models, 4 Information Processing Society of Japan (IPSJ), 281 Information publishing, 350-351 Information systems, 273 multi-threaded, control in, 1-52 Information technology, 273, 322 Initial defect model, 211,218 Input domain models. 205 Input Process Output (IPO) model, 307 Inspector, 82-84,87,99,110, 112, 120 Integrated text and graphics in schools, 347-348 Intelligent operators in genetic algorithms, 170-171 Intensity function, 204,211 Inter-failure time data, trend test, 222-223 Inter-iteration dependencies, 93 -94 International Association for Technology in Education (ISTE), 340 Internet, 334-335.351 Interoperability issues, 135- 143 lnterprocedural partial redundancy elimination, 121- 124 Intraprocedural compilation, 123

370

SUBJECT INDEX

Intraprocedural optimization, 123, 124 IPRE analysis, 124 Irregular applications, 105- 153 compiler methods for, 145- 147 irregularly coupled regular mesh, 144- 145 runtime support for, 143- 144 Irregular problems, 112 Irregular programs, 108 optimization, 147 Iteration execution time, 94 Iteration index, 83 Iteration space graph (ISG), 80-8 1, 87 Iterations, 9, 62, 71,72, 75,78,79,89, 92, 95

J Jacobi iterative algorithm, 92 Jelinslc- Moranda (JM) de-eutrophication model, 205-211,214,217-218,224 Job shop scheduling problem (JSP), 19 1- 192

K Kali compiler, 145 KeLP, 144 Kepler, 331 Key fields, 89 Keyboards, 336 K-partition problem, 185

L Label(s), 48 addition, 28 and processes, 31 -32 equality, 28 equivalence, 29-30,48 multiplication, 28 normal form, 28 LabNet, 330 Language constructs, 7 Laplace factor, 222,223 Laplace test, 221 in NHPP models, 224 optimality, 223-224 Laplace trend factor, 233 Laplace trend statistic, 233, 236, 245

Laplace trend test, 222-224 Layer assignment, 189 Leaf nodes, 179 Learning disabilities, 334 Lexically backward data. 55 Lexically backward dependence (LBD), 55, 72,79 Lexically forward data, 55 Lexically forward dependence (LFD), 55,72 Library Media specialists mailing list, 345 Light-weight communication schedules, 116 Linear Recurrence Equation, 94 LinkWay, 333 Lisp, 40, 179, 182 Littlewood-Verrall Bayesian model, 224 LiveBoard, 292 Live system, 16 Livermore loops, 94-96 Load balancing, 70 Local inspector phase, 85 Local-choice nets, 22 Locality-based dynamic scheduling (LDS), 66-67 Logarithm Poisson NHPP, 212 Logo, 328,330,336 Logo floor turtle, 329 Loop allocation techniques, 55 Loop scheduling algorithms. See DOACROSS; DOALL Loops of intermediate parallelism, 54 scheduling algorithms for, 54 see also DOACROSS; DOALL Los Altos Trail, 181 Lotus Notes, 302, 303 LPARX, 144

M Macro expansion, 9 Macromind Director, 348 Mainframes, 325 MAJIC system, 290 Management information systems (MIS), 273 Mapping structure, 129 Markov chains, 167 Markov process models, 205 MASSIVE, 298-299 Mathland, 328

371

SUBJECT INDEX

Maximization problem, 156 Mean time to failure (MTTF), 203 Mean value functions, 204, 218,220, 225 -226 Meeting facilitation systems, 281, 296-297 Messy genetic algorithms (MGAs), 171 172 Meta-Chaos, 135-143 applications programmer interface (API), 141-142 communication between two libraries in two different programs, 138 communication between two libraries within same program, 137 communication schedule computation, 139-141 data specification, 138- 139 mechanism overview, 136-138 performance, 142- 143 runtime library, 137 Meta-library, 136 Metropolis criterion, 158 Metropolis procedure, 158- 159 terminology, 158- 159 Microcomputer-based laboratories (MBLs), 329-330 Microcomputers, 323, 325, 349 Mobile objects, 126- 127 Model closure rate (MCR), 241 -244, 257, 26 1 Model CR, 25 1 Model labeling, 29 MOOS, 298 Mortgage loan process model, 11 MUDS, 297-298 Multiblock Parti, 134, 138, 144, 148 Multicast Backbone (MBONE), 290-20 1 Multicast video and audio, 290-291 Multi-chip modules (MCMs), 188 Multimedia in schools, 333-334, 348, 352 Multimodal function optimization, 173- I74 Multiple connection, addition, 38-39.42 Multiple-failure Markov process model, 2 10 Multiple response, 24.40 Multi-headed information systems, control in, 1-52 Multi-user applications, 281, 292 Multi-user whiteboards, 28 1 Musa-Okumoto logarithmic Poisson model, 224 Mutation, 162, 180, 190 ~

Mutation operator, 173 Mutation rate, 166 Myst, 333

N National Council for Accreditation of Teacher Education (NCATE), 340 National Council of Teachers of Mathematics (NCTM), 338 N-bounded system, 16 Netscape, 352 Network model, 169 Networks in schools, 335 Neural networks 4-bit parity, 190 encoding and decoding, 190 exclusive-or function, 190 training, 189-191 weight optimization, 189- 191 NHPP models, 197.-267 characteristic points, 228 defiiitions, 214 exponential, 2 12 Laplace test in, 224 Laplace trend statistic for estimating characteristic points, 233-234 niodel fitting, 236-239 multiple solutions, 228 pxameter estimation, 225 based on characteristic points, 23 I properties, 2 16 software reliability models based on, 2 I7 -220 summary, 219 see also specific models Niche, 173 Nodes, 8, 13.42, 144, 178 with unique identifiers, 129- 131 without unique identifiers, 13 1- 132 Non-homogeneous Poisson process. See NHPP models Non-routine processes, 3 Non-uniform memory access (NUMA) systems. 63 -67 Normalized fitness. 165 NUMA multiprocessors, DOALL loop scheduling on, 63~-67 Numeric parameters, 180

372

SUBJECT INDEX

0 Office automation, 271,274 Oliver30 problem, 187 One-point crossover, 185 Online reference resources, 345 Operational test and evaluation (OT&E), 200 Optimal conuol problems, 181 Optimization processes, 155-196 see also specific techniques and applications Optimum release time, 214 Or-join nodes, 8 Or-join-split node, 9 Or-nodes, 8, 18,38,39 adding a connection between, 38 addition, 37 Or-split nodes, 8 Order statistics models, 21 1-212 Organizational model, 5 Organizational processes, 3 Organizations, 2 Overloaded markings, 47 Overloaded state, 23, 33, 36

P P++, 145 PARADIGM compiler, 146 Parafrase-2, 146 Parallel Compiler Runtime Consortium, 135 Parallel computer architectures, 106- 107 Parallelism, 4,6,7, 9, 36,54, 96-97 in genetic algorithms, 169 Parallelization, 147 of DOALL and DOACROSS loops, 53-103 Parameterized uniform crossover, 172 Parascope-based Fortran D compiler, 146 Parasites, 174-175 PARKA, 116 PARTI, 114 Partial order behavior, 15 Partial redundancy elimination (PRE), 12 1, 147 Particle-in-cell (PIC) codes, 109 Partitioned affinity scheduling, 64-65 Partitioning between two processors, 128 Partitioning problems, 184-185, 189

PERFECT benchmark suite, 7 1 Permutation, 180 PERT chart, 7 Petri nets, 13-17,23, 31,41-42, 45, 47 free-choice, 25 high-level, 13 low-level, 13 mapping between, 20 semantics, 18-22 Pheromone trails, 182 PILOT, 326 Place invariants, 16 Placeltransition (P/T) nets. See P/T nets Placdtransition (P/T) systems. See P/T system Places, 13 PLATO, 325-326 Pocket calculators, 346 Pointer-based codes, runtime support for, 124-135 Population-based incremental learning (PBIL), 177 Population size, 166 Power law NHPP, 212 Re-synchronized scheduling (PSS), 74-75, 79-81,87,89,94,99 Printed documents, 349 Process, definition, 15-16 Process automation, 2 Process behavior, 5 Process Definition Tools, 307 Process model building, 35 components, 4-6 control specification, 7- 12 Process modeling languages, 5 Processes and labels, 31 -32 Productivity tools in schools, 332-333 Program slicing-based loop transformations, 118-121 Project management skills and coordination, 334 Project PLATO, 325-326 Pseudo-random vectors, 188 P/Tnets, 11-15, 18,42 definition, 19-20 P/T system behavioral properties and invariants, 16- 17 definition, 20

SUBJECT INDEX

dynamic behavior, 16 Public Education Network (PEN), 335 PYRROS, 146

Q Quadrature assignment problem (QAP), 19I Qualitative parameters, 180 QWERTY keyboard, 336

R Ranl-based selection, 165 RaspIVPL, 12,40 Rate of occurrence of failures (ROCOF), 203-204 Raw fitness, 165 Readiness assessment. See Software readiness assessment Real-time conferencing, 287 -290 Real-time shared spaces, 281-299 Recursion, 12 Region, 138 linearization, 138 Remaining open change points (ROCP), 254 Replication, 12 Representation, 162 Research and Development in Advance Communications Technology in Europe (RACE), 280 Robotic devices, 329 Robotic planning, 181 Roles, 5 Roots, 179 Roulette wheel selection, 164 Routine processes, automation, 3 Routing problems, 188 Runtime parallelization schemes, 82 Runtime support compilation and tools, 105- 153 for irregular applications, 143- 144 for pointer-based codes, 124- 135

S S-expressions, 178, 179 S-invariants, 16, 17, 31

373

Sampling error, 176 Santa Fe Trail, 181 Satellite downlinks in schools, 323 Scalable dynamic distributed hierarchical array (SDDA), 143 Scaling window, 166 Schedule generation, 112- 114, 116 for adaptive program, 113 optimization, 115 with hash table, 115 Scheduling, 304-305 Scheduling algorithms for loops, 54 Schema disruption, 176 Schema theorem, 176 School budgets, 327 School reform, 328-331 Schools access to technology, 342-343 computers in, 323-335 as reference librarian, 344-346 diffusion of innovation in, 335-352 examples of successful diffusion, 344 integrated text and graphics in, 347-348 Internet in, 334-335,350 limitation, 340-341 multimedia in, 333-334, 348, 351 networks in, 335 productivity tools in, 332-333 promising candidates for successful diffusion, 347 -35 1 technological barriers, 343 -344 technology integration, 321 -354 Search algorithms, comparison, 192 Search techniques, 177-184,185-187 classical or calculus-based, 156 enumerative, 157 random, 157 Selection, 162 strategy, 166 Self-scheduling, 56-57, 89.90 Semi-Markov process models, 205 Semi-master-slave model, 169 SEPIA, 303 Sequencing, 8 Sequential loops, 54 Sequential processes, 6 Shared whiteboards, 292 Shared-information-space technologies, 281-304 Shift work, 303

374

SUBJECT INDEX

Signal detection network, 190 Simple control, 24, 25, 33, 35,40,44,48,49 and threads, 32-33 Simple control property, 17-25 Simple genetic algorithm (SGA), 163- 164, 188 Simple process model, 17 Simulated annealing (SA), 157- 159, 185-187, 189 algorithm, 159 original formulation problems, 159 Single-input-single-output, 10, 24 Single-response, 24,48 Slice, 121 Small-group support, 273, 274 Software-Aided Meeting Manager (SAMM), 296 Software availability models, 210-21 1 Software development, 198 - 199,273 Software engineering (SE), 273-274 Software error, 204 Software failure, 204 Software failure data, trend testing, 220-224 Software fault, 204 Software problem reports (SPRs),197 Software quality, 199 assessment, 204 Software readiness analysis Air Force system, 254-257 commercial system, 244-2.54 Software readiness assessment, 197-267 background, 202 basic concepts and notations, 202-205 case studies, 241 -263 studies on, 214 Software reliability, 197-267 background, 202 basic concepts and novations, 202-205 definition, 204-20.5 evaluation, 234-239 graphical procedures, 221 statistical techniques, 221 -222 Software reliability models, 200, 205 based on NHPP, 2 17 220 classification, 206 software testing, 200 Sparse codes, 147 Sparse computations, 146 Sparse matrices, 146 SPMD, 87 -

SPMD execution of adaptive irregular applications, 134 SSPR cumulative fault data, 235 SSPR data, 239 SSPR data and Laplace statistic, 236 SSPR monthly fault data, 235 Staggered distribution (SD) scheme, 75-76, 79.94-97.99 Stamps, 114 Standardized fitness, 165 Start node, 18 State invariant, 16 Static chunking (SC), 55, 96 Static irregular applications, 109 Static irregular programs, I 11 Static scheduling, 54, 56 Stochastic remainder selection, 164 with replacement selection, 164 without replacement selection, 164 Stochastic universal selection, 165 Strong context preserving crossover (SCPC), 183 Sub-objects, 125 Subpopulations, 168- 169, 185 Subthread, definition, 30-3 1 SUPERB, 147 Superthread, definition, 30-31 Symbiosis, 174-175 in genetic algorithms, 175 Symbolic differentiation, 182 Symbolic expressions, 178 Symbolic integration, I82 Symbolic regression, 181 Synchronization, 28, 64, 89-90 Synchronous master-slave model, 169 Syracuse Fortran 90D compiler, 145, 146

T TABU list, 170 Tabu search, 170, 185-187 Teacher education, 337, 338-342 current status, 339 pre-service, 340 Team rooms, 303-304 TeamFocus, 296,297 Technical Education Research Centers (TERC), 329-330

375

SUBJECT INDEX

Telemedicine, 271 Television in education and society, 350-351 in schools, 322 Terminal set, 180 Termination criteria, 181, 186 Testing-effort function, 219 Thread labels, 27-31 Thread metaphor, 25 goals, 27 Threaded discussions, 299 Threads alternatives between, 38-40 alternatives within, 37-38 and behavior, 27,31-33 and simple control, 32-33 applications of theory, 33-46 base model, 35-36 definition, 27,30 Threads of control concept, 26 theory, 25-33 Ticket table, 84-85, 86 ToolRook, 333 Tournament selection, 165 Transitions, 13 Transplant descriptor, 121 Trapezoid self-scheduling (TSS),59,63, 91 92 Traveling salesman problem (TSP), 185-187 Trees, 178, 190 Trend testing techniques, 22 1 Trigonometric identities, 182 Tutorial software, 326, 327 Typewriter as acceptable technology, 349

Usefulness, 23 llseless activities, 22, 24 llseless place, 23 User-specified data partitioning strategies, 146

V Vector loops, 54 Vegetation Index Measurement (VIM), 133 Video conferencing, 287-290.292 Videocassette recorder (VCR) in schools, 322 Virtual linearization, 138 Virtual processor array declaration, 145 Virtual Reality Modeling Language (VRML), 298 VLSI design and testing problems, 187- 1x9 VPURasp, 47

W Weak context preserving crossover (WCPC), 183 Well-behaved problems, 156 Word processors, 332,349 graphics-based, 350 Work process modeling, 305 Workflow Enactment Engine, 307 Workflow Management Coalition, 306 Workflow management systems, 305-309 Workgroups, 304 World Wide Web (WWW), 28 1, 344, 35 I , 352 Wrapped partitioned affinity scheduling ( W A S ) , 66,67,70,92

U Unbalanced connectors, 11,40-42 generalization, 44 -46 Unspecified situations, 44 US Congress Office of Technology Assessment, 330,341

Z Zhu-Yew runtime parallelization scheme (ZYRPS), 82-83 ZYRPS, 87,89,99, 100

This Page Intentionally Left Blank

Contents of Volumes in This Series

Volume 21 The Web of Computing: Computer Technology as Social Organization ROBKL~NG AND WALTSCACCH~ Computer Design and Description Languages SUBRATA DASCUPTA Microcomputers: Applications, Problems, and Promise ROBERT C. GAMMLL Query Optimization in Distributed Data Base Systems M A u SACCO AND S. BINGYAO GIOVANNI Computers in the World of Chemistry PETERLYKOS Library Automation Systems and Networks JAMFSE. RUSH

Volume 22 Legal Protection of Software: A Survey MICHAEL C. GEMIGNANI Algorithms for Public Key Cryptosystems: Theory and Applications S. LAKSHMNARAHAN Software Engineering Environments I. WASSERMAN ANTHONY Principles of Rule-Based Expert Systems G. BUCHANAN AND RICHARD 0. DUDA BRUCE Conceptual Representation of Medical Knowledge for Diagnosis by Computer: MDX and Related Systems AND SANIAV ~ A L B. CHANDRASEKARAN Specification and Implementation of Abstract Data Types ALFST. B m s s AND SATISH THATTE

Volume 23 Supercomputers and VLSI: T h e Effect of Large-Scale Integration on Computer Architecture LAWRENCE SNYDER Information and Computation J. F. T R AAND ~ H. WOZNIAKOWSKI The Mass lmpact of Videogame Technology THOMAS A. DEFANTI Developments in Decision Support Systems ROBERT H. BONCZEK, CLYDEW. HOLSAPPLE, AND ANDREW B. WHINSTON Digital Control Systems PETER DORATO AND DANIEL PETERSEN

377

378

CONTENTS OF VOLUMES IN THIS SERIES

International Developments in Information Privacy G. K. GUPTA Parallel Sorting Algorithms S. LAKSHMIVARAHAN, SUDARSHAN K. DHALL,AND LESLIEL. MILLER

Volume 24 Software Effort Estimation and Productivity S. D. CONTE,H. E. DUNSMORE, AND V. Y. SHEN Theoretical Issues Concerning Protection in Operating Systems MICHAEL A. HARRISON Developments in Firmware Engineering SUBRATA DASGUPTA AND BRUCE D. SHRIVEX The Logic of Learning: A Basis for Pattern Recognition and for Improvement of Performance RANAN B. BANERJI The Curent State of Language Data Processing PAULL. GARVIN Advances in Information Retrieval: Where Is That i#*&@$ Record? DONALD H. %AFT The Development of Computer Science Education WILLIAM F. ATCHISON

Volume 25 Accessing Knowledge through Natural Language AND G O R D ~ MCCALLA N NICKCERCONE Design Analysis and Performance Evaluation Methodologies for Database Computers R. STRAWSER STEVEN A. DEMUIUIAN, DAVID K. HSIAO,AND PAULA Partitioning of Massive/Real-Time Programs for Parallel Processing 1. LEE, N. PRYWES, AND B. SZYMANSKI Computers in High-Energy Physics MICHAEL METCALF Social Dimensions of Office Automation ABBEMOWSHOWITZ

Volume 26 The Explicit Support of Human Reasoning in Decision Support Systems AMITAVA DUTTA Unary Processing A. DOLLAS, J. B. GLICKMAN, AND C. O’TOOLE W. J. POPPELBAUM, Parallel Algorithms for Some Computational Problems ABHAMOITRA AND S. SITHARAMA IYENGAR Multistage Interconnection Networks for Multiprocessor Systems S . C. KOTHARI Fault-Tolerant Computing WINGN. TOY Techniques and Issues in Testing and Validation of VLSI Systems H. K. RECHBATI

CONTENTS OF VOLUMES IN THIS SERIES

379

Software Testing and Verification LEEJ. WHITE Issues in the Development of Large, Distributed, and Reliable Software C. V. RAMAMOORTHY, ATULPRAKASH, VIJAY GARG, TSUNEO YAMAURA, AND ANWAM BHIDE

Volume 27 Military Information Processing JAMESSTARK DRAPER Multidimensional Data Structures: Review and Outlook s. S I T H A R A M IYENGAR, R. L. KASHYAP, v. K. VAISHNAVI, AND N. s. v. RAO Distributed Data Allocation Strategies AND ARUNA RAO ALANR. HEVNER A Reference Model for Mass Storage Systems W. MILLER STEPHEN Computers in the Health Sciences K E V I N c . O’KANE Computer Vision AZRIEL ROSENFELD Supercomputer Performance: The Theory, Practice, and Results OLAFM. LUBECK Computer Science and Information Technology in the People’s Republic of China: The Emergence of Connectivity JOHNH. MAER

Volume 28 The Structure of Design Processes SURRATA DASCUFTA Fuzzy Sets and Their Applications to Artificial Intelligence ABRAHAM KANDELAND MORDECHAY SCHNEIDER Parallel Architecture for Database Systems A. R. HLJRSON, L. L. MILLER,S. H. PAKZAD, M. H. EICH,AND B. SHIM Optical and Optoelectronic Computing MIRMOJTABA MIRSALEHI, MUSTAFA A. G. ABUSHAGUR, AND H. JOHN CAULFIELD Management Intelligence Systems MANFRED KOCHEN

Volume 29 Models of Multilevel Computer Security JONATHAN K. MILLEN Evaluation, Description, and Invention: Paradigms for Human-Computer Interaction JOHNM. CARROLL Protocol Engineering h h N G T. LiU Computer Chess: Ten Years of Significant Progress MONROE NEWBORN Soviet Computing in the 1980s w. JUDYAND ROBERT w. CLOIJGH RICHARD

CONTENTS OF VOLUMES IN THIS SERIES

Volume 30 Specialized Parallel Architectures for Textual Databases L. L. MILLER,S.H. PAKZAD,AND JIA-BING CHENG A. R. HURSON, Database Design and Performance MARK L. GILLENSON Software Reliability ANTHONY IANNINOAND JOHND. MUSA Cryptography Based Data Security GEORGE J. DAVIDA AM) Yvo DFSMEDT Soviet Computing in the 1980s: A Survey of the Software and its Applications W. CLOUCH R l c w W. JUDYAM)ROBERT

Volume 31 Command and Control Information Systems Engineering: Progress and Prospects STEPHEN J. ANDMOLE Perceptual Models for Automatic Speech Recognition Systems J. PALAKAL, AND PIERO COSI RENATO DEMORI,MATHEW Availability and Reliability Modeling for Computer Systems DAVIDI. HEIMANN, Nrrw MIITAL,AND KISHOR S. TMVEDI Molecular Computing MICHAEL CONRAD Foundations of Information Science ANTHONY DEBONS

Volume 32 Computer-Aided Logic Synthesis for VLSI Chips SABURO MUROGA Sensor-Driven Intelligent Robotics MOHAN M. TFUVEDI AM) CHUXIN CHEN Multidatabase Systems: An Advanced Concept in Handling Distributed Data A. R. HURSONAND M. W. BRIGHT Models of the Mind and Machine: Information Flow and Control between Humans and Computers KENTL. NORMAN Computerized Voting ROYG. SALTMAN

Volume 33 Reusable Software Components BRUCE W. WEIDE,WILLIAM F. &DEN, AND SWART H. ZWEBEN Object-Oriented Modeling and Discrete-Event Simulation BERNARD P. ZIEGLER Human-Factors Issues in Dialog Design AND MARTIN HELANDER THLAGARAJAN PALANWEL Neurocomputing Formalisms for Computational Learning and Machine Intelligence S. GULATI, J. BARHEN, AND S. S. IYENGAR

CONTENTS OF VOLUMES IN THIS SERIES

381

Visualization in Scientific Computing AND MAXINE D. BROWN THOMAS A. DEFANTI

Volume 34 An Assessment and Analysis of Software Reuse TEDJ. BIGGERSTAFF Multisensory Computer Vision N. NANDHAKUMAR AND J. K. AGGARWAL Parallel Computer Architectures RALPH DUNCAN Content-Addressable and Associative Memory LAWRENCE CHlSVIN AND R. JAMESDUCKWORTH Image Database Management WILLIAM I. GROSKY AND RAJIV MEHROTRA Paradigmatic Influences on Information Systems Development Methodologies: Evolution and Conceptual Advances R m v HWSCHHEIM AND HEINZ K. KLEIN

Volume 35 Conceptual and Logical Design of Relational Databases S. B. NAVATHE AND G. PERNUL Computational Approaches for Tactile Information Processing and Analysis HRISHIKESH P. GADAGKAR AND MOHAN M. TRIVEDI Object-Oriented System Development Methods ALANR. HEVNER Reverse Engineering JAMESH. CROSSII, ELLIOT J. CHIKOFSKY, AND CHARLES H. MAY,JR. Mu1ti processing CHARLES J. FLECKENSTEIN, D. H. GILL,DAVID HEMMENDINGER, C. L. MCCREARY, M. RIEHL,AND VIRGIL WALLENTIhE JOHND. MCGREGOR, ROY P. PARGAS, ARTHUR The Landscape of International Computing AND HSWCHUN CHEN EDWARD M. ROCHE. SEYMOUR E. GOODMAN,

Volume 36 Zero Defect Software: Cleanroom Engineering HARLAN D. MILLS Role of Verification in the Software Specification Process MARVIN V. ZELKOWITZ Computer Applications in Music Composition and Research AND JEFFREY E. HASS GARY E. WIITLICH,ERICJ. ISAACSON, Artificial Neural Networks in Control Applications V. VEMURI Developments in Uncertainty-Based Information GEORGE J. KLIR Human Factors in Human-Computer System Design MARYCAROL DAYAND SUSAN J. BOYCE

CONTENTS OF VOLUMES IN THIS SERIES

Volume 37 Approaches to Automatic Programming C . WATERS CHARLESRICHAND RICHARD Digital Signal Processing STEPHEN A. DYERAND BFUN K. HARMS Neural Networks for Pattern Recognition S. C. KOTHARI AND HEEKUCK OH Experiments in Computational Heuristics and Their Lessons for Software and Knowledge Engineering JURG NIEVERCELT High-Level Synthesis of Digital Circuits GIOVANNI DE MICHELI Issues in Dataflow Computing BENLEEAND A. R. HURSON A Sociological History of the Neural Network Controversy MIKEL OLAZARAN

Volume 38 Database Security G ~ H EF’ERNUL R Functional Representation and Causal Processes B. CHANDRASEKARAN Computer-Based Medical Systems JOHNM. LONG Algorithm-Specific Parallel Processing with Linear Processor Arrays JOSEA. B. FORTES, BENJAMIN W. WAH,WEIJASHANG, AND KUMAR N. GANAPATHY Information as a Commodity: Assessment of Market Value ABBEMows~owln

Volume 39 Maintenance and Evolution of Software Products VON MAYRHAUSER ANNELIESE Software Measurement: A Decision-Process Approach WARREN HARR~SON Active Databases: Concepts and Design Support THOMAS A. M ~ K Operating Systems Enhancements for Distributed Shared Memory VIRGINLA Lo The Social Design of Worklife with Computers and Networks: A Natural Systems Perspective ROBKLING AND TOMJEWETT

Volume 40 Program Understanding: Models and Experiments A. VON MAYRHAUSER AND A. M. VANS Software Prototyping ALANM. DAVIS

CONTENTS OF VOLUMES IN THIS SERIES

383

Rapid Prototyping of Microelectronic Systems AFQSTOLOS DOLLAS AND J. D. STERLING BARCOCK Cache Coherence in Multiprocessors: A Survey MAZIN S. YOUSIF, M. J. THAZHUTHAVEETIL , AND C. R., DAS The Adequacy of Office Models CHANDRA S. AMARAVADI, JOEYF. GEORGE, O L I V ~R.ALru SHENG, AND JAYF. NUNAMAKER

Volume 41 Directions in Software Process Research H. DIETER ROMBACH ANDARTIN IN V E R L A G ~ The Experience Factory and Its Relationship to Other Quality Approaches VICTORR. BASILI CASE Adoption: A Process, Not an Event JOCK A. RADER On the Necessary Conditions for the Composition of Integrated Software Engineering Environments DAVID J. CARNEY AND ALAN W. BROWN Software Quality, Software Process, and Software Testing DICKHAMLET Advances in Benchmarking Techniques: New Standards and Quantitative Metrics THOMAS CONTEAND WEN~MEI W. HWL An Evolutionary Path for Transaction Processing Systems CARLTON F’u, AVRAHAM LEFF,AND SHWWEI F. CHEN

Volume 42 Nonfunctional Requirements of Real-Time Systems G. KIRNER AND ALAN M. DAVIS TEREZA A Review of Software Inspections ADAMPORTER, HARVEY SIY,AND LAWRENCE VOTTA Advances in Software Reliability Engineering EHRLICH JOHND. MUMAND WILLA Network Interconnection and Protocol Conversion MINCT. LIU A Universal Model of Legged Locomotion Gaits S. T. VENKATARAMAN

Volume 43 Program Slicing DAVID W. BINKLEY AND KEITHBRIAN GALIAGHER Language Features for the Interconnection of Software Components RENATE MOTSCHNIG-PITRIK A N D ROLAND T. MIITERMEIR Using Model Checking to Analyze Requirements and Designs JOANNE ATLEE,MARSHA CHECHIK, A N D JOHN GANNON Information Technology and Productivity: A Review of the Literature AND SHINKYU YANG ERIKBRYNJOLFSSON

384

CONTENTS OF VOLUMES IN THIS SERIES

The Complexity of Problems WLL~AM GASARCH 3-D Computer Vision Using Structured Light: Design, Calibration, and Implementation Issues M. TRIVEDI FREDW. DEPEROAND MOHAN

Volume 44 Managing the Risks in Information Systems and Technology (IT) ROBERT N. CHARETTE Software Cost Estimation: A Review of Models, Process and Practice AND Ross JEFFERY FIONA WALKERDEN Experimentation in Software Engineering SHARI LAWRENCE PFLEEGER Parallel Computer Construction Outside the United States RALPH DUNCAN Control of Information Distribution and Access RALF HAUSER Asynchronous Transfer Mode: An Engineering Network Standard for High Speed Communications J. VETTER RONALD Communication complexity EYALKUSHILEVEZ

Volume 45 Control in Multi-threaded Information Systems AND CARLOS A. HURTADO PABLO A. STRAUB Parallelization OF DOALL and DOACROSS Loops-a Survey A. R. HUFSON, JOFORD T. LIM,KRISHNA M. KAVIAND BENLEE Programming Irregular Applications: Runtime Support, Compilation and Tools RAJADAS,GUYEDJLALI, PAULHAVLAK, JOELSALTZ,GAGAN AGRAWAL, CHIALIN CHANG, YUAN-SHIN HWANG, BONGKI MOON,RAVI PONNUSAMY, SHAMIK SHARMA, ALANSUSSMAN AND MUSTAFA UYSAL Optimization Via Evolutionary Processes SRILATA RAMAN AND L. M. PATNAIK Software Reliability and Readiness Assessment Based on the Non-homogeneous Poisson Process AMRITL. GOELAND KUNE-ZANG YANG Computer-Supported Cooperative Work and Groupware AND STEVEN E. POLTR~CK JONATHAN GRUDIN Technology and Schools GLENL. BULL