HYBRID PARALLEL EXECUTION MODEL FOR LOGIC-BASE SPECIFICATION ' LANGUAGE
HYBRID PARALLEL EXECUTION MODEL FOR LOGIC-BASED SPECIFICATION LANGUAGES
SERIES ON SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING Series Editor-in-Chief S K CHANG (University of Pittsburgh, USA)
Vol. 1 Knowledge-Based Software Development for Real-Time Distributed Systems Jeffrey J.-P. Tsai and Thomas J. Weigert (Univ. Illinois at Chicago) Vol. 2
Advances in Software Engineering and Knowledge Engineering edited by Vincenzo Ambriola (Univ. Pisa) and Genoveffa Tortora (Univ. Salerno)
Vol. 3
The Impact of CASE Technology on Software Processes edited by Daniel E. Cooke (Univ. Texas)
Vol. 4
Software Engineering and Knowledge Engineering: Trends for the Next Decade edited by W. D. Hurley (Univ. Pittsburgh)
Vol. 5
Intelligent Image Database Systems edited by S. K. Chang (Univ. Pittsburgh), E. Jungert (Swedish Defence Res. Establishment) and G. Tortora (Univ. Salerno)
Vol. 6
Object-Oriented Software: Design and Maintenance edited by Luiz F. Capretz and Miriam A. M. Capretz (Univ. Aizu, Japan)
Vol. 7
Software Visualisation edited by P. Eades (Univ. Newcastle) and K. Zhang (Macquarie Univ.)
Vol. 8
Image Databases and Multi-Media Search edited by Arnold W. M. Smeulders (Univ. Amsterdam) and Ramesh Jain (Univ. California)
Vol. 9
Advances in Distributed Multimedia Systems edited by S. K. Chang, T. F. Znati (Univ. Pittsburgh) and S. T. Vuong (Univ. British Columbia)
Forthcoming titles: Acquisition of Software Engineering Knowledge edited by Robert G. Reynolds (Wayne State Univ.) Monitoring, Debugging, and Analysis of Distributed Real-Time Systems Jeffrey J.-P. Tsai, Steve J. H. Yong, R. Smith and Y. D. Bi (Univ. Illinois at Chicago)
HYBRID PARALLEL EXECUTION MODEL FOR LOGIC-BASED SPECIFICATION LANGUAGES
Jeffrey J P Tsai Bing Li University of Illinois at Chicago
V f e World Scientific wb
Singapore Sinaapore • New Jersey •LLondon • Hong Kong
Published by World Scientific Publishing Co. Pte. Ltd. P O Box 128, Farrer Road, Singapore 912805 USA office: Suite IB, 1060 Main Street, River Edge, NJ 07661 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.
HYBRID PARALLEL EXECUTION MODEL FOR LOGIC-BASED SPECIFICATION LANGUAGES Copyright © 2001 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
ISBN 981-02-4096-1
Printed in Singapore by Uto-Print
Contents 1
Introduction
1
2
Current Approaches 2.1 Data Dependency Analysis 2.2 OR-Parallelism 2.3 AND-Parallelism 2.4 Backtracking
7 7 7 9 10
3
Overview of the N e w Approach 3.1 Non-monotonic Inheritance Expansion 3.2 Static Data Dependency Analysis 3.3 Automatic Transformation 3.4 Hybrid AND-OR Parallel Execution 3.5 Simplified OR-Parallel Model 3.6 Backtracking Elimination
13 15 15 17 17 18 19
4
FRORL Requirements Specification Language and Its Decomposition 4.1 Knowledge Representation through Object-Oriented Model . . 4.2 The Modeling Primitives of FRORL 4.2.1 Object Frame 4.2.2 Activity Frame 4.2.3 Reserved Words 4.3 Decomposition of a FRORL Requirements Specification . . . . 4.3.1 FRORL and its Non-monotonic Inheritance Mechanism 4.3.2 Transformation of FRORL into Horn-clause Logic-based Specification 4.3.3 Algorithms for Non-Monotonic Inheritance Handling . . v
21 22 23 23 25 27 27 28 29 30
vi
Contents
5
Rewriting and Data Dependency, Control Flow Analysis of a Logic-Based Specification 53 5.1 Rewriting of a Logic-Based Specification 54 5.1.1 Equal-Introduction 56 5.1.2 Equal-Substitution 58 5.1.3 Decomposition 60 5.1.4 Simplification 66 5.2 Data Dependency and Control Flow Analysis 74 5.2.1 Matrix Representation of Mode Information 75 5.2.2 Data Flow and Dependency Analysis Algorithm . . . . 78
6
Hybrid A N D - O R Parallelism Implementation 6.1 The Usage of Mode Information in the Parallel Model 6.2 AND-OR Parallel Execution 6.3 Synchronization in OR-Parallel Execution Model 6.4 Calculation of the Currently Executable Predicate Set 6.5 Hybrid Execution Algorithm 6.6 Comparison with the Conventional BFS and DFS 6.7 Advantages of the New Approach 6.8 Analysis of Non-functional Requirements in the New Parallel Execution Model
89 89 90 92 95 98 100 101 105
7
Efficiency Considerations and Experimental Results 107 7.1 Execution Evaluation 107 7.2 Communication Evaluation 107 7.3 Criteria for Simulation and Evaluation 108 7.4 A Simulator for Parallel Logic-based Specification Evaluation . 114 7.5 Experimental Results and Comparison 121 7.5.1 Simulation for Various Values of Error JEtate 122 7.5.2 Simulation for Values of Deep_Jumping_Distance . . . . 124 7.5.3 Simulation for Values of Deep.Jumping_Rate 126 7.5.4 Simulation for Values of Variable_Related_Rate 127 7.5.5 Simulation for Values of Nondeterminism_Rate 128
8
M o d e Information Support for Automatic Transformation Sys^ tem 161 8.1 Architecture of a Logic-based Specification Transformation System 162 8.2 Determination of Control Sequence 162 8.3 Data Type Generation and Procedural Function Formation . . 167 8.3.1 Rules-based Data Type Inference 168 8.3.2 Data Type Propagation Analysis and Generation . . . . 171
Contents
8.4 9
vii 8.3.3 Data Type Analysis in a FRORL Specification 8.3.4 Procedural Program Generation Intelligent Backtracking for Transformation System
172 173 183
Describing Non-Functional Requirements in FRORL 185 9.1 Functional Requirements vs. Non-functional Requirements . . . 185 9.2 Issues in Non-functional Requirements 186 9.3 Non-functional Requirements Modeling in FRORL 190 9.4 Adjusting Non-functional Requirements 196
10 Summary
201
List of Tables 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 7.11 7.12 7.13 7.14 7.15 7.16 7.17 7.18 7.19 7.20 7.21 7.22 7.23 7.24 7.25 7.26 7.27 7.28 7.29 7.30
TEST RESULTS FOR ERROR-RATE = 0.0 (a) 130 TEST RESULTS FOR ERROR-RATE = 0.0 (b) 131 TEST RESULTS FOR ERROR-RATE = 0.50 (a) 132 TEST RESULTS FOR ERROR-RATE = 0.50 (b) 133 TEST RESULTS FOR ERROR-RATE = 0.99 (a) 134 TEST RESULTS FOR ERROR-RATE = 0.99 (b) 135 TEST RESULTS FOR SMALL DISTANCE RANGE (a) . . . 136 TEST RESULTS FOR SMALL DISTANCE RANGE (b) . . . 137 TEST RESULTS FOR MEDIUM DISTANCE RANGE (a) . . 138 TEST RESULTS FOR MEDIUM DISTANCE RANGE (b) . . 139 TEST RESULTS FOR LARGE DISTANCE RANGE (a) . . . 140 TEST RESULTS FOR LARGE DISTANCE RANGE (b) . . . 141 TEST RESULTS FOR DEEP-JUMPING-RATE = 0.1 (a) . . 142 TEST RESULTS FOR DEEP-JUMPING-RATE = 0.1 (b) . . 143 TEST RESULTS FOR DEEP.JUMPING-RATE = 0.5 (a) . . 144 TEST RESULTS FOR DEEP-JUMPING-RATE = 0.5 (b) . . 145 TEST RESULTS FOR DEEP-JUMPING-RATE = 0.9 (a) . . 146 TEST RESULTS FOR DEEP_JUMPING-RATE = 0.9 (b) . . 147 TEST RESULTS FOR VARIABLE-RELATED .RATE = 0.0 (a) 148 TEST RESULTS FOR VARIABLE-RELATED .RATE = 0.0 (b) 149 TEST RESULTS FOR VARIABLE-RELATED-RATE = 0.5 (a) 150 TEST RESULTS FOR VARIABLE-RELATED-RATE = 0.5 (b) 151 TEST RESULTS FOR VARIABLE-RELATED-RATE = 0.9 (a) 152 TEST RESULTS FOR VARIABLE-RELATED-RATE = 0.9 (b) 153 TEST RESULTS FOR NON J)ETERMINISMJtATE = 1 (a) . 154 TEST RESULTS FOR NONJDETERMINISM.RATE = 1 (b) . 155 TEST RESULTS FOR NON .DETERMINISM-RATE = 2 (a) . 156 TEST RESULTS FOR NON-DETERMINISM-RATE = 2 (b) . 157 TEST RESULTS FOR NON-DETERMINISM-RATE = 3 (a) . 158 TEST RESULTS FOR NON J)ETERMINISM_RATE = 3 (b) . 159
IX
List of Figures 3.1
An overview of the system
16
4.1 4.2 4.3 4.4 4.5
An example decomposition process The flow chart of "NMJmageGenerator_for_Object_Frame" . . The flow chart of "NMJmageGenerator_for_Activity_Frame" . . The flow chart of the algorithm "inherit_from_parents" Graphical explanation of the procedure "inherit_from_parents"
28 35 36 37 38
5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8
An illustration of data transformation by a clause head literal . The partial result matrix for clause 2 The final result matrix for clause 4 The final result matrix for clause 2 Initial attempt to construct a matrix for clause 6 The final result matrix for clause 7 The final result matrix for clause 3 The final result matrix for clause 5
77 84 85 85 86 87 87 88
6.1 6.2
6.5 6.6
An illustration of the new parallel execution model 91 A comparison of execution sequence between current and new models 96 A simple comparison between current and the new parallel evaluation model 104 An example illustrating reduced duration of communication channel 104 An illustration of early-shortcut 105 Parallel evaluation and its reference to non-functional specification 106
7.1
An example evaluation tree
113
9.1 9.2
Non-functional requirements for security attributes Hierarchical model of the object frames for security
189 193
6.3 6.4
xi
List of Figures
Xll
9.3 9.4 9.5 9.6 9.7
Illustration quirements Illustration Illustration Illustration Illustration
of a FRORL specification with non-functional reattached of rule 37 of rule 38 of rule 39 of rule 40
195 197 198 198 199
Chapter 1
Introduction Software maintenance is a crucial phase for the longevity of any application. However, undiscovered errors in a requirements specification will result in maintenance costs exceeding ten times the original [95, 96]. Errors in the requirement phase can be reduced through validation and verification of the requirements specification. Many logic-based requirements specification languages have been developed to achieve these goals [9, 27, 37, 38, 47, 53, 54, 55, 56, 57, 58, 59, 67, 73, 77, 80, 82, 90, 91, 94]. Through the execution of the logic-based requirements specification, customers and developers can validate their disagreement or misunderstanding in the requirement phase. By reasoning over the logic-based requirements specification, developers can verify desired properties such as consistency, liveness and so on. Developers can also verify whether or not non-functional requirements specification describing the executional behavior of the specification have been satisfied. However, the execution and reasoning of logic-based requirements specification is very slow. An effective way to improve their performance is to execute and reason the logic-based requirements specification in parallel. In order to reach this goal, this book presents a hybrid parallel execution model for Horn-clause logicbased requirements specification languages such as FRORL [93]. This model is built on a static data dependency and mode analysis mechanism . Another application of the data dependency and mode analysis, a transformation system which transforms a logic-based specification into a procedural language program, is also discussed. The logic foundation of FRORL is based on Horn-clause logic augmented with non-monotonic inheritance . This book first discusses the automatic nonmonotonic inheritance handling of a FRORL specification. After the automatic expansion mechanism is applied to a FRORL specification for eliminating nonmonotonic inheritance, the specifications will contain only Horn-clause logic1
2
Introduction
based specification clauses. In order to evaluate the generated specification in parallel such that multiple solutions can be explored at the same time, and the executional behavior can be checked to see if they meet certain requirements, the executional behavior of a Horn-clause logic-based specification has to be studied. A Horn-clause logic-based specification consists of a set of clauses. For each clause, the clause head and body are formed by predicates. The execution of a Horn-clause logic-based specification is based on the resolution principle. A data value is transferred among clauses and within the bodies of clauses based on the unification rule. The execution of a logic-based specification is similar to a depth first search of an AND-OR tree [24, 78, 103], where AND nodes represent the predicates within a clause, and OR nodes represent the clauses unifiable within a calling predicate. In order for an answer to be generated from the specification, a successful search has to be found. A unit clause succeeds if it is a built-in predicate or is included in the input data base. The success of a predicate is defined by the clauses which are unifiable with the predicate, i.e. the clauses with their heads having identical name and being unifiable with the current predicate. If any of the clauses unifiable with the predicate is successful, then the predicate is successful. The success of a clause is defined by the predicates in the clause body. If all the predicates in the clause body succeed, only then the clause itself can be proved successful. Along with the resolution process, data values are transferred within clause body and among clauses by data sharing and unification. The parallelism of a Horn-clause logic-based requirements specification can be extracted from an AND-OR execution model. It follows the top-down searching strategy of the AND-OR tree. The parallel execution of OR-branches is relatively easy to implement because the execution of clauses unifiable with a predicate are independent of each other. The parallel execution of ANDbranches is much more complex, since it may create binding conflicts during execution [33]. An effective and common method to solve this problem is to rely on data flow and dependency analysis, which allows exactly one producer and multiple consumers for each parameter in a clause. Currently, most approaches adopt the method of dynamic mode inference to do data flow analysis [17, 32, 64, 88, 104]. The basic idea of this approach is to rely on the user's query to generate data dependency information, then use this information to spawn processes which can be executed in parallel. The nature of these methods can be described as a demand-driven model which executes in a top-down fashion. Even though these approaches can extract AND-parallelism during the execution, they have a common drawback; the mode information can only be obtained during the actual execution, and this dynamic determination of parameters' mode and parallel execution combination puts tremendous overhead on the model, which may jeopardize any ad-
Hybrid Parallel Execution Model
3
vantage these parallel execution models might obtain. A few other approaches use data dependency analysis by static mode inference [16, 28, 31], which is based on the analysis of the structure of a logic program to generate the parallelly executable components, but again they suffer the problems of either being unable to generate all the solutions to the input logic program, or they sacrifice the degree of parallelism of the program. For example, in [16], Chang et. al. implement the static data dependency analysis by considering only the worst case clause invoking pattern, which eliminates the possible advantage of generating multiple solutions of a logic program at the same time. As a result, the approach has to explicitly handle the backtracking problem during the parallel execution of the AND-OR tree , which can still slow the system execution. In this book, a new parallel execution model for a Horn-clause logic-based requirements specification based on a novel static data dependency analysis is proposed. This model can preserve maximum parallelism while guaranteeing to generate all the solutions of a logic-based specification without backtracking . The power of this model comes from the fact that the static data dependency is capable of representing all execution sequences and all multiple solutions implied by a logic-based specification [98, 94]. Since the complete mode information is available through mode inference even without actual execution, the new parallel execution model can apply some extra analysis processes during compiling compared to other approaches and obtain more parallel execution information of the underlying logic-based specification. The overall parallel execution behavior of the logic-based specification can be improved. Specifically, the new execution model extracts the necessary mode information from the static data dependency analysis to facilitate the combination of top-down and bottom-up searching of the AND-OR tree of a logic-based specification. In this process, all solutions implied by the logic-based specification can be generated and explored based on the same data dependency information, which may result in the preservation of correctness of the results while eliminating backtracking from consideration. Regarding the top-down execution direction of this hybrid model, it adopts the method similar to the current approaches, i.e., a predicate is executable only if all of its input parameters have received values from some other predicates within the same clause. This execution direction mainly reflects the interrelation among the predicates within a clause. On the other hand, the new execution along the bottom-up direction relies on the detection of ground variables from the static data dependency information to facilitate the evaluation of those predicates with all of its required value being available. This direction mainly deals with the possible adjustment of execution sequence among clauses of a logic-based specification. By adopting this strategy, overhead of the parallel execution of a logic-based specification can be reduced. In this
4
Introduction
book, other improvements related to this new parallel evaluation model are also discussed. Among them are reduced duration of processes, and reduced communication cost, reduced number of total nodes need to be searched in the AND-OR tree , as well as simplified OR-parallel synchronization data structure. It is possible that under certain conditions parallel execution is actually worse than sequential execution because of the extra management overhead. Using a simulator all different combinations of the properties which control the structure of the AND-OR search tree can be explored. By adjusting these control parameters, AND-OR trees with different structures are applied by the sequential, currently available parallel, and the new parallel execution model. Significant improvement has been achieved under most of the situations. The execution behavior of the specification explored by the new parallel execution model can also be used to verify whether or not the non-functional requirements attached to the functional requirements written in FRORL has been satisfied. The early discovery of non-match information can be fed back to the specification phase to adjust the non-functional requirements accordingly. Another application of the static mode information is to apply it to facilitate the automatic transformation of a FRORL logic-based specification into a procedural language program. This automatic transformation system can be used to implement new software development paradigms like rapid prototyping , transformational implementation, and operational specification. User can describe a problem using formal specification language FRORL, the equivalence preserving transformation system can then be applied to transform the specification into an executable model which is written in procedural language program. The behavior of the model can then be shown to the user. The transformation system has to make use of the mode information to find the multiple execution paths within a logic-based specification. The book is organized as follows. Chapter 2 summarizes current approaches in realizing the automatic data flow analysis and parallelism of a logic-based specification. Chapter 3 provides an overview of the proposed new system and how it handles major problems in the parallelizing process. Chapter 4 introduces requirements specification language FRORL . Chapter 5 discusses the static data dependency analysis method and the mode information it generates. Chapter 6 explains in detail the new AND-OR parallelism based on the static data dependency information. Chapter 7 discusses a simulator for the new parallel execution model and compares the experimental results it generates under various conditions with current approaches. The reasons why improvements can be achieved are discussed along with each comparison tables. Chapter 8 discusses an application of using mode information to realize an automatic transformation system. Chapter 9 gives another application of how to use FRORL to describe non-functional requirements specification. We conclude the book in Chapter 10 with a brief summary.
Hybrid Paxallel Execution Model
5
The research presented in the book would have been impossible without the assistance and contribution of our graduate students involved in the development of the FRORL software development environment. T. Weigert contributed to the development of FRORL language discussed in Chapter 4. R. Sheu and B. Li implemented the data dependency and data flow analysis techniques presented in Chapter 5. B. Li developed the simulator for the new parallel execution model of Chapter 6 and conducted experiments described in Chapter 7. A. Liu contributed the application of how to use FRORL to describe non-functional requirements specification presented in Chapter 9. M. Lu helped the editing of the book. The work described in the book was supported, in part, by Fujitsu America, Inc.; by the United States National Science Foundation and the U.S. Defense Advanced Research Projects Agency under Grant No. CCR-9633536; and by the U.S. Air Force Rome Laboratory under Grant No. F30602-95-1-0035.
Chapter 2
Current Approaches Many problems exist in realizing parallelism in logic programs. However, various approaches exist in dealing with the three main problems: 1) data dependency analysis; 2) realization of AND-OR parallelism; 3) reducing the effect of backtracking in parallel execution.
2.1
Data Dependency Analysis
Traditionally, data flow and data dependency analysis have been investigated within the framework of logic programming. One approach to deal with the multi-directionality of a logic program has been to provide mode information for each clause (e.g., Parlog, Concurrent Prolog). Bruynooghe [10] and Smolka [87] have investigated the issue of verifying the consistency of mode declarations for arguments to predicates, as supplied by programmers. Automatic mode inference has been discussed by Mellish [70] and Reddy [80], and more recently by Bruynooghe [11] and Mannila and Ukkonen [68]. The most promising approach to mode inference so far has been presented by Debray in [29, 30]. Debray uses an interpreter to reason upon the four-valued lattice of modes don't know, empty, free, and ground. Its purpose is to enable a compiler to optimize a logic program. Mannila and Ukkonen [68] limit themselves to the simple modes ground and unground. Reddy relies on syntactic analysis to assign modes to predicates .
2.2
OR-Parallelism
If an OR-branch has more than one clause which can be unified with a predicate (a goal), then these clauses can be executed concurrently. The successful evaluation of any one of these clauses indicates the successful evaluation of the 7
8
Current Approaches
invoking predicate. This kind of parallelism is called OR-parallelism . Because the execution of unifiable clauses in this case are independent of each other, the implementation of OR-parallelism is relatively simple. The main problem in handling the OR-parallelism is how to represent the different bindings of the same variable generated from different branches. The following two classes of approach exist in this area, according to our observations. Solution OR-parallelism: Almost all systems and approaches realizing ORparallelism fall into this category. The major problem which needs to be handled is the synchronization of the multiple solutions generated from different clauses [14, 17, 42, 64]. David H.D. Warren provided a concise summary of the current approaches available within this category [102]. These methods developed various synchronization and ordering algorithms to identify and control the solutions generated from the different branches of the clause. The final results of the OR predicate is basically a join of all the possible results generated from unifiable ORclauses. In order to insure the proper join and facilitate the further cross production at the AND-nodes one level higher, various data structures and identifications have been presented. These identifications attached to a result have been presented which determine the bindings generated by different clauses. These data structures basically have to form a hierarchical structure to store the value generated at each level of the AND-OR search tree. The major data structures used under this context can be categorized as shown below: • Binding array [13, 102]; • Synchronization timestamp [41, 92]; • Version vectors [42]; The data structures listed above have the following common characteristics, which lead to inefficient implementations: • Binding data structures defined at each OR-branch point exist when the OR-parallel point is first entered and continue to exist throughout the parallel execution process until all the branches of the ORparallel point have finished execution; • Binding data structures are dynamically modified along with the execution at each branch of the OR-parallel point. This modification may come from the deeper-level predicates in the OR-parallel branches when that branch finishes execution and generates the corresponding binding.
Hybrid Parallel Execution Model
9
The first point forces a system to bear a large overhead by maintaining many complex synchronization data structures. To make things worse, some of the data structures may not be necessary for the current ORparallel execution if some processes are not interested in the variables at the current OR-branch point [102]. The second point makes the control of the synchronization data structures complex. Specifically, the cost for creating and accessing variable bindings becomes prohibitively high. In order to cope with this problem, some of the current approaches consider an AND-OR search tree as a hierarchical structure and use the inheritance mechanism to reduce the complexity of maintaining the synchronization data structures [41, 92]. Since all possible OR-parallel branches are explored simultaneously, if all the solutions are guaranteed to be found when the OR-branches are explored, there is virtually no need for backtracking. The merging of ORparallel processes has to cooperate with the spawning of AND-parallel processes. For example, in [64] the results are gathered at OR-branches by taking a join, and only selected information is passed to the above AND-branches to reduce the complexity of control at the AND-parallel nodes. OR-parallelism combined with backtracking: This model [103] adopts the lazy approach in spawning the processes. Only when processor is idle and asks for a process, will another branch at an OR-parallel point be generated and executed. If a failure occurs at the current branch, then the remaining untouched branches of the OR-parallel point are searched. So this approach is a combination of the traditional sequential and the advanced parallel execution models. The incompleteness in the parallelism which originates from the inadequate number of processes at evaluation time requires the approaches under this category to retain the backtracking mechanism. This kind of approach can be considered as a special case of the previous approaches with a special focus on the ORparallelism and a limitation on the number of processes which can exist at the same time. So the implementation has to consider those controls which usually occur in the sequential realization, like task switching and backtracking . This is a mixture of the OR-parallelism and the general sequential evaluation of a logic program. When the number of processes existing at any time is limited to one, then the execution is sequential.
2.3
AND-Parallelism
With AND-parallelism , all predicates within a clause body must be true for a clause to be successful. The exploration of the predicates within a clause
10
Current Approaches
body can be carried out concurrently. Handling the binding conflict [33] is the major task in any implementation, and a number of approaches exist. Programmer's annotation: Programmer's annotation is the simplest method to handle AND-parallelism . It requires the users to explore and define explicitly the parallel component of a logic program [25, 83]. Even though the control is relatively simple, human involvement is definitely not our intention. Independent AND-parallelism: In this kind of parallelism, there is no variable dependency between AND-parallel goals. Predicates which share variables must have these variables bound to ground upon entering the clause [43, 46,102]. This approach restricts the AND-parallelism to a relatively simple case similar to OR-parallelism. A comparison of different independent AND-parallel models can be found in [40]. Data flow analysis for dependent AND-parallelism: One ofthe most pop ular methods is to assign only one producer for each parameter and allow multiple consumers ofthe same parameter in the program [16, 17, 28, 64]. Data-flow analysis is used to determine the producer and consumers. With a single producer, the dependency of variables between predicates can be monitored. The spawning of processes is based on this data flow information. The data flow analysis can be done either dynamically or statically, however dynamic analysis is the dominant method at the current time. Dynamic analysis generates binding information from the user's input at each step of unification. This variable instantiation changes during execution and is adjusted dynamically. Static analysis [16] generates binding information which is independent of the user's input, so that the overhead of parallelization can be reduced. But currently available static analysis approaches impose various limitations or restrictions on the results, because they either limit the number of solutions generated or the degree of parallelism. Other approaches impose similar sequential restrictions to a logic program to achieve dependent ANDparallelism [84]. Also, the accuracy of the statically generated mode information is worse than dynamically generated ones.
2.4
Backtracking
Handling backtracking is not critical in AND-parallelism and OR-parallelism, because it is possible to totally eliminate backtracking from the parallel execution model [17, 64] under the following conditions.
Hybrid Parallel Execution Model
11
• AND-OR parallelism is supported so that all branches at all choice-points can be searched simultaneously. • All solutions implied by the logic program can be expressed and generated from the parallel execution model. Even though some of the interesting developments in various approaches for handling backtracking are not necessarily under the context of parallel logic program model, they are still worth mentioning. There still exist a lot of approaches in the logic program parallelization community which explicitly handle backtracking because of the insufficient power to guarantee the above two criteria. Basically, there are two different kinds of backtracking mechanisms, shallow backtracking and deep backtracking . Shallow backtracking backtracks to alternative candidate clauses when the head of the current candidate clause cannot be proved a success by the unification with the calling literal. Deep backtracking backtracks to any previously invoked procedure which has a choice-point. This is also called intelligent backtracking . Intelligent backtracking: Conventionally, whenever a failure occurs during a search, backtracking is done to the nearest choice-point to explore other branches. In intelligent backtracking , instead of backtracking to the nearest choice-point, backtracking goes to the choice-point where the binding of the variable which causes the failure can be bounded to another variable [60]. Research in intelligent backtracking which handles other related aspects, like cut, tail recursion, etc., also exists. In [23], the pruning of the paths which could be explored several times by the conventional backtracking scheme is studied. Improving backtracking by using data dependency information was first proposed in [24]. Later on various researches considered intelligent backtracking handling in the AND-parallelism of the AND-OR parallel execution model of a logic program [8, 22, 34, 105]. Semi-Intelligent backtracking: Semi-intelligent backtracking uses static data dependency analysis for the AND-parallelism . Chang and Despain [15] use this technique to find the optimal choice-point for intelligent backtracking . Since the mode information is not sufficient to support the multiple solutions exploration, backtracking is necessary in their model. The binding status considered is only the worst case binding combination, which may involve inaccurate binding information. As a result, their approach can only achieve semi-intelligent backtracking . Elimination of backtracking: The AND-OR parallel execution model can completely eliminate backtracking from the execution of a logic program [17, 64]. Few results have been produced because most current
12
Current Approaches approaches are unable to find all the solutions implied by the underlying logic program. Another problem is that in order to eliminate backtracking, a lot of nodes have to be searched simultaneously. So the total number of processes exist at any time is much larger compared to backtracking models. In practice, the management of these processes and communication cost among them may offset any advantages achieved by eliminating backtracking . So even though many approaches declare that they have excluded backtracking from their model, nobody so far has addressed the problem of how to manage the vast processes generated during the parallel evaluation process. Backtracking elimination only stays at a theoretical level. Instead, most of the current approaches realize parallelism by adopting some degree of backtracking .
Chapter 3
Overview of t h e New Approach In the previous chapter we explored the current approaches for dealing with data dependency analysis, AND-parallelism , OR-parallelism , and backtracking of a logic program. In this chapter, we will outline a new approach and discuss how it handles the problems mentioned in Chapter 2. Figure 3.1 shows the major components of the model and their interrelations. The system starts from a FRORL logic-based requirements specification with non-monotonic inheritance mechanism. There are two different kinds of frames within a FRORL specification, namely, object frame and activity frame. Object frames describe entities within the problem domain. Activity frames describe actions taking place within the problem domain. The system then applies the inheritance expansion algorithms to eliminate the non-monotonic inheritance within a FRORL specification [98]. There are three algorithms to implement the expansion of the object and activity frames in a FRORL specification. After eliminating inheritance mechanism from a FRORL specification, all that remains is a pure Horn-clause logic-based requirements specification. In order to facilitate static data flow and dependency analysis, rewriting rules is applied to the Horn-clause logic-based specification. There are four kinds of operations for rewriting. The rewritten logic-based specification is applied by the static data flow and dependency analysis. The mode generated here is used to support both a hybrid AND-OR parallel execution model and an automatic transformation system, which transforms a logic-based specification into a procedural language program. The executional behaviors demonstrated during AND-OR parallel execution can be used to compare with the functional and non-functional requirements described in FRORL to see if any mismatches 13
14
Overview of the New Approach
exist. Basically, our approach is a combination of the following four techniques. A comparison of the new model with the best first search and depth first search will be discussed in a later chapter. Automatic non-monotonic inheritance expansion: This is the starting point of the whole execution model. The object and activity inheritance hierarchies in a FRORL specification are expanded so that each frame is represented by its ultimate frame image. The expanded logic-based specification has a format similar to a Horn-clause logic-based specification. Static data flow analysis: In this step the multiple execution sequences in a logic-based specification are found. These results are consulted during parallel execution to determine those predicates which can be evaluated in parallel. In order to facilitate the data flow analysis, non-monotonic expansion is followed by rewriting rules, which include equal-introduction , equal-substitution , decomposition and simplification . The logic-based specification is thus converted into a form suitable for the application of data flow and dependency analysis. Note that these rewriting operations often need to be repeated considerably. In the process of transformation, new conclusion of the analysis may cause the operations to become applicable, thus the rewriting process stops only when the specification reaches a stable state. Based on the analysis of data dependencies and flow analysis, mode status is inferred. Automatic transformation: The transformation of FRORL into a procedural language program discussed in this book relies on an integration of the analytical approach for transforming the nondeterminism of logic-based specifications and the knowledge-based approach for generating target code. This helps to avoid some of the complexity pitfalls like deductive program synthesis. The main problem in the transformation system is the efficient handling of backtracking . The approach presented here tries to adopt the accurate mode information to support the intelligent backtracking . Hybrid A N D - O R parallel execution model: This AND-OR parallel model uses mode information gathered statically and the user's input to explore all the predicates which can be executed in parallel. These predicates may come from different levels of the corresponding AND-OR tree. The parallelism of the predicates determined by the static data dependency information will be carried out in a bottom-up fashion. If there is no predicate which can be executed in this way, then the general top-down execution model will be used.
Hybrid Parallel Execution Model
15
Simplified OR-parallel synchronization model: The synchronization data structure at OR-parallel nodes can be realized with reduced complexity compared to the current approaches. According to the new AND-OR execution model depicted above, the duration of the synchronization data structure can also be reduced. Backtracking elimination: Since our approach can generate all the solutions based on the static mode information, it can eliminate backtracking from consideration by implementing the new AND- and OR-parallelism . The cost to manage parallelism is also reduced. In fact, by reducing the number of processes generated and the duration of these processes, our research is the first to consider what the management cost of the total parallelism will be. We also find a way to reduce the complexity so that the elimination of backtracking actually becomes applicable.
3.1
Non-monotonic Inheritance Expansion
This is the first transformation step in the system. The purpose of this step is to transform a FRORL specification with inheritance mechanism into a Hornclause logic-based specification without containing inheritance structures. This step is necessary since inheritance structures cannot be handled by the transformation steps, and almost all of the conventional procedural programming languages do not support inheritance. The transformation of a FRORL specification into a Horn-clause logic-based specification can be divided into two different approaches based upon different ways to handle non-monotonic inheritance. The first approach is to expand the entire frame into its ultimate set thereby eliminating the inheritance mechanism used in FRORL. This approach will transform a FRORL specification into a pure Horn-clause logicbased specification without inheritance. The second approach is to preserve the non-monotonic inheritance mechanism of FRORL during the transformation process. This approach will require a meta interpreter along with the specification. Every time the specification is called, this meta interpreter will be invoked to generate the corresponding specification without inheritance. Currently the first approach is used in our system and a detailed comparison of these two approaches can be found in later.
3.2
Static Data Dependency Analysis
The system relies on a new data dependency analysis algorithm, which can find out all the possible mode combinations within a logic clause, to implement the
Overview of the New Approach
16
(_ FRORL Logic-based Requirements Specification
J
Non-monotonic Inheritance Expansion [_ Horn-clause Logic-based Requirements Specification
j
V Rewriting Equal-Introduction
Equal-Substitution
Decomposition
Simplification
'' Rewrited Logic-based Specification
\' Static Data Flow and Dependency Analysis
' [ D a t a Flow and Dependency M a t r i x )
Execution Control Sequence Determination
User's Query
Hybrid AND-OR Parallel Execution Model
Adjusted Rewrited Logic-based Specification
Intelligent Backtracking
Figure 3.1: An overview of the system
new parallel execution of logic-based specification. A comparison with the other approaches realizing data dependency analysis [98, 97] shows that our approach has the following advantages: (1) four-valued lattice to represent the mode value in a data dependency matrix; (2) multiple mode preservation in the resulting data dependency matrix. These two points, especially the second one, are very important for the new parallel execution model. It is the basis for eliminating backtracking from the model. This static mode information will also be used in the hybrid AND-parallel model to find the predicates at different
Hybrid Parallel Execution Model
17
levels of the AND-OR tree which can be executed simultaneously. The static analysis algorithm is followed by the AND-OR parallelism generation process.
3.3
Automatic Transformation
The automatic transformation is supported by mode information generated above [98, 99, 97]. This mode information is essential to make adjustments to the execution sequence of a logic-based specification so that the resulting execution sequence can be expressed in a sequential, procedural language. Next, the functional procedures are generated for each clause in the specification, and the backtracking control mechanism behind a logic-based specification is implemented by a revised WAM. This WAM is included in a run-time system which will select the actual procedures and execute them based on the decision made by the WAM. As another effect, this run-time system is also responsible to resolve the multiple executional directions of clauses in the logic-based specification. Mode information can be used to achieve the effect of intelligentbacktracking.
3.4
Hybrid AND-OR Parallel Execution
In order to handle AND-OR parallelism , a bottom-up combined with topdown scheme is used. Current approaches always work from the top of an AND-OR tree and search all the way down it. Instead, my approach can find all the mode combination before the user's query is available, it is possible that while the execution goes on, some information about the operation which will be executed several steps further may be available in advance. If we carry out those operations first, then the efficiency may be improved. The approach uses the result generated from the static data flow analysis, combined with the user's input, to find out the portion of the AND-OR tree of which the execution behavior is deterministic under the current variable binding. This portion of the AND-OR tree is executed in a bottom-up fashion. The checking process is applied whenever a new unification occurs which changes the binding of a variable. The algorithm is based on the generation of grounded variables as the start point for the checking of statically executable portions. The dynamic checking will only focus on the detection of the newly generated grounded variables. Comparing with other approaches, which rely totally on the user's input query to initiate the analysis, the execution of theirs thus are based on the dynamic determination of data flow information. So the analysis is dynamic and the parallel processing executes in a top-down manner, which inevitably puts tremendous overhead on the system.
18
Overview of the New Approach
In our model, instead of evaluating dynamically, all the possible variable binding and execution information is included in the statically generated data dependency matrix and the execution sequence is adjusted by this matrix. A very important property of data dependency matrix is that it has all the possible execution sequences based on the multiple data dependency information. The only role a user's query plays is to initiate the analysis. At each step of the execution, especially whenever a new binding of a variable is generated, this information is used to restrict the multiple data dependency choices existing in the data dependency matrix and the possible execution sequences they imply. This restriction may result in only one possibility for the execution of a clause. The execution of this clause, along with the deterministic portion of other clauses, is carried out in the next step. So large part of the analysis of our approach is done statically instead of dynamically. The execution proceeds like this: we start from the root of an AND-OR tree , which corresponds to the user's query, then at each step of the unification, which is the case where a predicate is unifying with the heads of unifiable clauses, a variable binding is generated which will restrict the meaning and execution behavior of the clause. The system locates the branch of the AND-OR tree, which may not be the predicates from a single clause based on the multiple binding, so that the execution behavior of this branch is restricted to a single choice, and then execute this branch in a bottom-up fashion. All the remaining branches which can not be restricted to a single choice will be executed in the general top-down way. This is where multiple solutions of a query arise. Even in this case, the static data dependency matrix can still help to improve performance. Since all the solutions have been represented in the data dependency matrix, the place where this multiple choices occurs is also decidable from the same matrix. So the generation of multiple choices can be predicted by the parallel execution model.
3.5
Simplified OR-Parallel Model
The problem of preserving the multiple choices generated from the OR-branches can be handled with the help of the same data dependency matrix. As mentioned above, the major problem of OR-parallelism is to synchronize the multiple bindings of the same variable from different branches. The only thing we need to do here is to build a simple synchronization mechanism above the data dependency matrix, considering that all the binding information is supported by the matrix. Based on the analysis of the AND-OR parallelism, there exist two execution models, top-down and bottom-up . The synchronization of OR-parallelism is also closely related to these two models. During the execution, if we find out
Hybrid Parallel Execution Model
19
that the current parallel execution of a logic program is in the top-down fashion similar to the current approach, then the OR-parallelism synchronization is identical to the current approaches by using any of the currently available data structures and methods. If the execution proceeds in a bottom-up fashion, then the binding data structure is also generated in a bottom-up fashion, which will eliminate data structures for binding purposes to only one level. Comparing with other approaches, which have to keep a binding data structure for all the OR-processes spawned now and before, this is a major improvement. Also the creation of all these binding data structures can be delayed until all the variable bindings have been generated from all the branches of an OR-parallel point. The current approaches have to generate a binding data structure upon the first entry of the OR-branch, and keep changing the binding data structure to reflect the change of binding status of variables. Our approach can reduce the life cycle of the binding data structures until they are actually needed to generate the complete binding combinations from OR-branches.
3.6
Backtracking Elimination
The above new execution approach deals the AND-OR parallelism of a logicbased specification, i.e., it tries a new hybrid executional model to improve the efficiency of the system. Because all the execution possibilities and all possible binding status of variables in a logic program can be found out from the data dependency matrix, it is possible to totally eliminate backtracking from the model. The basic idea is to explore all possible AND- and ORbranches in parallel, thus all possible paths are searched and backtracking can be eliminated from the scheme. The most significant property of our method is that it can preserve the multiple solutions a logic-based specification can generate even before any actual execution.
Chapter 4
F R O R L Requirements Specification Language and Its Decomposition Early artificial intelligence systems relied on first-order predicate logic as a language to represent domain knowledge. Even though this scheme is general and semantically clear, it has been found to be inadequate for organizing large knowledge bases and for encoding complex objects. As a result, various framebased languages have emerged. These languages are designed to support the natural representation of structured objects and taxonomies. They have been proven to be well suited for representing many useful relations. FRORL (Frame-and-Rule Oriented Requirement Specification Language) [93] was developed to facilitate the specification, analysis and development of a software system. FRORL uses frame representation for object-oriented modeling and employs production rules for specifying actions and constraints of the real world. Object-oriented design enhances the naturalness of representation. A frame representation is used for structural description of domain objects, while a production rule is able to specify the behaviors of objects and interactions/relationships among them. A frame, as a node, lies in a cycle-free hierarchical network. The links which connect the frames represent the abstract relations among the frames. The root node represents the most general concept of the family of all its descendants, while nodes in the lower levels of the frames stand for more specific instances of those concepts. The nodes in the lower levels may have some abstract relations to their higher level nodes. The surface syntax of FRORL is based on the concepts of frames and production rules which may bear hierarchical relationships amongst each other, 21
22
FRORL Requirements Specification Language and Its Decomposition
relying on multiple inheritance . However, to provide thorough semantic foundations, FRORL has been based on a non-monotonic variant of Horn-clause logic. Using the machinery of Horn-clause logic, various properties of a FRORL specification can be analyzed. The underlying logic of FRORL is sound and complete. Among the external properties of FRORL are formality , objectorientedness , wide-spectrum of life cycle phases; while the intrinsic properties are modularity , provision for incremental development, inheritance , refinement , re-usability , prototyping , executability , knowledge base environment, theorem proving capability and graphics support.
4.1
Knowledge Representation through ObjectOriented Model
In an object oriented model, object, classification , inheritance , and communication with messages are the components of the model. The object is an encapsulation of attributes or abstraction of something in the problem domain. A class groups homogeneous objects with identical properties as its instances. An object is referenced to by a unique identifier and possibly by one or more keys. A class may have several superclasses from which it inherits properties. It may also have several subclasses . A class definition includes a set of attributes describing the instances of the class. All the information concerning an object is stored in its attributes. Attributes describe the state of objects. The state of an object is captured in the instance variables. The class hierarchy captures the hierarchical relations between a class and its subclass (also, a class and its superclass). All subclasses of a class inherit all properties defined for the class, and can have additional properties local to them. The notion of property inheritance along the hierarchy facilitates top-down design of the database as well as applications. The behavior of an object is captured in messages to which an object responds. The messages define the semantics and the functionalities of the object. Using FRORL to specify a software system is similar to the object-oriented design strategy as described in [21]*. As far as inheritance goes, the three reserved words anJnstance-of, a-kind-of, and a-part-of represent the hierarchical relations of instantiation , generalization , and aggregation , respectively, among frames. Generalization allows a class to capture common characteristics shared by a group of individuals. For example, if X is a generalization of Y then the abstract relation says that Y is a-kind.of X. Instantiation instantiates an individual from a class of entities. For example, if X represents a class of EECS graduate students, and Dean Schulz is an EECS graduate student, then the abstract relation says that Dean Schulz is anJnstance-of X. Aggregation is ' M a n y researchers have also applied object-oriented approach to deductive database to offer structured object and deductive capability. See [50].
Hybrid Parallel Execution Model
23
the abstraction by which an entity is constructed from its constituent entities. For example, if X is composed of W, Y, Z then the abstract relation says that Z is ajpart-of X. The developers are encouraged to apply the proposed three kinds of abstract relations to represent hierarchical inheritance among frames. Thus the requirements specifications will be more concise, and one can get more inferential information via abstract relations .
The system using FRORL is based on the following overall strategy. Given an informal description of the problem domain, we construct a representation of the problem and the domain using frames and production rules. People tend to have little difficulty in constructing object hierarchies using frames. Production rules can be easily understood by domain experts. FRORL also supports the feature of operational interpretation. The production rules serve to guide the execution mechanism in the application domain. The operational interpretation of the FRORL representation can understand and validate the system behavior against the informal user requirements. The machinery of FRORL base logic [56, 57, 59] is available to perform properties checking of the specifications. The prototype specifications are continually tested and modified until the original requirements and constraints are shown to meet the model of the prototype.
4.2
The Modeling Primitives of FRORL
The FRORL modeling primitives are objects and activities. Each object in the model stands for some entity in the world being modeled. Changes taking place in the world and relations between entities of the world are represented as activities in the requirement model. Each object and activity has certain properties, assumptions, or constraints associated with it. The information related to objects and activities is expressed using the frame representation. The structure of the executable specification written in FRORL is modularized into different activities. Because of this modularization, frames are relatively easy to reuse.
4.2.1
Object Frame
In FRORL, the objects are the entities that represent "things" in the world being modeled. An object also can be seen as a data format (a data structure) or a data type in a software system. Each object has its own attributes or properties. The syntax for the object frame is given below:
24
FRORL Requirements Specification Language and Rs Decomposition Object: object^name abstract-relation: parent-name; attribute-namei: valuei; attribute-namen:
valuen.
There are two kinds of slots in each object frame, abstract relation and attribute. The abstract-relation slot represents the inheritance relations between the current frame and its parent frame. The name of an abstract-relation slot can be aJtind-of, a.part-of, or anJnstance-of. The value set of the abstractrelation slot, parent_name, permits multiple object names because FRORL allows multiple inheritance representation. The attribute slot represents the particular properties of an object. Each property has a value set associated with it from which it can draw the value. The value set may contain either a single element or a list of elements. We introduce the following example of object frames to represent their inheritance properties. Object phone-call, which is a general class, has two attributes namely, caller J d and callee Jd, for the main characteristics. Object three-way-call is a subclass of phone_call. The three_way_call means that three parties can be on the same line and talk simultaneously. According to the proposed approach, we can say that three-way-call is a kind of phone-call. One major attribute of three-way-call is 2nd_calleeJd. John_Joe_David-call is an individual of the three.way.call class. According to the definition of abstract relations , three-way_call is the class of John.Joe_David_call. We can say that John_Joe_David_call is an instance of three-way-call. These three hierarchical objects bear inheritance relations among each other. The three-way-call class and JohnJoe_David_call can inherit all attributes presented in the phone^call class naturally. We can depict the special condition or treatment in the object John_Joe_David.call because John_Joe_David-call has general attributes of the three-way-call class. These three objects are as follows.
Object: phone-call callerdd: ; calleedd: . Object: three.way-call a-kind.of: phone.call; 2nd.calledJd:. Object: John-Joe-David-call anJnstance.of: three^way.call;
Hybrid Parallel Execution Model
25
callerJd: 226-7083; calleeJd: 996-7251; 2nd.calleeJd: 413-5352. E x a m p l e 1 This example is adopted from the Local Switching System General Requirement (LSSGR) protocol [93] which is written in FRORL. The LSSGR protocol is proposed for automatic completion of reverting telephone calls. Here we present only two object with their meanings.
object: reverting-call callerJd:; callee-id:. object: two-party-call a-kind-of: reverting-call; party-line-type: two-party. The first frame defines a reverting call to be a call in which a party line customer dials the number of another party on the same line. The second frame shows that reverting call allows a two-party line customer to complete a call to other party sharing the same line without the assistance of an operator (the second frame). Based on the functionality of the two frames, the second one should be a child of the first one, thus the hierarchy.
4.2.2
Activity Frame
The changes that take place in the actual world are referred to as activities in FRORL. Each activity is represented as a frame. The activity frame has four slots, namely: parts, precond (precondition), actions, and alt_actions (alternate actions). The structure of the activity frame is as given below.
Activity: activity-name(attr-valuei,..,attr-valuen) parts: attr-valuei: valuei; attr-valuen: valuen; precond: boolean combinations of activities or facts; actions: sequence of activities or facts; alt-actions: sequence of activities or facts.
26
FRORL Requirements Specification Language and Its Decomposition
The parts slot contains the objects or the attributes of objects participating in an activity. The precond slot is the constraints associated with an activity. If the contents of the precond slot are satisfied, then the actions slot will be executed. The contents of the actions slot could be either the other activities or some facts* about the world. The alt_actions slot contains the sequence of activities to be performed or the assertions to be made in case the precond slot is not satisfied. Activities are initiated with values as arguments or they may be passed as uninstantiated variables which will receive values as a result of performing the activity. Variables may be instantiated to lists. Here we use an example to illustrate activity frame representation. Suppose in a telephone system, a caller wants to dial a number. It is required that the dial tone is available before dialing a number, otherwise the switching system will give an alarm (a reorder tone) and the caller has to redial the number. In this system, we can create an activity called dial_a_number(Phone_number). In this activity, the "parts" slot is composed of the phone number to be dialed. If a dial tone (sound(diaLtone)) is available then it proceeds to complete the call (complete_the_call(Phone_number)). Otherwise, the reorder tone is on (reorder.tone()) and the caller has to re-dial the number (diaLagain(Phone-number)).
Activity: dial-a-number(Phone.number) parts: Phone-number; precond: sound (dial-tone); actions: complete-the-call(Phone-number); alt-actions: reorder.toneQ; dial-again (Phone-number). E x a m p l e 2 The two activity frames in LSSGR are discussed in the following along with their meaning.
activity:
complete(X) parts: X: reverting-call; actions: initiate.busy-tone(X); ringing (X).
activity:
initiate.busy-tone(X) parts: X: reverting-call;
tFact is some assertion about the world, it could be either asserted objects or activities which do not have any precondition, action.
Hybrid Parallel Execution Model actions:
27
apply.busy-tone(X.callerJd); wait-for-disconnect(X).
The first activity frame means that the system should attempt to complete reverting calls automatically by initiating a busy tone to the calling party and then ringing the two parties. A busy tone should be applied to the line as soon as the call is recognized as a reverting call. The second activity frame indicates that communication line should be monitored for disconnect and the tone should be removed on disconnect.
4.2.3
Reserved Words
Reserved words contain keywords and built-in operations in the FRORL system. Some of the keywords in FRORL are object, activity, parts, precond, actions, alt-actions, and so on. Built-in operations, such as findstate(X), get-word(X), and increase(X,Y,Z), are predefined to save time for the developers. Common primitive functions can be defined as built-in operations and embedded in the system for direct access. There are two kinds of built-in operations: domain-independent operations and domain-specific operations. The domain-independent operations are basic operations in general applications; the contents of these operations will not be changed in any application. The domain-specific operations are just the converse. The developers should avoid redefining reserved words as having other meanings. To allow for the executability of the specification, we have provided a number of frequently used operations in five categories: state operations, conditional operations, arithmetic operations, list operations, and input/output operations. They are organized in the knowledge base . Domain-specific operations are primitive operations in some application domain. For example, the operations send(Command) and receive(Command) are basic operations in a fire control system. If the system can provide these particular operations, the developers only need to define higher-level activities. Even though the definition of these particular operations is highly recommended, it is not a decisive part to form a correct FRORL specification. Experience shows that these primitive operations can alleviate the burden from user to form a complete specification.
4.3
Decomposition of a FRORL Requirements Specification
This is the first transformation step in the system. The purpose of this step is to transform a FRORL specification with inheritance into a Horn-clause logicbased specification without inheritance structures, since inheritance cannot be
28
FRORL Requirements Specification Language and Its Decomposition
handled by the future analysis steps. A simple decomposition of a FRORL specification is given in Figure 4.1. After discussing the non-monotonic inheritance mechanism in FRORL, three algorithms to achieve this kind of transformation will be presented. In the end, a LSSGR protocol example [93] and its transformation process will be illustrated. Image(A)={
Figure 4.1: An example decomposition process
4.3.1
FRORL and its Non-monotonic Inheritance Mechanism
As mentioned before, one of the important properties in FRORL is the multiple inheritance relations among hierarchical frames. Property inheritance can reduce a redundant depiction and allows one to concentrate on the differences between parent and child frames rather than on what must be described at each level of specification representation. In FRORL, we use three reserved words An-Instance-Of, A-Kind.Of, and A-Part-Of to represent the hierarchical relations of instantiation , generalization , and aggregation , respectively, among frames. Instantiation allows an instance to capture common characteristics of its superclass shared by a group of entities in a problem domain. Generalization allows a superclass to capture the common characteristics of several subclasses. Aggregation is the abstraction by which an entity is declared to be a component of another entity. If child entities and parent entities have the relations of instantiation {AnJnstance-Of) or generalization (A-Kind-Of), the children will naturally inherit the full properties from the parent unless the children have exception declarations. An instance of aggregation (A-Part-Of) abstract relations means that a frame declares itself as a component part of
Hybrid Parallel Execution Model
29
its parent (whole) frame. The child entity only inherits the partial properties from the parent entity.
4.3.2
Transformation of FRORL into Horn-clause Logicbased Specification
The transformation of FRORL into ultimately a procedural language program is our goal. To realize it, careful study of the logic semantics and its nonmonotonic inheritance mechanism is necessary. As a logic-based specification language, FRORL embodies non-determinism concerning the selection of next clause in the execution sequence. In addition, its non-monotonic inheritance mechanism makes it very difficult to directly apply other transformation processes, such as data flow analysis, execution sequence determination and so on, to a FRORL specification. Consequently, we have to find an intermediate form so that (1) it is relatively convenient to apply data flow and dependency analysis, and (2) it can provide a way to handle the FRORL's non-monotonic inheritance feature. In this system, Horn-clause logic-based specification is selected to play this role since the logic foundation of FRORL is built on Horn-clause logic extended with non-monotonic inheritance. The transformation of a FRORL specification into a Horn-clause logic-based specification can be divided into two different approaches based upon the characteristics of nonmonotonic inheritance. The first approach is to expand the entire frame into its ultimate set and therefore eliminate the inheritance mechanism used in FRORL. The second approach is to preserve the non-monotonic inheritance mechanism of FRORL during the transformation process. At first glance, it seems that the first approach achieves the transformation much more easily. However, it has a serious drawback when a FRORL specification contains large and deep frame hierarchies. The expansion of frames can grow at an exponential rate. This problem is further complicated by having multiple inheritance among the frames. Maintenance of the expanded clauses presents another problem. Whenever there is a change to a parent frame, we must regenerate the frames of its children and descendants' frames in order to maintain consistency . In order to effect the second method, it will be natural to keep the inheritance mechanism in the transformed specification using some kind of inheritance meta-interpreters. However, this approach has its own problem: the efficiency of execution during query of goals suffers since the specification interpreter will have to perform the inheritance inference every time. This problem will make the slow execution of a Horn-clause logic-based specification interpreter even slower. It is very clear that the choice between these two approaches is the trade-off between computing time and storage space. We choose to expand the entire frame into a Horn-clause logic-based specification.
30
FRORL Requirements Specification Language and Rs Decomposition
4.3.3
Algorithms for Non-Monotonic Inheritance Handling
In this section we will discuss in detail the transformation of a FRORL specification into a Horn-clause logic-based specification. This will require special procedures to generate the expanded image of a given frame in an inheritance structure . In the following, we will focus on three algorithms for handling multiple inheritance . Before the algorithms for image expansion are presented, it is necessary to introduce some definitions in order to facilitate the expansion. Definition 1 An inheritance structure of FRORL is defined as the 3-tuple G = (F, Map, L); where F is the set of frame nodes, Map is a function from a pair of frames in F to an inheritance link, and L is the set of possible inheritance links as defined in the following. F: Name x Argument x {<slot> : <slotjvalue> }" Map: F x F —> L is a mapping function from a pair of frames to a link. L: {Aio, Ako, Apo} is a finite set of inheritance links, in which Aio, Ako, Apo represent An_instance_of, A_kind_of, A_part_of respectively. Definition 2 In an inheritance structure G, a PList(Inh, F) is a parent list, i.e. the set of frames which correspond to the slot values of the inheritance slot Inh of the frames, F; where Inh 6 {Aio, Ako, Apo}; In an inheritance structure G, a CList(Inh, F) is the set of child frames of a given frame, F; where Inh € {Aio, Ako, Apo}. The above definition defines the parent frame set and the child frame set. Specifically, the frames in PList set are the ones we need to consider to do expansion. Definition 3 The inheritance distance , ID, for an attribute, A, in a slot Slot, of a given frame, F, is the number of inheritance links required to "trace back" to an ancestor frame with the attribute A included in it. That is, lD(A,Slot,F) lD(A,Slot,F) m(A,Slot,F)
= 0iiAe F = liiAePA =niiAePnA
DirReach(P, Inh, F) IndReach(P n , Inh, F);
Hybrid Parallel Execution Model
31
where Pn is obtained by recursive definitions as follows: IndReach(P\ 7n/i,F) = DirReach ( P , / n / i , F ) ; IndReach(P n , Inh,F) = DirReach(P n , I n ^ P " " 1 ) A IndReach(P n - 1 ,7nft,P), where n > 2.
This attribute is necessary when there is a conflict existing in the multiple inheritance . i.e., if the attributes inherited from different paths contradict each other, then the attribute with smaller ID value will be inherited by the current frame. This value is maintained by increasing the value by one whenever the search goes one level deeper. Definition 4 An ancestor frame, P , is directly reachable , DirReach(P, Inh, F) = True, from a given frame, F, through inheritance links of FRORL. DirReach(P, Inh, F) = True & P 6 PList( Jn/i, F) A Inh € { Aio, Ako, Apo } An ancestor frame, P , is indirectly reachable , IndReach(P, Inh, F) = True, from a given frame, F, through inheritance links of FRORL. IndReach(P, Inh, F) = True & DirReach(P', Inh, F) A DirReach(P, Inh, P') IndReach(P, Inh, F) = True & DirReach(P',Inh,F) A IndReach(P, Inh, P') If an ancestor frame is directly reachable , then it must be in the PList set. As mentioned above, these are the frames we need to consider when doing a frame expansion. Definition 5 An inheritance image , Image(P), of a given frame, F, is a frame with all the attribute slots inherited from its ancestor frames reachable through inheritance links. This image information is exactly what we need to find for each frame in the inheritance hierarchy. Definition 6 A group of frames, G= { < Fi, ...., F„ > } in FRORL is consistent iff there does not exist a pair of < F;, F^ >, where Fj, Fj € G, such that
32
FRORL Requirements Specification Language and Its
Decomposition
IndReach(Fj,Inh,Fj) A IndReach(F,-,/n/i,Fj). This definition ensures that a FRORL specification is acyclic so that the search will never go into an infinite loop. Definition 7 An ancestor frame, P, is reachable, Reach(P, Inh, F) = True, from a given frame, F, through inheritance links of FRORL. Reach(P, Inh, F) = True O DirReach(P, Inh, F) V IndReach(P, Inh, F) This definition helps us understand that only reachable frames from the current frame being expanded can contribute to the attribute set of the frame. Definition 8 The slot attribute membership function, Attribute(F, Slot), for an attribute slot, Slot, of a given frame, F, is "true" iff "Slot" exists syntactically in the frame, F, i.e., slot is a syntactically "visible" attribute slot of frame, F. Attribute^, Slot) = True <s> F.Slot exists. This set includes all the attributes which should exist in the final image of the expanded frame. The attributes in this set can come from two sources. One is the set of attributes explicitly defined in the current frame, another is the set of attributes inherited from the hierarchy. The purpose of forming the final image of a frame is to find these two sets for each frame. Definition 9 The set of non-inherited (direct) attribute slots, DirSlot(F), for a given frame, F, is as follows: DirSlot(F) = {Slot | (V Slot) Attribute^, Slot) = True} The set of inherited (indirect) attribute slots, IndSlot(F), for a given frame, F, is as follows: IndSlot(F) = {Slot | (V Slot) Attribute(P, Slot) = True A (DirReach(P, Inh, F) V IndReach(P,7n/i,P))} The attributes in DirSlot correspond to the first set in the previous definition, while those in InDslot are inherited from the hierarchy. These two sets combine to form the complete definition of a specific frame.
33
Hybrid Parallel Execution Model Definition 10 The function for a set of slot values in predicate form, SlotValue(F, Slot, V), will take any two variables from the set {F, Slot, V} as the bound inputs with the third variable unbound. The result of invoking this predicate function is to "bind" the third variable with a proper value and the bound value could be a unique value or a multiple value set. It can also act as a constraint checking predicate by taking all three values as input: SlotValue(F, Slot, V) = True <S> Slot G IndSlot(F) U DirSlot(F) A Slot.value = V.
This is a very useful function for retrieving frame slot value information when doing expansion. Example 3 Consider the object frames of the simplified FRORL specification for LSSGR protocol introduced in the previous section.
object: reverting-call caller-id:; calleeJd:. object: two-party-call a-kind-of: reverting-call; party-line-type: two-party.
We have the following results PList(afco, two-party -call) = {revertingxall} CL\st{ako,reverting-call) = {two.party_call}
Definition 2 Definition 2
lD(two-party,party Jine-type,two-party-call) = 0 ID(Va,l\iecauer_id,caller Jd,two-party-call) = 1
Definition 3 Definition 3
DivKeach.(revertingjcall,ako, IndReach(revertingjcall,ako,
Definition 4 Definition 4
two-party-call) = True two-party .call) = False
lmage(twojparty -call) = jailer .id :; calleeJd :; partyJineJype
: two-party. }
Definition 5
Ccms\stent(r everting-call, two-party-call)
= True
Definition 6
34
FRORL Requirements Specification Language and Rs Decomposition
Reach(reverting -call, ako,two-party -call) = True
Definition 7
Attribute(two-party-call,partyJineJype) = true Attribute(iwo_part2/_ca/Z, caller-id) = False
Definition 8 Definition 8
T>\xS\ot{two-party .call) = { party line-type } IndSlot {two-party-call) = { caller Id, calleeld }
Definition 9 Definition 9
S[otValue(two-party-call,party line-type,two jparty) — True Definition 10 Slot\&\ue{two-party-call,partylinelype, X) o X — two-party Definition 10
In the following we introduce three algorithms to perform the nonmonotonic image generation. These algorithms deal with the hierarchical structure of a FRORL specification and generate the expanded specification from the original one in a top-down manner. They start from the root of an activity or object frame hierarchy, and generate the expanded image layer by layer, i.e., they will first generate the expanded image for the root frame, and then generate the expanded images of the direct children of the root, and their direct children and so forth. When the three algorithms are arranged in this way, it can be shown that even if the inheritance may involve the retrieval of attributes from more than one level toward the root of the hierarchical structure, the formation of the attribute set for a frame will only need to consider one level above the inheritance hierarchy, instead of possibly going through a large number of levels in the hierarchy. Consider two levels in an inheritance hierarchy of frames, namely levels i and j , i < j . Assume there is an inheritance path from a frame Fi in the level i and another frame Fj in the level j , and there is an attribute Attr^ which is denned in the frame Fj and no other definition of the same attribute exists along the inheritance path from Fi to Fj. According to the formal definition of the inheritance structure , in order to define the attribute Attr^ in the frame Fj, the search must go all the way along the inheritance path back to the frame Fi to find the definition of the same attribute. But in our algorithm, instead of going back to the frame Fi in the level i, only one level of retrieving, i.e. we only need to go up one level to find the frame, say -Fj-i in the level j — 1 of the inheritance path in order to find the desired definition of the attribute. This is true since our algorithm adopts a top-down mechanism, and it guarantees that the attribute in question has already been included in the frame F j - i in the level j — 1 before the expansion mechanism reaches the frame Fj in the level j . Note that the first algorithm is used only for the image expansion of object frames in an inheritance hierarchy. The algorithm for expansion of activity frames will be introduced in the second algorithm. These two algorithms will call a common subroutine to accomplish expansion.
35
Hybrid Parallel Execution Model
The algorithm for expansion of object frames and the algorithm for expansion of activity frames have different main functions. This difference comes from the fact that the inheritance mechanism for object frames and activity frames are different based on the formal definition of FRORL[93]. But considering the body of these two main functions, they all call the same subroutine "inherit-from-parents". It is possible to share an identical main routine for these two functions since all the different aspects of inheritance have been considered and dealt with in the algorithm and only the identical operations are defined in the subroutine. ( Begin ) G = FRORL Specification mark all frames in G as unexpanded
Output expanded object frames in G
Select an unexpanded object frame F from G which meets one of the two criteria listed to the right.
( Stop )
Add all F' s direct slots to Image(F) Let ID[A11 predicates in F] be 0.
(l)Fhasno antecedent, or (2) All F's antecedents have been expanded.
Y yS¥ has no \ N parents? Set "Parents" to be all the parent object frames of F.
T
Call inherit_from_parent with "Parents", F, and Image(F) as parameters.
wait for return value Image*(F).
Mark F as expanded.
Figure 4.2: The flow chart of "NMJmageGenerator_for_ObjectJFrame"
36
FRORL Requirements Specification Language and Its Decomposition
( Begin ) G = FRORL Specification mark all frames in G as unexpanded
Y ^ ^ A l l Activity^ ^frames have been^ .expanded in G^ (1)F has no part, or Output expanded activity frames in G
Select an unexpanded activityframeF from G which meets one of the two criteria listed to the right.
( Stop )
Add all F's direct slots to Image(F) Let ID[A11 predicates in F] be 0.
' F has no \ parents?
(2) If one of the antecedents of F's parts acts as a part of activity frame G, while G has the same name as that of F, then G is expanded.
N
Set "Parents" to be all those frames similar to G.
Call inherit_from_parent with "Parents", F, and Image(F) as parameters.
wait for return value Image' (F).
Mark F as expanded.
Figure 4.3: The flow chart of "NMJmageGenerator_for_Activity_Frame"
37
Hybrid Parallel Execution Model
(Begin) Initiate F, Image(F), and "Parent", mark allframesin "Parents" as unparsed.
All frames in "Parents" have been expanded and F is unexpanded.
| Select an unparsed frame Gfrom"Parents". I For the next unchecked Direct slot "Slot" in G r If Slot Image(F)
Mark G as parsed
Figure 4.4: The flow chart of the algorithm "inherit.from_parents"
38
FRORL Requirements Specification Language and Its Decomposition
Parents Set: ParentJ
Parent_n
Q Slot
'd
Image(F)
' Same name
Domain Relation
Add in
Emfry
Combine
has " Su per"
Unchan
Sed
has its own value but no "Super"
Figure 4.5: Graphical explanation of the procedure "inherit_from_parents"
Hybrid Parallel Execution Model
39
The flow charts for each of the algorithms are provided in Figure 4.2 through Figure 4.4. Figure 4.5 gives an intuitive idea of what the condition (a) through (d) mean in the flow chart of the subroutine "inherit-from-parents". The example starts by generating the expanded image of frame F from its parents frames, and in this case the specific parent frame G is considered. The value of Image(-F) is the current value defined in frame F. In the flow chart: • Condition (a) means that if there is a slot which is not defined in the current slot we want to expand, then simply combine them together; • Condition (b) is the case where a slot in the parent frame is also defined in the current slot; based on the case (6), case (c) is when the same slot in the current frame is empty and case (d) is when the current slot has a key word super while case (e) is when the current slot is not empty and also no key word super in it. The detailed algorithm follows: Assumption: The input inheritance network is consistent . Input: A group of consistent frames G = {F\, ,Fn} in the hierarchy Output: The expanded inheritance image for G. Algorithm 1 N M J m a g e G e n e r a t o r Jbr_Object_Frame(G,/n/i, Image(G)) NotEmpty(G); % check that the inheritance network is non-empty. % while (not (all the object frames in G have been expanded)) Select an unexpanded object frame F from G such that (1) F has no antecedent, or (2) all F's antecedents have been expanded; Image(F) = Null; for each Slot € DirSlot(F) Image(.F) = Image(F) + Slot; for each attribute A in Slot ID[A,Slot,F}=0; if (F has no parent frames) then Mark F as "expanded" else Parents = {P \ P e DirReach(P,/n/i,F)}; Call inherit_from_parents(Paren£s,F); Mark F as "expanded" Output the expanded image of object frames in G; Example 4 Consider the expansion of the object frames in Example 3 using the algorithm 1 above.
40
FRORL Requirements Specification Language and Rs Decomposition
object: reverting-call caller-id:; calleeJd:. object: two-party-call a-kind-of: reverting-call; partyJine-type: two-party.
Since only the frame reverting-call meets the two requirements, it is selected as the current frame to do the expansion.
Image(r•everting-call)
= { caller-id, calleeJd } .
On the other hand, the expansion of the frame two-party-call generates in the resulting image of itself the attribute party Jine.type, and then it invokes the kernel algorithm with the calling pattern of inherit -from_parents({reuerim<7-caZZ}, two jparty-call). The expansion algorithm for activity frames has some special features included. This is decided by the way of handling nonmonotonic inheritance and overriding within an activity hierarchy. These relationships are mediated via the inheritance hierarchies that exist between the objects partaking in the respective activities. An activity frame A may inherit properties from another activity frame B iff (1) the frame A and B have the same name, and (2) the frame B has an object frame as its part which is an antecedent of another object frame which is a part of the frame A. Considering this special case, we have to add in some new operations before we can eventually call the "inherit_from.parent" procedure. Another difference existing within an activity frame is that it has only four kinds of slots, namely parts, prexondition, action and act-action(alternate_action). Algorithm 2 N M J m a g e G e n e r a t o r Jbr .Activity _Frame(G, Inh, Image(G)) NotEmpty(Cr); % check that the inheritance network is non-empty. % while (not (all the activity frames in G have been expanded)) Select an unexpanded activity frame F from G such that (1) F has empty parts, or (2) all those activity frames with antecedents of F's parts as'their parts meet the following criteria: if they have the same name as F, then they have been expanded;
Hybrid Parallel Execution Model
41
Image(F) = Null; for each Slot € DirSlot(F) Ima,ge(F) = Image(F) + Slot; for each attribute A in Slot ID[A,Slot,F} = 0; if (F has empty parts) or (the activity frame set of (2) is empty) then Mark F as "expanded" else Parents — {P \ P € activity frames which belong to (2)}; Call inherit-from-parents(Porenis,F); Mark F as "expanded" Output the expanded image of activity frames in G; Example 5 Consider the expansion of the activity frames in Example 2 using the algorithm 2 above.
activity:
complete(X) parts: X: reverting-call; actions: initiate-busy J,one(X); ringing (X).
activity:
initiate-busy-tone(X) parts: X: reverting-call; actions: apply.busy-tone(X. callerJd); wait-for-disconnect(X).
In this case, both activity frames' parts set are not empty. But the object frame in the parts is the root of the object frame hierarchy, so the final expanded frame is itself. In this case there is no need to call the kernel function of inherit_from_parents. The following is the procedure for "inherit_from.parents". Algorithm 3 Inherit_from_Parents(Parenis,F) Mark all frames in Parents as "unparsed"; while (there is a frame Focus in Parents which is unparsed) AUSlots = {Slot | Slot 6 DirSlot(Focus)}; for each Slot 6 AUSlots if Slot <£ Image(.F) then Image(F) = Image(F) + Slot; for each attribute A in Slot
42
FRORL Requirements Specification Language and Rs Decomposition ID[A, Slot, F] = ID[A, Slot, Focus] + 1; else for each CurSlot e Image(F) if (Slot_name(Cur5Zo£) = = Slot_name(S7o£)) then if {CurSlot ^ empty) and (jB keyword "super" in CurSlot) then for each attribite A in CurSlot ii(ID[A,CurSlot,F] < ID[A,Slot,Focus]) then Do not change the value of CurSlot; if (ID[A,CurSlot,F] == ID[A,Slot,Focus]) then Report "Confusion"; if (ID[A, CurSlot,F] > ID[A, Slot, Focus]) then CurSlot.A = Slot.A; ID[A, CurSlot, F][]= ID[A, Slot, Focus] + 1; if (3 keyword "super" in CurSlot) then for each attribute Slot.A, A £ CurSlot CurSlot = CurSlot + Slot.A; ID[A, CurSlot, F] = ID[A, Slot, Focus] + 1; if {CurSlot == Null) then CurSlot = Slot; for each attribute A in CurSlot ID[A, CurSlot, F] = ID[A, Slot, Focus] + 1; Mark Focus as "parsed"; Return (Image(.F));
The above algorithm generates the expanded image for object and activity frames respectively. Example 6 Consider the calling pattern of inherit Jrom_parents({reuerimg_ca//}, twojparty-call) which is generated in the expansion of the two-party-call object frame in Example 3. The content of the two frames are:
object: reverting-call caller-id:; callee-id:. object: two-party-call aJiind-of: reverting-call; partyJine-type: two-party. Note that the following conditions hold when the above algorithm is called:
Hybrid Parallel Execution Model \mage(two .party .call) = {party Jine.type}; I D[twojparty, party JineJype,twojparty-call] ID[
, caller-id, reverting-call] = 0;
ID[
,calleeJd, reverting-call] = 0;
43
= 0;
After the execution of the if part of the first for statement within the while statement, the Image for the two_party_call will be changed to: Ima,ge(two-party-call) = {partyJine.type, caller Jd, calleeJd}; I D[twojparty,party-line-type,twojparty-call] = 0; ID[ , caller Jd, two-party-call] = 1; ID[ ,calleeJd,twojparty-call] = 1; The else portion of the same if statement is to handle the case where the same attribute appear at the parent and the current frames at the same time. In this case, three conditions have to be considered, as mentioned above. The final image of the frame hierarchy G is the combination of the two results from the algorithms. After the final expanded image is generated, we have to adjust the format into Horn-clause logic form. Because of the nature of a FRORL specification, this step is straightforward. In the following example the complete LSSGR protocol is presented to further demonstrate the executional behavior of the decomposition mechanism. It consists of the description of the LSSGR protocol and the transformation of it into a Horn-clause logic-based specification. Example 7 Here we present an example based on a subsection of the Local Switching System General Requirement (LSSGR) protocol for automatic completion of reverting telephone calls and give an intuitive idea of how the transformation system works[93]. This part of the LSSGR protocol defines a reverting call to be a call in which a party line customer dials the number of another party on the same line. Reverting call service allows a two- or four-party line customer to complete a call to other parties sharing the same line without the assistance of an operator. The customer perspective on reverting calls depends on whether the telephone company elects to provide automatic completion. It also depends on the class of service of the line, namely the number of parties on the line and the type of ringing. Depending on their particular type of party line service, customers may or may not be required to dial a calling party identifying digit. We define the following object frames, based on the informal description of the requirements:
44
FRORL Requirements Specification Language and Rs Decomposition
object: reverting-call caller-id:; callee-id:. object: two-party-call a-kind-of: reverting-call; party-line-type: two-party. object:
four-partysemiseLiddigit-call a-kind-oj: reverting-call; party-line-type: four-partysemiseLiddigit.
object:
four-partysemiseLcall a-kind-of: reverting-call; party-line-type: four-partysemisel.
object:
four-party-fullyseLcall a-kind-of: reverting-call; party-line-type: four-party-fullysel.
Unless the telephone company has elected the option of denying reverting calls, the system should attempt to complete reverting calls automatically by taking the following steps: initiate a busy tone to the calling party and then ring the two parties.
activity:
complete(X) parts: X: reverting.call; actions: initiate-busy-tone(X); ringing (X).
A busy tone should be applied to the line as soon as the call is recognized as a reverting call or once an identifying digit has been detected (in case of a four-party semi-selective line with identifying digit). The line should be monitored for disconnect and the tone should be removed on disconnect. An invalid identifying digit will lead to a failure response. We have the following two initiate-busy-tone activities.
Hybrid Parallel Execution Model activity:
initiate-busy-tone(X) parts: X: reverting-call; actions: apply-busy-tone(X.caller-id); wait-for-disconnect(X).
activity:
initiateJ)usyAone(X) parts: X: four-partysemiseLiddigit-call; precond: correctJdentifying~digit(Digit); alt-actions: failure-response(X.callerJd); fail.
45
Note that the latter activity is defined for a fourjpartysemisel-iddigit-call only. It inherits from any initiate-busy .tone activity which is defined for objects higher up in the hierarchy, in this case from reverting-call. It does not have any actions defined, but inherits its actions from the first initiate-busy-tone activity. However, the child activity adds a special precondition, and an alternate action, should this precondition fail. Upon detection of disconnect the busy tone is removed from the calling party's line, as shown below. If a disconnect is not detected, the line is monitored continuously. The alternate action calls wait-for-disconnect recursively and thus creates a continuously running process which monitors the calling party's line until the precondition succeeds and the action fires, in which case the busy signal is removed.
activity:
wait-for-disconnect(X) parts: X: reverting-call; precond: is-disconnected(X); actions: remove-busysignal(X. caller-id); alt-actions: wait-for-disconnect(X).
Ringing should be applied to both the calling and called parties. The following activities show how ringing is applied in the general case: Normal ringing is applied to the calling party's line first, and after a 0.5 second delay the called party's line is rung, followed by another delay to allow the ringing isolator circuitry to turn off. The ringing continues in this manner until either party goes off-hook (again the continuously running process is represented as a recursive call).
activity:
ringing(X) parts: X: reverting-call;
46
FRORL Requirements Specification Language and Rs Decomposition precond: is-off-hook(X.caller-id); actions: stop-ring(X); alt-actions: ring(X). activity:
ringing(X) parts: X: reverting-call; precond: is-offJiook(X.callee-id); actions: stop-ring(X); alt-actions: ring(X).
activity:
ring(X) parts: X: reverting-call; actions: apply-normal-ring (X. caller-id); delay-for-0.5seconds; apply-normaLring(X.calleeJd); delay-for-0.5seconds; ringing (X).
This describes the general case of the application of ringing. This is a case of the over-abstraction typical for requirement specifications. Subclasses of the reverting call class provide exceptional cases to this abstraction, FRORL allows for a design strategy in which these exceptions can be ignored initially. Exceptional cases in the LSSGR protocol occur in four.party lines with identifying digits if calling and called parties are on the same side of the line, in which case only the called party should be rung. In the case of a four-party line without an identifying digit a special reverting ringing code is applied to the called party's line. Using FRORL we can state these exceptions easily by overriding the ring activity as shown below.
activity:
ring(X) parts: X: four-partysemiseLiddigit-call; precond: parties-onsameside-of-line(X); actions: apply-normaLring(X.callee-id); delay-for-0.5seconds; ringing(X).
activity:
ring(X) parts: X: four-partysemiseLcall; actions: apply-normaLring(X. caller-id); delay-for-0.5seconds; apply-revertingjring(X.callee-id);
Hybrid Parallel Execution Model
47
delay-for-0.5seconds; ringing (X).
After we apply the object frame expansion algorithm to the above five object frames, we can get the following expanded images:
object: reverting-call caller-id:; callee-id:. object: two-party-call a-kind-of: reverting-call; callerJd:; callee-id:. partyJine-type: two-party. object:
four-partysemiseLiddigit-call a.kind.of: reverting-call; caller-id:; calleeJid:. partyJine-type: four-partysemiseLiddigit.
object:
four-partysemiseLcall a-kind-of: reverting-call; caller-id:; callee-id:. party-lineJ,ype: four-partysemisel.
object:
four-party-fullyseLcall a-kind-of: reverting-call; caller-id:; callee-id:. partyJine-type: four-party-fullysel.
The expanded activity frame images listed here are obtained after we apply the corresponding algorithm to them.
activity:
complete(X) parts: X: reverting-call;
48
FRORL Requirements Specification Language and Its Decomposition actions:
initiateJ)usy-tone(X); ringing (X).
activity:
initiateJ)usy-tone(X) parts: X: reverting-call; actions: apply-busy-tone(X.caller'-id); wait-for-disconnect(X).
activity:
initiate-busy-tone(X) parts: X: fourjpartysemisel-iddigit-call; precond: correc ^identifying- digit (Digit); actions: apply~busy-tone(X.caller-id); wait-for-disconnect(X). alt-actions: failure-response(X. caller-id); fail.
activity:
wait-for-disconnect(X) parts: X: reverting-call; precond: is-disconnected(X); actions: remove-busysignal(X. caller-id); alt-actions: wait-for-disconnect(X).
activity:
ringing(X) parts: X: reverting-call; precond: is-off-hook(X.caller.id); actions: stop-ring(X); alt-actions: ring(X).
activity:
ringing(X) parts: X: reverting-call; precond: is-off-hook(X. calleeJd); actions: stop-ring(X); alt-actions: ring(X).
activity:
ring(X) parts: X: reverting.call; actions: apply.normal-ring(X. caller-id); delay-for-0.5seconds; apply-normaLring(X. calleeJd); delay-for-O. Sseconds; ringing (X).
activity:
ring(X) parts: X: four-partysemisel-iddigit-call; precond: parties-onsameside-ofJine (X); actions: apply-normaLring(X. callee-id);
Hybrid Parallel Execution Model delay^for-0.5seconds; ringing (X). activity:
ring(X) parts: X: actions:
four-partysemiseLcall; apply.normaLring(X.callerJd); delay ^for-O.5seconds; apply jrevertingjring(X.calleeJid); delay^for-O.^seconds; ringing (X).
After we obtain the expanded images of all the object and activity frames we can then use the general FRORL representation method to generate the Horn-clause logic-based specification from the expanded FRORL specification. The result is as follows.
callerJd(X) -f- reverting-call(X). calleeJd(X) <- reverting-call(X). reverting-call(X) <- two-party-call(X). caller-id(X) i- two-party-call(X). calleeJd(X) <- two-party-call(X). partyJine-type (X, two-party) <— two-party-call(X). reverting-call(X) <— four-party-serniseLiddigit-call(X). callerJd(X)
50
FRORL Requirements Specification Language and Its Decomposition complete(X) <— reverting-call(X); initiate -busy-tone (X) ; ringing (X). initiate-busy-tone(X) <— reverting-call(X); apply- busy-tone (X. caller-id); wait-for-disconnect(X). initiate-busy-tone(X) <— four-partysemiseLiddigit-call(X); correcLidentifying-digit(Digit); apply- busy-tone (X. caller-id); wait-for-disconnect(X). initiate-busy.tone(X)
Hybrid Parallel Execution Model ringing(X) «— reverting-call(X); is-offJiook(X. callee-id); ring(X). ring(X) <— reverting-call(X); apply-normal-ring (X. callerJd); delay-for-0.5seconds; apply-normal-ring (X. callee-id); delay-for-0.5seconds; ringing (X). ring(X) <— four-partysemiseLiddigit-call(X); parties-on-sameside-of-line(X); apply-normaLring(X.callee-id); delay-for-0.5seconds; ringing(X). ring(X) <— four-partysemiseLcall(X); apply-normal-ring (X. caller-id); delay-for-0.5seconds; apply-reverting-ring(X. callee-id); ' delay-for-0.5seconds; ringing (X).
Chapter 5
Rewriting and D a t a Dependency, Control Flow Analysis of a Logic-Based Specification After expanding the nonmonotonic inheritance structure in FRORL, we would have derived a Horn-clause logic-based specification. In this section, we convert the Horn-clause logic-based specification resulting from the first step of the transformation system into an intermediate representation, to facilitate the data flow and dependency analysis *. Rewriting is accomplished via a sequence of preparatory transformation steps. The preparatory steps are equivalence preserving, in the sense that they preserve both semantics and input/output relations. The main reason for rewriting in the transformation system is to facilitate the automatic data dependency and control flow analysis. The contributions of rewriting that are brought to the system are: 1. Rewriting makes explicit the constraints on variable bindings, etc., that are implicit in the original specification. Consider the following clause, r(U, V, V)
53
54
Rewriting and Data Dependency, ... the information passing realized by the unification of a query with a clause head is to unify variables from the query and clause head predicate and pass the values to the clause body. In the above case, there is another unification inside the clause head predicate created by identical variable name V. This may seem natural, but the mode analysis requires that this kind of implicit unification be represented explicitly inside the clause body. Equal-introduction and equal-substitution realize this format rewriting. 2. Rewriting isolates the explicit mode constraints supplied by the user or imposed by built-in predicates of the specification language as the initial mode information to start the mode analysis . Consider an example clause:
r(U,V)4-U>=V. the clause check if U is bigger than V. From the definition of the built-in predicate "£=", we know that when this clause is invoked, there should be two ground values bounded to U and V, respectively. Decomposition analyzes the mode constraints supplied by user and defined by builtin predicate. It is those mode information that is used to start the automatic mode analysis . Mode analysis operations and restrictions are built based on these original modes and will be discussed in a later section. 3. Rewriting eliminates redundancy from a specification. Redundancy in a logic-based specification derives from two sources, either from original specification or from the first three rewriting steps, Equal-introduction, equal-substitution , and Decomposition. An example of a redundant predicate in a clause is a unification involving a variable that appears nowhere else within the clause.
5.1
Rewriting of a Logic-Based Specification
In this section, the following four basic operations to transform a logic-based specification into an equivalent intermediate form are identified using rules. To facilitate the discussion we use the following notation to represent a transformation rule. The following rule pattern is also used in other sections to describe adjustment operations. So, instead of presenting a specific format, the general format of the rule which is suitable for all the related operations is given:
Hybrid Parallel Execution Model
55
RULE-
(<domain name>) <pattern> ==>• < replacement >
where Number provides an alphabetical ordering number for all the rules used in the canonicalization phase. Domain indicates at which step of the adjusting phases should this rule be applied, considering that the format of this rule will be used in subsequent sections. Pattern describes the situation to which a rule may apply. One may refer to components of the specification by variables which are constructed from the syntactical category which they are intended to match and some identifying number. Such variables in a rule will be bound to the matching non-functional specification and can be used later in the rule with this binding. Condition states syntactic or semantic constraints on the matched pattern. Only if these constraints are met will a match of the pattern against a situation be considered successful. There may be more than one condition in a rule, in which case they are connected by a logic operator "and". Replacement describes how to modify the specification in case the rule pattern matches successfully. The replacement may either state a substitution of pattern of the matched specification, or it may declare attributes on parts of the specification (such as non-functional specification, etc.). These rules are developed to handle all possible structures which may occur in a logic-based specification. The scope of each of these rules is defined in a way that every possible occurrence of the structure can be handled by the rule. Equal-Introduction: New variables are introduced for the arguments of a clause head and assigned values through unification in the clause body. The purpose of this operation is to move value assignment from the clause head to the clause body in order to allow for further simplification of the clause body.
56
Rewriting and Data Dependency, ...
Equal-Substitution: Non-variable terms in the clause body partaking in unifications are replaced by variables. The purpose of this operation is to create chances to further simplify the clause. Decomposition: Built-in predicates and functions in a clause body are decomposed by expanding them into a sequence of simpler operations. The predicates and functions thus become more amenable to mode analysis . Simplification: Clauses of the specification are simplified, either by manipulating the clause form (folding and unfolding of literals; clause pruning) or by partial evaluation. As the transformation process goes on, further opportunities for rewriting may arise, or other opportunities may have to be explored due to the uncovering of mode information. This requires that the rewriting transformations be executed several times, until no more changes occur. The rewriting transformation is described by the rules used in the previous section. In the system, the writer of a rewriting rule may assume that the unification operator is treated as commutative, or that the conjunctions in a clause body are treated as being properly connected by an associative and commutative operator.
5.1.1
Equal-Introduction
The meaning of "equal" is not the equality of arithmetic expression but rather, it refers to the unification between two terms. As such, "equal" encompasses both matching and de-structuring of the involved arguments. According to the procedural interpretation, "equal" could be both an assignment between the left-side literal and the right-side literal, or a testing condition for any two ground terms. However, it could also be the unification of two free variables (which happens when another clause is called within the body of a clause). t Equal-introduction creates a new variable for each argument in the clause head. In the clause body, arguments and the corresponding newly-introduced
variables are unified. This process may introduce redundancies in the system which are eliminated in the following transformation steps. Equal-introduction is accomplished through the following set of transformation rules: RULE-1 (Equal-Introduction) < p r e d > ( t ; a r i , . . . , varn) <— =$• var\,..., varn are mutually distinct variables t i t is necessary to remove "unwanted" arguments from clause heads, since we want the entries of the matrix to be variables rather than compound terms or literals.
Hybrid Parallel Execution Model
57
prohibit Equal-Introduction This rule simply tells us that if there is a clause without clause body and all its arguments are neither constants nor compound terms (while mutually distinctive), then we need not apply further equal-introduction to that clause. RULE-2 (Equal-Introduction) <pred>(.. .,trmi,...) <- lit\ A . . . A litm =$• var\ is a new variable < p r e d > ( . . . ,vari, ...)<— var\ — trm\ A lit\ A . . . A litm
New variables are introduced for the arguments of the clause head (which will then only contain variables as arguments). These variables are given their original value through unification in the clause body. Correctness Proof: The effect of the rule is to introduce a distinct variable symbol for each of the parameters within the clause head predicate, and unify each of the variable symbols with the corresponding parameter in the clause body. In this process, no semantic meaning of the clause has been changed, and all possible execution sequences have been preserved. The correctness thus follows. We shall illustrate the rewriting transformation as well as data flow and data dependency analysis by the successive refinement of the following example specification. The example demonstrates flow analysis of a dynamic specification. Using the approach, we can obtain necessary information to transform it into a procedural language program. Note that in the previous section, we used the LSSGR protocol example to show the decomposition of a FRORL specification into a Horn-clause logicbased specification. The specification generated has a form similar to a Prolog program. Even though it illustrates the decomposition process quite well, that example is not suitable for use here because the generated Horn-clause logic-based specification is too simple to illustrate the following transformation steps. Consequently, we will pick up one clause from that specification and illustrate the effects of each of the transformation steps which follow. The selection of the clause will be based on how that clause can reflect the effect of the corresponding transformation. Later on in the book, we will provide the complete procedural program generated from this LSSGE specification. We find that the transformation is straightforward for this specific example. Details can be found in [98]. The following example is used to illustrate the rewriting and procedural language program generation process. Example 8 Consider the following set of clauses.
58
Rewriting and Data Dependency, ... p(X) <- assert((q(X))
A q(Y) A r(X, Y).
r(U, V) <- s(U, N, W) A t(W, V).
*(D,o,D). sdXIL,], f(N), [f(N)\L2])
<- «(£!, TV, L2).
*(D.D)-
We apply the equal-introduction rules to the above specification and obtain
p(Ai)
A assert(q(X)) = UAA2
A q(Y) A r(X, Y).
= V As(U, N, W)
s(AuA2,A3)
«- Ai = 0 A A2 =0AA3
8{AltA2,A3)
<-Ai = [X\L1]AA2=f(N)AA3
t(A1,A2)<-A1 t(AuA2)
= \\AA2
At(W,V).
= Q. = [f(N)\L2] A
s(LuN,L2).
= \\.
<- Ai = [ff|Li] A A2 = [ff(ff)|L2] A t(Li, L a ).
The example clause from LSSGR protocol is selected as following. Again only one example is shown here.
partyJine-type(X, four-party.fully.sel) <— four-party-fullyseLcall(X). and the transformed specification clause looks like: partyJine-type(Ai, A2)
A
four-party-fullyseLcaU(X).
It uses Rule-2 to generate the above result.
5.1.2
Equal-Substitution
Equal-substitution is designed to replace terms in the body of a rule which are given values through a unification. The effect of equal-substitution is to
Hybrid Parallel Execution Model
59
simplify the literals in the clause body. After applying equal-substitution , some literals expressing unification can be eliminated later on in the simplification step if the information carried by them becomes redundant. Basically, equal-substitution reduces the degree of groundedness ' of a clause as much as possible according to the partial ordering § used to define the degree of groundedness. R U L E - 3 (Equal-Substitution) < p r e d > ( . . . ) « - . . . A trrrii — vark A . . . A trrrij = vark A . . . =$• vark is a variable A vark does not appear elsewhere within the clause < p r e d > ( . . . ) « - . . . A trrrii — trrrij A . . . This rule makes explicit (in the clause body) the data dependency due to transitivity of unification implicit in the clause head of the original clause. Correctness Proof: The only effect of the two unifications is to introduce a transitive unification. Because Vark does not appear elsewhere in the clause, the correctness is preserved during the elimination of the transitive unification. R U L E - 4 (Equal-Substitution) <pred>(...)«—... A trrrii = trrrij A . . . =>• trrrij is a constant prohibit Equal-Substitution Because a constant is already minimal according to the partial ordering used to measure the degree of groundedness, equal substitution should not be applied in this case (less additional complexity is introduced by later canonicalizing transformations). R U L E - 5 (Equal-Substitution) <pred>(. ..)<—... A trrrii = trrrij A . . . =$• terrrii, trrrij are compound terms, predicates , functions or variables <pred>(...)«—... A trrrii = trrrij A . . . mgu{trrrii, trrrij} This rule maintains consistency throughout the clause. Chances can be created by applying this rule for the elimination of newly-introduced unnecessary variable, as discussed in simplification . +The degree of groundedness is used to measure the degree of boundness of a clause. The more bound variables a clause has, the lower the degree of the clause is. §The partial ordering is defined in that constants are more ground than variables, which are more ground than function or predicate symbols, which in turn are more ground than literals.
60
Rewriting and Data Dependency, ...
Correctness Proof: If trrrii and trrrij are unifiable, then applying the most general unifier of these two terms does not change the semantic of the clause. Correctness thus follows. E x a m p l e 9 Considering Example 8, Equal-introduction had left several unifications behind, which are now put into a consistent representation. p(Ai) <-Ai = X Aassert^Ax)) r(AuA2)
i-At
8(A1,A2,A3)<-Ai S(AI,A2,A3)
Aq(Y) Ar(Ai,Y).
by Rule-5
= U AA2 = V As(AuN,W)At(W,A2).
by Rule-5
= 0 A A 2 = O A ^ 3 = D-
<-
Ai = [X\L!]AA2
= f(N)AA3
= [A2\L2] As(LuN,L2).
by Rule-5
t(Ai,^ 2 )<-^i = DA^2 = Dt{AuA2)
<- ^
= [H\L1]AA2
= [g(H)\L2])
At(LuL2).
Following is the example from LSSGR protocol. It also is generated from Rule-5.
partyJine-type(Ai, A2)
5.1.3
A
four-party-fullyseLcaU(Ai).
Decomposition
A logic-based specification will rely also on built-in predicates and functions. Usually the semantics of these components requires several operations of a procedural programming language to realize them. If we consider each one of these predicates and functions as a unit when performing flow analysis, we would find it very difficult to follow the execution sequence and provide "detailed" information to facilitate the transformation. Therefore, we need to decompose built-in features of a logic-based specification language into several components that can provide more information during flow analysis than the built-in feature considered as a unit. Although the rules shown below are restricted (for space limitations) to only a few of the built-in features that
a logic-based specification may provide, the strategy is quite general. The essential point here is to extract the operations contained in a built-in predicate or function as well as the constraints that the built-in predicate may impose on its arguments. This information is then used to simplify the flow and dependency analysis.
Hybrid Parallel Execution Model
61
The discussion of built-in predicates is mainly concerned with the operations themselves instead of possible side effects these operations may have on a database11. We first introduce some rules used for decomposition of built-in literals (for familiarity sake we use the names of standard LISP destructor operations). The car function is used to retrieve the first element of a list. The cdr function returns the tail of a list (i.e., the list with its first element removed). Furthermore, we define a function listcnstr to construct a single-element list from an individual element. In the following rules, different mode information is extracted along with the decomposition . At this stage, the mode information can be understood as indicating the groundedness of arguments in built-in predicates (with "ground" being denoted by "+" and "unground" denoted by <£ _" "\ R U L E - 6 (Decomposition) trm-i = [trmi\list] Ustcnstr(trmi,list2) mode listcnstr{"+",
A append(list2,list,trm2) " - " ) , append("+", "+", "-")
R U L E - 7 (Decomposition) trm.2 = [trmi\list] car(trrri2, trmi) A cdr(trm2, list) mode cor("+", " - " ) , cdr(u+", "-")
Correctness Proof: A list structure represents either decomposing a list into its first element and sublist or combining an element and a sublist into a list. These semantics are captured by these two rules and the correctness follows. Initially we have no basis on which to choose between the two options available. But with additional mode information collected during data dependency analysis, it may turn out that our initial choice was wrong, and the mode assignment obtained under the chosen decomposition for a list was inconsistent. Data dependency analysis follows a backtracking control schema, and a discovery of an inconsistent mode assignment will bring us back to this point, where we have to switch the decomposition of a list to the other alternative available. R U L E - 8 (Decomposition) 'This is in line with other approaches presented in the literature [29, 68, 80].
62
Rewriting and Data Dependency, ... trmi = trm,2 trm3 =>• is one of H— */ mod (arithmetic operators) (trrri2, trm3, trmi) mode < o p > ( " + " , "+", " - " ) R U L E - 9 (Decomposition) trrni is trm,2 assign(trm.2,trmi) mode assign("+", "—")
Correctness Proof: When dealing with arithmetic operations, we assume that their operands are always ground and the results are always unground, which is the only possible mode combination involving arithmetic operators. RULE-10 (Decomposition) trmi trrri2 =» is one of = = , "=, <, >, > = , = < (trmi,trm,2) let be a testing condition mode < o p > ( " + " , "+") = = , "=, <, >, > = , = < stand for logic operators of "equal", "not equal", "less than", "greater than", "greater or equal", "less or equal". R U L E - 1 1 (Decomposition) <pred>(£rmi) =>• <pred> is one of var, nonvar, atom, integer, atomic "<pred>"(irmi) let "<pred>" be a testing condition mode is not <pred>("—") Correctness Proof: In the cases involving logic operators, the operands have to be bounded to provide a nontrivial logic value. In Rule-11, trmi can be either bounded or unbounded, based on the context, but it can not happen that trmi receives a value by executing this test case, which is ruled out by the rule. These two rules are used to handle test conditions. If the result of a literal in a clause body is "true" or "false" and there is no other side effect by
63
Hybrid Parallel Execution Model
invoking the literal, then we call it a testing condition. Later on, when we generate code in a procedural programming language implementing the logicbased specification, these testing conditions will serve as the condition parts of conditional statements. Some built-in predicates considered in these two rules are used to test if the arguments are ground or not, so in these cases the mode information we obtain can only be used to reduce some impossible mode combinations. They cannot provide a definite mode status like the previously discussed rules. RULE-12 (Decomposition) u =" {trmutrm2) => mode(trmi)="-" mode "="("-», "+»)
Correctness Proof: In a unification, if one of the arguments receives a value during execution, then the other argument must provide this value. Unification cannot bind both of its operands. This rule infers the mode of one operand in a unification, when the mode status of another operand is restricted. Note that matching rules modulo commutativity of "=" allows us to cut down on the number of rules, since we need not consider the opposite case here. RULE-13 (Decomposition) (£rmi,..., trrnn) "" {trm\,..., trmn) mode ("+", • • •, "+") If there is a function name in the clause being analyzed, then the modes of all arguments in the function can be set to input. In this discussion it is assumed that a function always requires that all of its arguments be ground and that the value of the function depends on the arguments." Correctness Proof: Correctness follows from the assumption above. RULE-14 (Decomposition) <pred 1 >(...)<. . . A assert(<pred 2 > (trm2l, • • • ,trm2n))
A ... A
II This assumption reflects the usual meaning of a function call, but may be too strict for some advocates of logic programming.
64
Rewriting and Data Dependency, ... . . . A <pred2>(irmi 1 ,... ,trm\n) A . . . =>• pred2 has not been defined elsewhere within the specification 6 is a most general unifier between trm,21 , • ••, trrri2n and trm\x,..., trmin (<predi >(...) < - . . . ) *
Correctness Proof: If a clause is asserted and called only within the same clause, and there is no other definition of the same clause within the same specification, then this pair of assertion and calling can be resolved without affecting the logic value of the clause if there exists a most general unifier 0 between the arguments of the predicate <pred2> within the assertion, and the arguments of the predicate <pred2> outside the assertion. R U L E - 1 5 (Decomposition) liti A liti liti,liti mode "+" A "+"
Correctness Proof: This rule is used to construct the compound goals in the query. It reflects exactly the meaning of the 'and' construct. As a result of the application of this rule we can assume that both X and Y have input mode. An assumption of the don't-care nondeterminism of the logic-based specification language is that no binding of variables occurs before successful evaluation of the precondition. Therefore, no output can be generated from variables occurring in the precondition, and we may safely assume that all variables occurring in a precondition are of mode " + " , as expressed in the following rule. RULE-16 (Decomposition) . . . A < p r e d > ( . . . , trm,...)
A ...
= > •
mode < p r e d > ( . . . , "+", ...)
Correctness Proof: An assumption of the don't-care nondeterminism of the logic based specification language is that no binding of variables occurs before successful evaluation of the precondition. Therefore, no output can be generated from variables
65
Hybrid Paxallel Execution Model
occurring in the precondition, and we may safely assume that all variables occurring in a precondition are of mode "+". Many other built-in predicates may occur in a logic-based specification. Again mode information is implicit and can be used to eliminate redundant mode combinations during flow analysis. These built-in predicates include: • File Handling • Procedure Augmentation • Clause Handling • Term Modification Obtaining the mode information contained in these built-in predicates is straightforward. Whenever the predicates we discussed here occur in a logic-based specification, we first restrict the possible mode combinations to a set which can meet the mode requirements of these built-in predicates. And then build procedures in a target language which can realize the semantic meanings contained in these predicates . Example 10 Note that in Example 9 the body of clause p contains an assert statement. Also there is no other clause with head q(). So we can apply Rule-14 to simplify p. After applying rules in decomposition we have p(A1) <-= (Ai,X)
A r(Ai, A{)
r(AuA2)
<-= (Alt\\)
by Rule-10, Rule-14 /\t(W,A2).
A = (A2,0) A = (A 3 ,Q).
by Rule-10 by Rule-10
s(Ai,A2,A3) <- car(Ai,X) A cdr(Ai,Li) A "f"(iV,/(iV)) A = (A2,f(N)) A listcnstr(A2,[A2]) A append([A2],L2, A3) A s(Li,N,L2). by Rule-6, Rule-10, Rule-13 t(AuA2)
^= (A1,\\)A = (A2>{\).
by Rule-10
t(Ai, A2) <- car(Ai, H) A cdr(Ai, Li) A "g»(H,g(H)) A listcnstr{g{H),[g{H)]) A append([g(H)], L2, A2) A t(L 1 ,L 2 ).
by Rule-6, Rule-13
Note that in the second clause of s we decomposed the first list structure into a car/cdr pair and the second into a listcnstr/append pair, which is just one of the possible decompositions. These alternative mode combinations could be used to derive other possible data dependencies and, as a consequence,
66
Rewriting and Data Dependency, ...
different execution sequences. We discuss only one of them for simplicity's sake, (similarly for the second clause of t). The decomposition process discussed here has no effect on the example clause from the LSSGR protocol. It remains unchanged.
5.1.4
Simplification
Simplification is a process of removing redundancy from a specification. Redundancy in a logic-based specification derives from two sources: either it already exists in the original specification or it may be introduced by the canonicalizing transformation steps described above. To eliminate redundancy, we rely on the following functions: Folding merges a clause body with another clause in the specification. This operation can create chance to eliminate a specification clause in the description. Clause pruning eliminates clauses that are either unreachable or implied by some other clauses in the specification. Partial evaluation symbolically computes the result of calls to literals in a clause body. Definition 11 (Folding literals) Let the following be clauses in a logic-based specification. P(Argu..
.,Argt)
«- Qu A . . . A Qln.
U(Argi,..
.,Argj)
«- Q 2 l A . . . A Q2m.
If there is a substitution 6, such that Vs G { 1 , . . . , m} • Q2,6 = Bs, where Bs € {Qi1, • • •, Qin } and no other clauses the head of which can unify with U(Argx,..., Argj) existing in the specification, then Folding(P) = P{Argu..
.,Argi)
<- Cu A . . . A C l n _ m A U{Argu..
.,Argj)6
where for all t, such that 1 < t < n — m, C\t £ ({Qi!,...,Qin} {Bi,. ..,Bm})
—
The condition on folding preserves the correctness because in this case the specification may not contain other clauses the head of which can unify with the literal to be folded. Ignoring this condition might create a false
Hybrid Parallel Execution Model
67
line of reasoning due to the presence of other clauses defining the predicate U. We may also encounter the situation where two different clauses have the same or unifiable body but with a different clause head. This situation is legal, since it is possible that two such clauses might mean different things for a logic-based specification. We assume that, under such situations, the two clauses carry different meanings and the software developers are responsible for differentiating them. Example 11 Consider the following two clauses. P(A, B, C, L) *- Q(A, B, D) A R(B, C, F) A B > F A S{D, C, L). U(X, Y,Z,W)^X>TA
S(Z, Y, W) A R(X, Y, T).
then the substitution, 6-{X/B,Y/C,Z/D,T/F,W/L} results in the body of clause U to be contained in the body of clause P. Thus we can fold for clause P and obtain P{A, B, C, L) <- Q(A, B, D) A U(B, C, D, L). Clause pruning is designed to eliminate any clause which is implied by some other clause(s) or is not reachable by any other clause. A clause is not reachable when it cannot be invoked by any other clause or by a query. During clause pruning, we again consider only the logical meaning of clauses and ignore the possible side-effects they may have on input/output or the database. So even if two clauses may have different results considering their input/output or modification on a logic database, if they meet the following criteria, one of them can be considered as redundant and be eliminated. To determine whether a clause is not reachable by any other clause is a straightforward task. We can scan through the clauses and check if any clause contains a literal which is unifiable with a clause suspected to be un-reachable. However, to determine whether a clause is implied by some other clause(s) may be harder to achieve. Definition 12 A variable substitution is a substitution between two literals, Qi and Q2, that is not ground. Definition 13 (Clause Pruning) Given two clauses of a specification
68
Rewriting and Data Dependency, ... P{Argi,...,
Argi) <- Qu A . . . A Qu.
P(Argi,...,
Argi) <- Q2l A . . . A Q 2 m .
if we can find a variable substitution 6, such that V«6{l,...,m}-Q2^€{Qlll...,Qin} then we say that the first clause is implied by the second clause. RULE-17 (Simplification) . . . clausei... clausej . . . =$• clause clauset is implied by clause^ ... clause^ •.. If a clause is implied by another clause in the specification, then the implied clause can be eliminated from the specification. Correctness Proof: Assume there is an answer generated from the implied clause, by combining the unifier for the answer of the implied clause and the variable substitution, the result will be an answer for the implying clause. Another kind of clause pruning can occur within a clause body, and its effect is to eliminate a literal involved in a unification, if one of the parameters is not used elsewhere within the same clause. RULE-18 (Simplification) pred(. ..)<-... A vati = varj A . . . =>• no other literal contains vari pred(. ..)<-... A ... Correctness ProofSince no literal other than the unification contains vari, this unification is redundant and can be deleted. Next is an example of simplification within a clause body: Example 12 Consider P{Argi,Arg2,
Arg3) <- Argx = [] A Arg2 = C A Arg3 = 0-
P(ArguArg2,Arg3)
«- Q(Arg1,Arg2)
A S(Arg2,Arg3)
P(Arg4,Arg5,Arg6)
«- Q(Argi,Arg5)
A
A
S(Arg5,Arg6).
T{Argz,Z).
Hybrid Parallel Execution Model
69
We apply simplification and elimination operations to the above clauses, and obtain P(Arg1,Arg2,Arg3)
<- Argx = 0 A Arg3 = Q.
P(Argi,Arg5,Arg6)
<- Q(Arg4,Arg5)
A
S(Arg5,Arg6).
after applying the variable substitution, {Arg^f Arg\, Arg^j Arg-i, Arg§l Argz). Partial evaluation can be applied to literals in a clause to simplify the representation, especially when all the parameters of these literals are ground. In this paper, we consider only partial evaluation of literals with all parameters ground, and of fact clauses. The following is an example of the former kind of transformation. It requires the partial evaluation of a testing literal. Example 13 Given the following clauses,
p(x,r)4-io<9 A F = 4.
p(x,r)<-io>=9Ar = i6. If we evaluate the first literal (a testing condition), we realize that the first clause will always be false. Therefore, we can apply clause pruning to eliminate the first clause. For the second clause, partial evaluation of the first literal yields it to be always true, and therefore redundant. We can change the clause to P(X, Y)<-Y
= 16.
As far as partial evaluation of fact clauses is concerned, we do not presuppose that there is a fact clause (data base) available to bind with rule clauses (procedures) in order to achieve the partial evaluation of rule clauses. To preserve multiple interpretations of the specification (as typical for logic-based specifications), our transformational approach relies only on the set of rule clauses without any assumption of fact clauses. However, there are situations that a set of logic clauses contains "predefined" fact clauses which may represent a constant or known fact in the problem domain. For situations like these we can apply fact propagation to bind the facts with the literals which invoke the fact clauses. A fact clause is a clause without a body, and has only ground arguments. There may exist multiple fact clauses for the same predicate (with different
70
Rewriting and Data Dependency, ...
values for the arguments). This raises the issue of when, and how, facts should be propagated. Considering that fact propagation will multiply the number of clauses which invoke the fact clause, for a specification with a large number of fact clauses, many partially evaluated clauses with the identical clause head will be generated. This will result in a loss of design information exhibited by the original clause structure. We need to restrict the operation of fact propagation to a small number of fact clauses. The choice of how many fact clauses will be used as the criteria of fact propagation depends on the implementation of the rule base. Definition 14 (Fact propagation) If a logic-based specification contains the following clauses P(Argi,...,Argn) U(ci,...,cm)
«- Qu A ... A Q u . <- •
where c i , . . . ,cm are constants, and if there exists a most general unifier 6, such that Qih0 = ^ ( c i , . . . , c m ) 5 then Unfolding(P) = P(Argi,...,
Argn) <- (Qu A . . . A Q u _ , A Q1K+1 A . . . A Qu )6
The following is the rule to handle simplification of a fact clause. RULE-19 (Simplification) < p r e d i > ( . . . ) « - . . . A < p r e d 2 > ( t r m 2 l , . . . ,trm2n) A . . . ==> there is a clause < p r e d 2 > ( i r m i 1 , . . . , trm\n) <- A there is no other clauses with head <pred 2 > A 6 is a most general unifier between trmx1,..., trm\n and trm,2x, • • • trm,2n A tvm\x,..., trm\n are ground <predi >(...) < - ( . . . A . . . )6 Correctness Proof: The conditions for this rule to be applicable includes: 1) the unifying clause is a fact clause, 2) there is only one clause (the fact clause) defines the predicate, 3) the predicate and the fact clause are unifiable. This rule actually realizes one step of resolution.
Hybrid Parallel Execution Model
71
Example 14 Given the following clauses, P(X, Y, Z) «- Q(X, W) A S(W, Y) A Z is (Y + 10). Q(Arg1,Arg2)
<- Argi is 2 A Arg2 is 10.
5(Ar3i,ylr2) <- Argi is 10 A A r ^ is 4. We apply fact propagation and get
P(X, Y, Z) «- X is 2 A Y is 4 A Z is ( r + 10). Furthermore, we apply partial evaluation within the clause to obtain P(X,Y,Z)
i- X is 2 A Y is 4 A Z is 14.
We continue the original example. Once again various rules can be used to simplify the program obtained at the current stage. Example 15 The final result of applying canonicalizing transformations to the example specification is as follows (the first and the second clause are obtained by applying Rule-19). (1)
p(A1)^r(A1,A1).
(2)
r(A1,A2)^s(A1,N,W)At(W,A2).
(3)
s(A1,A2,A3)
<-= (Alt\\)
(4)
s(A1,A2,A3) +-car(Ai,X)Acdr(Ai,Li) A "f"(N,f(N)) A =(A2,f(N)) A listcnstr(A2,[A2]) append([A2],L2, A3) A s(L-i,N,L2).
(5)
t(A1,A2)^=(A1,\\)A
(6)
t(AuA2)
A = (A2,0) A =
(A3,\\). A
= (A2,\}).
*- car(Ai,H) Acdr{Ai,Li) A "g"(H,g(H)) A listcnstr(g(H),[g(H)}) append([g(H)],L2, A2) A t(£i,L 2 )-
A
The only applicable simplification towards the LSSGR protocol example is the deletion of the redundant predicate by Rule-23:
72
Rewriting and Data Dependency, ... party-line J,ype(A\, A2) <— A2 =four-party-fullysel
A
four-party-fullyseLcall(Ai).
The final Horn-clause logic-based specification for LSSGR protocol is:
callerJd(Ai) calleeJd(Ai)
«— reverting-call (A\). <— reverting-call(Ai).
reverting-call(Ai) <— two-party-call(A\). callerJd(Ai) 4- two-party-call(Ai). callee-id(Ai) <— two-party-call(A\). partyJineJype(Ai, A2) <- "="(A2,two-party)
A tw 0-party-call (A\).
reverting-call(A\) <— four-partysemiseLiddigit-call(Ai). callerJd(Ai) 4four-partysemiseLiddigit-callfAi). callee-id(A\) <— four-party-semiseLiddigit-call(Ai). party-line J.ype (A\, Ai) <— "="{A2, f our -party semisel -iddigit) A four-partysemiseLiddigit-call(Ai). reverting-call(Ai) <— four-partysemisel-call(Ai). caller-id(Ai) «— four-partysemiseLcall(Ai). callee-id(Ai) «— four-partysemiseLcall(Ai). partyJine-type(A\, A2) <— "="(A2, four -party semisel) A four-party-semiseLcall(Ai). reverting-call(Ai) «— four-party-fullyseLcall(Ax). callerJd(Ai) <— four-party-fullysel-call(Ai). callee-id(Ai) <— four-party-fullysel-call(Ai). party-line-type (A\, A2) <— "="(A 2 , fourjparty-fullysel) A four-party-fullyseLcall(Ai). complete(Ai)
reverting-call(Ai) A apply-busy-tone (A\. caller-id) A wait-for-disconnect(A\). initiate-busy-tone(Ai)
«-
Hybrid Parallel Execution Model four-partysemiseLiddigit-call(Ai) A correctJdentifying-digit(Digit) A applyJ>usyJ,one(A\.caller-id) A wait-for-disconnect(Ai). initiate-busy-tone (A\) •<— jour-partysemisel-iddigit-call(A\) A not(correct-identifying-digit(Digit)) A jailurejresponse(A\.caller-id) A fail. wait-for-disconnect(Ai) -k— reverting-call(Ai) A is-disconnected(A\) A remove J>usysignal(Ai. caller-id). wait-for-disconnect(Ai) <— reverting-call(Ai) A not(is-disconnected(A\)) wait-for-disconnect(Ai).
A
ringing(Ai) <— reverting-call(Ai) A is-off-hook(Ai. caller-id) A ringing(Ai)
A
A
ringing(Ai)
A A
74
Rewriting and Data Dependency, ... delay-for-0.5seconds A ringing (Ai). ring(Ai)
5.2
Data Dependency and Control Flow Analysis
In the previous section the basic approach used is summarized with a special focus paid on the hybrid execution model. In this section the method to do static data dependency analysis is elaborated. A Horn-clause logic-based requirements specification is highly dynamic, a predicate in a logic-based specification may have different meanings based on different environments. Most obviously, many predicates in a logic-based specification can be used in several directions, depending on the mode of the arguments. For example, append([1,2], [3,4],X) appends two lists and returns the result in X, append([l,2],X, [1,2,3,4]) obtains lists which could be appended to [1,2], in order to yield the list [1,2,3,4], and appendQl, 2], [3,4], [1,2,3,4]) tests whether the two lists appended indeed yield the list [1,2,3,4]. As a consequence, terms in a logic-based specification may have multiple meanings. Most prominently, the list notation [H\T] can either denote the construction of a list from an element H and a list T, or it can denote the destruction of an existing list in its head (H) and tail (T) components. Which of the two
meanings indeed is intended by the specification depends on the sequence of clause evaluation. The order of the clause evaluation can be determined within the semantic domain of logic-based specification. The sequence of evaluation may change from input (query) to input (query). Such multi-directionality is the nature of a logic-based specification language and constitutes the main
Hybrid Parallel Execution Model
75
difficulty in parallelizing a logic-based specification. In order to preserve the completeness of a logic-based specification, we have to explicitly locate all possible data transformation paths from it in the parallelizing process. This phenomenon can be characterized as the mode information of parameters and data flow analysis in a logic-based specification [16, 29, 30, 44, 68, 87, 98]. As pointed out before, one of the main features of logic-based specification languages is that the same clause may be used to perform computations in several directions. The arguments of a clause may serve both as sources of input for the computation described by the clause, and as the location of output of that very computation. Of course, no standard procedural programming language can provide such a feature. Therefore, to transform a logic-based specification into its procedural form, we must first eliminate the multi-directionality that may be present in the clauses of the logic-based specification. We need to analyze a specification in order to deduce information with respect to the flow of data between the clauses in a specification, and the data dependency between the clauses of a specification. Using this information we can obtain the execution control sequence for a given specification. Traditionally, data flow and data dependency analysis have been investigated within the framework of logic programming. One approach to deal with the multi-directionality of a logic program has been to provide mode information for each clause (e.g., in Parlog, Concurrent Prolog). Bruynooghe [10] and Smolka [87] have investigated the issue of verifying the consistency of mode declarations for arguments to predicates, as supplied by programmers. Automatic mode inference has been discussed by Mellish [70] and Reddy [80], and more recently by Bruynooghe [11] and Mannila and Ukkonen [68]. The most promising approach to mode inference so far has been presented by Debray in [29, 30]. Debray uses an interpreter to reason upon the four-valued lattice of modes don't know, empty, free, and ground. Its purpose is to enable a compiler to optimize a logic program. Mannila and Ukkonen [68] limit themselves to the simple modes ground and unground. Reddy relies on syntax analysis to assign modes to predicates. The approach presented in this section relies on reasoning over a lattice of mode values, similar to Debray and Warren's. However, we do not rely on a symbolic execution through an interpreter to obtain mode information. Rather, we present an analytical approach using a matrix of mode information which is built up by an interaction of algorithms and the application of transformation rules.
5.2.1
Matrix Representation of Mode Information
In this section, we present a novel approach to mode inference which relies on a matrix representation of clauses to perform data flow and dependency analysis
76
Rewriting and Data Dependency, ...
and to generate the corresponding execution control sequence. We reason over the four-valued lattice of the following mode values: • don't know, both ("?") • input mode ("+") • output mode ("-") • empty (" ") For each clause in the specification, we construct a two-dimensional matrix, with the literals in the clause along the one dimension, and the variables of the clause along the other. This matrix is filled in with mode values by applying algorithms and constraining rules. Initially arguments of literals are marked as don't know ("?"), since no mode information about the parameters in a predicate is available. Arguments that do not occur in a corresponding literal are marked with " ". The next step is to collect and apply all possible known mode constraints of the literals within the rule clause such as those of user-defined predicates and the mode information for all built-in operators used in this rule clause. The mode values of the arguments for all the literals in a clause body follow the normal definition of data flow : input is marked with "+", and output is marked with "-". In other words, an argument which generates data (exports a value for the argument) is said to have output mode and an argument which consumes data (needs a value to bind to) is said to have input mode (data is viewed to flow into the predicate across its "boundary"). But for a predicate in a clause head, the mode convention for its arguments should be the opposite of that in the arguments of predicates in the clause body. This is because a clause head serves dual roles: On the one hand it unifies with the calling (invoking) literal (and thus passes data between the calling literal and the clause). On the other hand, it communicates data to and from the clause body through variables sharing the same name. We can interpret the communication of data to a clause in two steps: First we have a unification between the calling literal and the clause head of the invoked rule clause, then the data is passed between the head of the clause and the literals forming the clause body. The modes for the arguments of a clause head are normally viewed as based on the former role. However, when performing the analysis of a particular clause we treat the clause as a single unit, and therefore consider the modes for the arguments of the clause head from the latter point of view. Figure 5.1 gives an example of the two-staged process of data communication. If a clause Ci with the clause head P(..., Var^,...) is called within the body of a clause P' in C\, then the literal P in C\ may provide data to the clause Ci via unification of the argument Var^ of P in C2 against the parameter
77
Hybrid Parallel Execution Model
Boundary of clause CI FY-- . . ; * -
P(
•
. .,Argk, ... ), -/Output to clause head
J +
i
™
Boundary of clause C2
i Input to clause head
P(.. .,Vark, . . . ) ^ Output from clause head
r +\ Input to Vark Q(. . ., Vark, ... )t
Figure 5.1: An illustration of data transformation by a clause head literal
78
Rewriting and Data Dependency, ...
Argk of the call to P within the clause C\. In that case, the mode of Argk of P in C\ is output, whereas Vark in C 2 has mode input, for the balance of flow to be preserved. However, in addition to receiving data from Argk in C±, the variable Vark of P in the clause C2 also provides data to the predicates in the body of C2 that share this variable. Since mode analysis is performed within the boundaries of a clause, the mode conventions for the clause head have to match that for the clause body. A "+" ("-") in the clause head therefore has opposite meaning in the clause body, and vice versa. The same holds true for data flow being communicated from the head of the clause C2 to the calling literal P in C\.
5.2.2
Data How and Dependency Analysis Algorithm
After we establish the matrix representation of mode information, we then apply the data dependency and flow analysis algorithm to the mode matrix of each clause. Applications of the algorithm are interleaved with applications of the constraining rules. Through this process, we will gradually build up the mode information for the clause under consideration. In order to handle multiple mode combinations more conveniently, we also introduce a new mode status "B", which stands for "both." It has a similar meaning as "?". The difference between "?" and "B" concerns the stage of analysis. "?" represents that we have no idea about the mode of an argument; "B" implies that we have already explored the argument and that both input and output modes are possible for this argument. If we restrict our attention to an argument position in a literal, then we can refer to the literals as "producer" and "consumer." If a literal consumes the value for a variable generated by other literals, we call this literal as "consumer." If a literal produces (binds) the value for an ungrounded variable, we call this literal a "producer." The following algorithm is based on two fundamental assumptions: Unique-Producer Assumption: Any argument, other than a constant, with the mode of "+" will demand exactly one producer for the value to be consumed by it. Existence-of-Consumers Assumption: A producer of a value (with the exception of dead-end values, which are generated but not used by any other literal in the clause) has at least one consumer. Prom the initial matrix, the algorithm below calculates the mode matrix for a given clause. The matrix initially contains only " " and "?" information, with the exception of built-in predicates , and predicates for which information is already available from previous runs of the algorithm on other clauses. The
Hybrid Parallel Execution Model
79
algorithm is exhibited in two parts, the initialization of the matrix, and the dynamic analysis. In the following, let Lit be a sequence formed from the multi-set of the literals mentioned in a clause (note that a literal could occur more than once in the sequence), and let Arg be a sequence constructed from the set of arguments in the clause. Further, let Arg(l) be the subsequence of those elements in Arg, that occur in the literal with predicate symbol I (i.e., Arg(l) — {v e Arg\MMitV ^ " "} where MMiyV denotes the mode status of the variable v in the literal 1 in the matrix). In the algorithm, 3! represents "there exists only one". The data dependency and flow analysis algorithm deduces mode information by reasoning from the modes already known from the first part (1 — 16) to generate the remaining unknown modes in the matrix (17 — 29). Algorithm 4 (Data Dependency and Flow Analysis) 1 types MM: Lit x Arg -> {"+", "-", "?", "B", " " } ; Arg: set of argument (empty) 2 for each literal I € Lit 3 push I onto todo-stack 4 Arg — Arg + Argument(Z); 5 if 3! v € Arg(l) • v is not a constant 6 then MMliV <- "-" 7 for each argument v 6 Arg 8 if v is a constant 9 then for each/ e Lit A MMt
While todo-stack is not empty I <— pop todo-stack for each v £ Arg(l) if MMitV ="+" A v is not a constant then if VZ' € Lit • MMV,V = " + " then fail if 3! / ' € Lit- MMV
80 28 29
Rewriting and Data Dependency, ... elseMMi,„^-"B" call rule-based flow analysis
The algorithm examines the mode matrix in two directions: In the horizontal direction, the analysis is mainly based on the mode constraints existing within the arguments of a single literals, which may include built-in or user defined predicates . In the vertical direction, the analysis is based on the above mentioned two assumptions. Line 1 initializes the data dependency and control flow matrix. "+" and "-" modes at this step are generated from user provided mode constraints and from decomposition in the rewriting process. This line also creates a set which holds all the arguments within the clause and initializes it to empty. Lines 2 to 4 push each literal in the clause onto a stack, todo-stack, which contains all remaining literals to be analyzed, at the same time these lines pick each argument from literal / and save it in Arg. Lines 5 and 6 set a variable's mode to "-" if it occurs only once in its column. Since in this case it cannot have the "+" mode, it requires another appearance of the same variable to provide a value. Lines 7 to 16 analyze the arguments in Arg one by one to decide their mode if possible, based on the available modes. Among them, lines 8 to 10 set the modes of all occurrence of a constant within its column to "+", since a constant always provides values to others. Lines 11 to 16 handle the case of a predicate. If all the predicate's arguments depends on constant, then the mode of the predicate itself is set to "+" (Line 14), otherwise its mode should be "B" (Line 16). Lines 17 to 29 apply mode inference operations and constraints to find new modes from the modes already in the matrix. It does this by checking each literal inside todo-stack. Note that / represents the current literal being checked (Line 18). Lines 20 to 22 force a failure if an argument other than a constant has all its occurrences mode "+"; the intuitive idea behind it is that all the occurrences are consumer, and no producer exists. Lines 23 to 24 change the only occurrence of the argument with unsearched mode to "-"(a producer) if all the remaining occurrences of the same argument are consumers. Lines 25 to 28 deal directly with the undecided modes in the matrix. If there is no other cell within the same column which has a undecided mode of "B" or unsearched mode of "?" other than the current cell, then the mode of the current cell should be "-" (Line 27), otherwise there is nothing we can say about its mode (Line 28). Line 29 calls the subroutines denned by Rule-20 through Rule-24. Those rules can also be written as condition statement inside the algorithm similar to the above cases. Because some of them are too long to be put in the algorithm, and some of them deserve some special attention for their importance, instead we write them in the format of rules. Note again the effect of Line 29 is identical to a case by case check from Rule-20 to Rule-24.
Hybrid Parallel Execution Model
81
Beside the operations described in the algorithm, the following rules constitute additional mode constraints and result in adjustment of modes within a single column or row of the mode matrix. Among them, some are used to eliminate mode status "B" from the matrix. R U L E - 2 0 (FlowAnalysis) MMltV = " - " => [MMV,V = " B " V MMV,„="?")
M±V
Let MMV ,„="+" R U L E - 2 1 (FlowAnalysis) MMhv = " B " Let MMtiV = " - " These two rules reduce the number of arguments marked with mode "B" based on the above two assumptions. Further mode information to trim "B" could consist of mode declarations provided by the user, or mode information obtained by queries to the user. An example of the former is given in the following example. E x a m p l e 16 Consider the predicate appendix, Y, Z) 4— ... The mode combination: append("+",
"-", "-")
and appendC-",
"+", "-")
makes no sense when considering the procedural meanings implied by them. By eliminating them from the mode analysis , we can generate more accurate mode status. ** From here we can see that the user should describe explicitly the mode constraints among the arguments of a user-defined predicate in order to ease the analysis process. When the user builds a logic-based specification, he should also provide the mode constraints with each predicate he defines. Alternatively, **This applies to append as a user predicate, rather than append as obtained from the decomposition of a list construct since the latter has a determined mode assignment.
82
Rewriting and Data Dependency, ...
when the operation of the transformed program does not meet the user's requirements, the user can force the selection among the multiple procedural interpretations generated due to several possible mode combinations. Further rules constituting mode constraints are: R U L E - 2 2 (FlowAnalysis) MMUv = "-" => M M | / , „ = " - " M±V fail
According to the unique-producer assumption, any mode information that includes more than one literal marked "-" for the same variable can be discarded. Thus this rule can be used to eliminate choices for modes marked "B", or terminate a particular search branch during flow analysis. R U L E - 2 3 (FlowAnalysis) MMl>v => / is a clause head A I' is a recursively called literal which matches with I A v and v' are the corresponding arguments let MMi>y — opposite MM^V This rule is built to handle the case that a recursive literal in the clause body calls the same clause head. Based on the different sign conventions between predicates in the clause head and body, we have to adjust the sign when this case occurs. R U L E - 2 4 (FlowAnalysis) MMi:V =>• I is a clause head A I' is a recursively called literal which matches with I A v and v' are the corresponding arguments A MMi>y ^ opposite MMitV fail
If the corresponding modes are not opposite, as discussed earlier, then the mode assignments constructed so far are inconsistent, and we have to backtrack and attempt to find a different assignment. In contrast to the data dependency and flow analysis technique presented here, Debray and Warren's approach lacks the power to handle multiple-mode
Hybrid Parallel Execution Model
83
combinations. As a result, the mode pattern of the arguments of a predicate have to be generated by taking the least upper bound of all the possible success patterns generated by different calling patterns. Our approach explicitly considers the multiple-mode possibility of a logic-based specification and is flexible enough to determine all possible mode combinations. For Debray and Warren, a logic program may just mean a single mode transformation process, whereas in our approach it may result in different data dependencies and execution sequences. Therefore, different target programs may be generated. In addition, our approach performs significantly better than Debray and Warren's method. The general procedure of mode analysis in Debray and Warren's approach consists of a computation of the set of success patterns and calling patterns of a predicate, and a computation of the mode of a predicate by taking the least upper bound of all the success patterns. The first one contributes most to the complexity of the algorithm. In order to compute the success pattern, the control of the algorithm must go from left to right, according to the evaluation order assumed, through each predicate of a clause, and then calculate the possible mode changes of all parameters of the predicates in the clause body. This will contribute 0(n2) to the algorithm. But in order to calculate the mode modifications of each predicate in the clause body, the control must recursively transfer to the analysis of success patterns of clauses with the predicates as their head. So the overall complexity is 0 ( n 3 ) . The best case occurs when all the predicates in the clause body are defined by fact clauses. In this case the complexity will reduce to 0(n2). In the worst case (if there is a recursive call to the clause itself), the algorithm may not terminate. Mode analysis in our approach proceeds sequentially through the matrix. If there exists a "B" in the matrix, then some modifications of the entries of a matrix need to be applied. However, because of the close relationship between modes in the column direction, which is represented by the two assumptions, and the row direction, which is represented by the constraints of arguments within a predicate, a change of only a few of the "B"s to either "+" or "-" can completely specify all the modes in a matrix. So the average complexity of our algorithm is 0(n2). The best case occurs when all the parameters of the predicates in a clause body are constants or ground variables, in which case all the predicates become test conditions and the mode analysis can be accomplished in linear time. The worst case arises when all the cells marked with mode "B" are totally independent such that the analysis of each predicate in the clause body must return to modify the matrix of the clause itself. Such backtracking may raise the complexity of the algorithm to 0(nz). E x a m p l e 17 Example 15 shows the state of our example after finishing the canonicalizing transformations of the logic-based specification. We are now ready to enter into the stage of flow analysis. For clause (2), we select one of two possible choices (see Figure 5.2). In
84
Rewriting and Data Dependency, ...
this case, the results in r having mode "+" in clause (1). Prom the mode information of r, we further partially infer the modes of the parameters of s and t. Because in column W there is no information about the mode status, it is marked with "B" in the two entries of the column. Mode status may change due to more specific mode information obtained at a later stage. Since the only entry in column N is not a constant, its mode has to be "-". (This entry is a "dead end", in that the value of this variable is not needed during the computation.). This information can be used to reduce the possible mode combinations for the clauses with the head s. In the following data dependency matrices shown, we mark entries that result from either the results of data flow analysis on other clauses, or from the Alg. 4 by circling them. Ai
Ai
N
W
"B'
->- "B' Figure 5.2: The partial result matrix for clause 2 Figure 5.3 shows the data dependency matrix obtained for clause (4). The selected decomposition for the list constructs of the FRORL specification allows us to eliminate from consideration the situation where the argument A3 of clause (4) has mode "+". This in turn determines completely the mode information for clause (2). Note that the mode information for the literal = is determined by Rule-13. The final matrix for clause (2) is given in Figure 5.4. To analyze clause (6) we obtain the mode of the clause head t from the matrix of clause (2). However, as Figure 5.5 shows, the resulting matrix is not consistent. Rule-23 of the second stage of the data flow analysis detects that the entries for the argument literal Ai of clause (6) conflict in both having mode "-". However, the selection of this particular decomposition for the list construct in clause (6) does not jeopardize mode analysis . When we fail to find a consistent mode assignment, we will backtrack to the decomposition phase of canonicalization and choose an alternate decomposition. ^ As a consequence, we also have to change this clause to t t T h e decomposition of the list may also need to be changed in case the user requests other possible mode combinations, in which case we might have to explore all alternatives that can be generated from using different ways of decomposing lists.
85
Hybrid Parallel Execution Model
s
A,
Ai
Q
©
As
L,
N
fiN)
LI
1-"
0
car
Y
cdr
X
©
©
"f
f"
,1
listconstr
0 0
append
——^,,+"
s
.. »:-
!
Figure 5.3: The final result matrix for clause 4
Ai
Ai
N
W
"+" "+"
-W"+"^
Figure 5.4: The final result matrix for clause 2
[A2]
86
Rewriting and Data Dependency, ... Ai
A2
H
Li
Li
g(H)
[g(H\
0 0 car
I+
cdr
("+"
0 "B"
"B" listconstr
append r
fail by Rule-22
(v)
(jvJ)
(vj) "B"
Q) (vj)
"B"
Figure 5.5: Initial attempt to construct a matrix for clause 6 (7)
t(Ai, A2) 4- car(Ai, H) A cdr {Ai, i i ) A «g»(H,g(H)) A car(A2,g(H)) A cdr(A2,L2)
A t(L!,L 2 ).
Figure 5.6 shows the successful decomposition choice and the obtained mode assignments. In Figure 5.7 and Figure 5.8, we show the final data dependency matrices for clauses (3) and (5), respectively. Note that for clause (3), the meaning of the = predicates vary based on different mode combinations of their arguments. The = predicate in the second row is that of a logical comparison since both of its arguments are grounded; the ones in the third and fourth rows are simply assignment expressions (e.g., A2 will receive a value from the constant "0"). For the example of LSSGR protocol, the data flow and dependency combination is simple. For the possibilities related to the first parameter, it can be either a function call with the parameter (Ai) coming from the outside or either a function call with the same parameter receives the value from the call and transfer to the outside. For the possibilities related to the second parameter {A2), it can be either an assignment clause or a test condition exists within the clause body.
87
Hybrid Parallel Execution Model
Ai
A2
Li
H
L2
g(H)
Q ©
,
© cdr
y
cdr
&-
Figure 5.6: The final result matrix for clause 7
Ai
A2
A3
1 1
"="1
"="2
"="3
Figure 5.7: The final result matrix for clause 3
Rewriting and Data Dependency,
A,
"="1
A2
[ ]
"+"
"="2
Figure 5.8: The final result matrix for clause 5
Chapter 6
Hybrid AND-OR Parallelism Implementation After the static mode analysis is finished, the system then uses the generated mode information to actually fulfill the parallelizing of the original Hornclause logic-based specification.
6.1
The Usage of Mode Information in the Parallel Model
As mentioned above, most current approaches put tremendous overhead on the parallel execution system by doing dynamic analysis of the data flow (also termed demand-driven). Few of them employ static data flow analysis, but none of them solve the problem satisfactorily. The main reason being that the static approach cannot guarantee that it will generate all the implied solutions of a logic program. In our approach, by using the four-valued mode combination and generating all possible mode combinations using bi-directional mode "B" in the matrix for each of the parameters of a predicate, all the solutions of a logic-based specification can be found from the matrix statically. The most significant advantage of this data dependency analysis is that the data flow information is available before any execution of the logic-based specification and any parallelization is attempted. Making use of this early available mode information is the key contribution of the new approach. At each step of the dynamic evaluation of the logic-based specification, when the binding of a variable changes, the static data flow and dependency matrix is consulted to see if there exist some predicates in the logic-based specification clauses (not necessarily clauses from the same level) which can be 89
90
Hybrid AND-OR Parallelism
Implementation
evaluated in parallel. The results generated from this simultaneous evaluation will enable further evaluation of other predicates in those clauses. These predicates can be executed in parallel and can come from different clauses which may reside in different levels in the searching tree. If there are no predicates which can be evaluated in the bottom-up fashion, then the general top-down evaluation process is invoked. The current approaches which employ mode information to solve binding conflicts generate and modify mode information at each step of the logic program. As a result, after each step of evaluation, mode analysis process is re-employed and mode status of arguments is changed. This changed mode information is used to facilitate further parallel evaluation of a logic program. So the parallel evaluation goes in a level by level fashion from a search tree perspective. After each step, mode analysis has to be invoked to find a more specific mode for arguments of a predicate in the lower level. Since the complete mode information is available before any actual evaluation of the logic-based specification in our model, the parallel evaluation can go more than one level in the search tree. So a predicate in a deeper-level of the search tree can be invoked if all its arguments which require value have been bounded by other predicates within the same clause, and this decision is made by checking the static mode information. In this case, the evaluation goes in a bottom-up fashion. That is, the parallel evaluation will explore predicates in a deeper-level in the search tree, collect the value there, and pass to other predicates within the same clause. The decision of when and where this bottom-up evaluation should be applied is solely decided by the static mode information. So the static mode information here serves to find predicates which can be executed in parallel from various levels of the search tree.
6.2
AND-OR Parallel Execution
In order to handle AND-parallelism , a bottom-up approach combined with a top-down evaluation scheme is used. Current approaches always work from the top of an AND-OR tree and search all the way down to the tree. In contrast, our approach can find all the mode combinations before the user's query is available, it is possible that while the execution continues, some information about the predicates which will be executed several steps later in the old topdown model is available in advance. If we carry out those operations first, then more information of mode for unsearched predicates can be gathered in advance and the efficiency of the whole process can be improved. The approach uses the results generated from the static data flow analysis, combined with the user's input, to find out the portion of the AND-OR tree of which the execution behavior is deterministic under the current variable binding. This portion of the AND-OR tree is executed in a bottom-up fashion. Compare this to other
91
Hybrid Parallel Execution Model
approaches, which rely completely on the user's input query to initiate the analysis, their execution is based completely on the dynamic determination of the data flow information. So the analysis is completely dynamic and executes in a top-down manner, which puts a tremendous overhead on the system. An example execution sequence of the new parallel model is illustrated in Figure 6.1. In the graph, the left side branch represents the new bottom-up evaluation, the left side represents the general top-down evaluation sequence. The new execution behavior at the left branch is supported by the static mode information included in data dependency matrices. Query(Goal)
Bottom-up
The hybrid parallel evaluation model Figure 6.1: An illustration of the new parallel execution model The overhead for the new model is the cost of the static mode analysis . The new model applies the static mode analysis to obtain the mode information to facilitate the new model. In the previous section the cost for static mode analysis has been discussed and, it is shown that when a mode can be obtained, the cost for the new approach is better than current approaches in the best and average cases, and is the same in the worst case. So among the approaches applying mode to resolve binding conflicts , our approach is substantially better than the current approaches in obtaining mode information. The new hybrid evaluation model also uses the mode matrices during execution to obtain the actual mode status for arguments of the predicates. This portion of the cost is the same among all the approaches which utilize mode information. The cost to finish a mode check for a predicate is 0(n * m), where n is the average predicates within a clause, m is the number of arguments within a clause. If we assume there is a central process which contains mode matrices for all the
92
Hybrid AND-OR Parallelism
Implementation
specification clauses, then an extra overhead is two communication channels to sent the current binding to and receive the currently executable predicates from the centered process. In the other word, one of these two channels sends the mode status for input query, the other returns generated parallelly executable predicates. These predicates will then be assigned processes and begin evaluation. The new approach shows significant improvement over the current approaches when the corresponding search tree has large height. The algorithm to check for predicates which can be executed in parallel begins by generating grounded variables. Dynamic checking of parallelly executable predicates focuses only on the detection of newly generated grounded variables. In our new model, instead of only dynamic evaluation, all possible variable bindings and execution information have been included in the statically generated data dependency matrix. A very important property of our data dependency matrix is that it has all the possible execution sequences based on the multiple data dependency information. The only role a user's query plays is to initiate the analysis. At each step of the execution, especially whenever a new binding of a variable is generated, this information is used to restrict the multiple data dependency choices existing in the data dependency matrix and the possible execution sequences they imply. This restriction may result in only one possibility for the execution of a clause. So a large part of the analysis of our approach is done statically instead of dynamically. The execution starts from the root of an AND-OR tree , which corresponds to the user's query, then at each step of the unification, a variable binding is generated which will restrict the meaning and execution behavior of the clause. The system locates the branch of the AND-OR tree, which may not be the predicates from a single clause based on the multiple binding, so that the execution behavior of this branch is restricted to a single choice, and then executes this branch in a bottom-up fashion. All the remaining branches which cannot be restricted to a single choice will be executed top-down. This is how multiple solutions of a query are derived. Even in the top-down case, the static data dependency matrix can still help. Since all the solution have been predicted and represented in the data dependency matrix, the place where this multiple choice occurs can be determined from the same matrix. So the generation of multiple choices is still under the control of the system.
6.3
Synchronization in OR-Parallel Execution Model
The problem of preserving the multiple choices generated from the OR-branches can be handled with the help of the same data dependency matrix. The major problem of OR-parallelism is to synchronize the multiple bindings of the
Hybrid Parallel Execution Model
93
same variable generated from different branches. A simple synchronization mechanism must be built according to the mode information included in data dependency matrix, considering that all the binding information is supported by these matrices. Based on the analysis of the AND-OR parallelism , there are two execution models, top-down and bottom-up . The realization of OR-parallelism is also closely related to these two models. During execution, if we find that the parallel execution of the logic program is in the top-down fashion similar to the current approaches, then the OR-parallel synchronization structure is identical to the current approaches using any of the currently available data structures and methods. If the execution proceeds in a bottom-up fashion, then the binding data structure will also be generated in a bottom-up fashion, which will simplify the data structures proposed for synchronization to only one level in the search tree. Compare with other approaches, which have to keep a binding data structure for all the processes existing at the same time, this is a major improvement in our approach. Also the creation of all these binding data structures can be delayed until all the variable bindings have been generated from all branches of an OR-parallel point. The current approaches have to generate a synchronization data structure upon the first entry of a OR-parallel branch point, and continue changing the synchronization data structure to reflect changes in the binding status of variables when the execution in any of its OR-parallel branches changes. Our approach reduces the life cycle of the synchronization data structures by waiting until they are actually needed to generate the complete binding combinations from OR-branches. Also the complexity of the OR-parallel binding data structure is sharply reduced because we use a one level control structure. In contrast, current approaches have to use a synchronization hierarchy when dealing with the OR-parallel synchronization problem. Current approaches use various data structures to synchronize the multiple bindings generated from different subtrees of an OR-branch [13, 14, 17, 41, 42, 64]. Following common characters are shared among almost all of them: • Binding data structures defined at each OR-branch point exist when the OR-parallel point is first entered and exists throughout the whole parallel execution process until all the branches of the OR-parallel point have finished execution; • Binding data structures are dynamically modified along with the execution at each branch of the OR-parallel point. This modification may come from the predicates very deep in the OR-parallel branches when that branch finishes execution and generates corresponding binding. The first point forces a system to bear large overhead by maintaining all
94
Hybrid AND-OR Parallelism
Implementation
those complex synchronization data structures. To make things worse, some of the data structures may not be necessary for the current OR-parallel execution if some processes are not interested in the current variables at the current OR-branch point [102]. The second point also makes the control of the synchronization data structures complex. Specifically, the cost for the creation and accessing variable bindings become prohibitively high. In order to cope with this problem, some of the current approaches consider a AND-OR search tree as a hierarchical structure and use the inheritance mechanism to reduce the complexity of maintaining the synchronization data structures [41, 92]. Based on the two points elaborated above, the synchronization data structure can be sharply simpler than the conventional data structures used now. The simplicity comes from that (1) the duration of synchronization data structure is shorter comparing with the conventional approach, it is not needed until all the solution has been generated and gathered from the lower level, (2) there is no need to maintain more than one level of the synchronization data structure. Because of the hybrid approach we adopt here, the control of the synchronization is much simpler than the conventional approach while achieve the same result. The simplification of the synchronization data structure comes from the control structure of the AND-OR parallelism . The choice of a specific synchronization data structure will not affect the improvement the new model can derive. Currently a lot of synchronization data structures have been proposed and studied, all of them consider control structure for the synchronization data structure and operations which can apply on them. Because of the simplicity of the new approach, all the current data structures, like array, harsh table, version vector, etc., can all be used, only with sharp simplicity. In here the detail data structure will not be discussed, instead, some principles of the design will be listed here. • Synchronization data structure is used to hold the results generated from the lower level, it contain the binding of the variables with the predicate which initiates the OR-parallel branch. • The result in synchronization data structure is necessary to be retrieved to take the cross product to form the complete result for the specification clause. • No special control mechanism has to be developed to handle the multilevel synchronization problem. The control for the synchronization data structure in the new approach only needs to deal with the adjacent levels to generate answers.
Hybrid Parallel Execution Model
6.4
95
Calculation of the Currently Executable Predicate Set
In order to find predicates which can be evaluated in parallel, a search of the data dependency matrices is necessary to find all predicates which can be evaluated concurrently. The predicates are calculated dynamically according to the binding status of the clause and the static data flow and dependency matrix. All these currently evaluable predicates are gathered into a data structure which has enough space to hold the syntax structure of any of the clauses in the logic-based specification. This data structure is called E X E C , which represents all the predicates at the current stage whose parameters' requirements for value in order for them to be evaluable have been satisfied by the other predicates within the respective clauses. This set will include the parallelly executable predicates at any time in the process. The formal definition of E X E C is followed. Definition 15 E X E C is a set which contains the predicates from the logic-based specification clauses which can be executed in parallel. A predicate is in E X E C if all of the input variables of the predicate have been bounded by other predicates within the same clause. Here we assume that EXEC is a data structure defined to be able hold the syntax structure of predicates appropriately. The predicates within this set can come at any time during the process from clauses of different levels in the search tree, based on the binding status of the parameters within the predicate. The binding generated from the execution of this set will enable both the calculation of the new E X E C set and the general top-down evaluation of the logic-based specification, according to the binding status of the other predicates in the clause. If the predicates form more than one level are found to be executable, then they will be included in the new E X E C set, otherwise, the general top-down evaluation is invoked. So the proper calculation and application of this set become the center point of the whole approach. Fig.6.2 provides a simple comparison of the EXEC set under the conventional and new parallel execution models. In the following, an example is used to illustrate how this mechanism works. The example is based on the data flow and dependency matrices generated in Example 2, where generating the mode information is discussed. Example 18 Let's consider the case where the specification is called with a mode combination of r(bound,unbound), i.e., r is called with the first parameter being bounded to a list and the second parameter being unbounded. The currently available approaches will first check the predicates in the body of the
96
Hybrid AND-OR Parallelism
Implementation
Query(Goal)
The hybrid parallel evaluation model Figure 6.2: A comparison of execution sequence between current and new models
clause (1), which corresponds to the AND-parallelism at the first level. In this case, there is no possible parallel execution, so a process corresponds to predicate s will be created and the OR-parallelism of the predicate s is explored, and so on by checking in a top-down fashion. Instead, in our approach, after the user's query is available, the input mode combination of r(—, +) is checked with the data dependency matrices to find the possible execution set E X E C . From the matrices for clause (1), we can see that the list provided from the clause head predicate r is consumed by the first parameter of the predicate s. Since the predicate is a user defined predicate, the search goes into the clauses unifiable with the predicate s which define an OR-branch. Based on the algorithm described below, all the predicates in the clause (2) can be executed simultaneously since for all of these predicates, all their parameters with the input mode have been satisfied. Thus after examining the data dependency matrix for clause (2), E X E C = { = i , = 2 , = 3 } - Meanwhile the search for executable predicates is also carried on at clause (3). In this case, the two predicates car, and cdr are detected to be also executable. Finally E X E C becomes { = i , =2, =3, car, cdr}. In this example, instead of only the predicate s in the clause (1) being executed, the five predicates in E X E C can be executed in parallel. After the execution of the predicate s in clause (1) is completed, the bounded value in Xi by the execution of predicate s will be passed to the predicate t in the same clause, and the same process will go on for t. The E X E C set corresponds to the first step of the execution of predicate t will be {=1, =2, car, cdr}, which comes from clause (4) and clause (5) respectively. Please notice that within a clause, the executional behavior of our approach is similar to Chen and Wu's model [17]. A predicate within a clause has to have
Hybrid Parallel Execution Model
97
all its parameters with input mode be satisfied with some producers in order for it to be executable. So after the parallel execution of the above E X E C set being carried out, the next EXEC set of {s} will be executed. The same process goes on until the parameter Bi and B3 in clause (3) receive the value from the execution of the clause. As this example shows, the execution of the new model is to rely on the statically built data dependency matrix to spawn as many parallelly executable processes as early as possible along the dynamic evaluation of the logic-based specification. In the following, an algorithm is introduced, which calculates the data structure E X E C set at each step whenever a new binding is introduced in the execution of the logic-based specification. The algorithm uses the E X E C to hold the predicates which are found to be parallelly executable by the algorithm. In the algorithm, E X E C is an external variable, similar to that in language C. The reason is that in any invocation of the algorithm, predicates which may be included in EXEC may come from various clauses in the logicbased specification, and the algorithm has to recursively call itself for each of the clauses involved in the calculation. In order to preserve the predicates in the E X E C set in the process of a multiple clause call, and form the complete image after the calling is finished, the data structure must be external. The same thing happens with the data structure CLAUSE, which holds the clauses which contributes to the current E X E C set. Other predicates in the clause in this set will be searched for further evaluation when the current parallel evaluation of the EXEC is over. Expanded is a two dimensional data structure. It is built up for each of the clauses in the logic-based specification, which forms the first dimension. The second dimension is similar to the EXEC, which holds the predicates in each clause which has been expanded or searched. It is similar to the dead set in the conventional DFS or BFS [66] algorithms. The only difference is that we have to consider the case when the clause is recursively called, the predicate may be moved from the expanded set, which means that the predicate has to be reevaluated under the new binding status in the new round of evaluation. Expanded^ is a duplicate of expanded except that it is defined locally. It is used to hold the partial results during the execution. In the algorithm, we assume there are iV clauses in the logic-based specification, and the specification is called with the predicate Predi, with arity of a, in the clause body of clausej. Further assume that clause Clausej has p predicates within its body, marked from 1 to P. Algorithm 5 Calculate_Current_EXEC(Predi(parai,... ,paraa), External: EXEC; External: Clause;
Clausej)
98
Hybrid AND-OR Parallelism
Implementation
Static: expanded[l..N]; expandedi [l..iV]; for each I, (l
6.5
Hybrid Execution Algorithm
In this subsection, another algorithm is introduced, which explores the hybrid AND-OR parallelism of a logic-based specification. This algorithm is initially called when the user's query is available. The user's query predicate and the clauses unifiable with the predicate consist of the initial calls to the previous algorithm to calculate the first EXEC set. After the set is constructed, all predicates in this set are evaluated in parallel by creating a process for each of them. Then the same exploration for the EXEC continues for the clauses
Hybrid Parallel Execution Model
99
which have contributed to the previous set and the parallel evaluation process proceeds for another EXEC set generated. In this recursive process, if the look-ahead process goes only one level deep, then we know that the new scheme does not contribute to the current evaluation of the clause, and the conventional top-down evaluation process is adopted. If the evaluation of a clause terminated, then the results generated from the evaluation of the clause is gathered to form a complete answer set and passed to the upper level in the search tree. This part is discussed further in the next section when the data structure for the OR-parallel synchronization is discussed. When there is no clause in the current CLAUSE set, which contributes to the calculation of the previous EXEC set, then the process terminated. Suppose the query is a predicate Pred with an arity of m. Both sets CLAUSE and CLAUSEi hold the clauses which contribute to the calculation of the previous E X E C set. Since CLAUSE will change in the process of the calculation for the next EXEC set, a duplicate CLAUSE data structure is created to keep the set consistent in the calculation process of the next E X E C set. Algorithm 6 Parallel_Evaluation((Pred(parai,... ,param)) E X E C = 0; for each clause Clausei, s.t. Head(CZausei) = = Pred Calculate_Current_EXEC(Pre
100
6.6
Hybrid AND-OR Parallelism
Implementation
Comparison with the Conventional BFS and DFS
As mentioned above, the EXEC set contains the currently executable predicates from the AND-OR search tree. This set is similar to the A C T I V E set in the conventional breadth-first search and depth-first search, which contains the nodes in the search tree having been expanded but waiting for evaluation. The difference of our approach comparing to the DFS and BFS from the viewpoint of the heuristic search is that all these nodes in the A C T I V E set will be evaluated in parallel, and the resulting node will also be calculated simultaneously for the new A C T I V E set. A lot of research has been done regarding the development of better heuristic to guide the sequential and parallel search of the expansion tree in the artificial intelligence community, such as A* [72], AO* [69], SSS* [89], B* [6], and alpha-beta [51]. All these searching techniques can be considered as special cases of a border technique called Branch-and-Bound algorithm [61]. Research also has been carried on how to improve the execution efficiency and how to find better heuristics for new searching algorithm [18, 101, 106]. Among them, [63] discussed the heuristic parallel combinatorial OR-tree searches under the approximations and anomalies. In the paper, the authors analyzed the searching efficiency affected by the relationship among the number of nodes expanded in a parallel search, the number of processors used, and the complexity of the problem to be solved. The result obtained there can raise some interesting questions regarding the development of the system discussed here. The relation between the general parallel heuristic search and the new method proposed here can be summarized in the following two points. • The calculation of the active set. Both these methods calculate the current node which can be evaluated in the next step. In the heuristic search domain, this decision decides which of the breadth first search, depth first search, or other heuristic like IDA* [52, 100], will be used in the action. In the new approach. The searching is in the logic domain and the selection of the EXEC set is based on the binding status of the parameters of the predicates. • The relation among number of nodes expanded and the processors available. The affect observed in [63] is also valid in the new system, especially when we also want to consider the various effect on the behavior of the system resulting from the variable number of processors, reduced number of processes created during the evaluation, compressed data structure used in combining the results generated in the parallel search, and the optimal evaluation sequence of the predicates in the searching tree. These points will be further discussed in the following sections when the hybrid evaluation model of the AND-OR parallel evaluation is discussed.
Hybrid Parallel Execution Model
6.7
101
Advantages of the New Approach
As a summary, advantages of the new approach are listed and the corresponding examples are given in this subsection. Three possible usages of this new model to the analysis and application of the parallelizing of a Horn-clause logic-based specification are: • Reduced number of total processes: During the AND-parallel execution, since we don't need to generate all the intermediate processes along the way from the initial calling predicate to the predicates currently executable, if all those intermediate nodes do not contribute to the binding of the related variables, we can save the generation of these intermediate processes, and alleviate the overhead of the parallelizing scheme. During the OR-parallel execution, the creation of the binding structure can also be delayed until the variable bindings actually be generated form the branches of the OR-parallel points; • Local communication channel: At each step of the execution, the processes exist at the same time are divided into several clusters by the clauses. These clauses have predicates currently executable or been active because of the static analysis. As a result, the total idle time for communication channels needed for the parallel processes are substantially smaller than the current approaches, where it is possible for a communication channel to be idle for a long time before a message is collected from one branch and been sent to the deeper-level of another branch; • Early discovery of failure and early short-cut: By consulting the static data dependency matrices before the actual exploration begins, the search can go through several clauses and gather the currently evaluable predicates and evaluate them in parallel. Considering that the evaluation of a logic-based specification is equal to a search of a tree, this evaluation sequence means that the search can go deeper into the searching tree and evaluate the predicate there. This is similar to the best first search when the active set may contain nodes from all levels of a search tree. The evaluation of predicates in a deep-level of the search tree may result in a failure. In this case, if most of the remaining predicates within the clause have not been evaluated yet, the failure of the predicate will terminate the search of the clause without searching all those predicates, as most of the current top-down approach will do, thus save a lot of memory space and processor time. This process may go on along with the evaluation process, as a consequence, the time and space saved under the possibility of short-cut will be maximized. Since at each step the search goes deeper into the AND-OR tree of a logic program and the evaluation is applied
102
Hybrid AND-OR Parallelism
Implementation
to the clauses which have predicates currently executable, it is possible to apply classical short-cut algorithm here earlier than other approaches and reduce the space of the AND-OR tree necessarily to be searched to generate the complete multiple solutions. • Multiple solutions generation and backtracking elimination: The new execution approach deals the AND-OR parallelism of a logic-based specification, i.e., it tries a new hybrid executional model to improve the efficiency of the system. Because all the execution possibilities and all possible binding status of variables in a logic-based specification can be found out from the data dependency matrix, it is possible to totally eliminate backtracking from the model. The basic idea is to explore all possible AND- and OR- branches in parallel, thus all possible paths are searched and backtracking is eliminated from the scheme. The most significant property of our method is that it can preserve the multiple solutions a logic-based specification can generated even before any actual execution. Since all the paths of the AND-OR execution tree and all the possible solutions have been considered, there is no need for the backtracking in this model. The main mechanism to support this property is to define a merging algorithm to find all the combinations of the answers generated from the OR-parallel sub-branches. A survey of the current available different merging algorithm can be found in [102]. In which he mentioned different ways existed to coordinate the solutions generated from different branches. Compared with the approaches mentioned in the survey. The mechanism proposed here doesn't need all those complex synchronization structures to arrange the sequence of the answers generated from the subtrees, because in our approach, not only there exists an OR-parallelism , but also that the system knows how many solutions will be generated from each of the subbranches. The only overhead we have here is a tag which distinguishes solutions generated only one level deep of one OR-parallel sub-branch from others. In our approach, the multiple solution is represented by the mode "B", which stands for "mode can be both", in the data flow and dependency matrix. As the user's query is available, the possible mode combination will be reduced. But the remaining "B" mode which cannot be eliminated from the matrix by the user's input query represents the multiple
solution which could be generated from the corresponding sub-branch. The substitution of the "B" mode to either mode "+" or "-" will provide the multiple solution implied by the logic-based specification [98]. • Delayed synchronization data structure creation: Since the bottom-
Hybrid Parallel Execution Model
103
up execution will first go deeper into the AND-OR tree hierarchy, a process at an OR-parallel merging point will not be created until all the variable bindings are actually generated from the evaluation of the OR-branches below the OR-parallel point. This will eliminate all those overhead exist in current approaches in maintaining the synchronization data structures throughout the whole process in evaluating branches below the current OR-parallel point. Note that this case only considers the behavior under the new paradigm. If the evaluation has to go top-down fashion, then it is still necessary to create the corresponding synchronization data structures. This advantage can be considered as a byproduct of the delayed creation of process in the bottom-up evaluation. • One-layered synchronization data structure: According to the bottom-up execution model, all the bindings generated from different branches of an OR-parallel predicate are already available before the merging algorithm at the same OR-branch point is called. So there is no need to maintain synchronization data structure for more than one level below the OR-parallel branch point. This will sharply reduce the complexity of creation and maintaining all the synchronization data structures most the current approaches share without any limitation on the level of information a binding list needs to maintain. Similarly, if the execution cannot detect the possible bottom-up evaluation segments, then the top-down evaluation of the OR-branches which is similar to the current approaches has to be applied. So, the handling of the OR-parallelism also has the taste of a hybrid model. In the following, examples are provided to support the above declarations. The examples are shown with tree structures, but it should be clear that they represent the AND-OR search tree during a specific execution. Example 19 Consider the simple AND-OR tree of the execution of a logicbased specification illustrated in Figure 6.3. In this example, the new approach goes deeper into the search tree by one search instead of goes one step by one step to reach the leaf of the tree where the value is bounded as the current approaches do. By adopting the new ANDparallel execution model, the value can be transferred directly to the place where it is requested. Also the values are returned without the interference of the intermediate nodes. Example 20 Consider another AND-OR tree of the execution of a logic-based specification shown in Figure 6.4. The communication of the new model which is shown at the top is restricted to the local branches, while it is possible to have communication with a long path in the old model shown at the bottom. Example 21 Consider another AND-OR tree of the execution of a logicbased specification shown in Figure 6.5. Because of the failure of one of the
104
Hybrid AND-OR Parallelism
J
2 The new approach
J
Implementation
5
The current Approaches
Figure 6.3: A simple comparison between current and the new parallel evaluation model
Figure 6.4: An example illustrating reduced duration of communication channel
Hybrid Parallel Execution Model
105
AND-branches, the search for all the remain branches can be eliminated from consideration. For the same AND-branch there may be more than one binding possibility, the special problem here is to distinguish which binding is useless and which remains useful. Current approaches lack the power of distinguishing the actual binding status, thus very few of them actually discusses this problem. With the help of the static mode information, it is believed that some useful result can be generated here.
Figure 6.5: An illustration of early-shortcut
6.8
Analysis of Non-functional Requirements in the New Parallel Execution Model
The parallel execution model is used to analyze the non-functional requirements of a FRORL specification. Any un-match can be fed back to the user to adjust with the functional or non-functional specifications. After data dependency and parallel evaluation, the executional behavior of the specification can be evaluated based on the non-functional requirements specification. The evaluation of non-functional requirements will be carried out during the new parallel execution. Results regarding the execution behavior and the functionality the system performs can be gathered and evaluated by the parallel evaluation system by referring to the non-functional requirements specification. This non-functional reference is realized by the objects which act as parts of the
106
Hybrid AND-OR Parallelism
Implementation
activity frame consisting the parallel evaluation. The result is then reported to the user to further adjust non-functional requirements specification. Because of the improvement results from the parallel evaluation model, comparison and its results can be gathered much faster than general parallel evaluation approaches. And the adjustment of non-functional requirements specification can be implemented with higher efficiency. The following is a simple example showing how the new parallel evaluation model refers to the non-functional specification during the execution process. AND-OR puiM enhatko Irce
Nofrfmctkaal rtqiirtiKst spaifKitioo
Figure 6.6: Parallel evaluation and its reference to non-functional specification In Figure 6.6, each node in the AND-OR tree represents an action defined by activity frame. The reference to non-functional requirements is realized by the objects as parts in the corresponding activity frame.
Chapter 7
Efficiency Considerations and Experimental Results In this chapter, the techniques used to improve the efficiency are discussed, which include the backtracking elimination and the early discovery of failure and the early short-cut. These techniques are resulted from the hybrid evaluation model. Another result regarding the better efficiency is the improvement in the intercommunication among processors. By evaluating the predicates in a bottom-up fashion, the communication among processors can be restricted within a clause, and the total costs thus being reduced. Finally, the whole system is simulated and evaluated with various examples.
7.1
Execution Evaluation
The execution evaluation is represented by TotaLNodesSearched, TotaLEvaluationStep, TotaLCPU-Round, and Degree-of-Parallelism. They represent the total nodes searched in the tree to finish the evaluation, the total steps needed to finish a evaluation, the total accumulated CPU time to finish the same evaluation, and the average number of processes exist at each step of the evaluation, respectively. A detailed discussion for these criteria is available in the subsequent subsections.
7.2
Communication Evaluation
A parallel execution of a logic-based specification always has to handle communication among processors. This subsection discusses the reduction in communication costs the new model may have on the parallel execution of a logic-based 107
108
Efficiency Considerations and Experimental
Results
specification. As mentioned earlier, the parallel execution of a logic-based specification can go into the search tree before evaluating all the remaining predicates of the clause which the calling predicate resides in. The main advantage of this evaluation sequence is that the clauses are assessed locally without interference with other clauses which reside in the same or higher levels in the search tree. This evaluation sequence will force most of the communication channels between processes to exist within the processes which represent the predicates of a same clause at a deeper level of the search tree. As a result, the communication channels at a lower level have to be created only when all the values have been generated from a deeper level of the search tree. This tends to reduce the duration of the communication channels between processes at the lower level and thus reduce the risk that the lower level communication channel will become a communication bottleneck [17]. In the current approaches, a pair of communication channels from a node near the root have to be created if there are two AND-parallel branches initiating from the node and these two branches have data dependencies between them. These two channels have to remain existing through out the whole process of the search for the two AND-parallel branch. Only after the value is generated from one branch and transferred to the other branch, these two channels may be deleted. The problem of idle channel and processes, and communication bottleneck is thus created. The reason of this phenomenon is that when these two deeper processes are evaluated, all the processes on the way to these two processes have all been generated and evaluated to a certain degree. The communication channels have to be created for all these intermediate processes. Instead, in the new approach, by adopting a bottom-up evaluation model and restricting the evaluation to clauses nto the search tree, the communication channel is restricted to be within the same clause bound and is created when it is actually needed, that is, when the value has been generated from deeper nodes and transferred to the lower nodes or other branches. The efficiency improvement in communication is further evaluated in the next subsection when the system is simulated with examples.
7.3
Criteria for Simulation and Evaluation
In this section, examples are used to demonstrate the performance improvements of the new parallel execution model proposed in this paper. The simulation and evaluation adopt some novel criteria. These criteria are developed to reflect the contribution of the new approach while staying in line with the common interests in this area. Criteria for doing simulation and evaluation of the parallel execution of
Hybrid Parallel Execution Model
109
logic program can be summarized as follows: • Total Execution Time : The time required to finish the execution of a logic program. This includes the time to run a program on a CPU, the time to coordinate the execution of the processes created during execution. The coordinate time is purely overhead [18, 62, 65, 85]; • CPU Time: The CPU time required to execute the logic program [15, 60]; • Degree of Parallelism: The number of processes which exists at a given point in time during the execution [15, 19]; • Number of Processes: The total number of processes which are created during the execution process [1, 7, 62]; • Total Nodes Searched: The total number of nodes searched [106]; • Total Memory Usage: The total amount of memory needed, which can be evaluated by the number of pages used by the evaluation model [18], or the total memory allocated [39]; • Average Speed Up: The average speed up of the parallel execution model [63, 85, 22]; • Total number of instructions executed : The total number instructions executed by a search algorithm. The instructions are defined by the simulator used [15, 60]. In this book, some other criteria are developed to compare the model with the current parallel execution models. I believe these criteria can best demonstrate the advantage of the new model. First, some assumptions and conditions under which the criteria hold are presented. Later, these criteria are discussed in detail. Synchronization Overhead: The synchronization overhead is primarily due to the communication costs, so the analysis will focus on it. Because of the distributed static data dependency analysis algorithm, the overhead of deciding the parallelly executable set becomes a problem of deciding the extra communication cost while realizing the static data dependency analysis. This is because that when enough information regarding the mode combination is available within a clause, the corresponding process can decide the correct currently executable set locally, and thus put no extra overhead in the execution model. The effect of this is to focus on the communication overhead for both the execution cost and the parallelizing execution cost. Other minor overhead, like the OR-parallel
110
Efficiency Considerations and Experimental
Results
synchronization also contributes to the new model. Considering the fact that the new bottom-up execution model can greatly reduce the cost for developing and maintaining the OR-parallel execution data structure, the effect of this kind of overhead is not as significant under the new execution model. Intra-Process Communication and Inter-Processes Communication: The communication cost within a process can be considered to be zero, while the communication among processes contributes most of the parallel communication overhead. The communication within a clause is necessary for the data transferred within predicates of the same clause which share the same parameters. Communication among clauses can consists of either the parameters shared by the calling predicates and clause heads being called, or the mode information sent to decide the correct execution sequences among the predicates within a clause and among clauses. In the evaluation, we assume the communication cost is represented by the duration of the channels between processes. The time segment is the same as the execution time of the process, that is, the duration increments by one step whenever the corresponding process extends their execution time by one step. Execution Time of the Process: Since the analysis of the overhead of the new model and the comparison of this model with other methods mainly focuses on the costs of the processes created and the communication, in order to make the comparison more convenient, we assume that the execution with any process will take a unit time segment to finish, if all of its input parameters are satisfied by the values provided from other predicates within the same clause. This is consistent with the assumption that the overhead of parallel execution mainly comes from the management of parallelly executable predicates and the communication overheads involved in the parallel execution. In the following, the criteria for the new execution model are discussed along with the examples which demonstrate the meaning of these criteria. 1. Total Evaluation Steps: This is the total number of steps needed to finish a evaluation of a logic-based specification. It is closely related to the degree of parallelism in the parallel execution model. Since the degree of parallelism can be increased in the new hybrid evaluation mode, it is quite possible that the total steps needed will also be reduced. 2. Total N o d e s Searched: This is the total nodes of the search tree searched by different search models. By applying short-cut, some of the
Hybrid Parallel Execution Model
111
nodes in the search do not need to be searched in order for a complete answer set to be generated. The reduction in the total number of nodes searched varies among different approaches. 3. Total C P U Time: This is the sum of the execution time for each of the processes created within the parallel execution of a logic-based specification. Because of the delayed creation of processes in the new model, the sum of the duration or execution time of all processes is likely to be shorter than that of the general models. 4. Total Communication Channel Cost: Because of the shorter duration of the communication channels for the new model, the total communication channel cost of the new model will be less than current approaches. The communication channel cost for a parallel execution model comes from shared parameters among predicates and parallel execution management costs. The parallel execution management cost in the new model is caused by the message passing needed to calculate the currently executable sets. For the old models, the same amount of communication cost comes from the implementation of the producer-consumer restriction, which corresponds to the same executable set decision costs caused by mode restriction in the new model. In a summary, the reduced communication channel cost results form the fact that a channel is created only when it is actually needed, while in the current model a channel is created when the parallel branch point is first entered and may be idle for a long time. 5. Weighted Communication Channel Cost: If we consider that the number of levels of nodes is increased from root to leaf in an evaluation tree, then the lower a node's level is, the more likely it is that the node will become a communication bottleneck [17]. So it is advantageous to reduce the duration time of a lower level process. The higher level processes become more localized compared with the lower level predicates within the same evaluation tree. Also the cost to maintain the higher level predicates is much easier than those of the lower level predicates. In this book, it is assumed that the weighted communication channel cost for a channel is the height of the complete search tree minus the average height between the two adjacent nodes involved in the communication channel. Based on this, if the height of the tree is H, and the calling and the called predicates reside in level h, and h + 1 respectively, then the weighted communication channel cost for these two predicates is (H — (h + (h + 1)) / 2), and the total weighted communication channel cost is the sum of the weighted communication channel cost for the channel which exists between each pair of the node within the search tree. In the
112
Efficiency Considerations and Experimental
Results
simulator, we use the maximum possible height of a tree to simulate the actual height of a specific evaluation tree. The larger the value, the more likely the channel is at a lower level, and thus prone to bottleneck. 6. Ratio of Deeper Channel: This value is defined to be the weighted communication channel cost divided by total communication channel cost, which ranges from 0.5 to positive infinity. It reflects the degree of local communication. The smaller the value, the more communications involves nodes in the deeper branch, which will put relatively little communication overheads on the system. 7. Degree of Parallelism: Same as above, this is defined to be the number of processes parallelly executable at any time during the evaluation process. In the new execution model, there are two directions to generate parallelly executable processes. As a result, it is more likely there will be more predicates which can be evaluated at the same time compared with the general models. From another perspective, the more predicates which can be evaluated in parallel, the less time it takes to evaluate the whole logic-based specification, which is related to the reduction in total evaluation time. If there are n steps to finish a parallel evaluation, and for each step i, there are pi parallelly executable processes, then the degree of parallelism is equal to J27=i PiIn the following, an example is used to illustrate the effects of the above discussed criteria. Example 22 In the example, a execution tree is given to apply parallel evaluation. In here, only the top-down evaluation direction is discussed, likewise, the bottom-up evaluation has similar effect on the value of the above criteria. Assume the following tree structure is considered (Figure 7.1) Assume the execution of A starts at time 1, consequently, node B, and C start execution at time 2; node D, E, and F start at time 3, Nodes E and F end at 4; node G starts at 4, ends at 5; Node C ends at 5; node D ends at 6; node B ends at 7; and finally, A stop at 8. In a summary, the start and end time for each node is: Start (A) Start(B) Start (C) Start(D) Start(E) Start(F) Start(G)
= = = = = = =
1 2 2 3 3 3 4
End(A) End(B) End(C) End(D) End(E) End(F) End(G)
= = = = = = =
8 7: 5 6 4 4 5
113
Hybrid Parallel Execution Model
Figure 7.1: An example evaluation tree For this example, the following criteria have the following values: Total Evaluation Steps = 8; Total CPU Time = (8 - 1) + (7 - 2) + (5 - 2) + (6 - 3) + 2 * (4 - 3) + (5 - 4) = 21 Regarding the communication costs, if it is assumed that the value transfer of one variable costs one unit resource to accomplish, then the communication costs will equal to the total variables transfered between the calling and called predicates. If the following communication costs are true for each pair of the calling and called predicate Channel.Start(A, Channel-Start (A, Channel-Start (B, Channel-Start(B, Channel.Start(C, Channel-Start (D,
B) = C) = D) = E) = F) = G) =
1 1 2 2 2 3
Channel_End(A, ChannelJEnd(A, Channel_End(B, ChannelJEnd(B, Channel-End(C, Channel-End(D,
B) = C) = D) = E) = F) = G) =
7: 6 4 4 5
then Total Communication Channel Cost = 2*(7-l)+(6-2)+2*(4-2)+(5-3) = 22
114
Efficiency Considerations and Experimental
Results
In this example, we have H = 4, which is used for weighted communication cost calculation, according to the height of the tree. The following weighted communication cost is true for each pair of the calling and the called node: Weighted_Commu_Channel.Cost(A, Weighted_Commu_ChanneLCost(A, Weighted_Commu.Channel.Cost(B, Weighted_Commu-Channel_Cost(B, Weighted_Commu_Channel.Cost(C, Weighted-Commu.ChanneLCost(D,
B) = C) = D) = E) = F) = G) =
(7 (7 (6 (4 (4 (5 -
1) * (4 - (1 + 1) * (4 - (1 + 2) * (4 - (2 + 2) * (4 - (2 + 2) * (4 - (2 + 3) * (4 - (3 +
2) / 2) / 3) / 3) / 3) / 4) /
2) = 2) = 2) = 2) = 2) = 2) =
15.0; 15.0; 6.0; 3.0; 3.0; 1.0;
and the total weighted communication cost of the sum of the cost for callingcalled each pair. Total Weighted Communication Cost = 15.0 + 15.0 + 6.0 + 3.0 + 3.0 + 1.0 = 43.0 The Ratio of Locality in this case has value of 1.55. If we assume that all the nodes at each level can be evaluated in parallel without any dependency, based on the definition of th degree of parallelism, we have Degree of Parallelism = (1 + 3 + 6 + 5 + 3 + 2 + 1) / 7 = 3.00
7.4
A Simulator for Parallel Logic-based Specification Evaluation
Currently, a simulator has been implemented to simulate the executional behavior of the logic-based specification. This simulator is written in C. It consists of about 3,000 lines of source code. The simulator now runs under SUN/UNIX environment. The simulator first builds up a searching tree based on various conditions which may affect the behavior of an AND-OR searching. After the searching tree is built, the system simulates the traditional sequential, general and new parallel execution model and outputs various results to illustrate the executional behavior of the searching algorithms according to the definition of the evaluation criteria defined above. In the following, the structure of the simulator is discussed in detail. The simulator consists of ten files. The functions realized by these files include the following: (1) creation of a search tree, (2) assignment of restriction values to the created tree to simulate the interrelations among clauses and variables within a clause in a logic-based specification, (3) the searching of the
Hybrid Parallel Execution Model
115
expansion tree based on the sequential evaluation principle, (4)the searching of the expansion tree based on the conventional parallel evaluation method, (5) the searching of the expansion tree based on the new hybrid parallel evaluation method, and (6) some other functions to help realize the function of the simulator. The key points for each of these functions are discussed in detail below: 1. N o d e s t r u c t u r e in search t r e e : The searches based on sequential, conventional parallel and new hybrid parallel models are all done within a pre-generated search tree. The node in this tree is represented by a s t r u c structure in C. and the arc in this tree is represented by the pointer inside the node. A complete node structure is illustrated in the following. As will be discussed later, there are three trees generated in a simulation process, one for each searching method, the following node structure is used for the two trees adopting parallel evaluation methods. Since the sequential AND-OR searching is much more simpler than the parallel one, the node structure for the sequential searching is simpler than the following structure. In fact, the field it has is only a subset of the fields mentioned below, which include tt nodeJd, truth_value, num_child, s t a t u s , next, and parent. s t r u c t node { char node_id[Max_Depth]; char int int
v a r . r e l a t e d [TOTAL_VAR_RELATE] [Max_Depth] ; var_init_counter; var_depen_counter;
char int int
deep_jump[MAX_DEGREE_DEEP_JUMP] [Max_Depth] ; jump[Max_Fan_Out] ; total_deep_jump;
int
truth_value;
int
num_child;
int
backup;
flag
status;
s t r u c t node*next [Max_Fan_Out] ; s t r u c t node*parent;
}; The first field, node.id, is used to identify one node from another. The
Efficiency Considerations and Experimental
Results
formation of this node ID is identical with that in [63]. Thus the maximum depth of this node ID is equal to the maximum depth of the search tree. The second field, Var_related is sued to simulate the variable dependencies among predicates within a logic-based specification. The first one represents the maximal allowed variable relations this predicate may have with other predicates. The second one denotes the node ID of all the other predicates inside the clause which are related with the predicate under consideration. The third and forth field, var_init_counter, var_depen_counter, are used to represent the number of variable dependencies initiate and from the node and the number of dependencies coming into this node. These two fields, will be used to decide at which time a predicate is executable. That is, if a node has its var_depen_counter set to be zero, then it may be executed in the next run, if the node is evaluated, then all the variable dependencies inside other nodes which are initiated from this node can be released. The fifth field, deep_jump, designates the possible deep jump within the search tree. This concept reflects the cornerstone of the new bottom-up searching strategy. That is, to go into the evaluation tree and start execution from there to the top of the evaluation tree. The next field, jump, represents the actual direction of the deep jump from the current node. This is accomplished by the coding scheme for node ID. For example, if the deep jump will goes into the child branch with root child of 2, and 4, then the the jump field will have two element, 2, and 4 respectively. Of course the maximum value for this field is the maximum fan out of a node. The next field of the node compound structure, total_deep_jump, is the number of the total jump form the current node. It defines that how many elements will exist in deep_jump, jump. The field, truth_value, represents the truth value of the predicate the current node denotes. The truth value initially is assigned only to the leaf nodes, which represent the fact the final decision of the truth value of nodes is based on. The evaluation of the truth value for intermediate node is based on the AND-OR rule, with the root node of the searching tree assumed to be always an ORparallel node. The truth value for nodes is necessary since the short-cut will based on this value to decide pruning. The next field, num_child, is used to denote how many children exists for the current node. The field following is called backup. If the value of this field is true, then it means that the node has been expanded by deep jumping and is waiting for the termination of the evaluation of its child nodes. Whenever there is a deep jumping occur, the current approach in the simulator will generate all the node into the current evaluable set and mark them as waiting for the evaluation value from the child node, which represents the bottom-up evaluation of the new approach. The status of the current node, whether
Hybrid Parallel Execution Model
117
being a deep jumped node or being an ordinary node, is further denoted in the next field, s t a t u s , which is an enumeration data type with the value domain of untouched, searched, and done, which represent that the current node has not been expanded, has been expanded but not finish, and evaluation has been finished, respectively. These values describe the current state a node is in and is used intensively in the simulator to control the execution sequence. 2. C r e a t i o n of search tree: In the searching tree creation phase, a lot of parameters are used to control the characteristics of the final produced tree. Among them, the following points worth us to pay special attention. The study of the behavior will also be carried out by adjusting these parameters, generating different trees and comparing the executional behavior of the two classes of searches. • Max_Depth: This is the maximal depth a search tree can be. Even though a specific branch may not reach this maximal depth, this limitation still holds for every branches within a searching tree. The adjustment of this parameter can simulate either a shallow search or a deep search to obtain the meaningful results. • Max_Fan_Out: This is the number of maximal children a node can have. This parameter is mainly focus on the AND-parallel branches, which denotes maximally how many predicates exist inside a clause. The number of children of an OR-parallel branch is denoted and controlled by another parameter, Nondeterminism_Rate. The adjustment of this parameter can simulate how many predicates an average clause can have, and further affect other properties of the clause like the number of variable dependencies relations of predicates within a clause. • Fan_Out_Balance: This indicator shows the number of children the majority of nodes could have. For example, if this indicator has value of 0.8, then about 80number of children more than half of Fan_Out_Balance, most of them quite near Fan_Out_Balance; on the other hand, it also shows that about 20have relatively small number of children, some of them may have only one child. • Variable_Related_Rate: This is the rate predicates relate with each other by means of shared variable. If two predicates are related by shared variable, then the execution sequence of them will obey the O n e - P r o d u c e r - m u l t i p l e - c o n s u m e r s rule, thus the variable block initiation and variable block receiving is necessary above. In this simulator, we assume that whenever a predicate shares variable with other predicates within the same clause, then it will has thus
Efficiency Considerations and Experimental
Results
kind of relation with all the predicate nodes to the right of the current node. The adjustment of this parameter can simulate the chance predicate correlation can occur. • Max_Degree_Deep_Jumping: This is the maximal number of deep jumps a node can have. The number of deep jumping a node can have cannot exceed the number of children it has. Besides that, it is explicitly restricted by this parameter. Since the deep jumping plays such an important role in the new execution model, the better adjustment and control over the deep jumping in the simulator will prove to be worthwhile. The sequential search tree does not involve this parameter. • Deep_Jumping_Rate: This is the chance a deep jumping may actually occur in a specific node within the search tree. It directly control that how many deep jumping a searching tree may have. The adjustment of this parameter can simulate the effect the deep jump have on the new execution mode. It should be the case that the higher the chance a deep jumping occur, the more significant the influence of the bottom-up evaluation will have on the traditional evaluation model, and the more explicit the character of the new model may demonstrated to the user. In the simulator, we assume that the jump will always go to the left most child of the lower level nodes in the search tree as the left most children tend to have least input mode parameters or its input mode parameters have all been satisfied by the calling predicate, which is the parent node in the searching tree. The sequential search tree does not involve this parameter. • Min_Deep_Jump_Distance: This is the minimal distance a deep jump should have. This parameter, along with the following parameter, defines the possible distance range a deep jump could have. Thus it also defines the influence power a deep jump may have on the new searching schema. It is straightforward that the larger this two
parameters are, the more significant the effect of the new searching scheme should have on the AND-OR parallel evaluation of a logicbased specification. The sequential search tree does not involve this parameter. Same as the conventional parallel search methods. • Max_Deep_Jump_Distance: This is the maximal distance a deep jump could have. This parameter and the above parameter defines the possible range the distance of a deep jump could have. See previous one for detail description. The sequential search tree does not involve this parameter. • NondeterminismJlate: This parameter defines the maximal possi-
Hybrid Parallel Execution Model
119
ble number of clauses which are unifiable with any predicate within the searching tree [31]. This parameter is a complement of the Max_Fan_Out defined above, where that one mainly deals with ANDparallelism . This parameter mainly involves the nondeterminism a logic-based specification has. The adjustment of this parameter can simulate the various execution behavior when the degree ORparallelism is adjusted. It also may affect the results derived from the short-cut. The short-cut may have more significance when the degree of nondeterminism is low, where the failure can be broadcasted into higher level without being truncated by an unexplored OR-parallel branch. The actual degree of nondeterminism is a linear distribution over 1 to NondeterminismJlate. • Error_Rate: This is the rate which decides the truth assignment towards the leaves of a searching tree. Originally, the truth values are assigned only to the leaves, later on the evaluation will assign truth value to each of the intermediate node based on AND-OR evaluation model. The parameter works in this way: If currently we have Error-Rate of 0.4, then about 40to the contrary. The adjustment of this parameter can simulate the effect early short-cut may have on the new execution model. It is obvious that the higher the Error_Rate, the more likely an early short-cut will occur. The actual search tree will be built incorporating the above parameters. There are three trees built for a single execution of the simulator. These three trees have exactly the same properties regarding the above parameters. One for sequential search, one for conventional parallel search, and another for the new hybrid searching method. The last tree will have the deep jumping information while the first two don't. This is because the deep jumping is only applicable for the new searching method. The three trees will have exactly the same number of children for each of the node in the tree. The same truth value for the leaf node, and the same variable dependency property for each of the node in the tree. Only by arrange the three trees in this way so that they have the same basic property, may the comparison of two searching methods become meaningful. 3. The sequential tree searching method: The sequential search is basically a depth first searching process augmented with AND-OR evaluation process. A non-leaf node's truth value is based on its children's truth value, and the level of the search tree the node resides in. The last condition decides whether the current node is an AND-parallel or OR-parallel node. 4. The conventional parallel tree searching method: A node's status
120
Efficiency Considerations and Experimental
Results
can be one of untouch, search, and done. This represents the three major steps in the general searching phase. At first, all the nodes in the tree are assigned statue value of untouch, which is the initial state of the search; then the root of the tree is assigned the state search, which represents that the search of the node has already started but does not finish. The search then goes into the deeper level of the tree until the definite logic value is found and assigned to the node current under searching based on AND-OR principle. During the search process, the nodes one layer below the current nodes which has status search can be assigned a process and being actually evaluated in the next step if this node has no uncleared variable dependency. If a process is assigned and the evaluation is going on in the next step, then the state of the node will be changed to search. The search goes on until the leaf node is reached. In this case, both the truth value of the node is gathered and pass to the nodes one level higher and the possible short-cut will begin, based on the level the leaf node resides in. The short-cut mainly occurs at the AND-level, and propagates along the search path above until the OR-parallel point where there are more than one OR-branches beside the current branch and was unsearched. The short-cut is represented by marking all the sibling nodes of the nodes on the path from the point where the short-cut occurs to the point where the short-cut should stop, including the nodes on the path itself, to have the status of done, which means the search has been finished. If no short-cut occurs, then the node in the current layer is marked as done. Recursively, if all the child nodes of a node have status done, then the node itself can finish evaluation and been marked as done in the next round of the evaluation. Of course the leaf node always has the status done by the initial assignment. The search process stops when the root node is assigned the status done and there is no node in the tree having status search or untouch. The behavior of the whole process of this search is recorded step by step intp some data structure for later analysis, according to the criteria defined above. During the course of the execution, the communication cost is accumulated and stored in a history data structure whenever each round of evaluation is finished. These communication cost is a logic cost used to simulate the actual expenses in communication.
5. The new hybrid searching method: The portion of the simulator for the new hybrid searching method has the basic execution structure similar to the one discussed above for the general method, except that the execution has to take into consideration the deep jumping based on the static data dependency analysis. This deep jumping information comes form an extra function designed specific for the new hybrid
Hybrid Parallel Execution Model
121
search. The function is to assign the deep jumping information to the searching tree based on the restrictions outlined in the parameters for the searching tree. Whenever a search reaches a deep jumping node, the nodes on the way from the starting point to the end point for this deep jumping along with all their sibling nodes will all be assigned the status of either search or done, based on whether the node has uncleared variable dependency. But in recording the corresponding steps into the history data structure for later analysis, these nodes on the way between will not been included, because these nodes actually doesn't need to be generated until the search of all of its children has been finished. So in the simulator we simulate the deep jump from the searching function for the general searching method. The implicit searched nodes by means of deep jumping will become explicit if all of its children have finished evaluation. The node itself will be invoked in the next step for one step evaluation, assigned status value of done and finish evaluation too. The same short-cut principle also apply in this case. 6. The execution history data structure: There are two main data structures used in the simulator. One is used to store the communication cost within each step of the execution. In this simulator we focus on how many time the communication between parent and child processor is necessary. There is no limitation on how many information can be transferred within each communication. So even if there are five values transferred between parent and child nodes when the parent call the child, only one communication is counted for the history; likewise, the message from child to parent node will also take only one step to finish, no matter how many values actually transferred. The other is to record how many processes (or nodes) exist at each round of the evaluation. The processes exist at each round is decided by the E X E C set at each step of the execution. The data structure also records the starting and ending time for each of the processes occurs during the whole course of execution.
7.5
Experimental Results and Comparison
In this section, the simulator is used to demonstrate the executional behavior of the sequential, general and the new searching method. For each set of simulations, the behavior comparison is provided for the three search methods, sequential, conventional parallel, and new parallel methods. The parameters for describing the searching tree are adjusted in their full range to explore all possible combinations of the searching. There are fifteen tables result from the execution of this simulation in this subsection. Each three tables are grouped
122
Efficiency Considerations and Experimental
Results
to simulate certain criteria and demonstrate the executional behavior of the sequential, conventional parallel, and new hybrid parallel execution model under various depth and width of the searching tree. The five criteria used for each of the five groups are: 1. Error_Rate of the leaf nodes in the search tree. The three values used are 0.0, 0.5, and 0.99. The higher the value, the more the leaf nodes which have fail truth value. 2. Deep .Jump-Distance of the minimum and maximum deep jump distance. This serves as the basis for the new evaluation model. The three pairs of value used for minimum and maximum deep jumping are 1 and 2, 2 and 4, 4 and 7 respectively. The higher the range of the value, the deeper the jump can be. 3. Deep_Jumping_R,ate of deep jumping. The three values used are 0.1, 0.5, and 0.9. The higher the value, the more likely deep jumping will occur within the searching tree. 4. Variable_Related_Rate of variable dependency. This occurs within a clause. This simulation can demonstrate the degree of dependency for AND-parallelism within a clause. The three values used are 0.0, 0.5, and 0.99 respectively. The higher the value, the more the variable dependencies within clauses. 5. Nondeterminism_Rate of the average number of clauses unifiable with a calling predicate. The three values used for simulation are 1, 2, and 3 respectively. The value of the Nondeterminism_Rate is equal to the maximum number of clauses unifiable with calling predicate. In the table, mh means the minimum height of the search tree, xh means the maximal possible height of the tree, mw means the minimum width of the AND-parallel branches, which is also the minimum number of predicates within a clause, and xw is the maximum width of AND-parallel branches.
7.5.1
Simulation for Various Values of Error_Rate
The following three tables are used to show the effect of the error rate on the execution. The error rate will show how significant the deep jumping is regarding the short-cut, how great the behavior improvement this can achieve. The following assumptions are all true for the three groups of simulation. • Variable-Related_Rate = 0.5 • Fan_Out_Balance = 0.3
Hybrid Parallel Execution Model
123
• DeepJumping-Rate = 0.8 • Min_Deep_Jump_Distance = 3 • Max_Deep_Jump_Distance = 5 • Nondeterminism_Rate = 1 which define the related rate between variables within a clause, the trend for the number of predicates within a clause, the rate a deep jumping will occur, the minimum and maximum distance a deep jumping will occur, the average number of clauses unifiable with a clause, the minimum and maximum number of predicates within a clause. (1) Error_Rate = 0.0: Table 1 and Table 2 illustrates the results. Prom this table, we can see that even though the total evaluation step is better in the parallel cases, the number of total nodes searched shows no improvement. The total evaluation step and total CPU rounds in the new evaluation model have both shown improvement over the conventional parallel evaluation model. The improvement for both communication cost and weighted communication cost are obvious. The improvement in the ratio of locality shows a positive change of the new model over the conventional parallel execution model. Even the total evaluation step decreases, the degree of parallelism turn to the negative side. This can be explained by noticing the total CPU rounds decreases by 30.79%, though the number of total nodes searched remains the same, which means that a node will stay shorter as waiting for search, thus tend to decrease the total number of nodes waiting for search in the new model, which turns out to be a desirable behavior. Regarding the two parallel evaluation model under the criteria of TotaLEvaluationStep, TotaLCPU-Round, TotaLCommunication-Cost, and TotaLCommunication.Cost, the rates of improvement increase along with the increase in depth and decrease in width of the search tree, averagely from around 5% to up 30% to 70%. So a conclusion can be drawn from this analysis, i.e., the new hybrid evaluation model is more suitable for evaluation with deep search. (2) Error_Rate = 0.50: Table 3 and Table 4 illustrates the results. Consider TotaLEvaluationStep, there is one case that both parallel evaluation behaves worse than the sequential model, which is the case with the widest width and shortest depth search tree. This is because that a sequential search can go deep quickly and find out a failure at leaf tree and results in a short-cut, while in parallel evaluation model the execution has to go level by level until a failure is discovered or the whole tree has been searched. For the remaining five cases, two of them show both parallel model being better than or equal to the sequential model, while
124
Efficiency Considerations and Experimental
Results
three of them show that the conventional parallel model being worse than the sequential, while the new execution model turns out to be better than the conventional. These cases reveal two aspects: (1) sometimes sequential evaluation is better than the parallel evaluation; (2) there are cases that the behavior of conventional evaluation is worse than the sequential evaluation while the new parallel evaluation model is better than both of them. Similar situation happens to TotaLNodeSearched, the only difference is that both the parallel execution models are better than the sequential case for most of the cases. The same trend is true in favor of deep search in this round of evaluation which can be seen from most of the criteria in the table. In the first column, where the widest and shallowest search tree is used, the overhead of communication in the new hybrid model becomes so strong that it actually overrules any improvement the new model may derive. The same trend partially continues into the second column. In Ratio-of-Locality, all evaluations show more than 3.93% up to 12.22% improvement. (3) Error-Rate = 0.99: Table 5 and Table 6 illustrate the results. Most of the cases generated this round have the same result comparing with the previous table. For the average value comparison, all the average values generated show little or no improvement, but all the columns showing change in average demonstrate positive change direction. (4) From the above three tables, we can see that with increasing error rate, all of the average improvements of the new execution model have values substantially better than the previous case in the corresponding column. For each row, where the depth of the evaluation changes, it can be seen that the improvement for deeper evaluation is generally better than for the shallow evaluation. This is true because the new approach can go into deeper levels more quickly than the general approach. So while the error rate increases, those leaf nodes with error at their truth value can be discovered earlier than the general approach can discover them, and short cuts can also be discovered much earlier. The new approach focuses on deep jumps, which is directly related to the depth of a tree. For the shallow searches, the search does not go into the tree, and the advantage of the new approach cannot overcome the overhead of the new approach. So for deep search processes, the advantage for the new approach is obvious.
7.5.2
Simulation for Values of Deep_Jumping-Distance
In the following, the effect of the various depths and widths of search trees is analyzed in relation to the deep jump distance. The evaluation uses following conditions. • Variable_Related_Rate = 0.5
Hybrid Parallel Execution Model
125
• Fan.Out-Balance = 0.3 • Deep-JumpingJtate = 0.6 • NondeterminismJLate = 1 • ErrorJlate = 0.8 (1) Min_Deep_Jump_Distance = 1, MaxJDeepJumpJDistance = 2: Table 7 and Table 8 illustrate the results. In this group of testing, Error-Rate is chosen to be 0.8, and the testing shows characteristic of the high error rate test above. Since the deep jumping distance is shorter than the previous case, the overall improvement is reduced by certain degree. The improvement between the shallow and wide searching trees and deep and narrow trees shows an obvious gap, regarding TotaLEvaluationStep, TotaLCPU-Round, TotaLCommunication-Cost, and TotaLWeighted-Communicatiori-Cost. This is another prove that deep search is more suitable for the new evaluation mode. These group of testing is closely related to the nature of the new approach. The deep jumping is unique among the three search mechanisms. (2) Min_Deep_Jump_Distance = 2, Max_Deep_Jump_Distance = 4: Table 9 and Table 10 illustrate the results. The same gap for improvement still exist for this group of testing. In TotaLCommunication_Cost, the first three tests for shallow and wide searching tree show negative improvement because of the overhead in communication, while the later three show very significant improvements. Same situation for Weighted-Communication-Cost, even though two of the first three have positive improvement, the values are small compare with the last three. Ratio .of-Locality in all columns show positive improvement. Even though Degree-of-Parallelism shows negative improvement, the actual case may not be so worse, since both the total number of nodes explored and the total steps to finish the evaluation decrease in the new hybrid evaluation model. (3) Min_Deep-Jump_Distance = 4, MaxJDeep.Jump-Distance = 7: Table 11 and Table 12 illustrate the results. Along with the deep jumping distance, the case with relatively shallow distance of searching and wide fanout (the third column) also shows relatively satisfactory improvement. So the suitable deep and width of the searching tree for new evaluation model is related to the minimum and maximum deep jumping distance. (4) Obviously the behavior of the new execution model improves substantially when the deep jumping distance increases. This best demonstrates the improvement of this new model over traditional evaluation models. The execution also improves with increased depths. This point was analyzed when the effect of ErrorJlate was discussed.
126
7.5.3
Efficiency Considerations and Experimental
Results
Simulation for Values of Deep_JumpingJRate
The next three tables examine the effect of the rate the deep jumps occur during search process. The following assumptions hold for all of the three tables. • Variable_Related_Rate = 0.5 • Fan_Out.Balance = 0.3 • MinJDeep.JumpJDistance 2 • MaxJDeep-JumpJDistance 4 • Nondeterminism_Rate = 1 • Error_Rate = 0.5 (1) Deep .Jumping _Rate = 0.1 Table 13 and Table 14 illustrate the results. Even with such a small deep jumping rate, the TotaLCPU-Round still shows more than 22% improvement and more than 16% percent improvement in TotaLNodeSearched. TotaL Weighted-Commu-Cost also shows satisfactory improvement. Along with the increase in depth and decrease in width, the percentage of communication overhead over the total communication cost of the hybrid evaluation model decreases from around 10% to around 0.5%, this once again favors deep search over shallow ones. For the latter ones, actually there is no improvement from the new evaluation model. (2) Deep-Jumping_Rate = 0.5 Table 15 and Table 16 illustrate the results. Along with the strengthening in Deep-Jumping-Rate, some shallow searches (like the second row) begin to show improvement in some items under the new parallel searching mechanism. But under criteria related to communication, these tests still demonstrate that shallow searching is not suitable to the new evaluation model. The Ratio .of-Locality shows substantial improvement in this group of tests, where all of them results in positive improvements. Also along with increase in deep jumping rate, the communication overhead becomes more and more significant comparing with the communication cost to finish the new parallel evaluation model. (3) Deep_Jumping_Rate = 0.9 Table 17 and Table 18 illustrate the results. The test in the first column shows big overhead in communication costs, which is related to the shallow and wide searching tree. The Ratio-o]'-Locality shows consistent improvement. The Degree-of-Parallelism have all negative improvement, which is not so bad as it first seems to be. Consider the firth test, where the Rate.of-Parallelism shows
Hybrid Parallel Execution Model
127
about 41% decrease, but the TotaLNodesSearched and the TotaLCPU-Rounds show improvement of 77% and 90% increase respectively, which all contribute to the decrease in Degree.of—parallelism. (4) As expected, increases in the deep jumping rate improve the performance of the new execution model more and more substantially. Since the advantage of this new approach comes solely from the deep jumping in the execution and hybrid execution model, the results is quite noticeable.
7.5.4
Simulation for Values of VariableJUelatedJRate
In the following three tables, the effect of the Variable_Related_Rate is discussed. The following assumptions are true for the three tables. • Deep-Jumping_Rate = 0.5 • Fan_Out_Balance = 0.3 • Min_Deep.Jump_Distance 2 • Max-Deep_Jump_Distance 4 • Nondeterminism_Rate = 1 • Error JRate = 0.5 (1) Variable_Related_Rate = 0.0 Table 19 and Table 20 illustrate the results. In this case where there is no variable dependency between clauses, the TotaLNodesSearched under the parallel execution model is much larger than that of the sequential model. This is can be understood as while the execution goes deep in the sequential model, both of the parallel model explore all the nodes at certain level where the search reached, as a result, the node searched is much larger than the sequential case. The analysis also shows that within the two parallel evaluation model, the new hybrid model has significant improvement in —it Total_Node_Searched, TotaLCPU-Round, and TotaLCommunication-Cost. (2) Variable_Related.Rate = 0.5 Table 21 and Table 22 illustrate the results. Along with the increase in Inter-Variable-Related-Rate, first test case show no improvement from the new execution model over the conventional parallel model. The TotaLNodeSearched in the new parallel evaluation model shows improvement over the sequential execution model in most of the cases. Compare with the previous group of tests, the communication costs show constant improvement in the new model over the conventional model. (3) Variable-Related-Rate = 0.99
Efficiency Considerations and Experimental
128
Results
Table 23 and Table 24 illustrate the results. The new model in the first two test cases show no improvement in execution, but the TotaLNodeSearched under two parallel execution model show constant improvement over the sequential model. (4) From the above three tables we can see that with increases in the variable dependency rate, the improvement of the new model decreases. The improvement, in the worst case, still achieves an average improvement rate of about 20%. The reason of this is that the increase in variable dependency will decrease the number of predicates which can be evaluated parallelly for both parallel evaluation cases. For the new execution model, the chance for deep jumping will also be reduced by this strong coupling between predicates sharing variables. So even though both parallel execution models are hurt by the strengthened predicate dependency, the new execution model will be hurt more than the conventional model. Along with increases in the value of the Variable-Related-Rate, the percentage of overhead decreases in both the communication and the weighted communication cost.
7.5.5
Simulation for Values of Nondeterminism_Rate
The following three tables are used to demonstrate the effects of the degree of nondeterminism on the new evaluation model. Nondeterminism is defined as the average number of clauses unifiable with the calling predicates in the specification. The following conditions are true for all the evaluations in the next three tables. • Variable _Related_Rate = 0.5 • Fan.Out -Balance = 0.3 • Deep -JumpingJtate = 0.6 • Min_Deep-Jump -Distance = 2 • MaxJDeep-Jump-Distance = 5 • Error_Rate = 0.6 (1) Non-Determinism_Rate — 1 Table 25 and Table 26 illustrate the results. Along with the increase in depth of the search tree, the percentage of the node searched by sequential method over the two parallel methods decrease, which means that the sequential search is more suitable for the deep searching tree. The deep searching tree being in favor of sequential search is not so clear from the TotaLEvaluationStep.
Hybrid Parallel Execution Model
129
(2) Non_Determinism_Rate = 2 Table 27 and Table 28 illustrate the results. The communication cost for the first four set of tests all show negative improvement, but they are positive for the last two set of tests. The total nodes searched for both of the parallel model show constant improvement over the sequential model. The sequential search under TotaLNodeSearched and TotaLEvaluationStep shows behavior worse than the two parallel evaluation models in all the test cases. (3) Non.Determinism_Rate = 3 Table 29 and Table 30 illustrate the results. The TotaLEvaluationStep and TotaLNodeSearched in both of the parallel model show significant improvement over the sequential model, except the first criterion in the first test case, especially in the forth test case. The Degree-of-Parallelism for the first time shows positive average change for the three group of tests under this category. (4) The test for various values of Non-Determinism-Rate compares sequential and parallel searches. The results in these three tables show that as the values of NonJDeterminism-Rate increase, the two parallel evaluation models demonstrate better evaluation behavior. Also along the increase in Non-Determinism-Rate during the three steps of execution, the average Degree-of-Parallelism increases significantly from negative to positive values.
130
Efficiency Considerations and Experimental Results
mh=3 xh=5 mw=6 xw=10 Ttl Evalu. Steps S 19 Ttl Evalu. Steps 0 17 Ttl Evalu. Steps N 16 Rate Improvement 5.88% Ttl Node Searched S 18 Ttl Node Searched 0 18 Ttl Node Searched N 18 — Rate Change Ttl CPU Rounds 0 63 Ttl CPU Rounds N 45 Rate Change 28.57% Ttl Com Cost 0 22 Ttl Com Cost N 28 Static Overhead 2 Rate Change L -27.27% Ttl Wt Com Cst 0 85.0 Ttl Wt Com Cst N 98.0 Static Wt Overhead 7.0 -15.29% Rate Change 3.864 Ratio of Locality 0 Ratio of Locality N 3.500 9.42% Rate Improvement Degree of Paral. 0 3.647 Degree of Paral. N 3.688 Rate Improvement 1.12%
mh=6 xh=9 mw=5 xw=7 268 145 142 2.07% 267 267 267 — 1,358 1,276 6.04% 270 278 2 -2.96% 1,581.0 1,591.0 11.0 -0.63% 5.856 5.723 2.27% 9.359 9.408 0.52%
mh=8 xh=13 mw=3 xw=5 515 200 195 2.50% 514 514 514 — 3,593 3,102 13.67% 358 359 8 -0.28% 3,139.0 3,137.5 52.0 0.05% 8.768 8.740 0.32% 17.960 18.282 1.79%
mh=15 xh=20 mw=l xw=3 107 57 48 15.79% 106 106 106 — 1,381 642 53.51% 62 54 26 12.90% 1,117.0 965.0 233.0 13.61% 18.016 17.870 0.81% 24.211 24.479 1.11%
Table 7.1: TEST RESULTS FOR ERROR-RATE = 0.0 (a)
Hybrid Parallel Execution Model
Ttl Evalu. Steps S Ttl Evalu. Steps 0 Ttl Evalu. Steps N Rate Improvement Ttl Node Searched S Ttl Node Searched 0 Ttl Node Searched N Rate Change Ttl CPU Rounds 0 Ttl CPU Rounds N Rate Change Ttl Com Cost 0 Ttl Com Cost N Static Overhead Rate Change Ttl Wt Com Cst 0 Ttl Wt Com Cst N Static Wt Overhead Rate Change Ratio of Locality 0 Ratio of Locality N Rate Improvement Degree of Paral. 0 Degree of Paral. N Rate Improvement
mh=20 xh=30 mw=l xw=2 25 48 34 29.17% 24 24 24 — 577 202 64.99% 46 32 8 30.43% 1,357.0 944.0 152.0 30.43% 29.500 29.500 — 12.000 11.294 -5.88%
mh=30 xh=50 mw=l xw=2 34 66 46 30.30% 33 33 33 — 1,090 313 71.28% 64 44 12 31.25% 3,168.0 2,178.0 410.0 31.25% 49.500 49.500 — 16.500 16.435 -0.39%
Avg. 161.33 88.833 80.167 9.76% 160.33 160.33 160.33 — 1,343.7 930.00 30.79% 137.0 132.5 9.667 3.28% 1,741.2 1,485.6 144.17 14.68% 19.251 19.139 0.58% 13.946 13.931 -0.11%
Table 7.2: TEST RESULTS FOR ERROR-RATE = 0.0 (b)
132
Efficiency Considerations and Experimental Results
Ttl Evalu. Steps S Ttl Evalu. Steps 0 Ttl Evalu. Steps N Rate Improvement Ttl Node Searched S Ttl Node Searched 0 Ttl Node Searched N Rate Change Ttl CPU Rounds 0 Ttl CPU Rounds N Rate Change Ttl Com Cost 0 Ttl Com Cost N Static Overhead Rate Change Ttl Wt Com Cst 0 Ttl Wt Com Cst N Static Wt Overhead Rate Change Ratio of Locality 0 Ratio of Locality N Rate Improvement Degree of Paral. 0 Degree of Paral. N Rate Improvement
mh—3 xh=5 mw=6 xw=10 5 7 6 14.29% 11 4 4 — 15 11 26.67% 7 6 2 14.29% 28.5 22.0 7.0 22.81% 4.071 3.667 9.92% 2.000 1.833 -8.35%
mh=6 xh=9 mw=5 xw=7 10 10 8 20.00% 22 14 11 21.43% 49 27 44.90% 24 24 2 — 124.0 108.0 11.0 12.90% 5.167 4.500 12.91% 4.800 4.250 -11.46%
mh=8 xh=13 mw=3 xw=5 10 11 6 45.45% 20 12 7 41.67% 50 15 70.00% 20 11 4 45.00% 190.0 92.5 38.0 51.32% 9.500 8.409 11.48% 4.455 3.667 -17.69%
mh=15 xh=20 mw=l xw=3 23 19 8 57.89% 26 66 16 75.76% 383 42 89.03% 81 26 18 67.90% 851.5 291.5 193.0 65.77% 10.512 11.212 -6.66% 20.105 11.750 -41.56%
Table 7.3: TEST RESULTS FOR ERROR-RATE = 0.50 (a)
Hybrid Parallel Execution Model
Ttl Evalu. Steps S Ttl Evalu. Steps 0 Ttl Evalu. Steps N Rate Improvement Ttl Node Searched S Ttl Node Searched 0 Ttl Node Searched N Rate Change Ttl CPU Rounds 0 Ttl CPU Rounds N Rate Change Ttl Com Cost 0 Ttl Com Cost N Static Overhead Rate Change Ttl Wt Com Cst 0 Ttl Wt Com Cst N Static Wt Overhead Rate Change Ratio of Locality 0 Ratio of Locality N Rate Improvement Degree of Paral. 0 Degree of Paral. N Rate Improvement
mh=20 xh=30 mw=l xw=2 25 27 13 51.85% 24 24 11 54.17% 325 67 79.38% 47 21 8 55.32% 1,133.5 471.0 152.0 58.45% 24.117 22.429 7.00% 12.000 10.154 -15.38%
mh=30 xh=50 mw=l xw=2 34 35 15 57.14% 33 33 13 60.61% 563 93 83.48% 65 26 12 60.00% 2,689.5 1,041.0 410.0 61.29% 41.377 40.038 3.24% 16.057 15.267 -4.92%
Avg. 17.833 18.167 9.333 48.63% 22.667 25.500 10.333 59.48% 230.83 42.50 81.59% 40.667 19.000 7.667 53.28% 836.17 337.67 135.17 59.62% 15.791 15.043 4.74% 9.903 7.820 -21.03%
Table 7.4: TEST RESULTS FOR ERROR-RATE = 0.50 (b)
Efficiency Considerations and Experimental Results
Ttl Evalu. Steps S Ttl Evalu. Steps 0 Ttl Evalu. Steps N Rate Improvement Ttl Node Searched S Ttl Node Searched 0 Ttl Node Searched N Rate Change Ttl CPU Rounds 0 Ttl CPU Rounds N Rate Change Ttl Com Cost 0 Ttl Com Cost N Static Overhead Rate Change Ttl Wt Com Cst 0 Ttl Wt Com Cst N Static Wt Overhead Rate Change Ratio of Locality 0 Ratio of Locality N Rate Improvement Degree of Paral. 0 Degree of Paral. N Rate Improvement
mh=3 xh=5 mw=6 xw=10 5 7 6 14.29% 11 4 4 — 15 11 26.67% 7 6 2 14.29% 28.5 22.0 7.0 22.81% 4.071 3.667 9.92%
mh=6 xh=9 mw=5 xw=7 10 10 8 20.00% 22 14 11 21.43% 49 27 44.90% 24 23 2 4.17% 124.0 90.5 11.0 27.02% 5.167 3.935 23.84%
2.000 1.833 -8.35%
4.800 4.250 -11.46%
mh=8 xh=13 mw=3 xw=5 10 11 6 45.45% 20 12 7 41.67% 50 15 70.00% 20 11 4 45.00% 190.0 92.5 38.0 51.32% 9.500 8.409 11.48% 4.455 3.667 -17.69%
mh=15 xh=20 mw=l xw=3 17 19 8 57.89% 20 66 16 75.76% 383 42 89.03% 81 26 18 67.90% 854.5 291.5 193.0 65.89% 10.549 11.212 -6.28% 20.105 11.750 -41.56%
Table 7.5: TEST RESULTS FOR ERROR-RATE = 0.99 (a)
Hybrid Parallel Execution Model
Ttl Evalu. Steps S Ttl Evalu. Steps 0 Ttl Evalu. Steps N Rate Improvement Ttl Node Searched S Ttl Node Searched 0 Ttl Node Searched N Rate Change Ttl CPU Rounds 0 Ttl CPU Rounds N Rate Change Ttl Com Cost 0 Ttl Com Cost N Static Overhead Rate Change Ttl Wt Com Cst 0 Ttl Wt Com Cst N Static Wt Overhead Rate Change Ratio of Locality 0 Ratio of Locality N Rate Improvement Degree of Paral. 0 Degree of Paral. N Rate Improvement
mh=20 xh=30 mw=l xw=2 25 27 13 51.85% 24 24 11 54.17% 325 67 79.38% 47 21 8 55.32% 1,133.5 471.0 152.0 58.45% 24.117 22.429 7.00% 12.000 10.154 -15.38%
mh=30 xh=50 mw=l xw=2 34 35 15 62.86% 33 33 13 60.61% 563 93 83.48% 65 26 12 60.00% 2,689.5 1,041.0 410.0 61.29% 41.377 40.038 3.24% 16.057 15.267 -4.92%
Avg. 16.833 18.167 9.333 48.63% 21.667 25.500 10.333 59.84% 230.83 42.50 81.59% 40.667 18.833 7.667 53.69% 836.67 334.75 135.17 59.99% 15.797 14.948 5.37% 9.903 7.820 -21.03%
Table 7.6: TEST RESULTS FOR ERROR-RATE = 0.99 (b)
Efficiency Considerations and Experimental Results
Ttl Evalu. Steps S Ttl Evalu. Steps 0 Ttl Evalu. Steps N Rate Improvement Ttl Node Searched S Ttl Node Searched 0 Ttl Node Searched N Rate Change Ttl CPU Rounds 0 Ttl CPU Rounds N Rate Change Ttl Com Cost 0 Ttl Com Cost N Static Overhead Rate Change Ttl Wt Com Cst 0 Ttl Wt Com Cst N Static Wt Overhead Rate Change Ratio of Locality 0 Ratio of Locality N Rate Improvement Degree of Paral. 0 Degree of Paral. N Rate Improvement
mh=3 xh=5 mw=6 xw=10 5 7 6 14.29% 11 4 4 — 15 11 26.67% 7 6 2 14.29% 28.5 22.0 7.0 22.81% 4.071 3.667 9.92% 2.000 1.833 -8.35%
mh=6 xh=9 mw=5 xw=7 10 10 9 10.00% 22 14 8 42.86% 49 33 32.65% 24 20 4 16.67% 124.0 104.0 18.0 16.13% 5.167 5.200 -0.64% 4.800 4.667 -2.77%
mh=8 xh=13 mw=3 xw=5 10 11 9 18.18% 20 12 7 41.67% 50 30 40.00% 20 13 6 35.00% 190.0 131.5 53.0 30.79% 9.500 10.115 -6.47% 4.455 4.667 4.67%
mh=15 xh=20 mw=l xw=3 17 19 13 31.58% 20 66 35 46.79% 383 142 62.92% 81 45 24 44.44% 854.5 512.0 280.0 40.08% 10.549 11.378 -7.86% 20.105 17.154 -14.68%
Table 7.7: TEST RESULTS FOR SMALL DISTANCE RANGE (a)
Hybrid Parallel Execution Model
Ttl Evalu. Steps S Ttl Evalu. Steps 0 Ttl Evalu. Steps N Rate Improvement Ttl Node Searched S Ttl Node Searched 0 Ttl Node Searched N Rate Change Ttl CPU Rounds 0 Ttl CPU Rounds N Rate Change Ttl Com Cost 0 Ttl Com Cost N Static Overhead Rate Change Ttl Wt Com Cst 0 Ttl Wt Com Cst N Static Wt Overhead Rate Change Ratio of Locality 0 Ratio of Locality N Rate Improvement Degree of Paral. 0 Degree of Paral. N Rate Improvement
mh=20 xh=30 mw=l xw=2 25 27 18 33.33% 24 24 16 33.33% 325 137 57.85% 47 30 18 36.17% 1,133.5 712.0 301.0 37.19% 24.117 23.733 1.59% 12.000 10.611 -11.57%
mh=30 xh=50 mw=l xw=2 34 35 24 31.43% 33 33 22 33.33% 563 255 54.71% 65 43 22 33.85% 2,689.5 1,784.5 743.0 33.65% 41.377 41.500 -0.30% 16.057 15.292 -4.76%
Avg. 16.833 18.167 13.167 27.52% 21.667 25.500 15.333 39.87% 230.83 101.33 56.10% 40.667 26.167 12.667 35.66% 836.67 544.33 233.67 34.94% 15.797 15.932 -0.85% 9.903 9.037 -8.74%
Table 7.8: TEST RESULTS FOR SMALL DISTANCE RANGE (b)
138
Efficiency Considerations and Experimental Results
Ttl Evalu. Steps S Ttl Evalu. Steps 0 Ttl Evalu. Steps N Rate Improvement Ttl Node Searched S Ttl Node Searched 0 Ttl Node Searched N Rate Change Ttl CPU Rounds 0 Ttl CPU Rounds N Rate Change Ttl Com Cost 0 Ttl Com Cost N Static Overhead Rate Change Ttl Wt Com Cst 0 Ttl Wt Com Cst N Static Wt Overhead Rate Change Ratio of Locality 0 Ratio of Locality N Rate Improvement Degree of Paral. 0 Degree of Paral. N Rate Improvement
mh=3 xh=5 mw=6 xw=10 5 7 6 14.29% 11 4 4 — 15 11 26.67% 7 6 2 14.29% 28.5 22.0 7.0 22.81% 4.071 3.667 9.92% 2.000 1.833 -8.35%
mh=6 xh=9 mw=5 xw=7 10 10 8 20.00% 22 14 11 21.43% 49 27 44.90% 24 24 2 — 124.0 108.0 11.0 12.90% 5.167 4.500 12.91% 4.800 4.250 -11.46%
mh=8 xh=13 mw=3 xw=5 10 11 9 18.18% 20 12 7 41.67% 50 30 40.00% 20 18 4 10.00% 190.0 160.0 34.0 15.79% 9.500 8.889 6.43% 4.455 4.889 9.74%
mh=15 xh=20 mw=l xw=3 17 19 10 47.37% 20 66 23 65.15% 383 73 80.94% 81 35 16 56.79% 854.5 374.5 188.0 56.17% 10.549 10.700 -1.43% 20.105 15.200 -24.40%
Table 7.9: TEST RESULTS FOR MEDIUM DISTANCE RANGE (a)
Hybrid Parallel Execution Model
Ttl Evalu. Steps S Ttl Evalu. Steps 0 Ttl Evalu. Steps N Rate Improvement Ttl Node Searched S Ttl Node Searched 0 Ttl Node Searched N Rate Change Ttl CPU Rounds 0 Ttl CPU Rounds N Rate Change Ttl Com Cost 0 Ttl Com Cost N Static Overhead Rate Change Ttl Wt Com Cst 0 Ttl Wt Com Cst N Static Wt Overhead Rate Change Ratio of Locality 0 Ratio of Locality N Rate Improvement Degree of Paral. 0 Degree of Paral. N Rate Improvement
139
mh=20 xh=30 mw=l xw=2 25 27 16 40.74% 24 24 13 45.83% 325 105 67.69% 47 26 10 44.68% 1,133.5 597.5 185.0 47.29% 24.117 22.981 4.71% 12.000 11.125 -7.29%
mh=30 xh=50 mw=l xw=2 34 35 17 51.43% 33 33 15 54.55% 563 122 78.33% 65 30 16 53.85% 2,689.5 1,206.0 552.0 55.16% 41.377 40.200 2.84% 16.057 15.529 -3.29%
Avg. 16.833 18.167 11.000 39.45% 21.667 25.500 12.167 52.29% 230.83 60.333 73.86% 40.667 23.167 8.333 43.03% 836.67 411.33 162.83 50.84% 15.797 15.156 4.06% 9.903 8.804 -11.10%
Table 7.10: TEST RESULTS FOR MEDIUM DISTANCE RANGE (b)
Efficiency Considerations and Experimental Results
Ttl Evalu. Steps S Ttl Evalu. Steps 0 Ttl Evalu. Steps N Rate Improvement Ttl Node Searched S Ttl Node Searched 0 Ttl Node Searched N Rate Change Ttl CPU Rounds 0 Ttl CPU Rounds N Rate Change Ttl Com Cost 0 Ttl Com Cost N Static Overhead Rate Change Ttl Wt Com Cst 0 Ttl Wt Com Cst N Static Wt Overhead Rate Change Ratio of Locality 0 Ratio of Locality N Rate Improvement Degree of Paral. 0 Degree of Paral. N Rate Improvement
mh=3 xh=5 mw=6 xw=10 5 7 6 14.29% 11 4 4 — 15 11 26.67% 7 6 2 14.29% 28.5 22.0 7.0 22.81% 4.071 3.667 9.92% 2.000 1.833 -8.35%
mh=6 xh=9 mw=5 xw=7 10 10 8 20.00% 22 14 11 21.43% 49 27 44.90% 24 24 2 — 124.0 108.0 11.0 12.90% 5.167 4.500 12.91% 4.800 4.250 -11.46%
mh=8 xh=13 mw=3 xw=5 10 11 5 54.55% 20 12 6 50.00% 50 11 78.00% 20 9 2 55.00% 190.0 72.5 23.0 61.84% 9.500 8.056 15.20% 4.455 3.200 -28.17%
mh=15 xh=20 mw=l xw=3 17 19 8 57.89% 20 66 16 75.76% 383 39 89.92% 81 25 10 69.14% 854.5 274.5 129.0 67.88% 10.549 10.980 -4.09% 20.105 13.625 -32.23%
Table 7.11: TEST RESULTS FOR LARGE DISTANCE RANGE (a)
Hybrid Parallel Execution Model
Ttl Evalu. Steps S Ttl Evalu. Steps 0 Ttl Evalu. Steps N Rate Improvement Ttl Node Searched S Ttl Node Searched 0 Ttl Node Searched N Rate Change Ttl CPU Rounds 0 Ttl CPU Rounds N Rate Change Ttl Com Cost 0 Ttl Com Cost N Static Overhead Rate Change Ttl Wt Com Cst 0 Ttl Wt Com Cst N Static Wt Overhead Rate Change Ratio of Locality 0 Ratio of Locality N Rate Improvement Degree of Paral. 0 Degree of Paral. N Rate Improvement
mh=20 xh=30 mw=l xw=2 25 27 12 55.56% 24 24 10 58.33% 325 56 82.77% 47 19 6 59.57% 1,133.5 423.0 111.0 62.68% 24.117 22.263 7.69% 12.000 10.000 -16.67%
mh=30 xh=50 mw=l xw=2 34 35 16 54.29% 33 33 14 57.58% 563 107 80.99% 65 27 8 58.46% 2,689.5 1,083.5 292.0 59.71% 41.377 40.130 3.01% 16.057 16.750 4.32%
Avg. 16.833 18.167 9.167 49.54% 21.667 25.500 10.167 60.13% 230.83 41.833 81.88% 40.667 18.333 5.000 54.92% 836.67 330.58 95.500 60.49% 15.797 14.933 5.47% 9.903 8.276 -16.43%
Table 7.12: TEST RESULTS FOR LARGE DISTANCE RANGE (b)
142
Efficiency Considerations and Experimental Results
Ttl Evalu. Steps S Ttl Evalu. Steps 0 Ttl Evalu. Steps N Rate Improvement Ttl Node Searched S Ttl Node Searched 0 Ttl Node Searched N Rate Change Ttl CPU Rounds 0 Ttl CPU Rounds N Rate Change Ttl Com Cost 0 Ttl Com Cost N Static Overhead Rate Change Ttl Wt Com Cst 0 Ttl Wt Com Cst N Static Wt Overhead Rate Change Ratio of Locality 0 Ratio of Locality N Rate Improvement Degree of Paral. 0 Degree of Paral. N Rate Improvement
mh=3 xh=5 mw=6 xw=10 5 7 7 — 11 4 4 — 15 15 — 7 7 0 — 28.5 26.5 0.0 7.02% 4.071 3.786 7.00% 2.000 2.000 —
mh=6 xh=9 mw=5 xw=7 10 10 10 — 22 14 14 — 49 49 — 24 24 0 — 124.0 124.0 0.0 — 5.167 5.167 —
mh=8 xh=13 mw=3 xw=5 10 11 9 18.18% 20 12 10 16.67% 50 33 34.00% 20 17 2 15.00% 190.0 158.5 15.0 16.58% 9.500 9.324 1.85%
4.800 4.800 —
4.455 3.778 -15.20%
mh=15 xh=20 mw=l xw=3 23 19 16 15.79% 26 66 47 28.79% 383 224 41.51% 81 63 2 22.22% 851.5 701.5 33.0 17.62% 10.512 11.135 -5.93% 20.105 16.438 -18.24%
Table 7.13: TEST RESULTS FOR DEEP_JUMPING-RATE = 0.1 (a)
Hybrid Parallel Execution Model
Ttl Evalu. Steps S Ttl Evalu. Steps 0 Ttl Evalu. Steps N Rate Improvement Ttl Node Searched S Ttl Node Searched 0 Ttl Node Searched N Rate Change Ttl CPU Rounds 0 Ttl CPU Rounds N Rate Change Ttl Com Cost 0 Ttl Com Cost N Static Overhead Rate Change Ttl Wt Com Cst 0 Ttl Wt Com Cst N Static Wt Overhead Rate Change Ratio of Locality 0 Ratio of Locality N Rate Improvement Degree of Paral. 0 Degree of Paral. N Rate Improvement
143
mh=20 xh=30 mw=l xw=2 25 27 24 11.11% 24 24 21 12.50% 325 253 22.15% 47 41 2 12.77% 1,133.5 979.5 33.0 13.59% 24.117 23.890 0.94% 12.000 11.500 -4.17%
mh=30 xh=50 mw=l xw=2 34 35 33 5.71% 33 33 31 6.06% 563 498 11.55% 65 61 2 6.15% 2,689.5 2,500.5 93.0 7.03% 41.377 40.992 0.93% 16.057 16.697 3.99%
Avg. 17.833 18.167 16.500 9.18% 22.667 25.500 21.167 16.99% 230.83 178.67 22.60% 40.667 35.500 1.333 12.71% 836.17 748.42 29.000 10.49% 15.791 15.716 0.47% 9.903 9.202 -7.08%
Table 7.14: TEST RESULTS FOR DEEP-JUMPING-RATE = 0.1 (b)
144
Efficiency Considerations and Experimental Results
Ttl Evalu. Steps S Ttl Evalu. Steps 0 Ttl Evalu. Steps N Rate Improvement Ttl Node Searched S Ttl Node Searched 0 Ttl Node Searched N Rate Change Ttl CPU Rounds 0 Ttl CPU Rounds N Rate Change Ttl Com Cost 0 Ttl Com Cost N Static Overhead Rate Change Ttl Wt Com Cst 0 Ttl Wt Com Cst N Static Wt Overhead Rate Change Ratio of Locality 0 Ratio of Locality N Rate Improvement Degree of Paral. 0 Degree of Paral. N Rate Improvement
mh=3 xh=5 mw=6 xw=10 5 7 7 — 11 4 4 — 15 15 — 7 7 0 — 28.5 26.5 0.0 7.02% 4.071 3.786 7.00% 2.000 2.000 —
mh=6 xh=9 mw=5 xw=7 10 10 10 — 22 14 13 7.14% 49 40 18.37% 24 24 2 — 124.0 120.0 11.0 3.23% 5.167 5.000 3.23% 4.800 5.200 8.33%
mh=8 xh=13 mw=3 xw=5 10 11 9 18.18% 20 12 7 41.67% 50 30 40.00% 20 18 4 10.00% 190.0 160.0 34.0 15.79% 9.500 8.889 6.43% 4.455 4.889 9.74%
mh=15 xh=20 mw=l xw=3 23 19 11 42.11% 26 66 29 56.06% 383 99 74.15% 81 42 12 48.15% 851.5 451.0 150.0 47.03% 10.512 10.738 -2.15% 20.105 15.636 -22.23%
Table 7.15: TEST RESULTS FOR DEEPJUMPINCRATE = 0.5 (a)
Hybrid Parallel Execution Model
Ttl Evalu. Steps S Ttl Evalu. Steps 0 Ttl Evalu. Steps N Rate Improvement Ttl Node Searched S Ttl Node Searched 0 Ttl Node Searched N Rate Change Ttl CPU Rounds 0 Ttl CPU Rounds N Rate Change Ttl Com Cost 0 Ttl Com Cost N Static Overhead Rate Change Ttl Wt Com Cst 0 Ttl Wt Com Cst N Static Wt Overhead Rate Change Ratio of Locality 0 Ratio of Locality N Rate Improvement Degree of Paral. 0 Degree of Paral. N Rate Improvement
mh=20 xh=30 mw=l xw=2 25 27 16 40.74% 24 24 13 45.83% 325 105 67.69% 47 26 10 44.68% 1,133.5 597.5 185.0 47.29% 24.117 22.981 4.71% 12.000 11.125 -7.29%
mh=30 xh=50 mw=l xw=2 34 35 20 42.86% 33 33 18 45.45% 563 173 69.27% 65 36 12 44.62% 2,689.5 1,461.0 414.0 45.68% 41.377 40.583 1.92% 16.057 15.450 -3.78%
Avg. 17.833 18.167 12.167 33.03% 22.667 25.500 14.000 45.10% 230.83 77.000 66.64% 40.667 25.500 6.667 37.30% 836.17 469.33 132.33 43.87% 15.791 15.329 2.93% 9.903 9.050 -8.61%
Table 7.16: TEST RESULTS FOR DEEPJUMPINGJIATE = 0.5 (b)
146
Efficiency Considerations and Experimental Results
Ttl Evalu. Steps S Ttl Evalu. Steps 0 Ttl Evalu. Steps N Rate Improvement Ttl Node Searched S Ttl Node Searched 0 Ttl Node Searched N Rate Change Ttl CPU Rounds 0 Ttl CPU Rounds N Rate Change Ttl Com Cost 0 Ttl Com Cost N Static Overhead Rate Change Ttl Wt Com Cst 0 Ttl Wt Com Cst N Static Wt Overhead Rate Change Ratio of Locality 0 Ratio of Locality N Rate Improvement
mh=3 xh=5 mw=6 xw=10 5 7 6 14.29% 11 4 4 — 15 11 26.67% 7 6 2 14.29% 28.5 22.0 7.0 22.81% 4.071 3.667 9.92%
mh=6 xh=9 mw=5 xw=7 10 10 7 30.00% 22 14 5 64.29% 49 17 65.31% 24 16 6 33.33% 124.0 75.0 29.0 39.52% 5.167 4.688 9.27%
mh=8 xh=13 mw=3 xw=5 10 11 7 36.36% 20 12 8 33.33% 50 20 60.00% 20 16 4 20.00% 190.0 139.0 38.0 26.84% 9.500 8.688 8.55%
mh=15 xh=20 mw=l xw=3 23 19 8 57.89% 26 66 15 77.27% 383 39 89.92% 81 27 16 66.67% 851.5 287.5 176.0 66.24% 10.512 10.648 -1.29%
Degree of Paral. 0 Degree of Paral. N Rate Improvement
2.000 1.833 -8.35%
4.800 4.714 -1.79%
4.455 3.857 -13.42%
20.105 11.875 -40.94%
Table 7.17: TEST RESULTS FOR DEEP-JUMPINGJR.ATE = 0.9 (a)
Hybrid Parallel Execution Model
Ttl Evalu. Steps S Ttl Evalu. Steps 0 Ttl Evalu. Steps N Rate Improvement Ttl Node Searched S Ttl Node Searched 0 Ttl Node Searched N Rate Change Ttl CPU Rounds 0 Ttl CPU Rounds N Rate Change Ttl Com Cost 0 Ttl Com Cost N Static Overhead Rate Change Ttl Wt Com Cst 0 Ttl Wt Com Cst N Static Wt Overhead Rate Change Ratio of Locality 0 Ratio of Locality N Rate Improvement Degree of Paral. 0 Degree of Paral. N Rate Improvement
147
mh=20 xh=30 mw=l xw=2 25 27 15 44.44% 24 24 13 45.83% 325 92 71.69% 47 25 10 46.81% 1,133.5 576.0 185.0 49.18% 24.117 23.040 4.47% 12.000 9.800 -18.33%
mh=30 xh=50 mw=l xw=2 34 35 14 60.00% 33 33 12 63.64% 563 80 85.79% 65 24 16 63.08% 2,689.5 961.0 552.0 64.27% 41.377 40.042 3.23% 16.057 14.929 -7.02%
Avg. 17.833 18.167 9.500 47.71% 22.667 25.500 9.500 62.75% 230.83 43.167 81.30% 40.667 19.000 9.000 53.28% 836.17 343.42 164.50 58.93% 15.791 15.129 4.19% 9.903 7.835 -20.88%
Table 7.18: TEST RESULTS FOR DEEPJUMPING-RATE = 0.9 (b)
148
Efficiency Considerations and Experimental Results
Ttl Evalu. Steps S Ttl Evalu. Steps 0 Ttl Evalu. Steps N Rate Improvement Ttl Node Searched S Ttl Node Searched 0 Ttl Node Searched N Rate Change Ttl CPU Rounds 0 Ttl CPU Rounds N Rate Change Ttl Com Cost 0 Ttl Com Cost N Static Overhead Rate Change Ttl Wt Com Cst 0 Ttl Wt Com Cst N Static Wt Overhead Rate Change Ratio of Locality 0 Ratio of Locality N Rate Improvement Degree of Paral. 0 Degree of Paral. N Rate Improvement
mh=3 xh=5 mw=6 xw=10 5 7 6 14.29% 11 18 11 38.89% 43 18 58.14% 14 13 2 7.14% 46.0 32.5 7.0 29.35% 3.286 2.500 23.92% 6.000 5.333 -11.12%
mh=6 xh=9 mw=5 xw=7 10 9 7 22.22% 22 224 74 66.96% 356 98 72.47% 231 83 8 64.07% 741.5 286.5 44.0 61.36% 3.210 3.452 -7.54% 39.444 20.429 -48.21%
mh=8 xh=13 mw=2 xw=5 13 11 7 36.36% 20 154 33 78.57% 342 49 85.67% 162 50 12 69.14% 978.0 327.0 98.0 66.56% 6.037 6.540 -8.33% 31.000 13.857 -55.30%
mh=15 xh=20 mw=2 xw=3 17 19 12 36.84% 23 668 131 80.39% 2,374 338 85.76% 635 145 40 77.17% 4,182.5 1,325.5 396.0 68.31% 6.587 9.141 -38.77% 124.90 42.750 -65.77%
Table 7.19: TEST RESULTS FOR VARIABLE JIELATEDJIATE = 0.0 (a)
Hybrid Parallel Execution Model
Ttl Evalu. Steps S Ttl Evalu. Steps 0 Ttl Evalu. Steps N Rate Improvement Ttl Node Searched S Ttl Node Searched 0 Ttl Node Searched N Rate Change Ttl CPU Rounds 0 Ttl CPU Rounds N Rate Change Ttl Com Cost 0 Ttl Com Cost N Static Overhead Rate Change Ttl Wt Com Cst 0 Ttl Wt Com Cst N Static Wt Overhead Rate Change Ratio of Locality 0 Ratio of Locality N Rate Improvement Degree of Paral. 0 Degree of Paral. N Rate Improvement
149 mh=20 xh=30 mw=l xw=2 25 27 16 40.74% 24 24 13 45.83% 325 105 67.69% 47 26 10 44.68% 1,133.5 597.5 185.0 47.29% 24.117 22.981 4.71% 12.000 11.125 -7.29%
mh=30 xh=50 mw=l xw=2 34 35 20 22.73% 33 33 18 45.45% 563 173 56.61% 65 36 12 44.62% 2,689.5 1,461.0 414.0 45.68% 41.377 40.583 1.92% 16.057 15.450 -3.78%
Avg. 17.333 18.000 11.333 37.04% 22.167 186.83 46.667 75.02% 667.17 130.17 80.49% 192.33 58.833 14.000 69.41% 1,628.5 671.67 190.67 58.76% 14.102 14.200 -0.69% 38.234 18.157 -52.51%
Table 7.20: TEST RESULTS FOR VARJABLEJIELATED .RATE = 0.0 (b)
150
Efficiency Considerations and Experimental Results
Ttl Evalu. Steps S Ttl Evalu. Steps 0 Ttl Evalu. Steps N Rate Improvement Ttl Node Searched S Ttl Node Searched 0 Ttl Node Searched N Rate Change Ttl CPU Rounds 0 Ttl CPU Rounds N Rate Change Ttl Com Cost 0 Ttl Com Cost N Static Overhead Rate Change Ttl Wt Com Cst 0 Ttl Wt Com Cst N Static Wt Overhead Rate Change Ratio of Locality 0 Ratio of Locality N Rate Improvement Degree of Paral. 0 Degree of Paral. N Rate Improvement
mh=3 xh=5 mw=6 xw=10 5 7 7 — 11 4 4 — 15 15 — 7 7 0 — 28.5 26.5 0.0 7.02% 4.071 3.786 7.00% 2.000 2.000 —
mh=6 xh=9 mw=5 xw=7 10 10 10 — 22 14 13 7.14% 49 40 18.37% 24 24 2 — 124.0 120.0 11.0 3.23% 5.167 5.000 3.23% 4.800 5.200 8.33%
mh=8 xh=13 mw=2 xw=5 13 14 9 35.71% 20 51 20 60.78% 242 56 76.86% 58 31 10 46.55% 429.0 231.0 83.0 46.15% 7.397 7.452 -0.74% 17.214 11.222 -34.81%
mh=15 xh=20 mw=2 xw=3 17 19 11 42.11% 23 183 61 66.67% 911 171 81.23% 185 81 40 56.22% 1,629.5 685.5 428.0 57.93% 8.808 8.463 3.92% 47.895 29.636 -38.12%
Table 7.21: TEST RESULTS FOR VARIABLE_RELATEDJtATE = 0.5 (a)
Hybrid Parallel Execution Model
Ttl Evalu. Steps S Ttl Evalu. Steps 0 Ttl Evalu. Steps N Rate Improvement Ttl Node Searched S Ttl Node Searched 0 Ttl Node Searched N Rate Change Ttl CPU Rounds 0 Ttl CPU Rounds N Rate Change Ttl Com Cost 0 Ttl Com Cost N Static Overhead Rate Change Ttl Wt Com Cst 0 Ttl Wt Com Cst N Static Wt Overhead Rate Change Ratio of Locality 0 Ratio of Locality N Rate Improvement Degree of Paral. 0 Degree of Paral. N Rate Improvement
151 mh=20 xh=30 mw=l xw=2 25 27 16 40.74% 24 24 13 45.83% 325 105 67.69% 47 26 10 44.68% 1,133.5 597.5 185.0 47.29% 24.117 22.981 4.71% 12.000 11.125 -7.29%
mh=30 xh=50 mw=l xw=2 34 35 20 42.86%
Avg. 17.333 18.667 12.167 34.82%
33 33 18 45.45% 563 173 69.27% 65 36 12 44.62% 2,689.5 1,461.0 414.0 45.68% 41.377 40.583 1.92% 16.057 15.450 -3.78%
22.167 51.500 21.500 58.25% 350.83 93.333 73.40% 64.333 34.167 12.333 46.89% 1,005.7 520.25 186.83 48.27% 15.156 14.711 2.94% 16.661 12.439 -25.34%
Table 7.22: TEST RESULTS FOR VARIABLEJIELATED JIATE = 0.5 (b)
152
Efficiency Considerations and Experimental Results
Ttl Evalu. Steps S Ttl Evalu. Steps 0 Ttl Evalu. Steps N Rate Improvement Ttl Node Searched S Ttl Node Searched 0 Ttl Node Searched N Rate Change Ttl CPU Rounds 0 Ttl CPU Rounds N Rate Change Ttl Com Cost 0 Ttl Com Cost N Static Overhead Rate Change Ttl Wt Com Cst 0 Ttl Wt Com Cst N Static Wt Overhead Rate Change Ratio of Locality 0 Ratio of Locality N Rate Improvement Degree of Paral. 0 Degree of Paral. N Rate Improvement
mh=3 xh=5 mw=6 xw=10 5 7 7 — 11 4 4 — 15 15 — 7 7 0 — 28.5 26.5 0.0 7.02% 4.071 3.786 7.00% 2.000 2.000 —
mh=6 xh=9 mw=5 xw=7 10 12 12 — 22 9 9 — 52 52 — 17 17 0 — 117.5 111.5 0.0 5.11% 6.912 6.559 5.11% 4.250 4.250 —
mh=8 xh=13 mw=2 xw=4 13 14 12 14.29% 19 12 10 16.67% 77 54 29.87% 23 21 2 8.70% 224.5 196.5 23.0 12.47% 9.761 9.357 4.14% 5.429 5.750 5.91%
mh=15 xh=20 mw=2 xw=3 17 19 14 26.32% 23 16 12 25.00% 153 79 48.37% 31 24 6 22.58% 499.5 359.0 71.0 28.13% 16.113 14.958 7.17% 8.000 7.643 -4.46%
Table 7.23: TEST RESULTS FOR VARIABLE_RELATED_RATE = 0.9 (a)
Hybrid Parallel Execution Model
153
mh=20 mh=30
Ttl Evalu. Steps S Ttl Evalu. Steps 0 Ttl Evalu. Steps N Rate Improvement Ttl Node Searched S Ttl Node Searched 0 Ttl Node Searched N Rate Change Ttl CPU Rounds 0 Ttl CPU Rounds N Rate Change Ttl Com Cost 0 Ttl Com Cost N Static Overhead Rate Change Ttl Wt Com Cst 0 Ttl Wt Com Cst N Static Wt Overhead Rate Change Ratio of Locality 0 Ratio of Locality N Rate Improvement Degree of Paral. 0 Degree of Paral. N Rate Improvement
xh=30 mw=l xw=2 25 27 16 40.74% 24 24 13 45.83% 325 105 67.69% 47 26 10 44.68% 1,133.5 597.5 185.0 47.29% 24.117 22.981 4.71% 12.000 11.125 -7.29%
xh=60 mw=l xw=2 34 35 20 42.86% 33 33 18 45.45% 563 173 69.27% 65 36 12 44.62% 2,689.5 1,461.0 414.0 45.68% 41.377 40.583 1.92% 16.057 15.450 -3.78%
Avg. 17.333 19.000 13.500 28.95% 22.167 16.333 11.000 32.65% 197.50 79.667 59.66% 31.667 21.833 5.000 31.05% 782.17 458.67 115.50 41.36% 17.059 16.371 4.03% 7.956 7.703 -3.18%
Table 7.24: TEST RESULTS FOR VARIABLEJIELATED JIATE = 0.9 (b)
154
Efficiency Considerations and Experimental Results
Ttl Evalu. Steps S Ttl Evalu. Steps 0 Ttl Evalu. Steps N Rate Improvement Ttl Node Searched S Ttl Node Searched 0 Ttl Node Searched N Rate Change Ttl CPU Rounds 0 Ttl CPU Rounds N Rate Change Ttl Com Cost 0 Ttl Com Cost N Static Overhead Rate Change Ttl Wt Com Cst 0 Ttl Wt Com Cst N Static Wt Overhead Rate Change Ratio of Locality 0 Ratio of Locality N Rate Improvement Degree of Paral. 0 Degree of Paral. N Rate Improvement
mh=3 xh=4 mw=8 xw=12 9 7 6 14.29% 15 8 7 12.50% 23 14 39.13% 7 7 2 — 21.5 18.0 5.0 16.28% 3.071 2.571 16.28% 3.143 2.833 -9.86%
mh=4 xh=5 mw=6 xw=8 6 7 5 28.57% 16 5 3 40.00% 17 8 52.94% 9 12 2 -33.33% 30.5 22.0 7.0 27.87% 3.389 1.833 45.91% 2.286 1.800 -21.26%
mh=5 xh=6 mw=4 xw=6 9 9 6 33.33% 14 16 8 50.00% 53 15 71.70% 16 11 4 31.25% 61.0 28.0 14.0 54.10% 3.812 2.545 33.24% 5.778 4.500 -22.12%
mh=6 xh=9 mw=3 xw=4 12 9 7 22.22% 15 20 10 50.00% 59 25 57.63% 24 18 4 25.00% 133.0 96.0 22.0 27.82% 5.542 5.333 3.77% 6.444 4.714 -26.85%
Table 7.25: TEST RESULTS FOR NON.DETERMINISM JIATE = 1 (a)
Hybrid Parallel Execution Model
Ttl Evalu. Steps S Ttl Evalu. Steps 0 Ttl Evalu. Steps N Rate Improvement Ttl Node Searched S Ttl Node Searched 0 Ttl Node Searched N Rate Change Ttl CPU Rounds 0 Ttl CPU Rounds N Rate Change Ttl Com Cost 0 Ttl Com Cost N Static Overhead Rate Change Ttl Wt Com Cst 0 Ttl Wt Com Cst N Static Wt Overhead Rate Change Ratio of Locality 0 Ratio of Locality N Rate Improvement Degree of Paral. 0 Degree of Paral. N Rate Improvement
155
mh=7
mh=12
xh=12 mw=l xw=3 11 11 8 27.27% 12 16 12 25.00% 66 33 50.00% 23 14 6 39.13% 192.5 114.0 47.0 40.78% 8.370 8.143 2.71% 5.909 6.375 7.89%
xh=15 mw=l xw=2 14 15 13 13.33% 13 13 10 23.08% J 93 67 27.96% 25 20 4 20.00% 284.5 241.0 22.0 15.29% 11.380 12.050 -5.89% 6.133 5.615 -8.45%
Avg. 10.167 9.667 7.500 22.42% 14.167 13.000 8.333 35.90% 51.833 27.000 47.91% 17.333 13.667 3.667 21.15% 120.50 86.500 19.500 28.22% 5.927 5.412 8.69% 4.949 4.306 -12.99%
Table 7.26: TEST RESULTS FOR NON-DETERMINISM_RATE = 1 (b)
156
Efficiency Considerations and Experimental Results
Ttl Evalu. Steps S Ttl Evalu. Steps 0 Ttl Evalu. Steps N Rate Improvement Ttl Node Searched S Ttl Node Searched 0 Ttl Node Searched N Rate Change Ttl CPU Rounds 0 Ttl CPU Rounds N Rate Change Ttl Com Cost 0 Ttl Com Cost N Static Overhead Rate Change Ttl Wt Com Cst 0 Ttl Wt Com Cst N Static Wt Overhead Rate Change Ratio of Locality 0 Ratio of Locality N Rate Improvement Degree of Paral. 0 Degree of Paral. N Rate Improvement
mh=3 xh=4 mw=8 xw=12 18 7 6 14.25% 32 12 11 8.33% 32 20 37.50% 9 9 4 — 25.5 22.0 10.0 13.73% 2.833 2.444 13.73% 4.429 4.000 -9.69%
mh=4 xh=5 mw=6 xw=8 15 7 5 28.57% 46 21 10 52.38% 46 16 65.22% 25 33 6 -32.00% 51.5 38.0 17.0 26.21% 2.060 1.152 44.08% 6.429 4.800 -25.34%
mh=5 xh=6 mw=4 xw=6 23 9 9 — 33 28 32 -14.29% 84 66 21.43% 22 36 6 -63.64% 77.0 72.5 19.0 5.84% 3.500 2.014 42.46% 9.222 10.000 8.44%
mh=6 xh=9 mw—3 xw=4 43 10 7 30.00% 64 22 17 22.73% 69 37 46.38% 34 17 8 50.00% 161.0 78.0 44.0 51.55% 4.735 4.588 3.10% 6.800 8.857 30.25%
Table 7.27: TEST RESULTS FOR NONJDETERMINISMJIATE = 2 (a)
Hybrid Parallel Execution Model
Ttl Evalu. Steps S Ttl Evalu. Steps 0 Ttl Evalu. Steps N Rate Improvement Ttl Node Searched S Ttl Node Searched 0 Ttl Node Searched N Rate Change Ttl CPU Rounds 0 Ttl CPU Rounds N Rate Change Ttl Com Cost 0 Ttl Com Cost N Static Overhead Rate Change Ttl Wt Com Cst 0 Ttl Wt Com Cst N Static Wt Overhead Rate Change Ratio of Locality 0 Ratio of Locality N Rate Improvement Degree of Paral. 0 Degree of Paral. N Rate Improvement
157
mh=7
mh=12
xh=12 mw=l xw=3 117 15 15 —
xh=15 mw=l xw=2 61 22 17 22.73% 60 60 43 28.33% 425 193 54.59% 64 49 20 23.44% 574.0 365.5 134.0 36.32% 8.969 7.459 16.84% 19.273 18.294 -5.08%
125 47 38 19.15% 222 149 32.88% 45 38 18 15.56% 354.5 250.0 109.0 29.48% 7.878 6.579 16.49% 14.733 12.533 -14.93%
Avg. 46.167 11.667 9.833 15.72% 60.000 31.667 25.167 20.53% 146.33 80.167 45.21% 33.167 30.333 10.333 8.54% 207.25 137.67 55.500 33.57% 4.996 4.039 19.16% 10.148 9.747 -3.95%
Table 7.28: TEST RESULTS FOR NON J)ETERMINISM_RATE = 2 (b)
158
Efficiency Considerations and Experimental Results
Ttl Evalu. Steps S Ttl Evalu. Steps 0 Ttl Evalu. Steps N Rate Improvement Ttl Node Searched S Ttl Node Searched 0 Ttl Node Searched N Rate Change Ttl CPU Rounds 0 Ttl CPU Rounds N Rate Change Ttl Com Cost 0 Ttl Com Cost N Static Overhead Rate Change Ttl Wt Com Cst 0 Ttl Wt Com Cst N Static Wt Overhead Rate Change Ratio of Locality 0 Ratio of Locality N Rate Improvement Degree of Paral. 0 Degree of Paral. N Rate Improvement
mh=4 mh=3 xh=4 xh=5 mw=8 mw=6 xw=12 xw=8 6 53 7 9 6 8 11.11% 14.29% j 14 138 5 23 5 13 — 43.48% 16 53 12 33 25.00% 37.74% 7 30 6 25 2 6 14.29% 16.67% 21.5 58.0 16.0 45.5 5.0 13.0 25.58% 21.55% 3.071 1.933 2.667 1.820 13.16% 5.85% 2.143 5.778 2.000 5.750 -6.67% -0.48%
mh=5 xh=6 mw=4 xw=6 19 9 8 11.11% 25 62 49 20.97% 149 74 50.34% 36 31 8 13.89% 101.0 46.0 24.0 54.46% 2.806 1.484 47.11% 16.444 15.500 -5.74%
mh=6 xh=9 mw=3 xw=4 163 13 11 15.38% 218 134 82 38.81% 341 170 50.15% 132 93 40 29.55% 473.0 193.5 164.0 59.09% 3.583 2.081 41.92% 26.154 28.273 8.10%
Table 7.29: TEST RESULTS FOR NONJDETERMINISMJIATE = 3 (a)
Hybrid Parallel Execution Model
Ttl Evalu. Steps S Ttl Evalu. Steps 0 Ttl Evalu. Steps N Rate Improvement Ttl Node Searched S Ttl Node Searched 0 Ttl Node Searched N Rate Change Ttl CPU Rounds 0 Ttl CPU Rounds N Rate Change Ttl Com Cost 0 Ttl Com Cost N Static Overhead Rate Change Ttl Wt Com Cst 0 Ttl Wt Com Cst N Static Wt Overhead Rate Change Ratio of Locality 0 Ratio of Locality N Rate Improvement Degree of Paral. 0 Degree of Paral. N Rate Improvement
159
mh=7
mh=8
xh=12 mw=l xw=3 53 16 13 18.75% 53 37 33 10.81% 149 101 32.21% 42 33 12 21.43% 324.0 217.5 74.0 32.87% 7.714 6.591 14.56% 9.250 10.462 13.10%
xh=13 mw=l xw=3 93 19 18 5.26% 96 65 54 16.92% 263 198 24.71% 60 51 18 15.00% 440.0 371.5 99.0 15.57% 7.333 7.284 0.62% 13.789 12.889 -6.53%
Avg. 64.500 12.167 10.667 12.33% 90.667 54.333 39.333 27.61% 161.83 98.000 39.44% 51.167 39.833 14.333 22.15% 236.25 148.33 63.167 37.21% 4.407 3.655 17.06% 12.260 12.479 1.79%
Table 7.30: TEST RESULTS FOR NON_DETERMINISM_RATE = 3 (b)
Chapter 8
Mode Information Support for Automatic Transformation System The result of the data flow and dependency matrix can be used to support an automatic transformation system which can transform a FRORL logicbased specification into a procedural language program [98]. Mode information is used to adjust the execution sequence of a clause within a FRORL logic-based specification and help to facilitate the intelligent backtracking mechanism. Logic-based languages have been advocated by many researchers as good tools for software development. In spite of the fact that logic-based languages are good candidates for prescribing the requirements of software systems, their declarative semantic interpretations are not suitable for sequential compiled executions. The nondeterminism of logic interpretations and the inefficient execution mechanisms clearly indicate that we cannot be satisfied with an operational interpretation of a logic-based specification language; rather we need to transform the specification into a deterministic interpretations as given by a conventional programming language. The transformation of high-level or logic-based languages into lower-level languages has been extensively studied from various points of view. Some researchers are interested in the transformation of a logic theory into another logic theory based upon some criteria such as specialization or optimization[9, 26, 82, 91]. Some researchers are interested in the transformation of high-level specification languages into detailed implementations in different languages[5, 12, 35, 74, 75, 80]. Some other researchers are interested in the transformation of logic languages into functional languages based upon lambda calculus[4]. 161
162
Mode Information Support for Automatic Transformation
System
A significant survey of the approaches to realize a transformation system can be found in[76]. However, the transformation of a nonmonotonic logic-based requirements specification into a procedural (imperative) language program has not been investigated. This book also presents a logic-based transformation system which can transform a nonmonotonic logic-based specification language, FRORL, into procedural language programs. The main problem is to use new mode information to support intelligent backtracking .
8.1
Architecture of a Logic-based Specification Transformation System
The transformation of FRORL into a procedural language program begins by applying the non-monotonic inheritance handling of the underlying FRORL requirements specification. This step is followed by rewriting transformations. The logic-based specification is thus converted into a form suitable for the application of data flow and dependency analysis. Based on the analysis of data dependencies and flow analysis, the mode status is inferred. These mode information is essential to make adjustment to the execution sequence of a logicbased specification so that the result execution sequence can be expressed in a sequential, procedural language. Next, the functional procedures are generated for each clause in the specification, and the backtracking control mechanism behind a logic-based specification is implemented by a revised WAM. In this revised WAM, intelligent backtracking based on mode information can be implemented. This WAM is included in a run-time system which will select the actual procedures and execute them based on the decision made by the WAM. As another effect, this run-time system is also responsible to resolve the multiple executional directions of clauses in the logic-based specification. The Transformation steps for rewriting and data flow and dependency analysis have been discussed in the previous section. In the following sections, details of execution sequence adjustment and intelligent backtracking will be discussed.
8.2
Determination of Control Sequence
In this section the algorithm to determine the execution control sequence from the obtained mode matrix is presented. The algorithm yields the execution sequence of the clause, adjusted from the procedural point of view (i.e., the algorithm generates an allowable execution sequence for the rule clause). Algorithm 6 (Execution Control Sequence) 1 types args : list variable 2 lits : list literal 3 for each I € Lit
Hybrid Parallel Execution Model 4 5 6 7 8 9 10 11 12 13 14 15
163
unmark I mark the clause head I lits <- {1} for each v € Arg ifMM,,„="-" then args <- args \J {v} for each v 6 Arg UMMltV="B" then MMUv «- "-" for each V £ Lit ifMM|.,„=BB" then MMV ,„ «- "+"
16 While there is still an unmarked literal / 17 if / is a constant V Vw G Arg • MMitV = "+" =>• « e args 18 then mark / 19 /its <- H«s (J {/} 20 for each w € Arg 21 if AfMj,„= a -" 22 then args 4- args (J {i;} 22 ifMM,,„="B" 23 then args ^— args (J {u} 24 for each /' G Lit 25 if MMV,„="B" 26 then MM,.„ <- "+" 27 return lits /* lits contains the correct execution sequence */ The idea behind this algorithm is to check a literal in a clause to see whether or not all of its arguments with input mode are grounded. If so, then the literal is "legal" for the next execution sequence. In the algorithm, args stores all the ground argumensts within the clause, and lits stores all the searched literals (Line 1 and 2) The process starts with the clause head predicate. If a literal is searched, it is marked and put into lits (Line 3 to 6). An argument, if it has "-" mode, it will be put into args set (Line 7 to 9). Right now we assume the highest appearance of a "B" mode within a column corresponds to a producer, and all remaining "B" modes are consumer (Line 10 to 15), it is also possible to assume the second or third "B" mode to be a producer, and it is exactly where multiple solutions are reflected from the matrix. Line 16 to 26 search each literal in the clause to decide its arguments' modes. A literal is considered to be evaluable if it is a constant literal or if all of its input arguments are included in args, which means they are ground (Line 17). If a literal of this kind is found, then first marks it as
164
Mode Information Support for Automatic Transformation
System
searched (Line 18 to 19), and for each of its output argument, puts them into args set, which means they have bounded by execution of the current literal (Line 21 to 22). Arguments also should included in args are those where the mode can not be decided, similar to Line 10 to 15, we assume the highest one is a producer and all the remaining are consumers (Line 22 to 26). If we assume that the updating of entries in the matrix can be performed in constant time, the complexity of the above algorithm is 0(n3). E x a m p l e 23 After applying the execution control sequence determination algorithm to the running example we obtain the adjusted program as follows: p^Jf-r^i.ii). r(A1,A2)4-8(Ai,N,W)*t(W,A2). s(A1,A2,A3)
<-= (A 1; D) A = (A2,0) A =
(A3,\\).
s(Ai,A2,A3) «- car{Ai,X)Acdr(Ai,L1)/\s(Li,N,L2) T'(W,/(JV)) A =(A2, f(N)) A listcnstr(A2,[A2]) append([A2],L2,A3). t(A1,A2)<-=(A1,W)A
=
A A
(A2,\\).
t(Ai,A2) i- car(Ai,H) Acdr(Ai,Li), «g"(H,g(H)) A car(A2,g(H)) A cdr(A2,L2)
A t(Li,L 2 ).
For the example of the LSSGR protocol, if we assume that the two parameters are all bound, i.e., the first predicate is a test condition and the second one is a function call, then the execution sequence will be unchanged, and the result looks like:
caller-id(Ai) <— reverting-call(A\). calleeJd(Ai) «- reverting.call(Ax). reverting-call(Ai) callerJd(Ai)
<- two.party.call(Ai). two-pariy-call(Ai). two-party-call(Ai). A2) «— "="(A2,two.party)
A
two.party.call(Ai).
reverting-call(Ai) <— four-partysemiseLiddigit-call(Ai). caller-id(Ai) «— four-partysemiseLiddigit-call(Ai). calleeJd(Ai)
165
Hybrid Parallel Execution Model
reverting.call(Ai) «four-partysemiseLcall(Ai). callerJd(Ai)
A
initiate-busy-tone(Ai) «— reverting-call(Ai) A apply-busy-tone(A\. caller-id) A wait-for-disconnect(A\). initiate-busy-tone(Ai) <— four-partysemiseLiddigit-call(Ai) A correct-identifying-digit(Digit) A apply-busy-tone (A\. caller-id) A wait-for-disconnect(Ai). initiate-busy-tone(A\) <— four-partysemisel-iddigit-call(Ai) A not(correctJdentifying-digit(Digit)) A failure-response(A\.caller-id) A /aiZ. wait-for-disconnect(Ai)
166
Mode Information Support for Automatic Transformation reverting-call(Ai) A is-off-hook(Ai .callerAd) A stop.ring(Ai). ringing(Ai)
A
ringing (Ai) «— reverting-call(Ai) A is-off-hook(A\. calleeJd) A ring(Ai). ring(Ai)
System
Hybrid Parallel Execution Model
8.3
167
Data Type Generation and Procedural Function Formation
In transforming a logic-based requirements specification into a procedural language program, how to declare the appropriate data type for each variables in the specification has to be considered. In a logic based specification like the one based on the Horn-clause logic as we adopted in the transformation system. There is no explicitly declaration of the data types in the specification. Furthermore, for a logic based specification language like FRORL, even though it is possible to retract some data type from the specification, there is no concept of data type explicitly in the specification. What we have is only "Frame" objects with attribute slots and each attribute only describes either it is a set of collections of other objects or some "known" entities. However, to transform a Horn-clause logic-based specification into a procedural program, we have to generate structured record types in the target code from the original unstructured logic-based specification. The data type declaration of the attributes of a frame can be obtained by parsing all the activity frames that use the frame and decide what data type is most appropriate. Another way to decide the data type for the variables in a logic-based specification is to wait until the Horn-clause logic-based specification has been generated from the original FRORL specification and then examine the operations in the corresponding procedures (clauses with the same predicate head) that share the thread of an argument and then decide what data type is appropriate for the argument in the target code. We will present a knowledge-based approach combined with a data type propagation analysis to generate data type for each local and global variable for procedures in the target code. From here we can see that data type generation must be considered throughout the whole process of the transformation, for example, in the original FRORL specification, in the data flow analysis phase, or in the final procedural generation phase if the user's input is available. After the transformation system obtains the results generated by the backtracking control mechanism and the data type generation process, it will finally generate the complete corresponding procedural language program. Since there is no explicit data type in a Horn-clause like logic specification, we have to generate data types for the transformation into the target code. We will present a knowledge-based approach combined with a data type propagation analysis to generate data types for each local and global variable for procedures in target code. The data type information in this approach mainly comes from the FRORL specification.
168
8.3.1
Mode Information Support for Automatic Transformation
System
Rules-based Data Type Inference
In this section, we present the rules for the inference of generating data types from a given Horn-clause like logic-based specification. These rules are closely related to the rules defined in the stage of the canonicalization and data flow analysis stage[98]. Eventually what rules are necessary to be defined is decided by the format we used to denote the final products the canonicalization step may generate. For all the typical representations of the data format, rules are developed to represent the possible variable declarations corresponding to that data representation. RULE-25 (Data Type Analysis) (Opi, Op2, Result) =>• is one of +, —, * Data_Type(Opi, Number) Data_Type(Op2, Number) Data_Type(i?esufa, Number) This rule says that if we have detected an arithmetic expression in the procedure, then we can declare that the two operands and the results of the operation expression all have the type of "number". RULE-26 (Data Type Analysis) ..., Var, ... ==>• The data type of Var is ambiguous when unifying data types Data_Type(Far, Number) A Data.Type(For, Integer) =>• Data_Type(Far, Integer) Data_Type(For, Real) A Data_Type(Far, Integer) =$• Data_Type(Far, Real) Data_Type(Var, Real) A Data_Type(Kar, Number) =>• Data_Type(yar, Real) This rule is used to resolve the confusion when a variable has more than one data type attached to it. This confusion may occur when a variable appears in more than one place in a procedure and been attached more than one possible data types in the different context. The data type unification should be applied according to following order: [Number < Integer < Real or Float]. RULE-27 (Data Type Analysis) Data_Type(Var, Number)
=>
There is no operation attach to Var
Data_Type(Far, Number) This rule emphasizes that if the data type of a variable is "Number", then by default the data type of the variable will be adjusted to "Integer" in the
Hybrid Parallel Execution Model
169
target code. From here we can see the data type "Number" is a temporary data type in the analysis process. Sooner or later it will be adjusted to an appropriate data type. The adjustment is decided by either the previous rule if the ambiguity occurs or by this rule if there is no other data type is possible to be attached to this rule. RULE-28 (Data Type Analysis) (Opi,Op2, Result) => is / (division) Data.Type(Opi, Float) Data_Type(Op 2 , Float) Data_Type(i?esuZt, Float) This rule is built specifically to handle the operation of division. The operation embedded in this rule implies that the operands and the result of a division operation have the data type "Float". RULE-29 (Data Type Analysis) "=" {LeftJPerm, Right J^erm) Data_Type(Le/iJTerm, Type) A Data_Type(i?i0/rf.Term, Type) This rule says that if two variables are unified together, then the data type of this two variable should be the same. RULE-30 (Data Type Analysis) "mod" (Opi, Op2, Result) Data_Type(Opi, Integer) Data_Type(Op2, Integer) Dat&JType(Result, Integer) This rule is built to handle the mode operation. By definition, a mode operation requires that all of its operands be integer and the data type of the result is also integer. RULE-31 (Data Type Analysis)
"="(yar,0) Data_Type(Var, List) This rule is used to define what is a list data type and under what circumstances a list data type may occur in a specification. The introducing of the list data type will arise the following rules which are used to handle the various features of the list processing.
170
Mode Information Support for Automatic Transformation
System
RULE-32 (Data Type Analysis) append(Parti, Part2, Concat) Data_Type(Parti, List) Data_Type(Par*2, List) Data_Type(Concai, List) RULE-33 (Data Type Analysis) listconstr(7tem, List) =}• The data type of List is known Data_type(/tem, ITEM) =s> Data_Type(£isi, LIST) A Data-Element (LIST, ITEM) Data.type(Lisi, LIST) =• Data_Element(LIST, ITEM) A Data_Type(/iem, ITEM) The operation listconstr is defined in the canonicalization step. The effects of it is to construct a list structure from a single element. It is used in the decomposition of the list structure used in the logic-based specification. This rule is built based on the operation included in the listconstr operation. This rule restricts the data type of Item to be the same data type of the element of the operand List. RULE-34 (Data Type Analysis) listconstr(Jiem, List) =£• The data type of Item is known Data.type(/iem, ITEM) =3> Data_Type(Lis£, LIST) A DataJElement(LIST, ITEM) Data_type(Lisi, LIST) =>• Data.Element(LIST, ITEM) A Data_Type(/iem, ITEM) This is the complimentary rule for the previous rule. RULE-35 (Data Type Analysis) P(Argi,Arg2,...,Argn) :- .. .,P(Vari,Var2,... ,Varn), . . . OR P{Argi, Arg2,..., Argn) : - . . . . P(Parai,Para2,... ,Paran) : - . . . , P(Vari,Var2, •.. ,Varn), — The data types of all the pairs of Argj and Varj or Argj and Paraj, for j = 1 . . . n, are the same. This rule is used to handle the case of recursive call. The data types of the corresponding parameters in the head should be the same. This can be insured by the calling pattern implied in the logic-based specification.
Hybrid Parallel Execution Model
171
8.3.2 Data Type Propagation Analysis and Generation In this subsection we will present an algorithm to do data type analysis. This approach is an assimilation of matrix approach for data flow and dependency analysis. They both are based upon the idea of matrix-oriented "reference" technique for clauses in logic-based specifications. So, most programming languages can effortlessly implement the matrix approach as long as the languages support two dimensional array or list structure. To justify the use of the matrix-oriented approach, we advocate the ease of manipulating the data type propagation analysis with matrix. Another point is the efficiency of doing data type analysis using matrix approach and, therefore, it is appropriate for my transformation system. However, this does not imply that there is no better alternative to data type analysis. In the previous subsection, we already present some useful transformation rules for inference on generation data type in target codes. Here, we will propose a matrix-oriented approach for data type propagation techniques. The matrix approach will show its simplicity in generating proper data type for Prolog-like logic specification languages. A data type propagation analysis matrix is a two dimensional representation. A horizontal (row) entry represents the literal which is either the underlying predicate name of the clause head or the literal name of the clause body. Note that for recursive rules, the same predicate head name will appear twice on the vertical entry which implicitly represents two different individuals. A vertical (column) entry represents either an argument in the underlying clause head or a variable of a literal or term in the clause body. These matrices have similar structure comparing with the data flow information matrix. The following is the strategy of applying the inference rules of data type to generate data type propagation analysis matrix. 1. Group together clauses with the same predicate head names and with same number of arguments. We call each such a group as a procedure. 2. For each group, we use a data type propagation analysis matrix to represent every individual clause. We mark a special mark, say "?", for the cell that is the intersection of horizontal and vertical entry. 3. Scan through the clause head and clause body and try to find and builtin operators such as "is", "+" and others. If any such built-in operator is found, then we apply the rule knowledge about the type for the operator and fill in the entry in the matrix. Once an entry is filled for a cell of the data type matrix, we delete the special mark "?". Repeat this process until no more applicable rules is available.
172
Mode Information Support for Automatic Transformation
System
4. Start the data type propagation analysis for the matrix. The propagation process is done by searching and applying suitable rules from the data type knowledge base . Each time when we apply the rules for data type, we then immediately propagate the result in both horizontal and vertical directions. And then we apply the rules again and propagation too. Repeat this process until all the entries marked "?" are filled with data types. 5. Loop back to step 1 through 4 until no new transformation result can be generated. The matrix we used in this algorithm adopts the format we used in doing the data flow and dependency analysis. But it is much more simpler than the matrix in the previous context. The only restriction is that at the end of the analysis, all the columns in the matrix must be the same data type and this data type must be the least upper bound of all the possible data types the variable represented by the column may have in the clause being analyzed. The RULE-2 says that if more than one possible data types may be attached to the variable in the clause being analyzed, then the data type of the variable must be adjusted to the least upper bound of all the possible data types according to the data type order denned in the RULE-2. By default, all the variables without explicit data type restrictions in the clause being analyzed will be assigned the data type integer by RULE-3. These data type information for each clause will be used later in the procedural language program generation phase to help the data type declaration.
8.3.3
D a t a T y p e Analysis in a F R O R L Specification
Even though there is no explicit data type included in a FRORL specification, we still can retract some data type information from a FRORL specification. These data type information mainly includes in the Object frame in a specification. Since an activity frame is used only to describe the transformation which may occur in the real world, it has less relation with the description of the object attribution, which is closely related to the data type of the objects in the real world. The data type information mainly comes from the two sources in a FRORL specification. The first source is related to an object frame in a FRORL specification, the second one is defined in the activity in an activity frame. 1. The attribute slot in an object frame. The possible slot value in an object frame's attribute slot may provides useful information for the decision of the corresponding variable which represents the actual case of the slot in the specification.
Hybrid Parallel Execution Model
173
2. The built-in predicates in a FRORL specification can provide auxiliary data type information for the variables in the specification. The built J n predicates in FRORL require special data type as its parameters. Whenever a variable occurs in a built J n predicate, its possible data type will include the one which is allowed by the specific built-in predicate. The formal definition of the attribute slot in an object frame and all built J n predicates can be found in [93]. In the following we will use an example in that paper to show the data type information retracted from attribute slot in an object frame: Example 24 Type generation for object frames.
object: reverting-call caller-id:; calleeJd:. object: two-party-call a-kind-of: reverting.call; partyJine-type: two-party. object:
four-partysemiseLiddigit-call a-kind-of: reverting-call; partyJine-type: four-partysemiseLiddigit.
object: four-party-semisel-call a-kind-of: reverting-call; party-line-type: four-partysemisel. object:
four-party-fullyseLcall a-kind-of: reverting-call; party-line-type: four-party-fullysel.
These definitions of object frame are part of a protocol to define a telephone switch system. By the common sense, all the attribute slot with attribute slot named id should have the data type of list (string of characters) by our definition. Other data type information can be discovered in the similar manner.
8.3.4
Procedural Program Generation
After the backtracking controll processing, and data type generation, the input requirements specification will be changed to a group of sequential execution
174
Mode Information Support for Automatic Transformation
System
statements. The execution sequence is decided by (1) the execution sequence determination, and (2) the backtracking control mechanism. The first decides the basic executional behavior of the logic-based specification. This part of information is solely decided by the structure of a logic-based specification, especially the data influence relations among the arguments of specification clauses, without considering any possible users' input. The second kind of constraint comes when actually the users' input is available, and the system is responsible to generate a procedural language to provide the user with the specific question supplied. The final product from the backtracking step has the following characteristics: 1. There is no recursion in the program. In this case all the execution has been explored in the backtracking phase so that the information contained in the linked list is a sequentially executed control string; 2. Data types of all variables in the clause being called have the information in the linked list to decide what kind of data types they are, for bounded variables in the specification or the variables get bounded in the process if execution, the actual value these variables can hold is decided by the information in the linked list; These characteristics, especially the second one, let the generation of the final procedural language program very easy to achieve. All the information necessary to decide the executional behavior is stored in the linked list for each clause in the specification. The final procedural program generator only need to trace along the path decided by the pointer field of the cells in the linked list to find out the correct execution sequence for the specific query from the user. Also information is available to decide the data types for the unbounded variables in the clauses being called in this execution. In the tracing process controlled by WAM, the execution of all the clauses involved have been decided. So we not only can obtain the executional information from the WAM controlled evaluation, but the information of the actual execution of the specification in the transformational domain can also decide the possible data types for the unbounded variables in the clauses involved in the execution. Another important characteristic we can see from the final product is that the optimization of the generated procedural language program plays a significant role in the whole transformation process. A significant example is
that in the partial procedural language program there will be no iterations included, as a result, a simple program which counts the sun of an array of 1000 elements will have a repetition of a chunk of statements for 1000 times. But the recent researches have explored the question in great detail about the similar question []. They points out that from the calling patterns of clauses
175
Hybrid Parallel Execution Model
in the logic program a trace tree can be built up. Theoretically this tree can be infinitely deep, but from the similarity of the calling patterns we can still adjust the representation of this tree into a finitely deep one. This provides us a interesting hint that we can also from the similarity of the patterns of either calling mode for each clause (exactly similar to the above mentioned research), or from the similarity of the repeated statements to deduce the useful information to adjust the format into the form which is more efficient than the original one. After applying data type analysis, the final rocedural language program can be genrated as the following: Example 25 The following is the complete procedures generated. The data type denotation adopts the format used in C. procedure-1: p(Ai) List Ai; { r(Ai, A\);
/* This is a procedure call */
} procedure-2: r(AuA2) List Ai\ List A2; { Integer N; List W; 8(AltN,wy, t(W, A2); } procedure-3, version-1: s(Ai,A2,A3) List Ai; Integer A2; List A3; { Ai = W; A2 = 0;
}
I* These are two procedure calls */
176
Mode Information Support for Automatic Transformation
System
procedure-3, version-2: s(A1,A2,A3) List A\\ Integer A2; List A3; { Integer X; List L\\ Integer N; Integer fN; List L2; car(A\,X); cdr^A1}L{); a(Li,N,L2y, i(N, /JV); /* This is a function call with two parameters*/ A2 = /JV; listcnstr(A2, [A2]); append([A2], L2, A3); } procedure-4, version-1: t(AuA2) List A\; List A2\ {
M = W;
} procedure-4, version-1: t(A1,A2) List Ai; List A2; { Integer H; List Li\ Integer H; Integer gH; List L2; car(A\,H);
Hybrid Parallel Execution Model cdr{Ai,Li); g(H,gu)); car(A2,gH); cdr(A2,L2); t(LuL2);
177
/* This is a function call with two parameters*/
} The selection of which procedure should be used in the program will based on the result of backtracking control. The backtracking control mechanism make the selection of the procedures depend on the result of the execution. In the process the intelligent backtracking scheme can be used. For the LSSGR protocol, the procedural program with data type information will looks like: caller Jd(Ai) Integer A\; { reverting .call (Ax);
} calleeJd(Ai) Integer Ax; { reverting.call
(Ax);
} r everting-call(Ax) Integer Ax; { two-party-call(Ax);
} caller Jd(Ax) Integer Ax; { two-party }
jcall(Ax).
calleeJd(Ax) Integer Ax; { two-party-call(Ax).
178
Mode Information Support for Automatic Transformation
} party dine JLype{A\, A2) Integer Ai; List A 2 ; { A2 = two-party; two.party-call(Ai); } r everting-call(Ai) Integer A±; { four-party }
semisellddigitjcall(Ai);
caller Jd(Ai) Integer Ai; { f our .party semisel }
Jddigitjcall(Ai);
calleeJd(Ai) Integer A\\ { four-partysemisel-iddigit-call(Ai); } party Jine.type(Ai, A2) Integer A±; List A2; { A2 = four-party semisel Jddigit; four-party semisel-iddigit-call {A\); } reverting-call{A\)
Integer Ai; { four-party semisel
}
jcall{A{);
System
Hybrid Parallel Execution Model caller-id(Ai) Integer A\; { four-party }
semisel-call(Ai);
callee.id(Ai) Integer A\; { four-party semis el .call (A\);
} party JineJype(Ai, A2) Integer A\; List A2; { A2 = four-party semisel; four-party semiseljcall(Ai); } reverting-call{Ai) Integer A\; { four-party-fully
sel-call{A\);
} caller-id(A\) Integer Ai; { four
.party-fullyjsel-call{A{)\
} calleeJd(Ai) Integer A\; { four-party-fully
} party-line Jype(Ai, A2) Integer A\; List A 2 ; {
jsel-call{Ai);
180
Mode Information Support for Automatic Transformation A2 = four -party -fully sel; four -party -fully sel -call (A\);
} complete(Ai) Integer Ai; { reverting-call(Ai); initiate JbusyJone(Ai); ringing (Ai); } initiate JbusyJone(Ai) Integer Ai; { reverting-Call (Ai); apply JbusyJone(Ai .caller-id); wait-for-disconnect(Ai); } initiate-busy-tone(Ai) Integer A\\ { four-party-semiseLiddigitjcall(Ai); correct-identifyingjdigit(Digit); apply-busy-tone(A\ .caller-id); wait-for-disconnect(Ai); } initiate-busy Jone(Ai) Integer Ai; { four-party semisel-iddigitjcall(Ai); not(correct-identifyingjdigit(Digit)); failuresesponse{A\ .caller Ad);
fail; }
System
Hybrid Parallel Execution Model wait-for-disconnect(Ai) Integer A\; { reverting.call (A\); is-disconnected(Ai); removeJbusy signal (Ai. caller Jd); } wait-f or jdisconnect{A\) Integer Ai; { reverting-call(Ai); not(is-disconnected(Ai)); wait-for-disconnect{A\); } ringing {A{) Integer A\; { reverting-call(Ai); is JO f f Jiook{A\.caller-id); stopjring(Ai); } ringing(Ai) Integer Ai\ { reverting-call(Ai); not(is-off-hook(Ai.caller-id)); ring{Ai); } ringing(Ai) Integer A\; { reverting -call(Ai); is-off-hook(Ai .calleeJd); stop-ring(Ai);
Mode Information Support for Automatic Transformation ringing(Ai) Integer Ai; { reverting .call (Ai); isjoff.hook(Ai .calleeJd); ring(Ai); } ring(Ai) Integer A\;
{ reverting .call (Ai); apply jnormaLring(Ai .caller Ad); delay-for_0.5 seconds; apply .normal jring{A\ .calleeJd); delay-f or D.5 seconds; ringing(Ax); } ring(Ax) Integer Ai; { four-party semiselJddigit-call(Ai); parties jm same side joj'Jine(Ai); apply jnormal jring (Ai .calleeJd); delay-f or Jd.5 seconds; ringing(A{);
} ring(Ai) Integer Ai; { four-party semiseljcall(Ai); apply jnormal-ring (Ai .caller-id); delay-f or J).5 seconds; apply jrevertingjring(Ai .calleeJd); delay -forJ).5 seconds; ringing{Ai);
}
System
Hybrid Parallel Execution Model
8.4
183
Intelligent Backtracking for Transformation System
Using mode information to improve the backtracking behavior has been addressed by many researchers, as mentioned in the previous sections. The mode information generated here can also be used to serve the same purpose. Since the mode information generated here is much more detailed than others. The intelligent backtracking behavior should be better using the new mode combination. Even though the intelligent backtracking is an interesting search trend, currently the handling of backtracking in my system is by adopting another approach, i.e., bases on the mode information to explore the search space of a logic-based specification in parallel and thus totally eliminate backtracking.
Chapter 9
Describing Non-Functional Requirements in F R O R L There are two categories of requirements, namely functional and non-functional, that are involved during the requirements analysis process. The functional requirements state the behavior of the system and model what a system does. The non-functional requirements, on the other hand, focus on the quality of the system and define how well the system should work.
9.1
Functional Requirements vs. Non-functional Requirements
Most of current techniques and tools focus on how to specify and analyze functional requirements. These are used to help the analysts to write a precise and correct requirements specification. The process concerned in these works is usually enough for showing the expected behavior of the system, but it does not contain the non-functional side of the requirements, which is the quality of the behaviors. It may not have any affect for a simple system which is only expected to produce some results from certain input without caring about how the results are produced. However, if the system's performance is an issue of concern, the requirements should reflect its needs. Similarly, there are properties like accuracy or security to be stated in some applications. The importance of non-functional requirements has started to gain recognition [45, 48], but the works published is minimal compared to the literature for functional requirements. Among such works there are research works done in information systems [71], real-time systems [36], and real-time, distributed systems [2]. However, as pointed out by these researchers their works are at 185
186
Describing Non-Functional Requirements in FRORL
the starting stage and need more work in the future. The work presented here is motivated by the work done by Mylopoulos and others [71], who proposed a process-oriented framework for modeling nonfunctional requirements from the qualitative aspect. Using requirements language (FRORL), their approach is followed in stating non-functional requirements. The non-functional requirements are organized independent from the functional requirements for flexibility. A method for referring non-functional requirements from functional requirements has also been derived. With a correct functional requirements specification derived from the corresponding methodology [94], the parallel evaluation algorithms can be applied on the functional requirements to see if the properties associated to the functional requirements are met. By observing the result, the non-functional requirements can be adjusted and modified. The contributions of the research to this area are providing a formal language in representing non-functional requirements, establishing relations between functional and non-functional requirements models, and applying a parallel evaluation method in refinement of the non-functional requirements.
9.2
Issues in Non-functional Requirements
When modeling non-functional requirements, we need to consider both what we model and how we process the model. The attributes of a system are what we model, and we can use either the process-oriented approach or the product-oriented approach to deal with how we process the model. In this section, I will discuss how non-functional requirements are modeled in current research works. In the next section, the approach of modeling and applying non-functional requirements is presented. As mentioned above, non-functional requirements deal with the quality issues in a software system and they describe different attributes in a system. Therefore, the attributes like performance , accuracy , security , or userfriendliness are the quality needed to be seen in a software system, but are not totally defined in functional requirements. Some attributes may be defined as a part of functional requirements, but it is better to separate the functional and the non-functional requirements. This provides better modifiability and readability of the requirements. The behavior of a system can be seen from the functional requirements and the quality of the behavior is described in the non-functional requirements. Roman uses six constraints to define the non-functional requirements — interface constraints , performance constraints , operating constraints , life-cycle constraints , economic constraints , and political constraints [81]. Rome Air Development Center (RADC) decomposes the consumer-oriented attributes
Hybrid Parallel Execution Model
187
of non-functional requirements into twenty-six general technically oriented attributes , such as accuracy , anomaly management, generality , . . . , etc. [49]. REMAP [79] is a system which uses non-functional requirements like the reasons behind design decisions or design rationales as the reference in the maintenance phase of the software life-cycle. The information can be used in understanding why some changes have taken place and keeping a project history for experience learned from that project. TARDIS [36] incorporates attributes like security and interface and physical architectural constraints like timeliness, dependability, and adaptability to change as non-functional requirements for a real-time system. For different attributes, we have different types of models. One can be just the description of what an attribute is; another can be the relation of how an attribute is constructed by other attributes. These are modeled by the product-oriented approach and the process-oriented approach respectively. When measuring non-functional requirements, we can use different approaches in processing a model depending on the characteristics of the models. We can apply a qualitative approach which uses reasoning to see if the model is correct or a quantitative approach which relies on software metrics to produce measurements on the model. 1. Process-oriented Approach The process-oriented approach concerns on how well the non-functional requirements are modeled. It checks to see if a certain piece of requirements can be derived from the given conditions. The idea can be seen as a goaldriven reasoning from the artificial intelligence field, where a requirement is a goal to meet. The model can be incomplete, because reasoning on attributes can produce results. This process of reasoning is a qualitative approach for verifying the model. NFR-Assistant is a prototype system for NFR-Framework which covers the attributes like accuracy , security , performance , and user-friendliness . Nonfunctional requirements are organized as a goal-tree, where the root of the tree is the attribute to model. In order for the attribute to be true, all the sub-trees must be satisfied. There are different links in the tree that shows how the subtree can be satisfied. It can be an -AiVD-relation, so that all the siblings need to be satisfied in order to make their parent true. It can be an OR-relation, in which only one child needs to be satisfied. There are other link types for relations, such as equal, sub, super. Different trees are for different attributes, but each tree can have a link between the internal nodes or even between its own node and a node from another tree. These links show the similarity and difference and they provide the possibility of trade off for having different attributes. Following is an example which shows how NFR-Assistant state the
188
Describing Non-Functional Requirements in FRORL
security aspect of a credit card account. The non-functional requirements is to define a security system. Example 26 The original representation of the non-functional requirements was in a graphical model as showing in Figure 9.1. This graphical model can be stated as follows: T h e u l t i m a t e g o a l : Secure accounts 1. Max integrity A (a) Accurate account A (b) Complete account 2. Max confidentiality A (a) Authorize access to account information i. Identify users A ii. Validate access against eligibility rules A iii. Authenticate user access • Compare signature V • Use P.I.N V • Require additional ID 3. Max availability
2. Product-oriented Approach The product-oriented approach basically focuses on the outcome of the model as a whole. It is as though we measure a final product using a software metric technique to see how well the performance criteria are met. Naturally, the non-functional requirements need to be stated completely. The measurement is based on the quantitative approach, because the major concern is to obtain a number which can be analyzed based on some standards. Non-functional requirements are measured according to the criteria set by the analyst. Each attribute is checked to see how much it satisfies the criteria. Weights are given to the attributes, and the results based on the weights are analyzed for reviewing how well the requirements reflect the client's needs. In addition, the software reliability field also deals with the quality of a software system. There are techniques derived for estimating the reliability measures. All these methods belong to the quantitative approach. RADC has a guidebook book for how to specify software goals and constructing software quality metric requirements [49]. The process goes through
189
Hybrid Parallel Execution Model
Secure accounts
Conpare signature
User PIN. Require additional I.D.
Figure 9.1: Non-functional requirements for security attributes
190
Describing Non-Functional Requirements in FRORL
these steps: ranking factors and criteria, revising rankings based on interrelationships, examine cost consideration, quantifying factors and criteria, comparing goals with project data, and evaluating all data. The entire set of requirements can be decomposed to subsections, and each subsections, in turn, can be decomposed further for measurement at a detail level. An example from [49] which shows how Maintainability is defined below. E x a m p l e 27 The definition of Maintainability uses a metric-aggregation method to combine the results from many metrics. • Maintainability
• Sub-Attributes: 1. Consistency: 2 metrics 2. Modularity: 2 metrics 3. Self-descriptiveness: 3 metrics 4. Simplicity: 6 metrics 5. Visibility: 3 metrics Software performance engineering uses quantitative methods to guide the software development in meeting performance objectives. Starting from the design phase of development, it uses models throughout the development stages. Performance requirements are defined and are incorporated to the software execution model and the system execution model [86]. GRAPE [2] is a such system which can model performance attributes for a real-time and distributed domain.
9.3
Non-functional Requirements Modeling in FRORL
An important issue in modeling non-functional requirements is to identify the attributes which are significant to the application. Here a process-oriented method for modeling non-functional requirements using FRORL is presented. The steps needed to construct a model are discussed, examples are also given for further explanation.
Hybrid Parallel Execution Model
191
Object Secure_accounts Object Max -integrity a_part_of: Secure-accounts Object Max-confidentiality a_part_of: Secure_accounts Object Max_availability a_part_of: Secure-accounts Object Accurate-account a_part.of: Maxdntegrity Object Complete-account a-part_of: Maxdntegrity Object Authorize_access-to_accountJnformation a_kind_of: Max_confidentiality Object Identify_users a-part.of: Authorize-access_to_accountinformation Object Validate-access_against_eligibility.rules a_part-of: Authorize.access_to_accountJnformation Object Authenticate_user_access a.part.of: Authorize_access_to_account_information Object Compare-signature aJcind-of: Authenticate.user.access Object Use.PIN a_kind_of: Authenticate.user .access Object Require_additionalJD aJcind-of: Authenticate.user .access We have spent much effort on how to derive correct functional requirements specification [94]. However, sometimes it is more difficult to deal with nonfunctional requirements. Non-functional requirements are usually incorporated
192
Describing Non-Functional Requirements in FRORL
into the frames as constraints. However, functional and non-functional requirements should be separated, so that there are more flexibility in modification and maintenance of a requirements specification. Currently object and activity frames are used to model non-functional requirements and relations among them, respectively. Modeling non-functional requirements in FRORL adopts FRORL's non-monotonic inheritance mechanism. Various non-functional requirements can be attached to non-functional requirements frames at different levels. Non-functional requirements attached to a frame will be inherited by its child frames unless defined otherwise by the child frames. The non-functional requirements are modeled as object frames in FRORL. The root frame (node) of an object tree (or a sub-tree) is satisfied when its child frames are satisfied. The abstract relation between objects in the example above has the following meaning: an_instance_of binding values to variables a_kind_of OR relation among siblings a_part_of AND relation among siblings We can model security shown previously in FRORL objects as next page. This definition can be represented in a hierarchical object model as Figure 9.2. Non-functional requirements are related with each other tightly. Changing one requirement may affect another. For example, higher accuracy requirements may result in lower response time and lower system throughput. A system which realizes non-functional requirements has to take this aspect into account. This relation is modeled as activity frame. The parts in the activity frame are the non-functional requirements involved in the relation. The action taken is the adjustment applied to one non-functional requirements frame when another changes. A non-functional requirements will be adjusted by the activity frames which define relations between the requirements and others. An example activity frame defining relation between MaxJntegrity and Max-availability is given in Example 28: Example 28 Relationship between MaxJntegrity and Max-availability. Activity Relate_non-functional_requirements (MaxJntegrity, Max_availability) Parts: MaxJntegrity: non-functional requirements Max-availability: non-functional requirements Action: Decrease(Max.availability) Increase(MaxJntegrity)
After we have a non-functional requirements built, we need to associate nonfunctional requirements to functional requirements. Let us view non-functional
193
Hybrid Parallel Execution Model
Max. integrity
u
7V
Accurate account
a_k_o
P-0
I
Complete Authorize access to account information account
a_p_o/ a_p_o
I
Identify users Validate access Authenticate against eligibility user access rales
a k o / a_k_o
Figure 9.2: Hierarchical model of the object frames for security
194
Describing Non-Functional Requirements in FRORL
requirements as a modifier in a sentence. Then non-functional requirements are an adjective to an object or they are an adverb to an activity (verb). We can simply state what non-functional requirements are associated to an object by having the name of such requirements in an attribute slot of the object. The non-functional requirements referred to by object frames form similar non-monotonic inheritance hierarchy in object frames as the functional components do. A non-functional requirements referred by an object frame will be inherited by the child object frames generated from it. Child frame can also override the non-functional requirements specified in its antecedent frames by providing the same definition of the non-functional requirements. This definition overrides the same non-functional requirements defined above. This enables the non-functional requirements at root to represent some general and abstract specification, and refine these definitions along the frame hierarchy. These refinement in the frame along the hierarchy can either strengthen or weaken the non-functional requirements specified at the antecedent frames. The non-functional requirements of an activity are implicitly specified. An activity has a parts slot which contains object names. When the objects in the parts slot are associated to some non-functional requirements, then the activity will assume those requirements. For example, if an activity, deposit, has two parts, amount and account and if the object, account, has non-functional requirements of accurate, then the result of the activity, deposit, must be accurate. This can be represented as follows: Object amount a_kind_of: figures Object account a_kind.of: record property: accurate / * a non-functional requirements object */ account-number: Activity deposit (X, Y) Parts: X: amount Y: account Precond: . . . Action: . . . Alt_action: . . .
Figure 9.3 illustrates a possible attachment of a non-functional requirements to an object frame hierarchy. This example is adopted from [71], where the example is used to illustrate the accuracy non-functional requirements of a management system. This management system is used by a company with various types of employees. The information about the employees are gathered
195
Hybrid Parallel Execution Model
and analyzed in the management system. Object frame for functional specification
Employee
Objectframefor non-functional specification Accuracy
Manageability
LessAccurate
Programmer
Engineer
Activityframefor relation between non-functional specification
Figure 9.3: Illustration of a FRORL specification with non-functional requirements attached First, a FRORL specification corresponding to the organization is presented. Each frame in the hierarchy represents an entity in the company. In order to achieve the goal of accuracy in the object frame hierarchy, only the root of the hierarchy needs to be assigned non-function requirements accurate. As a result, all the remaining frames in the hierarchy do not need to explicitly describe their non-functional requirements as accurate, instead, all of them inherit the same attribute from the root frame. Comparing this structure with the one presented in [71], the representation is much simpler in our case. The accuracy attribute inherited along the object frame is overridden in the publication (of a researcher) frame, where the accuracy of a publication of a researcher is defined by that the publication is periodically audited if it has been included in an existing database, or is validated if it is directly fed from
196
Describing Non-Functional Requirements in FRORL
the user. If a researcher's degree can include uncertainty to a certain degree but the accuracy of this researcher is still preserved, then a more specific requirements declaring that the researcher's degree may have certain inaccuracy can be attached to the frame degree.
9.4
Adjusting Non-functional Requirements
Along with the evaluation of a FRORL specification, the attached non-functional requirements specification may fail to meet the original requirements of the users. In order to solve this problem, it is necessary to develop some mechanism to adjust the value of a non-functional requirements for a FRORL specification frame. In [71], because the goal graph structure lacks theoretical foundation, it is difficult to develop a formal non-functional requirements adjustment mechanism. Even though the interrelations among the non-functional requirements are extensively studied in there, there still is not a clear mechanism of how to apply these adjustments to reach a satisfactory non-functional requirements state. In our approach, the non-monotonic inheritance is applied to the nonfunctional requirements in object frame. For activity frame, a reference is made to non-functional requirements structure if the frame includes an object as part which has attached non-functional requirements. Based on this unified non-functional specification, it is feasible for us to define a mechanism to adjust the non-functional requirements and their relations based on the logical foundation of FRORL specification. In the following, four rules are discussed along with the corresponding examples. For the four rules, we assume that NFR represents a FRORL specification defining non-functional requirements NFR. An object frame hierarchy refers to the non-functional requirements NFR, where frame A has child frames Bi, .. .,Bn. The purpose of these rules is to adjust the value of the non-functional requirements NFR at frame A after it has been detected for not meeting the users' requirements. RULE-37 (Non-functional Adjustment) =>The value of NFR is explicitly defined in frame A; All of J4'S children B\,...,Bn have explicit definitions of the value of NFR. Adjust the value of NFR in frame A.
This rule indicates that if the value of non-functional requirements NFR is explicitly provided at frame A and all its children, then updating can be made directly to frame A. Since all the child frames have included the value of non-functional requirements NFR, whatever changes made at A will not
Hybrid Parallel Execution Model
197
be propagated to its children and cause unnecessary side effect. Figure 9.4 illustrates this rule.
Figure 9.4: Illustration of rule 37
R U L E - 3 8 (Non-functional Adjustment) ==$>• The value of NFR is explicitly denned in frame A; Not all of A's children Bi,...,Bn have explicit definitions of NFR. Create an explicit NFR value in each child frame without defining NFR; Assign the value of NFR in A to the newly created NFR in child frames; Adjust NFR in frame A.
This rule indicates that if not all of the A's children have explicitly denned the value of NFR, then in order to prevent the case that the adjustment made in A affect the NFR value in those child frames, an explicit definition is necessary for them. Figure 9.5 illustrates this rule. RULE-39 (Non-functional Adjustment) =>The value of NFR is not explicitly defined in frame A; All of .A's children Bi,...,Bn have explicit definitions of the value of NFR. Create an explicit value of NFR in A;
198
Describing Non-Functional Requirements in FRORL
Operator 2 >
Figure 9.5: Illustration of rule 38
Figure 9.6: Illustration of rule 39
Hybrid Parallel Execution Model
199
Figure 9.7: Illustration of rule 40 This rule is similar to the Rule 37 except that NFR is not explicitly defined in A. In this case, NFR value has to be defined in A. But this definition will not affect the value of NFR at the child frames because all of them have explicit NFR value definitions. Figure 9.6 illustrates this rule. RULE-40 (Non-functional Adjustment) =>The value of NFR is not explicitly defined in frame A; Not all of A's children B\,..., Bn have explicit definitions of NFR. Create an explicit NFR value in each child frame without defining NFR; Assign the value of NFR inherited by A to the newly created NFR in child frames; Adjust NFR in frame A.
This rule is similar to the Rule 38 except that the value of NFR in the frame A is inherited instead of explicitly defined in it. By the same token, explicit definition has to be provided in each of the child frame to override the possible impact the change of NFR value in A on themselves. Figure 9.7 illustrates this rule.
Chapter 10
Summary The main topic of this book is a parallel execution model for a Horn-clause logic-based requirements specification. The model handles AND-parallelism , OR-parallelism , and eliminates of backtracking related to a logic-based specification. The significant points of this model are: • It is the first model to discuss the parallel execution of a logic-based requirements specification. Current approaches all focus on the parallelism of various logic programs. • The new model uses a static data dependency approach to generate mode information for each of the variables in the predicate of a clause. This static mode information is the basis for the whole parallel evaluation process. • The new model is guaranteed to find the maximum degree of AND- and OR-parallelism by adopting a new hybrid execution model. This model finds a parallel executable set of predicates at each step when a new variable binding is created based on the static mode information. • Because the static data dependency analysis is guaranteed to find all possible execution paths and all solutions of a logic-based specification, this model can eliminate the backtracking structure from the execution control structure. • At each step of the execution, the correct processes are divided into several clusters by the clauses. These clauses have predicates which are currently executable or have been active because of the static analysis. As a result, the total idle time of communication channels needed for the parallel processes is substantially lower than the idle time of the current 201
202
Summary approaches, where it is possible for a communication channel to be idle for a long time before a message is collected from deep predicate of one branch and been sent to the deep predicate of another branch.
• During the AND-parallel execution, since we do not generate all intermediate processes from the initial calling predicate to the current predicates, then if all those intermediate nodes do not contribute to the binding of the related variables, we do not need to generate these intermediate processes, which reduces the overhead of the parallelizing scheme. During the OR-parallel execution, the creation of the binding structure can also be delayed until the variable bindings actually are generated form the branches of the OR-parallel points. • Since at each step the search goes into the AND-OR tree of a logicbased specification and the evaluation is applied to the clauses which have predicates currently executable, it is possible to apply the classical short-cut algorithm here earlier than other approaches can and thereby reduce the space of the AND-OR tree searched to generate the complete multiple solutions. As a consequence, the time and space saved using the short-cut algorithm will be maximized. • The hybrid execution of OR-parallelism reduces the complexity of the synchronization data structures used for management of different bindings generated from different branches. This is reflected in the possibility of delaying the creation of binding data structure s and the one-layered binding data structure synchronization. The bottom-up execution will first search the AND-OR tree hierarchy, a data structure at an ORparallel merging point will not be created until all the variable bindings are actually generated from the evaluation of the OR-branches below the OR-parallel point. According to the bottom-up execution model, all the bindings generated from different branches of an OR-parallel predicate are already available before the merging algorithm at the same OR-branch point is called. So there is no need to maintain synchronization data structure for more than one level below the OR-parallel branch point. This will sharply reduce the complexity of creation and maintenance of all the synchronization data structures most the current approaches share without any limitation on the level of information a binding list needs to maintain. The only overhead we have here is a tag which distinguishes solutions generated one level deep of one OR-parallel sub-branch from others.
References [1] H. Alshawi, D.B. Moran, "The Delphi Model and Some Preliminary Experiments," Proceedings of the Fifth International Conference and Symposium on Logic Programming, pp. 447-465, Seattle, Washington, August 1988. [2] R. A. Ammar and C. P. Rosiene, "Visualizing a Hierarchy of Performance Models for Software Systems," Software - Practice and Experience, vol. 23, pp. 293-315, Chicago, IL, March 1993. [3] D. Barstow, "An Experiment in Knowledge-Based Automatic Programming," Artificial Intelligence, Vol. 12, pp. 73-119, 1979. [4] D. Barstow, "Automatic Programming for Streams II: Transformation Implementation," Proceedings of 10th Int. Conf. on Software Eng., pp. 439447, Singapore, 1988. [5] F.L. Bauer, et al., "The Munich Project CIP: The Program Transformation Systems CIP-S," Lecture Notes in Computer Sciences, No. 292, Springer, Berlin, 1987. [6] H. Berlinger, "The B tree Search Algorithm: A best-first proof procedure," Artificial Intelligence, Vol. 12, No. 1, pp. 23-40, 1979. [7] P. Biswas, S.C. Su, and D. Y.Y. Yu, "A Scalable Abstract Machine Model to support Limited-OR(LOR/Restricted-AND Parallelism(RAP) in Logic Programs," Proceedings of the Fifth International Conference and Symposium on Logic Programming, pp. 1161-1179, Seattle, Washington, August 1988. [8] P. Borgwardt, "Distributed Semi-intelligent Backtracking for a Stackbased AND-parallel Prolog," Proceedings of IEEE 1986 Symposium on Logic Programming, pp. 211-222, Salt Lake City, Utah, September 1986. 203
204
References
[9] A. Bossi, N. Cocco, and S. Dulli, "A Method for Specializing Logic Programs," A CM Trans, on Programming Languages and Systems, Vol. 12, No. 2, pp. 253-302, April 1990. [10] M. Bruynooghe, "Adding Redundancy to Obtain More Reliable and More Readable Prolog Programs," Proceedings of the first International Logic Programming Conference, Marseille, France, 1982. [11] M. Bruynooghe, B. Demoen, A. Callebaut and G. Janssens, "Abstract Interpretation: Towards the Global Optimization of Prolog Programs," Proceedings of the 4th IEEE Symposium on Logic Programming, San Francisco, California, September 1987. [12] R.M. Burstall and J. Darlington, "A Transformation System for Developing Recursive Programs," Journal of the ACM, Vol. 24, No. 1, pp. 44-67, March 1977. [13] R. Butler, E.L. Lusk, R. Olson, and R.A. Overbeek, "ANLWAM: A Parallel Implementation of the Warren Abstract Machine," Internal Report, Argonne National Laboratory, U.S.A., 1986. [14] R. Butler et. al., "ANLWAM: A Parallel Implementation of the Warren Abstract Machine," Internal Report, Argonne National Laboratory, U.S.A., 1986. [15] J.-H. Chang, and A.M. Despain, "Semi-Intelligent Backtracking of Prolog Based on Static Data Dependency Analysis," Proceedings of IEEE 1985 Symposium on Logic Programming, pp. 10-21, Boston, Massachusetts, July 1985. [16] J.-H. Chang, A.M. Despain, and D. DeGroot, "AND-parallelism of Logic Programs Based on A Static Data Dependency Analysis," Digest of Papers, COMPCON Spring '85, pp. 218-225, February 1985. [17] A. Chen, and C.-L. Wu, "A Parallel Execution Model of Logic Programs," IEEE Trans, on Parallel Distributed Sys., pp. 79-92, VOL. 2, No. 1, January 1991. [18] L. Chu, and B. Wah, "Band Search: an Efficient Alternative to Guided Depth-first Search," Proceedings of 1992 IEEE International Conf. on Tools with AI, pp. 154-161, Arlington, VA, November 1992. [19] A. Ciepielewski, and B. Hausman, "Performance Evaluation of a Storage Model for OR-Parallel Execution of Logic Programs," Proceedings of IEEE 1986 Symposium on Logic Programming, pp. 246-257, Salt Lake City, Utah, September 1986.
References
205
[20] W. F. Clocksin and C. S. Mellish, Programming in Prolog, Springer-Verlag, Berlin Heidelberg, 1984. [21] P. Coad and E. Yourdon, Object-Oriented Analysis, Yourdon Press Computing Series, Englewood Cliffs, New Jersey, Prentice Hall, 1989. [22] C. Codognet, and P. Codognet, "Non-deterministic Stream ANDParallelism Based on Intelligent Backtrackings," Logic Programming, Proceedings of the Sixth International Conference, pp. 63-80, Lisborn, Portugal, June 1989. [23] C. Codognet, "Yet Another Intelligent Backtracking Method," Proceedings of the Fifth International Conference and Symposium on Logic Programming, pp. 447-465, Seattle, Washington, August 1988. [24] J. S. Conery, The AND/OR Model for Parallel Interpretation of Logic Program, PhD Thesis, Dept. of Information and Computer Science, Univ. of California, Irvine, 1983. [25] T. Conlon, Programming in PARLOG publisher: Addison-Wesley Publishing Company, New York, NY, 1989. [26] J. Darlington, "An Experimental Program Transformation and Synthesis System," Artificial Intelligence, Vol. 16, pp. 1-46, 1981. [27] R.E. Davis, Generating Correct Programs from Logic Specifications, Ph.D. dissertation, University of California, Santa Cruz, March 1977. [28] S. K. Debray, "Static Analysis of Parallel Logic Programs," Proceedings of the Fifth International Conference and Symposium on Logic Programming, pp. 711-732, Seattle, Washington, August 1988. [29] S. K. Debray, "Flow Analysis of Dynamic Logic Programs," Journal of Logic Programming, Vol. 7: pp. 149-176, 1989. [30] S. K. Debray and D. S. Warren, "Automatic Mode Inference for Logic Programs," Journal of Logic Programming, Vol. 5: pp. 207-229, 1988. [31] D. DeGroot, "Alternate Graph Expressions for Restricted ANDparallelism," Proceedings of COMPCON Spring '85, pp. 206-210, Chicago, IL, February 1985. [32] D. DeGroot, "Restricted AND-parallelism and Side Effects," Proceedings of 1987 IEEE Symposium on Logic Programming, pp. 80-89, San Francisco, California, August 1987.
206
References
[33] D. DeGroot, "Restricted AND-parallel Execution of Logic Programs," Parallel Computation and Computers for Artificial Intelligence, pp. 91107, editor: Janusz Kowalik, publisher: Kluwer Academic Publishers, San Francisco, CA, 1988. [34] P. Dembinski, and J. Maluszynski, "AND-parallelism with Intelligent Backtracking for Annotated Logic Programs," Proceedings of the 1985 IEEE Symposium on Logic Programming, pp. 29-38, Boston, Massachusetts, July 1985. [35] M. S. Feather and P.E. London, "Implementing Specification Freedoms," Science of Computer Programming, Vol. 2, pp. 91-131, New York, NY, 1982. [36] C. J. Fidge and A. M. Lister, "Disciplined Approach to Real-Time Systems Design," Journal of Information and Software Technology, vol. 34, pp. 603-610, Chocago, IL, September 1992. [37] J. Gallagher, "Transforming Logic Programs by Specializing Interpreters," Proceedings of the Seventh European Conf. on Artificial Intelligence, Brighton Center, United Kingdom, pp. 109-122, 1986. [38] M. Genesereth and M.L. Ginsberg, "Logic Programming," Communications of the ACM, Vol. 28, No. 9, pp. 933-941, September 1985. [39] A. Goto, et. al., "Lazy Reference Counting: An Incremental Garbage Collection Method for Parallel Inference Machines," Proceedings of the Fifth International Conference and Symposium on Logic Programming, pp. 447-465, Seattle, Washington, August 1988. [40] G. Gupta, et. al., "IDIOM Integrating Dependent and-, Independent and-, and Or-parallelism," Proceedings of the 1991 International Symposium on Logic Programming, pp. 152-166, San Diego, California, October, 1991. [41] M. Hailperin, and H. Westphal, "A Computational Model for PEPSY," Technical Report CA-16, ECRC, Cambridge, British, 1985. [42] B. Hausman, A. Ciepielewski, and S. Haridi, "OR-parallel Prolog Made Efficient on Shared Memory Multiprocessors," Proceedings of the 1987 IEEE Symposium on Logic Programming, pp. 69-79, San Francisco, California, August 1987. [43] M. V. Hermenegildo, "An Abstract Machine for Restricted AND-parallel Execution of Logic Programs," Proceedings of the Third International Conference on Logic Programming, "Lecture Notes in Computer Science," Springer-Verlag, Vol. 225, San Francisco, CA, 1986.
References
207
[44] M. V. Hermenegildo, R. Warren and S. K. Debray, "Global Flow Analysis as a Practical Compilation Tool," Journal of Logic Programming, vol. 13, pp. 349-366, 1992. [45] P. Hsia, A. Davis, and D. Kung, "Status Report: Requirements Engineering," IEEE Software, vol. 10, pp. 75-79, November 1993. [46] D. Jacobs, and A. Langen, "Compilation of Logic Programs for Restricted AND-parallelism," Proceedings of the European Symposium on Programming, Nancy, France, 1988. [47] J. Jaffar and J.L. Lassez, "Constraint Logic Programming," Proceedings of the Fifteenth Annual ACM Symp. Conf. on Principles of Programming Languages, Munich, Germany 1987. [48] M. Jarke et al., "Theories underlying Requirements Engineering: an overview of NATURE at genesis," in Proceedings of the IEEE International Symposium on Requirements Engineering, pp. 19-31, San Diego, CA, January 1992. [49] S. E. Keller, L. G. Kahn, and R. B. Panara, "Specifying Software Quality Requirements with Metrics," in System and Software Requirements Engineering (R. H. Thayer and M. Dorfman, eds.), IEEE Computer Society Press, pp. 145-153, Los Alamitos, CA, 1990. [50] W. Kim and F.H. Lochovsky, Object-Oriented Concepts, Databases and Applications, Addison-Wesley, New York, NY, 1989. [51] D. E. Knuth, and R. W. Moore, "An Analysis of Alpha-Beta Pruning," Artificial Intelligence, Vol. 6, No. 4, pp. 293-326, 1975. [52] R.E. Korf, "Depth-First Iterative Deepening: An Optimal Admissible Tree Search," Artificial Intelligence, Vol. 27, pp. 97-109, North-Holland, 1985. [53] R. Kowalski, "Predicate Logic as a Programming Language," Proceedings of 1974 IFIP, North-Holland, pp. 569-574, Amsterdam, Holland, 1974. [54] R. Kowalski, "The Semantics of Predicate Logic as a Programming Language," Journal of the ACM, Vol. 23, No. 4, pp. 733-742, April 1976. [55] R. Kowalski, "Algorithm = Logic + Control," Communications ACM, Vol. 22, No. 7, pp. 424-436, July 1979.
of the
[56] R. Kowalski, Logic for Problem Solving, North-Holland, New York, 1979.
208
References
[57] R. Kowalski, "Software Engineering and Knowledge Engineering in New Generalization Computing," Future Generation Computer Systems, pp. 39-50, Chicago, IL, January 1984. [58] R. Kowalski, "The Limit of Logic and Its Role in Artificial Intelligence," Proceedings of Wksp. on Knowledge Base Managements, pp. 477-489, Crete, Germany, June 1985. [59] R. Kowalski, "The Relation between Logic Programming and Logic Specification," Mathematical Logic and Programming Languages, (eds, C.A.R. Hoare, and J.C.Shepherdson), pp. 1-24, Prentice-Hall, Englewood Cliffs, CA, 1985. [60] V. Kumar, and Y. Lin, "An Intelligent Backtracking Scheme for Prolog," Proceedings of the 1987 IEEE Symposium on Logic Programming, pp. 406414, San Francisco, California, August 1987. [61] E.L. Lawler, and D.W. Wood, "Branch and Bound Methods: A Survey," Operation Research, Vol. 14, pp. 699-719, Chicago, IL, 1966. [62] R. K.S. Lee, "Concurrent Prolog in a Multi-process Environment," Proceedings of the 1985 IEEE Symposium on Logic Programming, pp. 100-109, Boston, MA, July 1985. [63] G. Li, and B. Wah, "Computational Efficiency of Parallel Combinatorial OR-Tree Searches," IEEE Trans, on Software Engineering, Vol. 16, No. 1, pp. 13-31, January 1990. [64] P. P. Li, and A. J. Martin, "The SYNC Model: A Parallel Execution Method for Logic Programming," Proceedings of the IEEE 1986 Symposium on Logic Programming, pp. 223-234, Salt Lake City, Utah, September 1986. [65] Y. J. Lin, and V. Kumar, "AND-parallel Execution of Logic Programs on a Shared Memory Multiprocessor: A Summary of Results," Proceedings of the Fifth International Conference and Symposium on Logic Programming, pp. 1124-1141, Seattle, Washington, August 1988. [66] G. F. Luger, and W. A. Stubblefield, Artificial Intelligence, jamin/Cummings Co., New York, NY, 1989.
Ben-
[67] Z. Manna and R. Waldinger, Studies in Automatic Programming Logic, North-Holland, New York, 1977. [68] H. Mannila and E. Ukkonen, "Flow Analysis of Prolog Programs", Proceedings of the 4th IEEE Symposium on Logic Programming, San Francisco, California, September 1987.
References
209
[69] A. Martelli, and U. Montanari, "Optimizing Decision Trees through Heuristically Guided Search," Communication ACM, Vol. 21, pp. 1025-139, 1978. [70] C. S. Mellish, "Some Global Optimizations for a Prolog Compiler", Journal of Logic Programming, Vol. 2, No.l: pp. 43-66, April 1986. [71] J. Mylopoulos, L. Chung, and B. Nixon, "Representing and using nonfunctional requirements: a process-oriented approach," IEEE Trans. Software Engineering, vol. 18, pp, 483-497, June 1992. [72] N.J. Nilsson, Problem Solving Methods in Artificial Intelligence. McGrawHill, New York, NY, 1971. [73] J.S. Ostroff, "Temporal Logic for Real-Time Systems," Research Studies Press, Taunton, IN, 1989. [74] H.A. Partsch, "The CIP Transformation System," in P. Pepper, Program Transformation and Programming Environments, Springer Verlag, New York, pp. 305-322, 1983. [75] H.A. Partsch, "Specification and Transformation of Programs: A Formal Approach to Software Development," Springer Verlag, New York, NY, 1990. [76] H. Partsch and R. Steinbruggen, "Program Transformation Systems," ACM Computing Surveys, Vol. 15, No. 3, pp. 199-236, 1983. [77] A. Pnueli, "Applications of Temporal Logic to the Specification and Verification of Reactive Systems: A Survey of Current Trends," in J.W. de Bakker, W.P. de Roever, and G. Rozenberg , Current Trends in Concurrency, Overviews and Tutorials, Lecture Notes in Computer Science, Vol. 224, Springer Verlag, New York, NY, pp. 510-584, 1986. [78] P. Raman, and W. Stark, "Fully Distributed AND/OR-parallel Execution of Logic Programs," Proceedings of the Fifth International Conference and Symposium on Logic Programming, pp. 1189-1203, Seattle, Washington, August 1988. [79] B. Ramesh and V. Dhar, "Supporting Systems Development by Capturing Deliberations during Requirements Engineering," IEEE Trans, on Software Engineering, vol. 18, pp. 498-510, June 1992. [80] U. S. Reddy, "Transformation of Logic Programs into Functional Programs", Proceedings of the 1984 International Symposium on Logic Programming, Atlantic City, New Jersey, IEEE Computer Society: pp. 187196, February 1984.
210
References
[81] G.-C. Roman, "A Taxonomy of Current Issues in Requirements Engineering," IEEE Computer, vol. 18, pp. 14-21, April 1985. [82] T. Sato and H. Tamaki, "Transformational Logic Program Synthesis," Proceedings of the International Conf. on Fifth Generation Computer Systems, pp. 195-201, Tokyo, Japan, 1984. [83] E. Y. Shapiro, "A Subset of Concurrent Prolog and Its Interpreter," TR003, ICOT-Institute for New Generation Computer Technology, Japan, January 1983. [84] K. Shen, "Exploiting Dependent And-parallelism in Prolog: the Dynamic Dependent And-parallel Scheme(DDAS)," Proceedings of the Joint International Conference and Symposium on Logic Programming, pp.717-731, Boston, MA, 1992. [85] K. Shen, and D. H.D. Warren, "A Simulation Study of the Argonne Model for OR-oarallel Execution of Prolog," 1987 IEEE Symposium on Logic Programming, pp. 54-68, San Francisco, California, August 1987. [86] C. U. Smith and L. G. Williams, "Software Performance Engineering: a case study including performance comparison with design alternatives," IEEE Trans, on Software Engineering, vol. 19, pp. 720-741, July 1993. [87] G. Smolka, "Making Control and Data Flow in Logic Programs Explicit", Proceedings of the 1984 Symposium on LISP and Functional Programming, Austin, Texas, August 1984. [88] K. Steer, "Testing Data Flow Diagrams with PARLOG," Logic Programming, Proceedings of the Fifth International Conference and Symposium, pp. 96-110, Seattle, Washington, August 1988. [89] G.C. Stockman, and L.N. Kanal, "Problem Reduction Representation for the Linguistic Analysis of Waveforms," IEEE Trans, on Pattern Anal. Machine Intell, Vol. PAMI-5, No. 3, pp. 287-298, 1983. [90] D. Syners and A. Thayse, "From Logic Design to Logic Programming," Springer, Berlin, 1987. [91] H. Tamaki and T. Sato, "Unfold/Fold Transformation of Logic Programs," Proceedings of the Second International Logic Programming Conference, Upssala, pp. 127-138, July 1984. [92] P. Tinker and G. Lindstrom, "A Performance-Oriented Design for ORparallel Logic Programming," Proceedings of the Fourth International Conference on Logic Programming, pp. 601-615, Melbourne, Australia, May 1987.
References
211
[93] J. J.-P. Tsai, T. Weigert and H. C. Jang, "A Hybrid Knowledge Representation as a Basis of Requirement Specification and Specification Analysis," IEEE Trans, on Software Engineering, Vol. SE-18, No. 12, pp. 1076-1100, December 1992. [94] J. J.-P. Tsai, and T. Weigert, Knowledge-Based Software Development of Real-Time Distributed Systems, World Scientific, Singapore, 1993. [95] J. J.-P. Tsai and S. Yang, Monitoring and Debugging Distributed RealTime Systems, IEEE Computer Society Press, Washington D.C., November 1994. [96] J. J.-P. Tsai, Y. Bi, S. Yang, and R. Smith, Distributed Real-Time Systems, Wiley k Sons Inc., NY, 1996. [97] J. J.-P. Tsai, B. Li, and E. Juan, "Parallel Evaluation of Software Architecture Specifications," Communication of the ACM, Vol. 40, No. 1, pp. 63-70, January 1997. [98] J. J.-P. Tsai, B. Li, and T. Weigert "A Logic-Based Transformation Systems," IEEE Trans, on Knowledge and Data Engineering, Vol. 10, No.l, pp. 91-107, January 1998. [99] J. J.-P. Tsai, A. Liu, E. Juan, and A. Sahay, "Knowledge-Based Software Architecture," IEEE Trans, on Knowledge and Data Engineering, Vol. 11, No. 1, pp. 187-201, January/February 1999. [100] B. Wah, An IDA* Search with Dynamic Control, Research Report CRHC-91-09, Center for Reliable and High Performance Computing, Coordinated Science Laboratory, Univ. of Illinois, April 1991. [101] B. Wah, "Population-Based Learning: A Method for Learning from Examples Under Resource Constraints," IEEE Trans, on Knowledge and Data Engineering, Vol. 4, No. 5, pp. 454-474, October 1992. [102] D. H.D. Warren, "The SRI Model for Or-Parallel Execution of PrologAbstract Design and Implementation Issues," Proceedings of the 1987 IEEE Symposium on Logic Programming, pp. 92-102, San Francisco, California, August 1987. [103] H. Westphal, et al., "The PEPSys Model: Combining Backtracking, AND- and OR-parallelism," Proceedings of the 1987 IEEE Symposium on Logic Programming, pp. 436-448, San Francisco, California, August 1987.
212
References
[104] W. Winsborough, "Semantically Transparent Selective Reset for AND Parallel Interpreters Based on the Origin of Failures," Proceedings of the 1987 IEEE Symposium on Logic Programming, pp. 134-152, San Francisco, California, August 1987. [105] N. S. Woo, and K. M. Choe, "Selecting the Backtracking Literal in the AND/OR Process Model," Proceedings of the IEEE 1986 Symposium on Logic Programming, pp. 200-210, Salt Lake City, Utah, September 1986. [106] C. Yu, and B. Wah, "Efficient Branch-and-Bound Algorithm on a TwoLevel Memory System," IEEE Trans, on Software Engineering, Vol. 14, No. 9, pp. 1342-1356, September 1988.
Index abstract relation, 19, 21, 22, 27 abstract-relation slot, 22 accuracy, 8, 181-183, 187, 190 aggregation, 20, 26 AND-OR parallelism, 4, 9, 15, 17, 91, 92, 96, 100 AND-OR tree, 2-4, 15, 16, 88, 90, 100, 101, 103, 198 AND-parallelism, 2, 7-9,11, 88, 93, 117, 120, 197, 200, 201 anomaly management, 182 attribute, 22, 24, 29, 30, 53, 164, 182 Average speed up, 107 backtracking, 3, 5, 7-12, 15, 59, 81, 100, 105, 160, 165, 170, 173, 179, 197 backtracking elimination, 10,13,17 binding conflicts, 2, 88, 89 binding data structure, 17, 91, 198 bottom-up, 3, 12, 15-17, 88, 91, 101, 106, 114, 116, 198 classification, 20 consistency, 5, 27, 57, 73 consistent, 29, 37, 58, 82, 97, 108, 124 conventional parallel tree searching, 117 CPU Time, 107 data dependency, 13,16, 51, 55, 57, 72, 73, 76-78, 80, 82, 84,
87, 89-91, 93, 103, 107, 118, 208 data flow, 11, 12, 15, 27, 51, 73, 74, 82, 84, 87, 89, 93, 100, 159, 165, 168, 169 decomposition, 12, 23, 26, 41, 55, 59, 63, 78, 79, 82, 84, 167 deep backtracking, 9 Degree of Parallelism, 107,110,112 directly reachable, 29 dynamic mode inference, 2 economic constraints, 182 equal-introduction, 12, 55, 56 equal-substitution, 12, 52, 56, 57 EXEC, 93-95 executability, 20, 25 formality, 20 FRORL, 1, 4, 11-13,19, 20, 25, 31, 32 generality, 182 generalization, 20, 26 Horn-clause, 1-3, 11-13, 20, 26, 27, 41, 47, 51, 55, 70, 87, 99, 164,197 independent AND-parallelism, 8 indirectly reachable, 29 inheritance, 1, 20-22, 26, 28, 32, 38, 186, 192 inheritance distance, 28
214 inheritance image, 29, 37 inheritance structure, 28, 32, 51 instantiation, 8, 20, 26 intelligent backtracking, 9, 12, 159, 160, 179 interface constraints, 182 knowledge base, 19, 20, 25, 169 life-cycle constraints, 182 liveness, 1 logic-based, 1-4, 11-13 Maintainability, 185 metric-aggregation, 185 mode analysis, 1, 52, 54, 76, 79, 81, 82, 87-89 modularity, 20 multiple inheritance, 20, 22, 26-29 non-monotonic, 1, 11-13, 160, 186, 188, 192 non-monotonic inheritance expansion, 13 Number of Processes, 107 object, 20 object-oriented design, 20 object-orientedness, 20 operating constraints, 182 OR-parallelism, 6, 11, 13, 16, 17, 90, 91, 94, 100, 101, 117, 197, 198, 205 performance, 106,125,181-183,185 performance constraints, 182 political constraints, 182 precond slot, 24 predicate, 3, 5-8,12, 14, 16, 52, 54, 57-59, 63, 72, 76, 78, 80, 87, 88, 93 programmer's annotation, 8 prototyping, 4, 20
Index Ratio of Deeper Channel, 110 re-usability, 20 refinement, 20, 55, 182, 188 requirements specification, 1 reserved words, 25, 26 security, 181-183, 187, 189 semi-intelligent backtracking, 9 sequential tree searching, 117 shallow backtracking, 9 simplification, 12, 53, 57, 66-69, 92 static data dependency, 1, 3, 4, 9, 12, 16, 90, 197 static data dependency , 72, 99,107, 118 subclasses, 20 superclass, 20, 26 synchronization, 6, 7, 13, 16, 17, 91, 92, 101 synchronization overhead, 107 theorem proving, 20 top-down, 2, 3, 12, 15-17, 20, 32, 88, 89, 97, 99, 101, 110 Total Communication Channel Cost, 109,111 Total CPU Time, 109, 111 Total Evaluation Steps, 108, 111 Total Execution Time, 107 Total memory Usage, 107 Total Nodes Searched, 107, 108 Total number of instructions executed, 107 user-friendliness, 182, 183 validation, 1 verification, 1 Weighted Communication Channel Cost, 109 wide-spectrum, 20
m m \ m,
BBjfMMfflB 9 6 H . ffiRH I
:
;"
9
>• I '- '•'•' r .
H
H
Series on Software Engineering and Knowledge Engineering - Vol. 10 HYBRID PARALLEL EXECUTION MODEL FOR LOGIC-BASED SPECIFICATION LANGUAGES by Jeffrey j p Tsai & Bing Li (University of Illinois, Chicago)
B
EUni
J
•
Parallel processing is a very important technique for improving the performance of various software development and maintenance activities. The purpose of this book is to introduce important techniques for parallel executation of highlevel specifications of software systems. These techniques are very useful for the construction, analysis, and transformation of reliable large-scale and complex software systems.
I • I
_J
•
Mm •
!•
-A •.*•.{• s: «:..:•.H •. I S • ; • •'
'•.'•''!
L ::i
www. worldscientific. com 4242 he
•"'"•"1"
I 1 I