DEPENDENCE ANALYSIS
A Book Series On
LOOP TRANSFORMATIONS FOR RESTRUCTURING COMPILERS
Utpal Banerjee
Series Titles:
Loop Transformationsfor Restructuring Compilers: The Foundations Loop Parallelization Dependence Analysis
DEPENDENCE ANALYSIS
Utpal Banerjee Intel Corporation
A B o o k Series on
Loop Transformations for Restructuring Compilers
Kluwer Academic Publishers Boston / D o r d r e c h t / L o n d o n
Distributors for North America: Kluwer Academic Publishers 101 Philip Drive Assinippi Park Norwell, Massachusetts 02061 USA Distributors for all other countries: Kluwer Academic Publishers Group Distribution Centre Post Office Box 322 3300 AH Dordrecht, THE NETHERLANDS
Library of Congress Cataloging-in-Publication Data A C.I.P. Catalogue record for this book is available from the Library of Congress.
Copyright © 1997 by Kluwer Academic Publishers All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher, Kluwer Academic Publishers, 101 Philip Drive, Assinippi Park, Norwell, Massachusetts 02061 Printed on acid-free paper.
Printed in the United States of America
To m y parents:
Late Santosh K u m a r Banerjee Santi Rani Banerjee
Contents Preface
xv
Acknowledgments
xvii
1 Introduction
1
Single Loops
15
2.1 2.2 2.3 2.4
Introduction ............................ Index and Iteration Spaces . . . . . . . . . . . . . . . . . . . Dependence Concepts ..................... Dependence Problem ......................
15 15 19 25
2.5 2.6
S o l u t i o n t o t h e Linear P r o b l e m . . . . . . . . . . . . . . . . Method of Bounds ........................
30 46
2
3
Double Loops
57
3.1
Introduction ............................
57
3.2 3.3 3.4
I n d e x a n d I t e r a t i o n Spaces . . . . . . . . . . . . . . . . . . . Dependence Concepts ..................... Dependence Problems .....................
58 64 71
81
4 Perfect Loop Nests 4.1
Introduction ............................
81
4.2 4.3 4.4 4.5
I n d e x a n d I t e r a t i o n Spaces . . . . . . . . . . . . . . . . . . . Dependence Concepts ..................... Subscript Representation .................... Dependence Problem ...................... Special Cases . . . . . . . . . . . . . . . . . . . . . . . . . . .
82 94 100 102 108
4.6
vii
, , o
Vlll
General Program
121
5.1 5.2 5.3 5.4
121 122 129 134
Introduction ............................ Dependence Concepts ..................... Dependence Problem ...................... G e n e r a l i z e d gcd T e s t . . . . . . . . . . . . . . . . . . . . . .
6 Method of Bounds 6.1 6.2 6.3 6.4 6.5 7
Introduction ............................ P e r f e c t Nest, O n e - D i m e n s i o n a l A r r a y . . . . . . . . . . . . R e c t a n g u l a r Loops, O n e - D i m e n s i o n a l A r r a y . . . . . . . . R e c t a n g u l a r Loops, M u l t i - D i m e n s i o n a l A r r a y . . . . . . . General Method . . . . . . . . . . . . . . . . . . . . . . . . .
139 139 141 151 158 165
Method of Elimination
171
7.1 7.2 7.3 7.4
171 172 176 180
Introduction ............................ Dependence Testing by Elimination . . . . . . . . . . . . . Two-variable Problems . . . . . . . . . . . . . . . . . . . . . Other Methods ..........................
8 Conclusions
189
A Linear Equations on Polytopes
191
A.1 A.2 A.3 A.4 A.5 A.6
Introduction ............................ Polytopes ............................. Real S o l u t i o n s to a Single E q u a t i o n . . . . . . . . . . . . . . Lagrangean Relaxation ..................... Real S o l u t i o n s to a S y s t e m o f E q u a t i o n s . . . . . . . . . . I n t e g e r S o l u t i o n s t o Linear E q u a t i o n s . . . . . . . . . . . .
191 192 193 196 200 204
Bibliography
207
Index
213
List of Figures 2.1 Statement d e p e n d e n c e graph for Example 2.2 . . . . . . . 2.2 The triangle P of T h e o r e m 2.11 . . . . . . . . . . . . . . . . 2.3 Loop nest of Example 2.7 after unrolling . . . . . . . . . . .
24 49 54
3.1 Index and iteration spaces for Example 3.1 . . . . . . . . .
63
List of Tables 2.1
Steps of A l g o r i t h m 2.1 for a = 21 a n d b = 34 . . . . . . .
33
3.1 I n d e x values for l o o p s o f Example 3.1 . . . . . . . . . . . . 3.2 I t e r a t i o n v a l u e s for l o o p s of Example 3.1 . . . . . . . . . . 3.3 Some i t e r a t i o n s of (Lx,L2) in Example 3.2 . . . . . . . . . .
62 62 69
4.1 4.2
97 98
Loop n e s t of Example 4.2 a f t e r u n r o l l i n g . . . . . . . . . . . D e p e n d e n c e s t r u c t u r e of (Lx,L2, L3) in Example 4.2.
5.1 Three s t a t e m e n t i n s t a n c e s for Example 5.1 . . . . . . . . . 5.2 P a r a m e t e r s for the D e p e n d e n c e P r o b l e m . . . . . . . . . . .
. .
127 130
List of Notations In t h e f o l l o w i n g , i = ( i l , i 2 , . . . , i m ) a n d j = ( j l , j 2 , . . . , j m ) m - v e c t o r s ( i n t e g e r o r real), a n d 1 < ~ < m .
a/b R Rm Z Zm
S
For Set Set Set Set
are two
i n t e g e r s a a n d b ( ~ 0), t h e r a t i o n a l n u m b e r ~ of real numbers ..................... of real m-vectors ..................... of integers ......................... of integer m-vectors ..................
2 10 10 15 15
S-6 T
Statement S lexically precedes statement T ..... S
19 19 22 23 23 23 23 23
i>_j i_<j
i r >_ j r f o r l <_ r <_ m i r <- j r f o r l < r <_ m
44 83
S6 a T S~OT S~ i T
1
-<eJ
~-<j l_j 1
>'eJ
lYj
_>j
.................... ....................
il = j l , i2 = j 2 , . . . , i # - I = J # - l , i# < j~ . . . . . . . . i -<~ j f o r s o m e # . . . . . . . . . . . . . . . . . . . . . . . . i<jori=j ........................... J "
Xlll
96 60 104 96 65 67
xiv
sig(i)
sig(i) lev(i)
det(P) diag(v) p(r)
~.~ rank(P) (i;j)
(A; B)
Sign of a n u m b e r i . . . . . . . . . . . . . . . . . . . . . . ( s i g ( i l ) , s i g ( ~ 2 ) , . . . , sig(~m)) . . . . . . . . . . . . . . . If i = 0, t h e n m + 1; otherwise, g w h e r e ~1 = i 2 . . . . . i£_ 1 = O, i e ~ 0 . . . . . . . . . . . . . .
66 95 66
Determinant of square matrix P . . . . . . . . . . . . . Diagonal m a t r i x w i t h m a i n d i a g o n a l v . . . . . . . . . Column r of matrix P. . . . . . . . . . . . . . . . . . . . . The m × m i d e n t i t y m a t r i x . . . . . . . . . . . . . . . . . Rank of matrix P . . . . . . . . . . . . . . . . . . . . . . . .
83 83 84 85 111
The 2 m - v e c t o r o b t a i n e d b y c o n c a t e n a t i n g the e l e m e n t s o f i a n d j . . . . . . . . . . . . . . . . . . .
135
The m a t r i x o b t a i n e d b y c o n c a t e n a t i n g the rows of matrices A and B . . . . . . . . . . . . . . . The m a t r i x o b t a i n e d b y c o n c a t e n a t i n g t h e c o l u m n s o f m a t r i c e s A a n d B. . . . . . . . . . . . .
135 205
Preface The b o o k series Loop Transformations for Restructuring Compilers has b e e n d e s i g n e d to provide a c o m p l e t e m a t h e m a t i c a l t h e o r y of t r a n s f o r m a t i o n s , that can be u s e d to automatically change a sequential p r o g r a m containing FORTRAN-like d o loops into an equivalent parallel form. Volume I of this series, subtitled The Foundations, provides the n e e d e d m a t h e m a t i c a l background, a n d i n t r o d u c e s the c o n c e p t of d e p e n d e n c e a n d some of the major t r a n s f o r m a t i o n s . The m o d e l p r o g r a m u s e d there is a perfect n e s t of loops with unit stride, w h o s e b o d y is a s e q u e n c e of a s s i g n m e n t s t a t e m e n t s . V o l u m e II, Loop Parallelization, uses the same m o d e l a n d gives a detailed t h e o r y of iteration-level t r a n s f o r m a t i o n s . This is Volume III in the s a m e series. In this book, we e x t e n d the m o d e l to a p r o g r a m consisting of d o loops and a s s i g n m e n t statements, where the loops n e e d n o t be sequentially n e s t e d a n d are allowed to have arbitrary strides. In the context of s u c h a p r o g r a m , we study, in detail, d e p e n d e n c e b e t w e e n s t a t e m e n t s of the program, c a u s e d by p r o g r a m variables that are ele m e n t s of arrays. In the future, we will s t u d y t r a n s f o r m a t i o n s in this p r o g r a m model, a n d t h e n extend our t h e o r y to a m o d e l that is generalized f u r t h e r to include conditional s t a t e m e n t s . This b o o k m a y be c o n s i d e r e d to be the s e c o n d edition of the author's 1988 b o o k Dependence Analysis for Supercomputing. It is, however, a c o m p l e t e l y n e w w o r k that s u b s u m e s the material of the 1988 publication. The b o o k s in this series are directed t o w a r d s graduate a n d adv a n c e d u n d e r g r a d u a t e s t u d e n t s , a n d professional writers of restructuring compilers. The r e c o m m e n d e d b a c k g r o u n d for V o l u m e I is s o m e k n o w l e d g e of p r o g r a m m i n g languages, a n d familiarity w i t h calculus a n d g r a p h theory. We need a similar b a c k g r o u n d for V o l u m e III,
a n d it is u n d e r s t o o d that the r e a d e r has read V o l u m e I. We have a d o p t e d a cyclical a p p r o a c h for coverage of material in this series, a n d m a n y c o n c e p t s that were first i n t r o d u c e d in V o l u m e I are again d i s c u s s e d here in m o r e generality. However, we do n e e d several results a n d a l g o r i t h m s that have already b e e n covered in great detail in V o l u m e I, a n d there is no n e e d to d i s c u s s t h e m again. This b o o k is basically i n d e p e n d e n t of V o l u m e II. Volume III is m o r e m a t h e m a t i c a l in n a t u r e t h a n the previous two volumes. That is to be e x p e c t e d since d e p e n d e n c e analysis of array variables (with linear subscripts) is a special type of linear optimization problem. Readers w h o have s t u d i e d linear p r o g r a m m i n g will have an easier time with this b o o k t h a n those w h o have not. However, a k n o w l e d g e of linear p r o g r a m m i n g is n o t a prerequisite. We have tried to explain the r e q u i r e d c o n c e p t s a n d results f r o m that discipline, as n e e d e d in this book. We have t a k e n great pains to s h o w the evolution of a c o n c e p t or result as we pass f r o m a single loop to a d o u b l e loop to a perfect n e s t to a general p r o g r a m containing arbitrarily n e s t e d loops. It is o u r h o p e that the a m o u n t of r e d u n d a n c y i n t r o d u c e d here is j u s t right to m a k e the material easily accessible to the b e g i n n i n g reader, while avoiding the risk of causing b o r e d o m to the experienced one. As in V o l u m e II, we refer to algorithms, t h e o r e m s , etc. in Volume I (i.e., in [Bane 93]) by prefixing an "I" in front of their n u m b e r . For example, A l g o r i t h m 1.2.1 is A l g o r i t h m 2.1 in V o l u m e I. Also, an exercise in the c u r r e n t b o o k m a y be r e f e r e n c e d in two different ways d e p e n d ing on the location of that reference. For example, Exercise 3 is the third exercise at the e n d of the c u r r e n t section, while Exercise 2.1.3 is the third exercise at the e n d of Section 2.1. As in the previous two volumes, a c o m m e n t like (explain) or a q u e s t i o n like (why?), enclosed in p a r e n t h e s e s , is m e a n t for the reader a n d n o t a c o m m e n t for the a u t h o r h i m s e l f left over f r o m a draft version of the m a n u s c r i p t . In Chapter 1, the n a t u r e a n d the s t r u c t u r e of the b o o k are explained. Due to space a n d time constraints, we could n o t cover several i m p o r t a n t topics in this v o l u m e (e.g., p o i n t e r analysis, symbolic analysis, run-time d e p e n d e n c e analysis, a n d n o n l i n e a r problems); we expect to r e t u r n to t h e m s o m e t i m e in the future. U.B.
Santa Clara, California
Acknowledgments The present structure of this book owes much to Constantine D. Polychronopoulos who read an early version of the manuscript and suggested major changes. His comments have led to a book that is much more readable than it otherwise would have been. He also provided a number of motivated reviewers from the ranks of his present and former students. The author would like to thank Constantine for his extremely valuable help. Several versions of the manuscript were read by the following reviewers: Mohammad Haghighat, Hideki Saito, Nicholas Stavrakos, David Craig, George Dimitriou, Ramesh Subramonian, Milind Girkar, David Sehr, and Pohua Chang. They found many errors and suggested a large number of changes. The author would like to thank them all for their devoted work that did a lot to improve the quality of the manuscript. Special thanks are due to Mohammad Haghighat who took a personal interest in this project. He was always available for consultation. He let the author use his unpublished paper for the preparation of Section 7.4. He also acted as the local BTEX expert providing valuable advice on complicated typesetting problems. The author is grateful to his colleague Kent Fielden for his help in setting up the system on which the book was written. Finally, the author wishes to thank Dr. Richard Wirt and the Intel management for giving him the opportunity to write this series of books.
xvii
Chapter 1
Introduction D e p e n d e n c e analysis is the m o s t i m p o r t a n t part of any p r o g r a m restructuring process. When a c o m p i l e r tries to r e s t r u c t u r e a given prog r a m to satisfy a certain goal (e.g., parallelization or i m p r o v e d cache performance) w i t h o u t changing the m e a n i n g of the program, it m u s t h o n o r certain dependence constraints. These c o n s t r a i n t s are determ i n e d by the order in which each p r o g r a m variable (see the first footnote on page 22) is defined a n d u s e d d u r i n g the course of execution of the program. When we rearrange a program, the order of m e m o r y references can be c h a n g e d only to the extent that w r o n g values are n o t u s e d or s t o r e d w h e n the r e a r r a n g e d p r o g r a m is executed. The m a i n d e p e n d e n c e p r o b l e m is as follows: given two p r o g r a m variables, decide if they have instances that reference the s a m e m e m o r y location d u r i n g execution of the p r o g r a m . We m a y also w a n t to k n o w the exact set of s u c h instance pairs a n d / o r certain characteristic p r o p e r t i e s of that set. The aim of d e p e n d e n c e analysis is to gather u s e f u l inform a t i o n about the u n d e r l y i n g d e p e n d e n c e s t r u c t u r e of a p r o g r a m a n d p r e s e n t it in a suitable form. This analysis can be p e r f o r m e d at various levels: we m a y s t u d y d e p e n d e n c e s b e t w e e n p r o g r a m statements, iterations of a loop nest, s u b r o u t i n e s in the p r o g r a m , etc. It m a y be very difficult or even impossible for a compiler to solve a particular d e p e n d e n c e problem, if all the n e c e s s a r y i n f o r m a t i o n is n o t available at compile-time. This could h a p p e n , for example, w h e n there are conditional s t a t e m e n t s in the program, or certain variables are to be assigned values at run-time, or the value of a loop b o u n d can-
2
CHAPTER 1. INTRODUCTION
not be d e t e r m i n e d at compile-time. Also, variables of certain kinds m a y be difficult to deal with; pointer variables fall in this category. In this book, we take as our model a sequential p r o g r a m consisting of d o loops with arbitrary strides and assignment statements, and s t u d y d e p e n d e n c e between those statements. We will only consider p r o g r a m variables that are elements of arrays, and a s s u m e that loop limits and array subscripts are affine functions of index variables of loops. (An affine function is a linear function plus a constant.) Dependence problems involving array elements are i m p o r t a n t for numerical programs, and they have b e e n widely studied for twenty years. Non-unit strides lead to holes in the index space of a loop nest. In the basic model, the index variable of a loop m o v e d f r o m one integer to the next larger integer as the loop iterations unfolded. That is no longer the case w h e n the loop has a non-unit stride. To remedy the situation, we use an iteration variable for such a loop, that e n u m e r a t e s the iterations 0, 1, 2,.o.. The iteration variable of a loop with a non-unit stride behaves like the index variable of a loop with unit stride. When non-unit strides are present, d e p e n d e n c e analysis and loop transformations are easier to handle in t e r m s of iteration variables rather t h a n in t e r m s of index variables. The passage f r o m a perfect to an imperfect loop nest poses a m o r e serious challenge. We handle that situation by utilizing two i m p o r t a n t points. First, an imperfect nest can be thought of as a perfect nest whose body is a sequence of other loops and a s s i g n m e n t statements. Second, even in an imperfect loop nest, the loops containing a given s t a t e m e n t are still sequentially nested. Thus, the sequence of loops d e t e r m i n e d by a s t a t e m e n t behaves like a perfect nest. If a and b denote two integers such that b is not zero, t h e n we use the notation a / b to represent the rational n u m b e r ~. If a / b is an integer (i.e., if a m o d b = 0), t h e n b (evenly) divides a. We a s s u m e exact rational arithmetic w h e n we deal with these expressions. Consider the p r o g r a m segment d o Ii = 10, 100, 3 d o I2 = 50, 5, - 2
L1 : L2 :
S: T:
X(211 - 1,I1 +I2) . . . . ....... X(312 + 1,211 + 2) • • • enddo enddo
We explain the n a t u r e of the d e p e n d e n c e p r o b l e m by m e a n s of this example. This p r o g r a m is a n e s t of two loops containing two assignm e n t statements. We s h o w only the o u t p u t variable of S a n d one i n p u t variable of T, a n d they are b o t h e l e m e n t s of a two-dimensional array X. We focus on the m e m o r y references (reads and writes) c o m i n g f r o m the instances of these two variables only. The variables s h o w n cause a d e p e n d e n c e b e t w e e n S a n d T, only if there is at least one m e m o r y location that b o t h s t a t e m e n t s reference d u r i n g the sequential execution of the p r o g r a m . In o r d e r to test for d e p e n d e n c e , we first f o r m u l a t e the p r o b l e m m a t h e m a t i c a l l y and t h e n try to solve it. Consider an iteration of the double loop (L~, L2) defined by a set of values I~ = i l , I2 - - ~ 2 o f the index variables. In this iteration, s t a t e m e n t S writes the m e m o r y location defined by X ( 2 i ~ - 1, i l + i2). Consider a n o t h e r iteration defined by a set of values I1 = j l , 12 - j2. In that iteration, s t a t e m e n t T reads the m e m o r y location defined by X(3j2 + 1, 2 j l + 2). These two locations are identical iff 2il - 1 = 3j2 + 1 gl -b i2 = 2j1 + 2, that is, iff 2il il
--
3j2 = 2
(1.1)
2 j l + i2 --= 2.
(1.2)
-
Thus, to decide if there exists a m e m o r y location that is referenced by b o t h S a n d T, we n e e d to decide if there exist values il, jx of I1 and i2, j2 of 12 that satisfy b o t h e q u a t i o n s (1.1) a n d (1.2). Now, i 1 and j l are integers characterized by the following conditions: 1 0 _ < i 1 < 100,
10 < j l < 100,
(il - 10)/3 is an integer,
(jl - 10)/3 is an integer.
Similarly, i2 a n d j2 are integers satisfying the conditions: 5 _< i2 -< 50,
5 <j~_< 50,
(i2 - 5 0 ) / ( - 2 ) is an integer,
(j2 - 5 0 ) / ( - 2 ) is an integer.
4
CHAPTER 1. INTRODUCTION
Thus, we are l o o k i n g for a s i m u l t a n e o u s integer s o l u t i o n to e q u a t i o n s (1.1)-(1.2) t h a t s a t i s f y all t h e s e c o n d i t i o n s . This p r o b l e m c a n b e simplified to a large e x t e n t if we c a n eliminate c o n d i t i o n s of the t y p e "(il - 1 0 ) / 3 is an integer." This c o u l d b e d o n e b y labeling t h e l o o p s differently as we shall n o w see. Note that the index variable ir1 t a k e s t h e s e q u e n c e of values: 10, 1 3 , . . . , 100, a n d the i n d e x variable I2 t a k e s the s e q u e n c e o f values: 50, 4 8 , . . . , 6. The first s e q u e n c e is increasing, the s e c o n d is decreasing, a n d n e i t h e r is a s e q u e n c e of contiguous integers. To s t a n d a r d i z e t h e labeling of l o o p iterations, w e agree to n u m b e r the i t e r a t i o n s of e a c h l o o p 0, 1, 2 . . . . . in t h e o r d e r of s e q u e n t i a l execution. This p r o c e s s , called loop normalization, is a c h i e v e d b y i n t r o d u c i n g a n e w variable for e a c h loop. Since (I1 - 1 0 ) / 3 m u s t b e a n integer, w e m a y define a n i n t e g e r variable fl, called the iteration variable of L1, b y 1 Ix = 10 + 3fl.
(1.3)
Likewise, the iteration variable o f L2 is an integer variable d e n o t e d b y ~2 a n d d e f i n e d b y I2 = 50 - 2~2.
(1.4)
The s e q u e n c e of i t e r a t i o n s of L1 c o r r e s p o n d s to t h e v a l u e s 0, 1 . . . . . 30 of ~1, a n d t h e s e q u e n c e o f iterations of L2 c o r r e s p o n d s to the values 0, 1 . . . . ,22 of ~2. Each iteration variable t a k e s an i n c r e a s i n g s e q u e n c e of c o n t i g u o u s i n t e g e r values. 2 Let El, 31 d e n o t e t h e v a l u e s o f fl c o r r e s p o n d i n g to t h e v a l u e s i1, jx of I1, a n d ~2, 32 t h e v a l u e s of ~2 c o r r e s p o n d i n g to the v a l u e s i2, j2 of I2. F r o m e q u a t i o n s (1.3) a n d (1.4), we get il = 10 + 3~1
i2 = 50 - 2~2
(1.5)
j l = 10 + 331
j2 = 50 - 232.
(1.6)
1The "hat" in ~ does not indicate a vector. We denote vectors and matrices by bold letters. 2Changing the description of a loop from the given one in terms of its index variable to the one in terms of its iteration variable may produce complicated array subscripts. It is not necessary for dependence analysis to completely rewrite a program in terms of iteration variables. We need to use the iteration variable of a loop only when the values of its index variable do not form an increasing sequence of contiguous integers (i.e., when the stride is not one). Whether the initial limit of the loop is zero or not is not important for our purpose.
Also, it follows from the ranges of ~1, ~2 given above that 0<21 <30
0<22 <22
(1.7)
0 < 31 < 30
0 < 32 < 22.
(1.8)
Using (1.5) and (1.6), we rewrite equations (1.1) and (1.2) in terms of ^
^
^
^
~l,J1, ~2,J2: 621 + 632 = 132
(1.9)
321 - 631 - 222 = - 3 8 .
(1.10)
Thus, there is d e p e n d e n c e between S and T (i.e., there is a memory location referenced by both statements), iff there exists an integer solution (21,31,22,32) to equations (1.9)-(1.10) satisfying the constraints (1.7)-(1.8). If we want the complete set of all instance 3 pairs (S(i1, i2), T ( j l , j2)) involved in this dependence, then we need to find the set of all such solutions. If we want to know about a more specific dependence, then additional constraints will arise. For example, statement T d e p e n d s on statement S and that d e p e n d e n c e is carried by the outer loop L1, iff there exists an integer solution to (1.9)-(1.10) that satisfies (1.7)-(1.8) and the condition 21 < 31. (Note that since 21 and 31 are integers, the inequality 21 < 31 is equivalent to 21 ~ 31 -- 1.) We have d e m o n s t r a t e d that a typical d e p e n d e n c e problem (when loop b o u n d s and subscripts are affine functions of index variables) leads to a system of linear diophantine equations and a system of linear inequalities. First, we have to decide if this combined system has an integer solution, and then find the set of all solutions in case it does. This ends the problem formulation stage. Next, we look at various approaches to solving the problem. The general approach to a d e p e n d e n c e problem consists of a few basic steps. They are stated below in the form of an informal algorithm. A l g o r i t h m 1.1 Given a d e p e n d e n c e problem with a system of linear equations and a system of linear inequalities (each with rational coefficients), this algorithm decides if the particular d e p e n d e n c e exists, and finds relevant information in case it does. 3We will label instances of a statement by values of index variables.
Once
~1, ~2, ]1, ]2 are found, the corresponding index values il, i2, jl, j2 can be computed from (1.5) and (1.6).
6
CHAPTER 1. INTRODUCTION
1. Decide if there is an integer solution to the system of equations (Algorithm 1.2.1 and Theorem 1.3.6). 2. If there is no solution, then terminate; the particular d e p e n d e n c e does not exist. . Otherwise, find the general integer solution to the system of equations in terms of certain u n d e t e r m i n e d integer parameters (Theorem 1.3.6). 4. Substitute the general solution into the inequalities to derive a new set of inequalities involving those integer parameters. 5. Decide if this new system of inequalities has an integer solution. 6. If there is no solution, then terminate; the particular d e p e n d e n c e does not exist. . Otherwise, the d e p e n d e n c e exists. Find the set of all integer solutions to the new inequalities, and c o m p u t e information relevant to the dependence. A system of linear diophantine equations can be solved in polynomial time. As pointed out above, we know how to test if such a system has an integer solution, and how to find the general solution in case it does. The problem of deciding if a system of linear inequalities has an integer solution is NP-complete [Schr 87]. There are several integer programming algorithms (e.g., the cutting plane method) available to solve this problem. However, there is no convenient way to represent the general integer solution to a system of linear inequalities (see [Fior 72] and [Will 83]). For our example, we find that equations (1.9)-(1.10) have integer solutions. Indeed, starting with the coefficient matrix A for the system of equations
(~1,)1,~2,)2)
6
3
0 0 6
--6 -2 0
= (132,-38),
we use A l g o r i t h m 1.2.1 to find matrices
U =
0
0
0
1
0
1
2 0
0 1
3 -3
1 -1
and
-2 0
S=
/°°) 0 0 0
1 0 0
'
such that U is u n i m o d u l a r , S is echelon, a n d UA = S. Next, we apply T h e o r e m 1.3.6. Any integer vector of the f o r m ( 2 2 , - 3 8 , t3, t4) will satisfy the e q u a t i o n (tl, t2, t3, t4)S = (132, - 3 8 ) . Thus, equations (1.9)-(1.10) have integer solutions, a n d the general solution is given by (~1,31, ~2,32) = ( 2 2 , - 3 8 , t3, t4)U = (-38+2t3,t4,-38+3t3-3t4,60-2t3),
(1.11)
where t3 a n d t4 are u n k n o w n integer p a r a m e t e r s . Substituting f r o m (1.11) into the c o n d i t i o n s (1.7)-(1.8), we get the following s y s t e m of inequalities (after rearrangement): -2t3 2t3
< -38 < 68 -
t4 <
t4 - 3 t 3 + 3t4 3t3-3t4 2t3 -2t3
0
< 30 < -38 < 60 < 60 < -38.
(1.12)
Note that while the n u m b e r of inequalities is the s a m e as the n u m b e r in (1.7)-(1.8), the n u m b e r of variables has been r e d u c e d f r o m 4 to 2. A l t h o u g h we can decide if this s y s t e m has an integer solution by any integer p r o g r a m m i n g m e t h o d , it is c o m m o n l y believed that using a general integer p r o g r a m m i n g m e t h o d to solve each and every d e p e n d e n c e p r o b l e m is n o t a g o o d idea. The reasons usually cited are 1. A general integer p r o g r a m m i n g m e t h o d is expensive;
8
CHAPTER 1. INTRODUCTION
2. The s y s t e m arising f r o m a typical d e p e n d e n c e p r o b l e m is too small to justify the cost of s u c h a m e t h o d ; 3. The compiler typically n e e d s to solve a large n u m b e r of dependence p r o b l e m s in the course of processing a typical p r o g r a m , so that the integer p r o g r a m m i n g m e t h o d will have to be u s e d a large n u m b e r of times. This forces u s to seek a hierarchical a p p r o a c h to solve the d e p e n d e n c e problem. We will u s e several different techniques to h a n d l e different classes of p r o b l e m s . These classes m a y overlap a n d their b o u n d a r i e s are n o t strictly defined. First, t h e r e is a class of simple problems. It turns o u t that in real p r o g r a m s , there is a large n u m b e r of d e p e n d e n c e p r o b l e m s where the s y s t e m of n e w inequalities o b t a i n e d in Step 4 of A l g o r i t h m 1.1, involving the u n k n o w n integer p a r a m e t e r s , is trivial to solve. (This is syst e m (1.12) for o u r example.) In this case, the d e p e n d e n c e p r o b l e m is c o m p l e t e l y settled: we can easily decide if the particular d e p e n d e n c e is present, a n d c o m p u t e all relevant i n f o r m a t i o n in case it does. A simple p r o b l e m m a y arise in several different ways: A. The s y s t e m of n e w inequalities has no variables; B. It has exactly one variable; C. It can be b r o k e n u p into s u b s y s t e m s each of which has exactly one variable. See Example 1.5.6 for a d e p e n d e n c e p r o b l e m that falls in Case A. The d e p e n d e n c e p r o b l e m for a o n e - d i m e n s i o n a l array in a single loop belongs to Case B. A typical example is the following program: do I = 1,100, 2 X(2I) . . . . ....... X(3I + 1)- • • enddo See also Example 1.5.7 for a p r o b l e m in Case B. There are p r o b l e m s in a perfect n e s t of loops with c o n s t a n t loop limits 4 that can be b r o k e n u p into m u t u a l l y i n d e p e n d e n t single-loop problems. They fall in Case C. A typical example is the p r o g r a m 4Here, "constant" means loop-invariant.
d o I1 = 0, 100, 1 d o I2 = 0, 50, 1
X(311,312) . . . . .......
X(211 +
1,I2 + 2 ) . - -
enddo enddo Note that the range of I2 is i n d e p e n d e n t of I1. Also, each subscript of each p r o g r a m variable has exactly one of I1 and 12, and the index variables in a pair of corresponding subscripts match. Next, we describe an approximate algorithm for the solution to the d e p e n d e n c e problem, that checks if there is an integer solution to the system of equations, and a r e a l solution to the combined system of equations and inequalities. A l g o r i t h m 1.2 Given a d e p e n d e n c e problem with a system of linear equations and a system of linear inequalities (each with rational coefficients), this algorithm either decides that the particular d e p e n d e n c e does not exist, or terminates with the a s s u m p t i o n that it may exist. 1. Decide if there is an integer solution to the system of equations. (We may use Algorithm 1.2.1 and T h e o r e m 1.3.6.) 2. If there is no solution, then terminate; the particular d e p e n d e n c e does not exist. 3. Decide if there is a real solution to the equations that satisfies the inequalities. 4. If there is no solution, then terminate; the particular d e p e n d e n c e does not exist. 5. Otherwise, assume that the d e p e n d e n c e exists. (The algorithm is inconclusive in this case.) It is not very c o m m o n to see a d e p e n d e n c e problem in real programs where this algorithm predicts a false dependence. In fact, it takes some effort to construct a meaningful example where there are an integer solution to the equations and a real solution to the equations satisfying the inequalities, but no integer solution satisfying the
10
CHAPTER 1. INTRODUCTION
inequalities. Also, there are cases where the existence of such a real solution guarantees the existence of an integer solution. The important new step in this algorithm is Step 3. Different ways to execute this step lead to different d e p e n d e n c e testing methods; we describe two of these m e t h o d s below. The approximate method, which we call the method of bounds, has been successfully i m p l e m e n t e d in s o m e f o r m or another in m a n y restructuring compilers, both in research and commercial environments. This m e t h o d is an i m p l e m e n t a t i o n of Algorithm 1.2, where Step 3 is based on finding extreme values of real-valued linear functions in certain sets. To get an idea about this method, consider Equation (1.10). Define a linear function f : R 4 --* R by f ( x l , Yl,
X2,
Y2) = 3X1 - 6 y l - 2X2.
Let P denote the subset of R 4 defined by the inequalities (1.7)-(1.8), where El, j1, ~2, and j2 are replaced by x l , y l , x2, and 22, respectively. The m i n i m u m and m a x i m u m values of f in P are as follows: n-nn~r = 3 x 0 - 6 x 3O - 2 x 22 = P
m a x f = 3 x 3 0 - 6 x O - 2 x 0 = 90. P
It is clear t h e n that for each point ( x I , Y l , X 2 , Y 2 )
E
P, we m u s t have
- 2 2 4 < f(xl,y~,x2,2Y2) <- 90. In other words, if - 2 2 4 < c < 90 fails to hold, t h e n the equation f(XI,Yl,X2,Y2)
= C
has no (real) solution in P. The converse also h a p p e n s to be true: if - 2 2 4 < c < 90 holds, t h e n the above equation does have a real solution in P. Thus, (1.10) has a solution in P since - 2 2 4 < - 3 8 < 90 is true. This technique can be e x t e n d e d to decide if the s y s t e m of equations (1.9)-(1.10) has a simultaneous (real) solution in P (i.e., a real solution satisfying the inequalities (1.7)-(1.8)). Although the m e t h o d of b o u n d s is applicable, in principle, to any d e p e n d e n c e problem, it is practical for only those problems where
11
the region P is simple e n o u g h to allow easy c o m p u t a t i o n of the ext r e m e values. For s u c h cases, it is n o r m a l l y a p o l y n o m i a l time m e t h o d . This m e t h o d is usually applied to a p r o b l e m that involves a oned i m e n s i o n a l array in a loop nest with c o n s t a n t loop limits. It r e m a i n s practical if we m o v e to a t w o - d i m e n s i o n a l array in s u c h a loop nest; an example is the double loop we have b e e n working o n in this chapter. C o m p u t a t i o n of e x t r e m e values b e c o m e s h a r d e r as we m o v e to higher d i m e n s i o n a l arrays a n d / o r n o n - c o n s t a n t loop limits. For multid i m e n s i o n a l arrays, a relaxed version of the m e t h o d of b o u n d s m a y be used. When it is h a r d to apply the m e t h o d of b o u n d s , we m a y apply a n o t h e r m e t h o d w h i c h we call the method of elimination. This m e t h o d is the following i m p l e m e n t a t i o n of A l g o r i t h m 1.2. A l g o r i t h m 1.3 Given a d e p e n d e n c e p r o b l e m with a s y s t e m of linear e q u a t i o n s a n d a s y s t e m of linear inequalities (each with rational coefficients), this a l g o r i t h m either decides that the particular d e p e n d e n c e does n o t exist, or t e r m i n a t e s with the a s s u m p t i o n that it m a y exist. . Decide if there is an integer solution to the s y s t e m of equations (Algorithm 1.2.1 and T h e o r e m 1.3.6). . If there is no solution, t h e n terminate; the particular d e p e n d e n c e does n o t exist. . Otherwise, find the general integer s o l u t i o n to the s y s t e m of e q u a t i o n s in t e r m s of certain u n k n o w n integer p a r a m e t e r s (Theo r e m 1.3.6). .
.
.
Substitute the general s o l u t i o n into the inequalities to derive a new set of inequalities involving those integer p a r a m e t e r s . Decide if this new s y s t e m of inequalities has a real solution by Fourier's m e t h o d of elimination (Algorithm 1.3.2). If there is no solution, t h e n terminate; the particular d e p e n d e n c e does n o t exist.
7. Otherwise, a s s u m e that the d e p e n d e n c e exists. (The algorithm is inconclusive in this case.)
12
CHAPTER 1. INTRODUCTION
Since Algorithm 1.3.2 (Fourier) is not polynomial [Schr 87], the m e t h o d of elimination is not polynomial either. However, the number of u n k n o w n parameters is typically rather small, and an efficient implementation of Fourier elimination should not be too time consuming. An advantage of this m e t h o d is that Algorithm 1.3.2 gives the ranges of the variables in a hierarchical way. (See the c o m m e n t s after Algorithm 1.3.2.) From that we get a description of all real solutions to the c o m b i n e d system of equations and inequalities. Enumeration may be used then, if desired, to decide if there is also an integer solution and find all integer solutions w h e n they exist. For example, applying it to the system (1.12), we see that there are real solutions, and the general integer solution can be r e p r e s e n t e d by the following set of inequalities: 19 < t3 < 30 ]max (0, t3 - 20)] < t4 < l m i n (30, t3 - 38/3)].
(1.13) (1.14)
It is not immediately clear that there are integer vectors (t3, t4) satisfying these constraints. To establish that, we take t3 = 19 and get 0 < t4 < 6. Thus, (19,0), (19, 1 ) , . . . , (19,6) are integer solutions to (1.12). Now, going back to our original d e p e n d e n c e problem, we can say that there is d e p e n d e n c e be~ween statements S and T. The set of all pairs of instances (S(il, i2), T(jx, j2)) of S and T involved in this dependence can be found as follows: 1. For each integral value of t3 in (1.13), find all integral values of t4 that satisfy (1.14). 2. For each integer vector (t3, t4) satisfying (1.13)-(1.14), find the corresponding value of (~1,31, ~2, 32 ) from (1.11). 3. For each vector (El, J1, ~2, j2 ) found in the previous step, find the corresponding index values (~1, ~2) and (jl, j2) f r o m (1.5)-(1.6). There could be a variation of the m e t h o d of elimination where Algorithm 1.3.2 is applied to the original equations and inequalities; we will not discuss it here. Finally, established integer p r o g r a m m i n g m e t h o d s are the last resort. It is quite possible that for a very large and complicated dependence problem, the cost of an efficient integer p r o g r a m m i n g m e t h o d
13
will be less than the cost of the m e t h o d of elimination (especially if we p e r f o r m e n u m e r a t i o n after elimination). In this book, we give a precise formulation of the (linear) dependence problem, and discuss the solution of simple problems, the m e t h o d of bounds, and the m e t h o d of elimination. For integer programming methods, see [Schr 87] or any standard text on the subject. The rest of the book is organized as follows. In Chapter 2, we study the d e p e n d e n c e problem for a one-dimensional array in a single loop. Many of the important d e p e n d e n c e concepts and techniques are introduced here, and we provide a large amount of detail to give the reader easy access to the subject. Chapter 3 deals with double loops; it acts as a bridge between Chapter 2 and the chapters after 3. The matrix notation introduced in Volume I is extended to perfect loop nests with arbitrary strides in Chapter 4. Chapter 5 deals with a general p r o g r a m where the loops are legally, but not necessarily perfectly, nested. We have tried to show the evolution of ideas and concepts as we move from a single loop to a general program. The emphasis in chapters 2 through 5 is on problem formulation. Simple problems are covered in these chapters as they occur naturally. The m e t h o d of b o u n d s is discussed in Chapter 6; the mathematical results n e e d e d there are described in the appendix. The m e t h o d of elimination is covered in Chapter 7, and some conclusions are given in Chapter 8.
Chapter 2
Single Loops 2.1
Introduction
In this book, we introduce m a n y important d e p e n d e n c e concepts and techniques in terms of a single loop containing assignment statements, then generalize first to a double loop, t h e n to a perfect loop nest, and finally to a program consisting of arbitrarily nested loops. This way the reader is exposed to complicated ideas and notation in steps that makes u n d e r s t a n d i n g of the material easier. This chapter is devoted to a study of the d e p e n d e n c e problem in a single loop. In Section 2.2, we discuss in detail the index and iteration spaces of a single loop. Dependence concepts are introduced in Section 2.3, and the d e p e n d e n c e problem is stated in Section 2.4. In Section 2.5, we give a complete solution to the linear d e p e n d e n c e problem for a onedimensional array. This is one of the simple problems m e n t i o n e d in Chapter 1. Finally, in Section 2.6, we introduce the m e t h o d of b o u n d s and apply it to solve the same problem.
2.2
Index and Iteration Spaces
If the loops in a perfect nest do not all have unit strides, it is important to distinguish between the index and iteration spaces of the nest. The two spaces are b o t h finite subsets of Zm (where m is the n u m b e r of loops in the nest) with the same n u m b e r of points, but the index space is more spread out than the iteration space. In this section, we 15
16
CHAPTER 2. SINGLE LOOPS
introduce these concepts in their simplest form, that is, in terms of a single loop. The model p r o g r a m for this chapter is a single do loop L of the form L:
do l = p, tt, O H(I) enddo
where I is the index variable of the loop, p the initial limit, ~l the final limit, and 0 the stride. The loop limits are integer-valued and the stride is a n o n z e r o integer. The loop body, H(I), is a sequence of assignment statements. The index values (or index points) of L are the values of the index variable I. The index space of the loop is the set of all its index points. The following t h e o r e m characterizes the index values. T h e o r e m 2.1 A n integer I is an index value o f the loop L iff (a) (I - p ) / O is an integer, and (b) 0 < ( I - p ) / O < (el - p)/O. PROOF. We have 0 ~: 0. The index variable I of the loop L is characterized by the following two conditions: 1. I takes the sequence of values: p, p + 0, p + 20 . . . . . 2. I lies between p and ~/: p
I>~l
if0>0 if0<0.
}
The two cases of Condition 2 can be combined into the statement p / O < I/O < el~O,
that holds for all n o n z e r o values of 0. From this, we get 0 <_ (I - p ) l O <- (q - p)iO.
(2.1)
2.2. INDEX AND ITERATION SPACES
17
Thus, I takes precisely t h o s e integral values for which (I - p ) / 0 is a nonnegative integer b e t w e e n 0 a n d (~/- p ) / 0 . [] We define an integer variable ~ by the e q u a t i o n so that
~ = ( I - p)/O,
(2.2)
I = p + ~0.
(2.3)
We call ~ the iteration variable of the loop L. The iteration values (or iteration points) of L are the values of ~, and the iteration space of the loop is the set of all its iteration points. The following t h e o r e m characterizes the iteration values. T h e o r e m 2.2 An integer ~ is an iteration value o f the loop L iff
0 <_~ <_ (q - p)/O.
(2.4)
PROOF. The p r o o f follows f r o m the definition of ~ in (2.2) a n d the inequalities in (2.1). [] T h e o r e m 2.3 The iteration space of L is the set of integers {0, 1 . . . . . ~} where ~ = [(q - p)/OJ. The index space of L is the set of integers { p + f,O : 0 <_~ < ~l}. In the sequential execution of L, the index space is traversed in the direction of increasing ~. PROOF. It follows f r o m T h e o r e m 2.2 that the iteration values are 0, 1 . . . . , ~. The index values are given by (2.3) where ~ d e n o t e s an iteration point. In the sequential execution of L, the iterations of L are t a k e n in the order H ( p ) , H ( p + O),H(p + 20) . . . . , H ( p + dlO). Since the corres p o n d i n g values of ~ are 0, 1, 2 , . . . , ~, it follows that the index space is traversed in the direction of increasing ~. [] Corollary 1 A single loop with an arbitrary stride can be written as an equivalent loop with unit stride. PROOF. Consider the following loop with unit stride:
18
L:
CHAPTER 2. SINGLELOOPS
doJ~ = 0,~,1 H(p + ~0) enddo
The iterations of L are H ( p ) , H(p + 0), H(p + 2 0 ) , . . . , H(p + ~0), a n d they are executed in this order. Hence, L is equivalent to L. [] We refer to ~ as the normalized final limit of L. This loop is n o t e x e c u t e d (i.e., the index a n d iteration spaces are empty) if ~ < 0, which h a p p e n s if either 0 > 0 a n d p > q, or 0 < 0 a n d p < q. Otherwise, the total n u m b e r of iterations (i.e., the n u m b e r of index or iteration points) is ~ + 1, a n d the value of I in the last iteration is p + ~0. There is a one-to-one c o r r e s p o n d e n c e b e t w e e n the index a n d iteration spaces of L. The m a p p i n g I ~ ~ is called loop normalization. The iteration p o i n t s are c o n t i g u o u s while the index p o i n t s generally are not. The iteration variable [ always increases f r o m zero in steps of one as the loop is e x e c u t e d sequentially. The index variable I, on the o t h e r hand, m a y increase or decrease d e p e n d i n g on the sign of 0, a n d will n o t c h a n g e in s t e p s of one if 101 > 1. For this reason, it is easier to w o r k with ~ rather t h a n with I. E x a m p l e 2.1 Consider a single loop of the form: L:
do I = 120, 10, - 3
H(I) enddo
The index variable I decreases f r o m 120 in steps of 3, a n d we write I = 120 - 3~
(2.5)
where ~ is an integer. Also, I m u s t lie b e t w e e n 120 and 10. From the inequalities 120 > 120 - 3~? > 10, we get 0 > - 3 ~ > - 1 1 0 , so that 0 < ~ < 110/3. Here, ~ = L l 1 0 / 3 ] = 36, a n d the iteration p o i n t s of the loop are 0, 1, 2, . . . , 36. The iteration space is the set of these 37 integers. The index p o i n t s of L are f o u n d f r o m (2.5) by plugging in the values of ~: 120,120-3(1),120-
3(2),...,120-
3(36).
2.3. DEPENDENCECONCEPTS
19
Thus, the i n d e x space is t h e set {120, 117, 1 1 4 , . . . , 12}.
EXERCISES 2.2
1. What are the smallest and the largest values of I in the model loop L of this section? 2. Find a closed-form expression for the number of iterations of L, that holds for all values of p, q, and 0 (including cases where the loop fails to execute). 3. Find the index and iteration spaces of the loop L, when (a) p = 0 , q = 9 , 0 = 1 ; (b) p = 17,q = 39,0 = 5; (c) p = -15,q = 20,0 = 2; (d) p = 10,q = -13,0 = -3. 4. Find the number of iterations of L and the value of I in the last iteration, when (a)
p = 0 , q = 1 0 0 0 , 0 --- 1;
(b)
p = -17,
(c)
p = -15,q
(d)
p = 1001,q
q = 390,0 = -200,0 = -137,0
= 4; = 3; = -7.
2.3 Dependence Concepts C o n s i d e r two, n o t n e c e s s a r i l y distinct, a s s i g n m e n t s t a t e m e n t s S a n d T in o u r m o d e l loop L:
p,q,O H(I)
do/=
enddo If S lexically p r e c e d e s T in t h e p r o g r a m , we write S < T. The m e a n i n g of the n o t a t i o n S < T is t h e n clear, a n d it is o b v i o u s t h a t < is a t o t a l o r d e r in the set of a s s i g n m e n t s t a t e m e n t s in the p r o g r a m . An i n d e x p o i n t (i.e., a value o f t h e i n d e x variable I) d e t e r m i n e s a n i n s t a n c e of e a c h a s s i g n m e n t s t a t e m e n t in t h e loop. Since, by h y p o t h esis, t h e r e are no c o n d i t i o n a l s in the p r o g r a m , all s u c h i n s t a n c e s are executed. Let S(i) d e n o t e t h e i n s t a n c e of s t a t e m e n t S d e t e r m i n e d by an i n d e x p o i n t i, a n d T(j) t h e i n s t a n c e of s t a t e m e n t T d e t e r m i n e d b y
20
CHAPTER 2. SINGLELOOPS
an index p o i n t j. The distance f r o m S(~) to T(j) is defined to be the integer (3 - ~), where ~ and 3 are the iteration p o i n t s c o r r e s p o n d i n g to i a n d j, respectively. 1 We leave to the reader the p r o o f of the following result. L e m m a 2.4 Consider two assignment statements S and T in the single loop L. Let d denote the distance from an instance S(~) of S to an instance T (j) ofT. In the sequential execution of the program, S ( ~) is executed before T(j) iff either d > O, or d = 0 a n d s < T. The c o n c e p t of d e p e n d e n c e can be i n t r o d u c e d in m a n y different contexts. We define d e p e n d e n c e first b e t w e e n s t a t e m e n t instances, a n d t h e n b e t w e e n statements. An instance T(j) of a s t a t e m e n t T depends on an instance S(~) of a s t a t e m e n t S, if there exists a m e m o r y location .~M s u c h that 1. Both S(i) and T(j) reference (read or write) _W/; 2. S(i) is executed before T(j) in the sequential execution of the program; 3. During sequential execution, the location M is n o t written in the time period f r o m the e n d of execution of S(i) to the b e g i n n i n g of execution of T(j). The s t a t e m e n t s S and T n e e d n o t be distinct, b u t C o n d i t i o n 2 requires that the instances S(i) a n d T(j) be distinct. Let ~ and 3 d e n o t e the iteration values c o r r e s p o n d i n g to i and j, respectively. This dependence is loop-carried if ~ < 3; it is loop-independent if ~ = 3 (i.e., if i = j). In the l o o p - i n d e p e n d e n t case, we necessarily have S < T since S(~) is to be executed before T(i). Since a m e m o r y reference is either a "read" or a "write," a pair of s t a t e m e n t instances can reference the s a m e m e m o r y location in four different ways. This leads to four different types of d e p e n d e n c e :
1. T(j) is flow dependent on S(~), if S(~) writes ~4 a n d T(j) reads it (the value c o m p u t e d by S(~) is u s e d by T(j)); 1See [Pugh 92b] for concerns expressed about the proper definition of dependence distance, and [Wolf 94] for related comments.
2.3. DEPENDENCECONCEPTS
21
2. T(j) is anti-dependent o n S(~), if S(~) reads ~4 a n d T(j) writes it (S(~) u s e s the value in ~M before it is c h a n g e d by T(j)); . T(j) is output dependent o n S(~), if S(~) a n d T(j) b o t h write ~v/ (the value c o m p u t e d by T(j) is s t o r e d after the value c o m p u t e d by S(i) is stored);
4. T(j) is input dependent o n S(i), if b o t h S(~) a n d T(j) read ~M (the "read" by T(j) c o m e s after the "read" by S(~)). Now, we consider d e p e n d e n c e b e t w e e n two s t a t e m e n t s . A statem e n t T depends o n a s t a t e m e n t S, if there is at least one instance S(i) of S a n d one instance T(j) of T, s u c h that T(j) d e p e n d s o n S(i). We can be m o r e specific. For example, T is flow dependent on S if there is an instance pair (S(~), T(j)) s u c h that T(j) is flow d e p e n d e n t on S(~). One m a y similarly define w h a t is m e a n t by T is anti-dependent, output dependent, or input dependent o n S. The dependence of T o n S can be formally defined to be the set of all instance pairs (S(i), T(j)) s u c h that T(j) d e p e n d s o n S(~). Thus, T d e p e n d s o n S iff the d e p e n d e n c e of T o n S is n o n e m p t y . Sometimes, it is c o n v e n i e n t to say that there is d e p e n d e n c e between S a n d T if either T d e p e n d s o n S, or S o n T (or both). The d e p e n d e n c e of T on S can be s e p a r a t e d into two parts: the loop-carried part, consisting of all instance pairs (S(~), T(j)) s u c h that T(j) d e p e n d s o n S(~) a n d ~ < j; a n d the loop-independent part, consisting of all instance pairs (S(~), T(i)) s u c h that T(~) d e p e n d s on S(~). The l o o p - i n d e p e n d e n t part is necessarily e m p t y if T < S. We can also break u p the d e p e n d e n c e of T o n S by the type of d e p e n d e n c e b e t w e e n a pair of instances. This way we get four subsets: . The flow dependence of T on S consists of all instance pairs (S(i), T(j)) s u c h that T(j) is flow d e p e n d e n t on S(~); . The anti-dependence of T o n S consists of all instance pairs (S(i), T(j)) s u c h that T(j) is a n t i - d e p e n d e n t on S(~); . The output dependence of T o n S consists of all instance pairs (S(~), T(j)) s u c h that T(j) is o u t p u t d e p e n d e n t o n S(~); . The input dependence of T on S consists of all instance pairs (S(i), T(j)) s u c h that T(j) is i n p u t d e p e n d e n t o n S(i).
22
CHAPTER 2. SINGLE LOOPS
These subsets are not necessarily pairwise disjoint (e.g., T ( j ) m a y be both flow d e p e n d e n t and a n t i - d e p e n d e n t on S(i)). The d e p e n d e n c e b e t w e e n two s t a t e m e n t s can be described in t e r m s of the p r o g r a m variables 2 involved. (See Section Io4.3.) The instances of the o u t p u t variable of a s t a t e m e n t S d e t e r m i n e the m e m o r y locations written by the instances of S. On the other hand, the instances of the input variables of S d e t e r m i n e the m e m o r y locations read by the instances of S. Let u d e n o t e a variable of s t a t e m e n t S and v a variable of s t a t e m e n t To The pair (u, ~ ) causes a d e p e n d e n c e of T on S, if there are s t a t e m e n t instances S(i) and T(j), such that T ( j ) d e p e n d s on S(i), and the c o r r e s p o n d i n g m e m o r y location ~4 is r e p r e s e n t e d by both the instance of u for I = ~ and the instance of v for I = j. If u is the o u t p u t variable of S and ~ an input variable of T, t h e n (u, v) can cause a flow d e p e n d e n c e of T on S, or an anti-dependence of S on T, or both. The o u t p u t variables of S and T can cause an o u t p u t d e p e n d e n c e of T on S, a n d / o r of S on T. Two input variables of the s t a t e m e n t s can cause only an input dependence. A distance for the d e p e n d e n c e of T on S is the distance ( 3 - ~) f r o m an instance S(i) of S to an instance T ( j ) of T, where T ( j ) d e p e n d s on S(~). A given d e p e n d e n c e has at least one, but usually m a n y distances. Since S(i) m u s t be executed before T ( j ) by the definition of dependence, we have the following result as a direct c o n s e q u e n c e of Lemma 2.4.
Theorem 2.5 If a statement T depends on a s t a t e m e n t S in the loop L, then each dependence distance satisfies d > O. The equality d = 0 is possible only if S < T. The relation of d e p e n d e n c e b e t w e e n s t a t e m e n t s is d e n o t e d by ~, and we write S ~ T to indicate that T d e p e n d s on S. 3 The statement dependence graph of the given p r o g r a m is the directed graph that represents the relation ~. (In other words, it is the graph of ~; see Section 1.1.4.) We d e n o t e flow dependence, anti-dependence, output
2Weoften use the term "variable" somewhat loosely, although the exact meaning should always be clear from the context. In the current context, for "program variables," we may take the output variable X(I) of S and the input variable X (I-2) of T in Example 2.2 on the next page. 3Often, ~ is used as a generic symbol for dependence. For example, one may write S(i) ~ T(j) to denote dependence between statement instances.
2.3. DEPENDENCE CONCEPTS
23
d e p e n d e n c e , a n d i n p u t d e p e n d e n c e b y t h e s y m b o l s 6 f, 6 a, 6 °, a n d 6 i, respectively. (Thus, for example, S 6 f T m e a n s T is flow d e p e n d e n t on S.) T h e s e r e l a t i o n s m a y overlap, a n d t h e y c o n s t i t u t e the m a i n relation of d e p e n d e n c e : 6 = 6f ~J 6a ~j 6 ° u 6 i. F r o m this point, d e p e n d e n c e will n o t i n c l u d e i n p u t d e p e n d e n c e , u n l e s s o t h e r w i s e stated, The transitive c l o s u r e ~ o f t h e relation 6 is the relation o f indirect d e p e n d e n c e . Thus, a s t a t e m e n t T is indirectly d e p e n d e n t o n a s t a t e m e n t S if t h e r e is a (directed) p a t h f r o- -m S to T in the s t a t e m e n t
d e p e n d e n c e graph. In symbols, we have S 6 T if there is a n o n e m p t y sequence of statements Sl, S 2 , . . . , SN such that S = S1, Sl 6 S 2 . . . . . SN-1 6 S N , SN = T.
Indirect d e p e n d e n c e b e t w e e n s t a t e m e n t i n s t a n c e s is d e f i n e d similarly. E x a m p l e 2.2 C o n s i d e r the l o o p L. S
,
T" U"
do I = 4, 100, 2 X ( I ) --- X ( I ) + 1 Y ( 2 I - 4 ) = X ( I ) + X ( I + I) + X ( I + 2) + X ( I - 2) Y(I) = Z(I) enddo
The i n d e x v a l u e s o f t h e l o o p are 4, 6, 8 , . . . , 100, a n d the c o r r e s p o n d i n g i t e r a t i o n v a l u e s are 0, 1, 2 . . . . ,48. The first three iterations of L along with the last o n e are s h o w n below: S(4)" T(4) : U(4)"
X(4) = X(4) + 1 Y(4) = X(4) + X(5) + X ( 6 ) + X(2) Y(4) = Z(4)
S(6)T(6) • U(6)"
X(6) = X(6) + 1 Y(8) = X(6) + X(7) + X(8) + X(4) Y(6) = Z(6)
S(8) : T(8) :
X(8) = X(8) + 1 Y(12) -- X(8) + X(9) + X(10) + X(6)
u(8) :
Y(8) = z(8)
;
S(100) : T(100) : U(100) :
•
X(100) = X(100) + 1 Y(196) = X(100) + X(101) + X(102) + X(98) Y(100) = Z(100)
24
CHAPTER 2. SINGLELOOPS
Figure 2.1: S t a t e m e n t d e p e n d e n c e g r a p h for Example 2.2. (Statement i n s t a n c e s are always labeled by index values.) F r o m this pattern, it is easy to figure o u t the d e p e n d e n c e s t r u c t u r e of the program. Note the following facts: 1. S does n o t d e p e n d o n itself. 2. Fix the o u t p u t variable X(I) of S a n d take the i n p u t variables of T, one at a time, in the o r d e r of their appearance. We can m a k e these observations: (a) X(I) of S a n d X(I) of T cause a flow d e p e n d e n c e of T on S. This d e p e n d e n c e is l o o p - i n d e p e n d e n t ; it has no loopcarried part. The only d e p e n d e n c e distance is 0. (b) X(I) of S and X ( I + 1) of T do n o t cause a d e p e n d e n c e b e t w e e n S and To (c) X(I) of S a n d X(I + 2) of T cause an a n t i - d e p e n d e n c e of S o n T. This d e p e n d e n c e is loop-carried; it has no loopi n d e p e n d e n t part. The only d e p e n d e n c e distance is 1. (d) X(I) of S a n d X ( I - 2) of T cause a flow d e p e n d e n c e of T on S that is also loop-carried, w i t h o u t any l o o p - i n d e p e n d e n t part. Again, the only d e p e n d e n c e distance is 1. 3. There is an o u t p u t d e p e n d e n c e of U on T. It has a loop-carried p a r t a n d a l o o p - i n d e p e n d e n t part. There are several d e p e n d e n c e distances: 0, 1, 2 . . . . . 4. U d o e s n o t d e p e n d o n S, b u t U d e p e n d s on S indirectly.
2.4.
DEPENDENCE
PROBLEM
25
The s t a t e m e n t d e p e n d e n c e g r a p h for the p r o g r a m is given in Figure 2.1. Strictly speaking, this figure s h o w s the s u p e r p o s i t i o n of the g r a p h s of the r e l a t i o n s 8 f, 8 a, 8 °. (We have i g n o r e d i n p u t d e p e n d e n c e s b e t w e e n s t a t e m e n t s , a n d will u s u a l l y do so in f u t u r e examples.) Note the d i f f e r e n t m a r k i n g s o n the edges: the triangle r e p r e s e n t s flow dep e n d e n c e , the d a s h a n t i - d e p e n d e n c e , a n d the circle o u t p u t dependence. If we w a n t to s h o w m o r e d e t a i l e d d e p e n d e n c e i n f o r m a t i o n , t h e n a m o r e d e t a i l e d g r a p h will be necessary. For example, to emp h a s i z e t h a t t h e r e are two pairs of p r o g r a m variables e a c h of w h i c h c a u s e s a flow d e p e n d e n c e of T o n S, we w o u l d create a g r a p h w i t h two flow d e p e n d e n c e e d g e s f r o m S to T, e a c h labeled w i t h t h e a p p r o p r i a t e variables. EXERCISES 2.3
1. Take two iteration points ~ and ~ of the model loop L. Let i and j denote the corresponding index points. Let d = ) - [ and d' = j - ~. How is d' related to d? Explain the difficulties that may arise if the distance from a statement instance S(~) to a statement instance T ( j ) is defined to be d'. 2. Write down the set of all pairs of statement instances that constitute the flow dependence of T on S in Example 2.2. Show the different subsets that represent the flow dependences caused by different pairs of variables. 3. Write down the set of all pairs of statement instances that constitute the output dependence of U on T in Example 2.2. Give all the distances for this dependence. 4. Are there any input dependences between statements in Example 2.2? 5. Give a complete description of the dependence structure of the loop in Example 2.2 (including a dependence graph) after changing the loop-header to
2.4
(a) L:
d o I = 4,100,1
(b) L:
doi=4,100,3
(c) L :
d o I = 100, 4, -2
Dependence Problem
C o n s i d e r a g a i n t h e m o d e l loop L a n d two s t a t e m e n t s S a n d T in its body. Let u d e n o t e a variable of S a n d v a variable o f T. Now, we a d d r e s s the practical p r o b l e m o f d e t e r m i n i n g w h e t h e r u a n d v cause a d e p e n d e n c e b e t w e e n the s t a t e m e n t s : a d e p e n d e n c e of T o n S, or
26
CHAPTER 2. SINGLE LOOPS
of S on T, or both. In this context, we are not interested in the type (flow, anti-, output, or input) of dependence. This book focuses on array elements, and here we consider the case where u and v are b o t h elements of a given array X. Throughout this chapter, we a s s u m e that X is one-dimensional, unless otherwise specified. Let u = X ( f ( I ) ) and v = X ( g ( I ) ) , where f and ~ are realvalued functions that r e t u r n integer values for integer values of I. When S < T, u is the o u t p u t variable of S, and v an input variable of T, we m a y partially r e p r e s e n t the model p r o g r a m as follows: L.
d o / = p,~l,O :
S.
X(f(I)) .... :
T:
.......
X(g(I) ) . • •
:
enddo T h e o r e m 2.6 gives necessary conditions u n d e r which u and v will cause a d e p e n d e n c e between S and T. T h e o r e m 2.6 Consider a n y two statements S a n d T in the loop L. Let X ( f ( I ) ) denote a variable o f S and X ( g ( I ) ) a variable o f T , where X is a one-dimensional array. I f these variables cause a dependence between S a n d T, then the equation f (p + 0~) - e ( p + 03) = 0
(2.6)
has an integer solution (~, 3) such that O<~
and
O<j
(2.7)
where ~ = / ( ¢ / - p ) / O]. In each o f the following special cases, the solution satisfies an additional condition: (a) I f S < T a n d S c5T, then ~ < 3; (b) I f S = T a n d S ~ S, then ~ < j; (c) I f S < T a n d T ~ S, then ~ > 3.
2.4. DEPENDENCE PROBLEM
27
PROOF. Suppose that the variables in q u e s t i o n do cause a d e p e n d e n c e b e t w e e n S a n d T. Then, there is an instance S(i) of S a n d an instance T(j) of T, such that either T(j) d e p e n d s o n S(i), or S(i) d e p e n d s o n T(j). In either case, there is a m e m o r y location M that is referenced by b o t h S (i) and T (j), a n d that is r e p r e s e n t e d by the instance X ( f ( i ) ) of X ( f ( I ) ) in S (i) and also by the instance X ( g (j)) of X ( g (I)) in T(j). This implies that we m u s t have f(i) = g(j).
Next, we restate the p r o b l e m in t e r m s of iteration values. Let ~ and 3 d e n o t e the iteration values c o r r e s p o n d i n g to i and j, respectively. The index a n d iteration values are c o n n e c t e d by the e q u a t i o n s (see (2.3)): i = p + 0~ and j = p + 03, where ~ a n d 3 satisfy (2.7) ( T h e o r e m 2.2). If we switch over to iteration values, the e q u a t i o n f ( i ) = ~ (j) b e c o m e s f ( p + 0~) = a ( p + 03)
which is (2.6). The additional c o n d i t i o n (a), (b), or (c) in the t h e o r e m follows f r o m the part of the definition of d e p e n d e n c e that p o s t u l a t e s the ordering of the s t a t e m e n t instances. If T d e p e n d s on S, t h e n the instance S(i) m u s t be executed before the instance T (j) in the sequential execution of the loop. In that case, if S < T, we m u s t have ~ < 3; b u t if S = T, t h e n the strict inequality ~ < 3 has to hold (since S(i) c a n n o t d e p e n d on itself). If S d e p e n d s on T, t h e n the instance S(i) is executed after the instance T(j). In that case, if S < T, we m u s t have ~ > 3. [] The equation f ( i ) = ~ ( j ) is the dependence equation for the prog r a m variables X ( f ( I ) ) a n d X(~(I) ) in terms of index values. The index values i a n d j satisfy the c o n d i t i o n s ( T h e o r e m 2.1): 1. (i - p) / 0 is an integer, 2. 0 <_ ( i - p ) / O <_ (q - p)/O,
3. (j - p) / O is an integer, 4. 0 < ( j - p)/O <_ (q - p)/O.
CHAPTER 2. SINGLE LOOPS
28
These c o n d i t i o n s are called the dependence constraints for X ( f ( I ) ) a n d X (t~ (I) ) in terms of index values. Equation (2.6) is the dependence equation for the p r o g r a m variables X (J: ( I ) ) and X (y ( I ) ) in terms of iteration values. The inequalities in (2.7) are the dependence constraints for the same variables in terms of iteration values. In this book, dependence e q u a t i o n s a n d constraints are e x p r e s s e d in t e r m s of iteration values, u n l e s s otherwise stated. Note that we c a n n o t expect to have a strict converse to Theor e m 2.6, since its p r o o f does n o t take into account the third c o n d i t i o n of d e p e n d e n c e that p r e c l u d e s writing of the m e m o r y location M in t h e time period b e t w e e n execution of instances S(i) a n d T(j). Supp o s e there is a solution ([, 3) to Equation (2.6) that satisfies (2.7). This tells us that there are instances S(i) of S and T ( j ) of T, b o t h of which reference a certain m e m o r y location 0Vl. The ordering of ~ a n d ~ gives the order of execution of the two instances. However, we do n o t k n o w if .w/will be w r i t t e n in the period b e t w e e n the two executions. T h e o r e m 2.7 is a weak converse to T h e o r e m 2.6; we omit the proof. T h e o r e m 2.7 Suppose that Equation (2. 6)has an integer solution ([, 3)
that satisfies the conditions in (2. 7). (a) If ~ < 3, then S -~ T. -
-
(b) If ~ > 3, then T ~ S. -
-
(c) If [ = ~ and S < T, then S ~ T. When checking for d e p e n d e n c e b e t w e e n two s t a t e m e n t s c a u s e d by a pair of p r o g r a m variables, we ignore all other variables and statem e n t s . Thus, we o f t e n abuse language a n d say "T d e p e n d s o n S" w h e n we k n o w m e r e l y that T d e p e n d s on S indirectly. (Note that m o s t loop t r a n s f o r m a t i o n s are effectively guided by indirect a n d n o t ordinary dependence.) The dependence problem for the variable X(J:(I)) of S a n d the variable X ( g ( I ) ) of T consists of finding all integer solutions (~, 3) to the d e p e n d e n c e equation (2.6) that satisfy the d e p e n d e n c e c o n s t r a i n t s (2.7), a n d t h e n partitioning the solution set into three subsets b a s e d o n w h e t h e r ~ < 3, or ~ > 3, or ~ = 3. The following example shows h o w T h e o r e m 2.7 is applied to check for d e p e n d e n c e b e t w e e n s t a t e m e n t s in a single loop, c a u s e d by elem e n t s of a o n e - d i m e n s i o n a l array.
2.4.
29
DEPENDENCE PROBLEM
E x a m p l e 2.3 C o n s i d e r a p r o g r a m s e g m e n t o f the form: L"
do/=
1,100,2 .
S"
X(2I) . . . . .
T"
• ......
X ( 3 I + 1) • • •
.
enddo Let u s t e s t for d e p e n d e n c e b e t w e e n the s t a t e m e n t s S a n d T c a u s e d by the two variables s h o w n . Here, we have p = 1, q = 100, 0 = 2, f ( I ) = 2I a n d ~ ( I ) = 3I + 1. Since f(p
+ 0~) - ~ ( p + 0 j ) = 2(p + 0~) - 3 ( p + 0 j ) - 1 = 2(1 + 2~) - 3(1 + 2~) - 1 = 4 [ - 6 ~ - 2,
the d e p e n d e n c e e q u a t i o n (2.6) b e c o m e s 4~ - 6~ = 2.
(2.8)
Since c~ = [ (100 - 1 ) / 2 ] = 49, the c o n s t r a i n t s in (2.7) b e c o m e 0 < [ < 49
and
0 < ~ < 49.
Thus, we are s e e k i n g i n t e g e r s o l u t i o n s (~,j) to (2.8) s u c h t h a t ~ a n d ~ are non_negative i n t e g e r s n o t larger t h a n 49. One s u c h s o l u t i o n is clearly (2, 1), a n d t h e r e are m a n y o t h e r s . However, a n y s u c h s o l u t i o n m u s t n e c e s s a r i l y s a t i s f y ~ > ~ (Exercise 3). Thus, we c o n c l u d e (by T h e o r e m 2.7) t h a t T 6 S h o l d s a n d t h a t S 6 T d o e s not. W h e t h e r T 6 S is t r u e w o u l d d e p e n d o n t h e p a r t o f t h e loop b o d y n o t s h o w n . -
-
_
_
EXERCISES 2.4
1. In Theorem 2.6, provide the additional condition in the special case: (a) T has a loop-carried dependence on S; (b) S has a loop-carried dependence on T;
CHAPTER 2. SINGLE LOOPS
30
(c) S < T and T has a loop-independent dependence on S; (d) T < S and S has a loop-independent dependence on T. 2. Prove Theorem 2.7. 3. Explain why each nonnegative solution (~,j) to (2.8) must satisfy ~ > j. 4. Find the dependence equation and dependence constraints in Example 2.3 after changing the loop-header to (a) L :
d o I = 1,100, - 2
(b) L:
doi=5,100,3
In each case, decide if T depends indirectly on S, or S on T, or both. 5. Let X denote a two-dimensional array. Extend the concepts introduced so far to give a description of the dependence problem posed by the two variables in the following loop: L: do I = 0,100,1 S: X(2I, 3I + 1) . . . . T: ....... X(3I+1,4I+3)... enddo Can you solve this problem? What are your conclusions?
2.5
Solution to the Linear P r o b l e m
In a single loop, t h e d e p e n d e n c e e q u a t i o n for t w o a r r a y e l e m e n t s o f t h e f o r m X ( a I + ao) a n d X ( b I + bo) t u r n s o u t t o be a l i near diop h a n t i n e e q u a t i o n in two variables. We saw this in E xam pl e 2.3. We s t a r t this s e c t i o n w i t h a d i s c u s s i o n o n t w o-vari abl e linear d i o p h a n t i n e e q u a t i o n s , a n d t h e n we will r e t u r n to t he d e p e n d e n c e p r o b l e m . A lin ear d i o p h a n t i n e e q u a t i o n in a n y n u m b e r o f v a r i a b l e s c a n be s o l v ed b y T h e o r e m 1.3.5 a n d A l g o r i t h m 1.2.1. We d e s c r i b e h e r e a m e t h o d t h a t is c u s t o m i z e d f o r t he two-variable case. C o n s i d e r first t h e e x t e n d e d Euclid's a l g o r i t h m (see [Knut 73] a n d [Knut 81]).
A l g o r i t h m 2.1 (Extended Euclid's algorithm). Given two i n t e g e r s a a n d b, this a l g o r i t h m finds g = g c d ( a , b), a n d two i n t e g e r s xo, Y0 s u c h t h a t a x o - b y o = g. T h e a l g o r i t h m u s e s six auxiliary i n t e g e r variables: u , v , h, q, ~, ft. ( u , v ) -- ( 1 , 0 ) (g,/~) ~ ( lal, Ibl)
2.5. SOLUTION TO THE LINEAR PROBLEM
31
do while h > 0 q "- L 0 1 h J (~x,/7) .- (ix - q v , ~ o - q h )
(u, v) -
(v, a)
(~, h) -- (h, ~) enddo xo ~ sig(a) • u ifb =0 t h e n Yo ~ 0 e l s e Yo ~- ( a x o - y ) / b .
The e x t e n d e d Euclid's a l g o r i t h m is a special case of A l g o r i t h m 1.2.1. Note that we have t a k e n the m i n u s sign in a x o - b y o = g only to conf o r m to the n o t a t i o n of o u r d e p e n d e n c e p r o b l e m . A l t h o u g h the gcd g is u n i q u e , the i n t e g e r s xo a n d )'o in the exp r e s s i o n g = a x o - b y o are not. However, any t w o integers w i t h this p r o p e r t y will w o r k w h e n we try to solve an e q u a t i o n of the f o r m a x - b y = c, as we see below. 2.8 Let a, b, c denote integers such that a and b are not both zero, and let g = g c d ( a , b ). The linear diophantine equation
Theorem
ax - by = c
(2.9)
has a solution iff g divides c. When a solution exists, the general solution is given by (x,y)
= (Cxo/g + bt/g, cyo/g + at/g),
where t is an arbitrary integer parameter, and (x0,)'o) is a pair of integers such that ~ = axo - byo. PROOF. (axo (s, t)U, integer
Y0 The i n t e g e r m a t r i x U = ( b/gX°a/g ) is u n i m o d u l a r since d e t ( U ) = b)'o)/~l = 1. A n y 2-vector ( x , y ) can b e w r i t t e n as ( x , y ) = w h e r e (s, t) = ( x , y ) U -~. Since U is u n i m o d u l a r , ( x , y ) is an v e c t o r iff (s, t) is a n integer vector. Now, ( x , ) , ) is a s o l u t i o n
CHAPTER 2. SINGLELOOPS
32
to E q u a t i o n (2.9) iff
c = ax - by = (x, y )
-b
= (s,t)U
-b
= (s,t)
b/g
a/g
-b
= ( s ' t ) ( a x ° -)b y0°
= ,.q~. If B d o e s n o t divide c, t h e n s is n o t an i n t e g e r a n d h e n c e (x, y ) c a n n o t b e an i n t e g e r vector. Therefore, Equation (2.9) h a s no integer s o l u t i o n in this case. If B d o e s divide c, t h e n all i n t e g e r s o l u t i o n s to (2.9) are given b y t h e f o r m u l a ( x , y ) = (c/B, t)U, w h e r e t is a n a r b i t r a r y integer p a r a m e t e r . This f o r m u l a gives t h e e x p r e s s i o n s for x a n d y in the t h e o r e m . [] This t h e o r e m is a b a s i c r e s u l t of e l e m e n t a r y n u m b e r theory. It is a v a r i a t i o n o f Corollary 1 to T h e o r e m 1.3.5; here we gave a direct p r o o f for the sake of c o m p l e t e n e s s . E x a m p l e 2.4 Let u s solve the d i o p h a n t i n e e q u a t i o n 21x-
3 4 y = 2.
(2.10)
Here w e have a = 21, b = 3 4 , c = 2. First, a p p l y A l g o r i t h m 2.1 to a a n d b. The a s s i g n m e n t s in the w h i l e l o o p are s h o w n in Table 2.1. Near the e n d of the algorithm, we get g c d ( 2 1 , 3 4 ) = ~ = 1, xo = 13, Y0 = (21 x 13 - 1 ) / 3 4 = 8,
2.5. SOLUTION TO THE LINEAR PROBLEM
33
Table 2.1: Steps o f A l g o r i t h m 2.1 for a = 21 a n d b = 34. so that a x o - b~yo = 2 1 ( 1 3 ) - 34(8) = 1 = 9. By T h e o r e m 2.8, Equation (2.10) h a s a solution, a n d the general s o l u t i o n is given b y ( x , y ) = (26 + 34t, 16 + 21t), w h e r e t is a n i n t e g e r p a r a m e t e r . Taking t = 0, 1, - 1 , w e get t h e particular solutions: (26, 16), (60, 37), ( - 8 , - 5 ) .
We n o w r e t u r n to the d e p e n d e n c e p r o b l e m b e t w e e n t w o statem e n t s S a n d T in a l o o p of the f o r m L:
do/=
p,q,O
H(I) enddo
p o s e d b y a variable X ( a I + ao) of S a n d a variable X ( b I + bo) o f T, w h e r e a, ao, b, a n d bo are i n t e g e r c o n s t a n t s . Since h e r e we have f ( I ) = a I + ao
and
9 ( I ) = b I + bo,
the d e p e n d e n c e e q u a t i o n (2.6), n a m e l y f ( p + 0~) - g ( p + 0.~) = O, becomes [ a ( p + 0~) + ao] - [ b ( p + 0.~) + b0] = 0
34
CHAPTER 2. SINGLE LOOPS
or
(aO)~ - ( b O ) ] = bo - ao + (b - a ) p .
(2.11)
We rewrite the d e p e n d e n c e constraints (2.7):
t
(2.12)
where t~ = [ (q - p)/OJ. Here, the d e p e n d e n c e problem consists of finding all integer solutions (~,j) to Equation (2.11) subject to the constraints (2.12), and partitioning the solution set into three subsets based on whether ~ is less than, greater than, or equal to j. A solution (~,~) indicates d e p e n d e n c e of T on S if ~ < 2, d e p e n d e n c e of S on T i f ~ > ), and d e p e n d e n c e o f T o n S i f ~ = ~ i n c a s e S < T. (See T h e o r e m 2.7 and the c o m m e n t following it about not distinguishing between d e p e n d e n c e and indirect dependence.) The following t h e o r e m gives a necessary condition for dependence, that is not always sufficient. T h e o r e m 2.9 (gcd test). I f a variable X ( a I + ao) o f a s t a t e m e n t S a n d a variable X ( b I + bo) o f a s t a t e m e n t T cause a d e p e n d e n c e b e t w e e n S a n d T in the loop L, then the integer (bo - ao + b p - a p ) is an integral multiple o f the integer O . gcd(a, b).
PROOF. We consider only the nontrivial case where a and b are not both zero. As we just saw, the d e p e n d e n c e equation in this case is (2.11). By Theorem 2.6, the existence of d e p e n d e n c e between statements S and T implies the existence of an integer solution to (2.11). By Theorem 2.8, that is possible iff g c d ( a 0 , bO) divides the right-handside of (2.11). Since gcd(a0, bO) = 10] .gcd(a, b) (Lemma 1.3.3(e)), it follows that 0 . g c d ( a , b) divides the integer (bo - ao + b p - a p ) . [] W. L. Cohagan [Coha 73] first recognized the usefulness of the following result in d e p e n d e n c e analysis: Consider t w o m a p p i n g s f a n d g o f Z into Z, defined by f ( I ) = a I + ao a n d g ( I ) = b I + bo, w h e r e a, a0, b, a n d bo are integers. The r a n g e s o f f a n d ~ are disjoint iff gcd(a, b) does not divide ( bo - ao ). It follows immediately from The-
o r e m 2.8, since these ranges intersect iff there is an integer solution
2.5. SOLUTION TO THE LINEAR PROBLEM
35
to the e q u a t i o n a~c - b y = b0 - ao. Application of this result to d e p e n d e n c e analysis is c o n t a i n e d in T h e o r e m 2.9 (see Example 2.4). Algorithm 2.2 given below solves the linear d e p e n d e n c e p r o b l e m in a single loop. It originated in [Bane 79]. (See also [WoBa 87].) Its current f o r m is a d a p t e d f r o m A l g o r i t h m 1.3.1; see Section 1.3o5 for m o r e details. The basic idea of the a l g o r i t h m is as follows. Check to see if the d e p e n d e n c e e q u a t i o n (2.11) has an integer solution. If not, t h e n there is no d e p e n d e n c e . Otherwise, find the set of all solutions and partition it into three subsets, s u c h that o n e subset consists of all solutions (~, 3) where ~ < 3, a n o t h e r consists of all solutions (~, 3) where ~ > 3, a n d the third consists of all s o l u t i o n s of the f o r m (~, ~). To check the existence of a solution to (2.11), we apply the gcd test in two stages. First, see if 0 divides the integer [b0 - ao + (b - a ) p ] . If it does, t h e n test w h e t h e r [b0 - ao + (b - a ) p ] / O is an integral multiple of g c d ( a , b). If it is (i.e., if the gcd test passes), t h e n find all solutions to (2.11). The s e c o n d stage of the gcd test a n d c o m p u t a t i o n of the solution set are d o n e separately for the three cases: a = b = 0, a -- b ~= 0, and a ~= b. For the first two cases, it is trivial to find g c d ( a , b) a n d t h e n solutions to (2.11) if they exist. Euclid's algorithm a n d T h e o r e m 2.8 n e e d to be u s e d explicitly only for the case a ~ b. A l g o r i t h m 2.2 Given integers p, ~/, 0 ( ~ 0), a, ao, b, and b0, s u c h that we have a single loop of the f o r m L:
d o / - - p, cl, O U(I)
enddo two a s s i g n m e n t s t a t e m e n t s S and T in the loop b o d y H ( I ) with S < T, a variable X ( a I + ao) of S a n d a variable X ( b I + bo) of T, where X is a o n e - d i m e n s i o n a l array, this algorithm • Decides if the variables X ( a I + ao) of S a n d X ( b I + bo) of T cause a d e p e n d e n c e of T on S, or a d e p e n d e n c e of S on T; • Finds the set ~1 of all iteration value pairs ([,3), s u c h that [ < 3 a n d the instance T ( j ) of T d e p e n d s on the instance S(i) of S (the index values c o r r e s p o n d i n g to [ and 3 are i a n d j); • Finds the set ~-1 of all iteration value pairs (~, 3), s u c h that 3 < [ a n d the instance S(i) d e p e n d s o n the instance T(j);
CHAPTER 2. SINGLELOOPS
36
• In case S < T, finds the set ~0 of all iteration value pairs (~, ~), s u c h that the i n s t a n c e T(i) d e p e n d s on the i n s t a n c e S(i); • Finds the d i s t a n c e s for each loop-carried d e p e n d e n c e . 1. Set ~ ~ [(q - p)/O]. If ~ < 0, t h e n t e r m i n a t e the algorithm; the l o o p is n o t e x e c u t e d . 2. Set c o ~ b o - a o + ( b - a ) p . [The d e p e n d e n c e e q u a t i o n ( 2 . 1 1 ) b e c o m e s O(a~ - b3) = co.l If co m o d 0 ~ 0, t h e n t h e r e is no d e p e n d e n c e b e t w e e n statem e n t s S a n d T; t e r m i n a t e the algorithm. 3. [Here, 0 d o e s divide co.] Set c ~ c o ! 0 . [The d e p e n d e n c e e q u a t i o n is a~ - b3 = c.] 4. Initialize ~1, ~-1, ~o to the e m p t y set. 5. [There are t h r e e c a s e s b a s e d o n w h e t h e r a and b are equal to e a c h o t h e r or to zero.] Select t h e p r o p e r case: C a s e (a = b = 0): Go to Step 6. C a s e (a = b ~: 0): Go to Step 7. C a s e (a :~ b): Go to Step 9. 6. [Since h e r e a = b -- 0, the d e p e n d e n c e e q u a t i o n d e g e n e r a t e s to 0 = c.l If c :~ 0, t h e n t e r m i n a t e the algorithm; t h e r e is no d e p e n d e n c e b e t w e e n s t a t e m e n t s S a n d T. Otherwise, set xI~1
'--
{(Lj):O< }<j_<@}
v_, v0
-
{(L~):0_<)<£_<~}
-
{(LD:0_<~_<~}.
Since ~ >_ 0, the set ~0 is always n o n e m p t y . So, if S < T, t h e n T d e p e n d s o n S ( l o o p - i n d e p e n d e n t d e p e n d e n c e ) . If ~ > 0, t h e n the
37
2.5. S O L U T I O N TO THE LINEAR PROBLEM
sets ~1 a n d ~-1 are b o t h n o n e m p t y . In this case, T d e p e n d s on S a n d S d e p e n d s o n T. Also, t h e d i s t a n c e for e a c h d e p e n d e n c e is 1 (why?). T e r m i n a t e t h e algorithm. . [Here, a = b ~ 0. The d e p e n d e n c e e q u a t i o n is a ( [ - 3) = c.]
If c m o d a ~ 0, t h e n t h e r e is no d e p e n d e n c e b e t w e e n s t a t e m e n t s S a n d T; t e r m i n a t e t h e algorithm. Otherwise, set c l ~- c / a . .
[The d e p e n d e n c e e q u a t i o n is n o w [ - 3 = Cl w h o s e g e n e r a l solution is ([,3) = (t + Cl, t), w h e r e t is a n a r b i t r a r y i n t e g e r p a r a m e ter. The d e p e n d e n c e c o n s t r a i n t s (2.12) r e d u c e to the inequalities 0 < O<_t
t+Cl
< ~
~
T h e s e inequalities are c o n s i s t e n t iff ~ >_ ICl 1.1 If ~ < Ic~l, t h e n t e r m i n a t e the algorithm; t h e r e is no d e p e n d e n c e b e t w e e n s t a t e m e n t s S a n d T. Otherwise, select the p r o p e r case b a s e d o n t h e sign o f Cl:
Case (Cl < 0): Set g[1
~--
{ ( t + c l , t ) : lc~l <- t < gl}.
T d e p e n d s o n S w i t h a u n i q u e d i s t a n c e Ic~ I. Case (Cl > 0): Set • -1"- {(t+cx,t):O
S d e p e n d s o n T w i t h a u n i q u e d i s t a n c e c~. Case (c1 = 0): Set 'I'0 .- {(t, t) : 0 _< t _<'~}. If S < T, t h e n T d e p e n d s o n S. T e r m i n a t e t h e algorithm.
CHAPTER 2. SINGLE LOOPS
38
.
[Here, a * b.l By A l g o r i t h m 2.1, find t h e gcd ~ of a a n d b, a n d two integers [0 a n d jo s u c h t h a t a[o - b)o = ~. If c m o d y ~ 0, t h e n t h e r e is no d e p e n d e n c e b e t w e e n s t a t e m e n t s S a n d T; t e r m i n a t e the algorithm.
10. [Now, ~ d o e s divide c.] Set (ad, bx,c1) -- (a/~/,
b/.q,c/~).
11. [The g e n e r a l solution to a [ - b ) = c is [=cx~0+bltt Cxj0 + a l t
j=
(2.13) .~¢
w h e r e t is an a r b i t r a r y integer p a r a m e t e r ( T h e o r e m 2.8). Since it m u s t satisfy the c o n s t r a i n t s (2.12), the inequalities 0 < C1~0 + bxt < ~ ) 0 <- CliO + a l t <-- ~
hold. We will find the r a n g e -rt < t < T2 o f t.] Select t h e p r o p e r case b a s e d o n the sign o f hi: Case (bl = 0): If 0 < c~[o -< ~ d o e s n o t hold, t h e n t h e r e is no d e p e n d e n c e b e t w e e n s t a t e m e n t s $ a n d T; t e r m i n a t e the algorithm. Otherwise, set T~ ~ - ~ , T2 "- oo. Case (b~ > 0): Set T1 "- [-cl~o/bl| a n d T2 "- [ ( ~ - - c ~ o ) / b l ] . Case (bl < 0): Set T1 -- r ( ~ - Cl~o)lbl] a n d T2 "- [-Cl[o/bl]. Select t h e p r o p e r case b a s e d o n the sign o f a l : Case (al = 0): If 0 < c l i o < ~ d o e s n o t hold, t h e n t h e r e is no d e p e n d e n c e b e t w e e n s t a t e m e n t s S a n d T; t e r m i n a t e the algorithm.
2.5. SOLUTION TO THE LINEAR PROBLEM
39
Case (al > 0): Set r l "- m a x ( - r l , I-CliO~all) a n d T2 "- m i n ( r 2 , [(~-Cl]O)/al]). C a s e (al < 0): Set r l - m a x ( r l , [(~-CljO)/al])
a n d r2 .- m i n ( r 2 , I - c l i o ~ a l l ) .
If T1 > T2, t h e n t h e r e is no d e p e n d e n c e b e t w e e n s t a t e m e n t s S a n d T; t e r m i n a t e the algorithm. 12. [Find t h e set ~o. F r o m (2.13) we get j-
~ = ( a l - bl)t - Cl(~O - ]0).
(2.14)
Since a ~: b, we have a l ~: bx. Let ~ d e n o t e the value of t for w h i c h ~ = j. If ~ is a n i n t e g e r b e t w e e n T1 a n d T2, t h e n it is a n e l e m e n t (the o n l y e l e m e n t in this case) of ~0.] Set ~ "- Cl(~O - j o ) / ( a l - h i ) . If ~ is a n i n t e g e r s u c h t h a t r l -< ~ -< r2, t h e n set
~0 "- {(Cl~0 + bl~, Cl.~0 + alE)}. 13. [Find t h e sets ~1 a n d ~_~. The d i f f e r e n c e (] - ~) given b y (2.14) is e i t h e r positive for all t < ~ a n d negative for all t > ~, or negative for all t < ~ a n d positive for all t > ~. We c o m p u t e t h e i n t e r s e c t i o n s [T1, T2] C~ (--0% ~) a n d [Tx,T2] (3 (~, oo). See A l g o r i t h m 1.3.1 a n d figures 1.3.4, 1.3.5 for details.] Set T3 -- [ ~ -- I I T4 -- [ ~ + I ]
T5 ~Inin(T2, T3) T6 ~ m a x ( T 1 , T 4 ) . Select t h e p r o p e r case b a s e d o n t h e sign o f ( a l - bl):
CHAPTER 2. SINGLE LOOPS
40
Case ( a l > bl): If T6 <
T2,
t h e n set
~1 " - {(C1~0 -t- b l t , e l j o + a l t ) : T6 < t < T2}. If T1 < ~-S, t h e n set • -1 ~- {(cliO + b l t , CljO+ a i r ) : T1 --< t ~< T 5 } .
Case ( a l < bl): If T1 < TS, t h e n set ~1 "-- {(C1~o + b i t , CliO + a~t) : T1 < t < TS}. If T6 < T2, t h e n set ~I/-1 "-- {(Cl~0 + b i t , c l i o + a l t ) : T6 < t < T 2 } . 14. If v/1 = O, t h e n t h e r e is n o loop-carried d e p e n d e n c e o f s t a t e m e n t T o n s t a t e m e n t S. Otherwise, t h e r e is s u c h a d e p e n d e n c e , a n d if ~F1 h a s the f o r m ~F~ = { ( c ~ o + b l t , C~jo + a ~ t ) :¢x < t < ]3} t h e n t h e c o r r e s p o n d i n g set o f d i s t a n c e s is { ( a l - b~)t + c i ( ) o -
20) : ~x < t < ~8}.
If ~-1 = O, t h e n t h e r e is n o loop-carried d e p e n d e n c e o f statem e n t S o n s t a t e m e n t T. Otherwise, t h e r e is s u c h a d e p e n d e n c e , a n d if ~-1 h a s the f o r m • -1 = {(C1~0 + b l t , c ~ o + a~t) : a < t < ~8} t h e n the c o r r e s p o n d i n g set of d i s t a n c e s is {(bl - a ~ ) t + c1([o - ~o) : a < t ~}. If S < T a n d ~0 :~ ~ , t h e n t h e r e is a l o o p - i n d e p e n d e n t d e p e n d e n c e o f T o n S. Otherwise, t h e r e is n o s u c h d e p e n d e n c e . 15. T e r m i n a t e the algorithm.
[]
2.5. SOLUTION TO THE LINEAR PROBLEM
41
We w o u l d like to s u m m a r i z e here the major facts gleaned f r o m the d e s c r i p t i o n of the above algorithm. First, note that the sets YI, ~ - 1, Y0 r e p r e s e n t the following three d e p e n d e n c e s , respectively: 1. loop-carried d e p e n d e n c e of T on S, 2. loop-carried d e p e n d e n c e of S o n T, 3. l o o p - i n d e p e n d e n t d e p e n d e n c e of T on S (provided S < T). We cannot have a l o o p - i n d e p e n d e n t d e p e n d e n c e of S on T, since S < T by hypothesis. A s s u m e n o w that a, b are n o t b o t h zero. The values of t defining a n o n e m p t y Y-set f o r m a finite sequence of c o n t i g u o u s integers (steps 8, 12, 13). An e l e m e n t of a Y-set is a pair of iteration points ([,)) of the loop L. We can c o m p u t e the c o r r e s p o n d i n g pair of index p o i n t s (i, j ) by the f o r m u l a
( i , j ) = (p + Oi, p + Oj) obtained f r o m the relation I = p + 0~. This allows us to find the set of all instance pairs (S(i), T ( j ) ) that are involved in the particular d e p e n d e n c e . If t has the range ot < t ~, where tx and ~ are integers such that tx < fl, t h e n the n u m b e r of s u c h instance pairs (i.e., the size of the ~-set) is (~ - tx + 1). Next, c o n s i d e r the case a = b ~ 0. Let ~ > ]Cl] (Step 8). Then, exactly two of the sets ~1, ~-1, and Y0 are empty, a n d one is n o n e m p t y . Thus, exactly one of the following is true: 1. T has a loop-carried d e p e n d e n c e o n S; 2. S has a loop-carried d e p e n d e n c e o n T; 3. T has a l o o p - i n d e p e n d e n t d e p e n d e n c e on S (provided S < T). In each case, the d e p e n d e n c e distance is u n i q u e a n d i n d e p e n d e n t of iteration points. It has the value Icl I. This means, for example, that if T d e p e n d s on S, t h e n an instance T ( j ) of T always d e p e n d s on an instance S(i) of S, as long as the c o r r e s p o n d i n g iteration p o i n t s j and ~ satisfy j - ~ = Ic I [. We characterize this situation by saying that the d e p e n d e n c e is uniform, a n d that the distance is a uniform d e p e n d e n c e distance. (See Example 1.3.6.)
42
C H A P T E R 2.
SINGLE LOOPS
Finally, consider the case a * b. Any c o m b i n a t i o n of the sets ~/1,~/_1, and ~/o m a y be n o n e m p t y . Thus, one or m o r e of the three d e p e n d e n c e s listed above m a y be p r e s e n t at the same time. The loopi n d e p e n d e n t d e p e n d e n c e , if present, holds only for one iteration (~o is either e m p t y or a singleton). There m a y be m a n y distances for a loopcarried d e p e n d e n c e , and they are not uniform. (See Example 1.3.5.) Example 2.5 Let us test if there is a d e p e n d e n c e b e t w e e n s t a t e m e n t s S and T in the following p r o g r a m (caused by the variables shown): L: S : T :
d o I = 10,200,5 X ( 7 I + 2) . . . . ....... X ( 3 I + 17) • • •
enddo The steps of Algorithm 2.2 applied to this p r o b l e m are shown below. Input to the algorithm: p=10,q=200, O=5, a = 7,ao = 2,b = 3, bo = 17. 1. ~ "- L ( q - p ) l O J = 38. Since ~ > O, continue. 2. co *- b o - a o
+ (b-a)p=
-25.
Since co m o d 0 = 0, continue. 3. c .- c o / O = - 5 . 'tq -- ~,~I~_1
4.
- -
~,~I~o
- -
~.
5. Since a ~: b, go to Step 9. .
[By Algorithm 2.1, find gcd (7, 3) and two integers ~o, 30 such that 7~o - 330 = gcd(7, 3).] 9 "- 1, (~o,,3o) -- (1, 2). Since c m o d 9 = 0, continue.
10. ( ~ 2 1 , ~ 1 , ¢ 1) "--
(alg, blg, c/9)
= (7,3,-5).
2.5. SOLUTION TO THE LINEAR PROBLEM
11. Go to Case (bl > 0). Vl -- [ - c l [ o / b l ] = 2, T 2 '-- [ ( 3 Cl[o)/bl] = 14. Go to Case (al > 0). T 1 '-- m a x (T1, [ - - C 1 , ~ 0 / / 2 1 ] ) = 2, T 2 "-- m i n (T2, [(t~ - C l ) 0 ) / ~ / 1 ] ) = Since T1 --< T 2 , continue. 12.
~ ~ Cl([0
- 3o)/(al
- bl)
43
6.
= 5/4.
[Since ~ is n e i t h e r an integer, n o r d o e s it lie b e t w e e n T1 a n d ~-2, the set 't~0 r e m a i n s empty.] 13. T3 "-- [ ~ - - 11 = 1, ,--
+ I J = 2,
T5 "" m i n ( T 2 , T 3 ) -- max
= 1,
('rl, V4) = 2.
Go to Case (t~l > bl). Since -/-6 --< "/-2, s e t xI~1 ~ {(--5 q- 3 t , - - 1 0 + 7t) : 2 < t < 6}. 14. Since xI~1 #: ~ , there is a loop-carried d e p e n d e n c e of s t a t e m e n t T on s t a t e m e n t S. The c o r r e s p o n d i n g set of distances is {4t - 5 : 2 < t < 6}. Since the sets ~-1 a n d ~/0 are empty, there is no loop-carried d e p e n d e n c e of S on T, nor any l o o p - i n d e p e n d e n t d e p e n d e n c e b e t w e e n S a n d T. We see that the pairs of iteration p o i n t s that cause the d e p e n d e n c e of T o n S f o r m the set xI~1 ---- {(1,4),
(4, 11), (7,18), (10,25), (13,32)}.
The c o r r e s p o n d i n g set of pairs of index p o i n t s is obtained by r e p e a t e d applications of the relation I = p + ~?0: {(15, 30), (30,65), (45,100), (60, 135), (75,170)}. Thus, T(30) d e p e n d s on S(15), T(65) d e p e n d s o n S(30), etc. There are five s u c h pairs of s t a t e m e n t instances. The set of d e p e n d e n c e
44
CHAPTER
2.
d i s t a n c e s is {3, 7, 11, 15, 19}. ( N o t e t h a t d e p e n d e n c e computed from iteration points.)
SINGLE LOOPS
d i s t a n c e s are
EXERCISES 2.5
1.
(a) By Algorithm 2.1, find ~/ = gcd(10, 14) and two integers x0 and Yo, such that 10x0 - 14y0 = kt. Show all steps. Apply Theorem 2.8 to find the general solution to the equation 10x - 14y = 6, using these values of )Co and Y0. (b) By trial and error, find a pair of integers ( x l , y l ) ~ (x0,Yo) with the same property: 1 0 X x - - 14yt = 9. Solve 10x - 14y = 6 again, this time using (Xl, Yl) instead of (x0, Yo) in Theorem 2.8. Show that the general solution obtained here generates the same set of ordered pairs as generated by the general solution in the previous problem. (c) Find a formula that will generate all integer pairs (x, y ) such that 10x-14y=g.
2. In Theorem 2.8, show that irrespective of whether ~q divides c or not, the set of all real solutions to Equation (2.9) is given by x = (c/9)Xo + (b/9)t Y = (c/9)2o + (a/~)t,
where t is a real parameter. 3. Write an algorithm such that given three integers a, b, and c, it decides if there is an integer solution (x, 2 ) > (0, 0) (i.e., x _> 0 and 2 > 0) to the diophantine equation a x - b y = c, and gives a formula to generate all such solutions when they exist. Apply your algorithm to three examples to demonstrate that the number of nonnegative solutions could be infinite, finite and positive, or zero. 4. In the context of Theorem 2.9, consider these three conditions: (a) X ( a I + ao) and X ( b I + bo) cause a dependence between S and T; (b) (bo - ao + b p - a p ) is an integral multiple of O . g c d ( a , b ) ; (c) (b0 - a0) is an integral multiple of gcd(a, b). Show that (a) ~ (b) ~ (c). (We already know about the first implication.) Give an example where (c) does not imply (b). Find conditions on loop parameters such that (b) ** (c) for all integers a, ao, b, and b0. While (b) is the gcd test for the dependence equation in terms of iteration values, we can think of (c) as the god test for the equation a i - b j = bo - ao, that is, for the dependence equation in terms of index values. Which is the stronger test for dependence in general: the gcd test in terms of iteration values, or the gcd test in terms of index values? When are they equivalent? (Note that (b) does not involve the final loop limit q, and (c) does not involve any loop parameters.)
2.5.
SOLUTION
TO
THE LINEAR
45
PROBLEM
5. A p p l y A l g o r i t h m 2.2 t o t h e f o l l o w i n g e x a m p l e s : (a) L : S : T :
doi=3,100,1 X(9I + 22) .... .......
17) • • •
X(6I-
enddo (b) L : S : T : (c) L : S : T :
doi=0,100,2 X(3I) .... ....... X(36) • • • enddo d o I = 100, - 1 0 , - 3 X ( 4 I + 16) . . . . .......
4) • • •
X(4I-
enddo (d) L : S : T :
d o I = 100, - 1 0 , - 2 X ( 4 I + 16) . . . . .......
4) • • •
X(4I-
enddo (e) L : S : T :
d o I = 100, - 1 0 , - 1 X ( 4 I + 16) . . . . .......
4) • • •
X(4I-
enddo (f) L : S : T :
d o I = - 5 0 , 100, 1 X(-2I + 25) .... .......
X ( 3 I + 4) • • •
enddo (g) L : S : T :
do I = -300, 200, 3 X ( 6 I - 17) . . . . .......
x(gI
+ 22). • •
enddo (h) L :
doi=0,100,1 S: T :
X(21,2I+1) .... ....... X ( 3 I + 1 , 3 I + 2) • • •
enddo 6. I n A l g o r i t h m 2.2, l e t a # b. S h o w t h a t if ~l'0 is n o n e m p t y , t h e s i n g l e p o i n t (c ! ( a - b ) , c ! ( a - b ) ) . 7. I n t h e c o n t e x t o f A l g o r i t h m 2.2, g i v e e x a m p l e s ~I~1,wit-1, a n d ~/0, (a) o n l y o n e is n o n e m p t y
t h e n it c o n t a i n s
such that of the three sets
( t h r e e e x a m p l e s , o n e f o r e a c h set);
(b) o n l y t w o a r e n o n e m p t y
(three examples, one for each combination);
(c) all t h r e e a r e n o n e m p t y
(one example).
8. W r i t e a s i m p l e r v e r s i o n o f A l g o r i t h m 2.2 f o r t h e c a s e w h e r e b = - a
~ 0.
46
C H A P T E R 2. SINGLE L O O P S
9. Consider the program L: S: T:
do I = p , q , O X .... ....... x..enddo
where x is a scalar. Can Algorithm 2.2 handle this d e p e n d e n c e problem?
2.6
Method of Bounds
As we shall see, the d e p e n d e n c e p r o b l e m b e c o m e s m o r e difficult w h e n we m o v e f r o m a single loop to a m o r e complicated p r o g r a m . The m a j o r difficulty arises f r o m the fact that the general solution to the d e p e n d e n c e e q u a t i o n usually contains m o r e t h a n one integer parameter, and t h e r e f o r e we get a s y s t e m of inequalities with m o r e t h a n one integer variable w h e n the solution is s u b s t i t u t e d in the d e p e n d e n c e c o n s t r a i n t s (Step 4, A l g o r i t h m 1.1). Such a s y s t e m of inequalities is n o t trivial to solve. We m e n t i o n e d in Chapter 1 that for this reason, an a p p r o x i m a t e a l g o r i t h m (Algorthm 1.2) is o f t e n used, where we test if there is an integer solution to the d e p e n d e n c e e q u a t i o n w i t h o u t any constraints, a n d a real solution to the equation w i t h the d e p e n d e n c e constraints. We also d i s c u s s e d two i m p l e m e n t a t i o n s of that algorithm: the m e t h o d of b o u n d s a n d the m e t h o d of elimination. Since the d e p e n d e n c e p r o b l e m involving a o n e - d i m e n s i o n a l array in a single loop is a simple problem, an application of the m e t h o d of elimination (Algorithm 1.3) to it is u n i n t e r e s t i n g (why?). On the other hand, by applying the m e t h o d of b o u n d s to this problem, we get results that can be applied to a m o r e general program. In this section, we introduce the m e t h o d of b o u n d s in t e r m s of a single loop. In Chapter 6, this m e t h o d will be s t u d i e d in detail in the context of a n e s t of several loops, where it is n o r m a l l y u s e d in practice. The m e t h o d of b o u n d s originated in [Bane 76]. Consider again the d e p e n d e n c e p r o b l e m b e t w e e n two s t a t e m e n t s S a n d T in a loop of the f o r m L:
do/= p,q,O H(~) enddo
47
2.6. METHOD OF BOUNDS
posed by a variable X ( a I + ao) of S and a variable X ( b I + bo) of T. Write the d e p e n d e n c e equation (2.11) (a0)~ - (bO).~ = bo - ao + (b - a ) p in the f o r m a ~ - b3 = c, where c = [b0 - ao + (b - a ) p ] / O . constraints (2.12):
(2.15)
Also, rewrite the d e p e n d e n c e
o-<j-<4
}
(2.16)
where ~ = [ ( q - p ) / O l . We focus o n Step 3 of Algorithm 1.2. In the m e t h o d of bounds, this step is i m p l e m e n t e d by computing the extreme values of a particular linear f u n c t i o n in a particular set. To motivate the following discussion, s u p p o s e T < S a n d that we want to test if T d e p e n d s on S. Then, we want to k n o w if t h e r e is a real solution to (2.15), that satisfies (2.16) and the condition ~ < 3 - 1. Note that (2.16) and this additional condition t o g e t h e r define a right-angled triangle P b o u n d e d by the lines: x = 0, y = ~, x < y - 1. (This will be the triangle in Figure 2.2 if we take p = 0,q = ~, and ~ = 1.) Also, the left-hand-side of (2.15) defines a linear function f : R 2 -~ R such that f ( x , y ) = a x - b~y. Our p r o b l e m t h e n is to decide if there is a real solution to the equation f ( x , y ) = c in the triangle P. By T h e o r e m A.6, there is such a solution iff min f(x,y)
CHAPTER 2. SINGLE LOOPS
48
Thus, we have 3 ÷ = 3, 3- = 0, ( - 3 ) + = 0, a n d ( - 3 ) - = 3. It is clear that there are several e q u i v a l e n t e x p r e s s i o n s for a ÷ a n d a - : a +=
max(O,a)=(lal+a)/2=
a-=-nfln(O,a)=(la[-a)/2=
a 0
ifa > 0 ira
f O la[
ifa_>O ifa
A n u m b e r o f e l e m e n t a r y f a c t s o n positive a n d negative p a r t s are s t a t e d in L e m m a 1.3.1. The following simple r e s u l t s are u s e d in the p r o o f of T h e o r e m 2.11 o n w h i c h the m e t h o d of b o u n d s is b a s e d . L e m m a 2.10 For a n y two real numbers a and b, we have
m i n (a, b) = b - (a - b ) max(a,b)
= b + (a-
b) ÷.
PROOF. If a > b, t h e n we have b-(a-b)-
=b-0=b=min(a,b),
b+(a-b)
+ =b+(a-b)
=a=max(a,b).
If a < b, t h e n we have b-(a-b)b + (a-
=b-(b-a)=a=min(a,b), []
b) + = b + 0 = b = m a x ( a , b ) .
T h e o r e m 2.11 Let a, b, p, q, and E denote real n u m b e r s such that p < q a n d O < E < q - p. The m i n i m u m a n d m a x i m u m values o f the function f : (x, y ) ~ a x + b y in the right-angled triangle P, defined by the inequalities: p<x
are given by min
(x,y)~P
f(x,y)
max f(x,y)
(x,y) ~P
= (a + b ) p + b e - ( q - p = (a+b)p+be+
E)(a- - b) ÷,
(q-p-E)(a
÷ + b ) ÷.
2.6.
METHOD
49
OF BOUNDS
Y
x+
P+~
'
o
i,
,
,
x
q-e
~
~
X
Figure 2.2: The triangle P of Theorem 2.11. PROOF. In Figure 2.2, denote the three vertices of the triangle u n d e r consideration by u, v, and w such that u = (p,q),
v=
(q-e,q),
w= (p,p+e).
It follows f r o m a well-known result in the theory of linear programming (Theorem A. 5) that the extreme values of f are among the values f ( u ) , f ( v ) , and f ( w ) . We can write f(u)=ap+bq
= (a + b)p + be + (q - p - e) . b,
f(v)=a(q-e)+bq
= (a + b)p + be + (q-
f(w)=ap+b(p+~)
= ( a + b ) p + b e + ( q - p - e) . O.
p - e) . (a + b),
Hence, we have min f ( x , y ) =
(x,y)~P
(a+b)p+be+(q-p-e)min(b,a+b,O),
m a x f ( x , y ) = (a + b ) p + be + ( q - p - e) m a x (b, a + b, 0). (x,y)~P
The expressions in the t h e o r e m follow since mAn(b,a
+ b,0) = rain (min ( a + b, b),0) = min(b - a-,0)
by Lemma 2.10
= -(b-a-)-
by Lemma 2.10
= -(a-
- b) ÷
by Lemma 1.3.1(c),
50
CHAPTER 2. SINGLE LOOPS
and m a x (b, a + b, 0) = m a x (max (a + b, b), 0) = m a x (b + a +, 0)
by L e m m a 2.10
= (b + a+) ÷.
by L e m m a 2.10.
[]
The m i n i m u m a n d m a x i m u m values of the f u n c t i o n J~ in P, s t u d i e d in the above t h e o r e m , are f r e q u e n t l y n e e d e d in the m e t h o d of b o u n d s . We d e n o t e t h e m b y / J ( a , b, p, ~, ~) a n d v ( a , b, p, q, ~), respectively. The e x p r e s s i o n s defining/J a n d v are stated below. /J(a, b, p , ~ , ~) = (a+b)p+b~+(~-p-~)min(b,a+b,O)
(2.18)
= ( a + b ) p + b~ - (~ - p - ~ ) ( a - - b) +,
(2.19)
v(a, b,p,~,~) = ( a + b ) p + b~ + ( ~ - p -
~)max(b,a
= ( a + b ) p + b~ + ( ~ - p - ~ ) ( a + + b) +
+ b,O)
(2.20) (2.21)
In C h a p t e r 6, we will s h o w h o w to express e x t r e m e values of f u n c t i o n s of several variables in t e r m s of/J a n d v. C o r o l l a r y 1 The m i n i m u m a n d m a x i m u m v a l u e s o f a f u n c t i o n o f the f o r m x ~ a~c in a closed interval p < x < q a r e / J ( a , 0 , p , q , 0 ) a n d v ( a , 0, p, ~, 0), respectively. We can n o w derive n e c e s s a r y c o n d i t i o n s for the existence of various kinds of d e p e n d e n c e b e t w e e n two s t a t e m e n t s S a n d T. The imp o r t a n t cases are listed in T h e o r e m 2.12. T h e o r e m 2.12 (Bounds test) S u p p o s e t h a t a variable X ( a I + ao) o f a s t a t e m e n t S a n d a var i abl e X ( b I + bo) o f a s t a t e m e n t T cause a d e p e n d e n c e b e t w e e n S a n d T in the loop L. Let ~ = [ ( ~ - p ) ] O I a n d c = (bo - ao + b p - a p ) ] O .
(a) I f S < T a n d S ~ T ,
thenlJ(a,-b,O,~,O)
< c <_ v ( a , - b , O , ~ , O ) ;
(b) I f S = T a n d S ~ S ,
thenlJ(a,-b,O,~,l)
< c < v(a,-b,O,~,l);
(c) I f S < T a n d T ~ S ,
thenlJ(-b,a,O,~,l)
< c < v(-b,a,O,~,l).
2.6. METHOD OF BOUNDS
51
PROOF. We prove only Case (a) a n d leave the o t h e r two cases to the reader. A s s u m e that S < T, a n d that the variables X ( a I + ao) of S and X ( b I + bo) of T cause a d e p e n d e n c e of T on S. As we saw at the b e g i n n i n g of this section, the d e p e n d e n c e e q u a t i o n can be written as (2.15) which we restate: a i - b~ = c. Since T d e p e n d s o n S, there exists a solution to this e q u a t i o n that satisfies the c o n d i t i o n s ( T h e o r e m 2.6(a)):
i_<j. Let f d e n o t e the real f u n c t i o n ( x , y ) ~ a x - b y , a n d P the rightangled triangle defined by the lines: x = 0, y = ~, and x < y . Since c = f(x,y) for s o m e p o i n t ( x , y ) E P, it follows that re_in f ( x , y ) <_ c <_ m a x f ( x , y ) . (x,y)EP (x,y)EP
By definition, g ( a , b, p, q, ~) a n d v ( a , b, p, q, ~) are the m i n i m u m and m a x i m u m values, respectively, of the f u n c t i o n ( x , y ) ~ a x + b y in the triangle b o u n d e d by t h e lines: x = p , y = q , a n d x < y - ~. Hence, we have rn~n f ( x , y ) = u ( a , - b , O, ~, O) (x,y)~P max f(x,y) (x,y)~P
= v(a,-b,O,~,O),
so that the following inequalities hold: g(a,-b,0,~,0)
< c < v(a,-b,O,~,O).
When we apply T h e o r e m 2.12 to a given problem, the values of the corresponding/.1 a n d v f u n c t i o n s are c o m p u t e d by (2.18) a n d (2.20), or by (2.19) a n d (2.21). By T h e o r e m A.6, in each case of the theorem, the set of inequalities is n e c e s s a r y a n d sufficient for the existence of a real solution to the d e p e n d e n c e e q u a t i o n satisfying the d e p e n d e n c e c o n s t r a i n t s (0 < i < ~ a n d 0 < j < ¢~) a n d the additional c o n d i t i o n (e.g., i < j).
52
C H A P T E R 2.
SINGLE LOOPS
The m a j o r steps of the m e t h o d of b o u n d s (as applied to the model p r o g r a m of this chapter) are T h e o r e m 2.9 (the gcd test) and Theor e m 2.12 (the b o u n d s test). We illustrate an application of this m e t h o d by an example given below. Example 2.6 Consider again the loop of Example 2.5: L: S : T :
do I = 10, 200, 5 X ( 7 I + 2) . . . . .......
X ( 3 I + 17) • • •
enddo This time we use the gcd and b o u n d s tests (theorems 2.9 and 2.12) to s t u d y possible d e p e n d e n c e b e t w e e n s t a t e m e n t s S and T. The par a m e t e r s for this loop are p = 10,c/ = 200, and 0 = 5. Therefore, ~ = [ ( q - p ) l O J = 38. For the variable X ( 7 I + 2) of S and the variable X ( 3 I + 17) of T, we have a = 7, ao = 2, b = 3, and b0 = 17. Using Algorithm 2.1, we find that gcd(7, 3) = 1, and hence 0 - g c d ( a , b) = 5. Compute co = b o - a o + ( b - a ) p
= -25.
Since co is an integral multiple of 0 - g c d ( a , b), the gcd test passes. Thus, d e p e n d e n c e between S and T is not ruled out. Consider now the possibility of d e p e n d e n c e of T on S. Compute c = co / 0 = - 5. The inequalities ~u(a,-b,0,~,0) < c < v(a,-b,O,~,O) of T h e o r e m 2.12(a) become /~(7, - 3 , 0, 38, 0) < c < v ( 7 , - 3 , 0 , 3 8 , 0 ) . We c o m p u t e the values of the ~ and v functions using definitions (2.19) and (2.21), and get the inequalities - 1 1 4 < - 5 < 152 which obviously hold. According to T h e o r e m 2.12, a d e p e n d e n c e of T on S m a y exist. We know f r o m our work in Example 2.5 that this is indeed the case. Next, s t u d y the possible d e p e n d e n c e of S on T. The inequalities of T h e o r e m 2.12(c) become /~(-3, 7,0,38, 1) < - 5 < v ( - 3 , 7,0,38, 1),
2.6.
METHOD
OF BOUNDS
53
or
7 < - 5 < 266, one of which is false. Thus, the b o u n d s test implies that there is no d e p e n d e n c e of S on T (reconfirming what we already f o u n d in Example 2.5). Example 2.7 Suppose we are checking d e p e n d e n c e by the m e t h o d of bounds. If either one of the two tests (gcd and bounds) fails, t h e n there is definitely no dependence. If both tests pass, then we a s s u m e there is d e p e n d e n c e . The situation does arise, albeit rarely, where both tests pass and d e p e n d e n c e does not exist. We explore this below in the context of a particular example. Consider the loop L(7): S : T :
d o I = 0,7,1 X ( 7 I + 2) . . . . .......
X ( 3 I + 17) • • •
enddo Here, the index and iteration spaces are identical and we have ~ = q = 7. The d e p e n d e n c e equation is 7 i - 3j = 15,
(2.22)
where i and j denote index (or iteration) points. We s t u d y possible d e p e n d e n c e of T on S. The gcd test (Theorem 2.9) is not conclusive, since gcd(7, 3) is 1. Consider the b o u n d s test. The inequalities of T h e o r e m 2.12(a) are / ~ ( 7 , - 3 , 0 , 7 , 0 ) < 15 < v ( 7 , - 3 , 0 , 7,0), or
- 2 1 < 15 < 28, which clearly hold. Thus, the b o u n d s test is also inconclusive. If we rely o n these two tests alone, t h e n we m u s t a s s u m e d e p e n d e n c e to be on the safe side. However, by unrolling the given loop, we get the sequence of s t a t e m e n t instances shown in Figure 2.3. It is clear that there is no d e p e n d e n c e of s t a t e m e n t T on s t a t e m e n t S. Let us now s t u d y the role of the size of the iteration space in this context. Consider the loop
54
C H A P T E R 2.
S(0) : T(O) •
x(2) = • • ....... x(17)
S(1): T(1):
x(9) = • • ....... x(20)
S(2) : T(2) :
x(16) = • ....... x(23)
S(3): T(3) :
x(23) = • ....... x(26)
S(4) : T(4) :
x(30) = • ....... x(29)
S(5) : T(5) :
x(37) = • ....... x(32)
S(6) : T(6) :
x(44) = • ....... x(35)
S(7) : T(7) :
x(51) = • .
.
.
.
.
.
.
SINGLE LOOPS
X(38)
F i g u r e 2.3: L o o p n e s t o f E x a m p l e 2 . 7 a f t e r u n r o l l i n g .
doI = 0,q, 1 X ( 7 I + 2) . . . . ....... X ( 3 I + 17) • • •
L(q) " S : T :
enddo w h e r e q > 0 is u n s p e c i f i e d f o r t h e m o m e n t . The solution t i o n ( 2 . 2 2 ) ( o b t a i n e d b y A l g o r i t h m 2.1 a n d T h e o r e m 2.8) is (i,j)
= (15 + 3t,30 + 7t),
w h e r e t is a n i n t e g e r p a r a m e t e r . bounds
Since i and j must
and the inequality i < j for dependence
system of inequalities 0<15+3t
to Equa-
to -5
< (q-15)/3] < (q 30)/7
}
satisfy the loop
o f T o n S, w e g e t t h e
2.6. METHOD OF BOUNDS
55
w h i c h is e q u i v a l e n t t o - 3 _< t _< [ m i n ( ( q -
15)/3,(q-
30)/7)/.
T h e r e is at l e a s t o n e i n t e g e r t w i t h i n t h e s e b o u n d s f o r e a c h i n t e g r a l v a l u e o f q g r e a t e r t h a n o r e q u a l t o 9, a n d t h e r e is n o s u c h t f o r q = 1 , 2 . . . . ,8. T h e i n e q u a l i t i e s o f T h e o r e m 2.12(a) f o r t h e l o o p L ( q ) a r e / ~ ( 7 , - 3 , 0 , q , 0 ) < 15 < v ( 7 , - 3 , 0 , q,O), or -3q_<15_<4q. T h e y h o l d f o r all v a l u e s o f q g r e a t e r t h a n o r e q u a l t o 4, a n d fail t o h o l d f o r q = 1, 2, 3. T h u s , if t h e l o o p L ( q ) h a s m o r e t h a n 9 i t e r a t i o n s (q _> 9) o r f e w e r t h a n 5 i t e r a t i o n s (q _< 3), t h e n t h e b o u n d s t e s t c o m b i n e d w i t h t h e g c d t e s t will a c c u r a t e l y p r e d i c t d e p e n d e n c e o r t h e l a c k t h e r e o f . O n l y f o r t h e v a l u e s q = 4, 5, 6, 7, 8, t h e c o m b i n a t i o n o f t h e s e t w o t e s t s will m a k e a w r o n g p r e d i c t i o n . See E x e r c i s e 16. EXERCISES 2.6
1. Show that for any real a (a) (ea) + = e a + and ( c a ) - = ~a-, when e _> 0; (b) (ea) + = lela- and (ea)- = lela +, when e < 0. 2. Prove that (a + b) + _< a + + b +. Under what conditions, do we get equality? State and prove similar results for (a + b)-. 3. Show that the identity (a- +b) ÷ = b ÷ + (a+ b-)holds for any two real numbers a and b. 4. Express a ÷ and a - in terms of the/~ and v functions. 5. Prove that v(a,b,p,q,~) = - p ( - a , - b , p , q , e ) . 6. Prove that/~(a, b, p, q, e) = v(a, b, p, q, e) iff either q = p + e, or a = b = 0. 7. Instead of the functions/~ and v of five variables, we can use similar funct-ions of three variables. Let/~0(a, b, q) and vo(a, b, q) denote the minimum and maximum values of the function ax + b y in the triangle defined by the inequalities 0 _<x _< y _< q. Express p and v in terms of P0 and v0.
56
C H A P T E R 2.
SINGLE LOOPS
8. In T h e o r e m 2.11, give a direct p r o o f (without using T h e o r e m A.5) t h a t the e x t r e m e values of f in P are a t t a i n e d at vertices of P. 9. Find the m i n i m u m a n d m a x i m u m values of the functions: 2 x + 3),, 2 x - 32", - 2 x + 32', a n d - 2 x - 32" in each of the following sets: (a) { ( x , y ) : l
<_x <_y - 2
(b) { ( x , y ) : l
<_x <_9,1<_ y <_9},
<_7},
(c) { ( x , y ) : l _ < y _ < x _ < 9 } , (d) { ( x , y ) : l <_x<_9,1<_ y < _ 9 , x > _ y - 2 } . 10. Derive Lemma L3.2 f r o m Corollary 1 to T h e o r e m 2.11. 11. Prove cases (b) a n d (c) of T h e o r e m 2.12. 12. As in T h e o r e m 2.12, find a set of n e c e s s a r y c o n d i t i o n s for the following cases: (a) T has a loop-carried d e p e n d e n c e on S; (b) S has a l o o p - c a r r i e d d e p e n d e n c e o n T; (c) S < T a n d T has a l o o p - i n d e p e n d e n t d e p e n d e n c e on S; (d) T < S a n d S has a l o o p - i n d e p e n d e n t d e p e n d e n c e on T. 13. In T h e o r e m 2.12, a s s u m e 0 > 0. Show t h a t the c o n d i t i o n s in the three cases can be w r i t t e n as follows: (a) If S < T a n d S 6 T, t h e n / a ( a , - b , p , p + 04,0) < c < v ( a , - b , p , p (b) I f S = T a n d S ~ S , then la(a,-b,p,p + 04,0) < c < v(a,-b,p,p (c) I f S < T a n d T ~ S , then la(-b,a,p,p + 04,0) < c < v(-b,a,p,p
+ 04,0); + 04,0); + 04,0).
Write d o w n the conditions when 0 = 1. Next, a s s u m e 0 < 0. How do the conditions change? 14. A p p l y the m e t h o d of b o u n d s (gcd a n d b o u n d s tests) to check for d e p e n d e n c e o f T o n S, a n d of S on T in l o o p s (a)-(g) of Exercise 2.5.5. 15. Solve the d e p e n d e n c e p r o b l e m in the loop L: S : T:
dol = 0,10,1 X ( 2 I + 34) . . . . ....... X ( - 3 I + 35) • • • enddo
in two different ways: b y A l g o r i t h m 2.2, and by the m e t h o d of b o u n d s . Comp a r e y o u r findings. 16. In Example 2.7, find the values of q for which the m e t h o d of b o u n d s m a k e s a wrong p r e d i c t i o n for the following d e p e n d e n c e s : (a) Loop-carried d e p e n d e n c e of T on S, (b) Loop-carried d e p e n d e n c e of S on T.
Chapter 3
Double Loops 3.1
Introduction
In Chapter 2, we i n t r o d u c e d the m a i n c o n c e p t s a n d t e c h n i q u e s of dep e n d e n c e analysis in t e r m s of a single loop. In Chapter 4, we develop a formal s t r u c t u r e for d o i n g d e p e n d e n c e analysis in a perfect loop nest, a n d t h e n go on to s t u d y general p r o g r a m s a n d general m e t h o d s in later chapters. The c u r r e n t c h a p t e r f o r m s a bridge b e t w e e n Chapter 2 a n d the rest of the book. Here, we s t u d y the m a j o r c h a n g e s that h a p p e n as we m o v e f r o m a single loop to a m o r e general program: index a n d iteration variables b e c o m e index and iteration vectors; dep e n d e n c e distances b e c o m e d e p e n d e n c e (distance) vectors; a n d there is usually m o r e t h a n one u n k n o w n integer p a r a m e t e r in the general solution of the d e p e n d e n c e equation(s). The m a i n difficulty we face now is that we are r e q u i r e d to seek integer solutions to a s y s t e m of linear inequalities in m o r e t h a n one variable (Step 5, A l g o r i t h m 1.1). In Section 3.2, we s t u d y the index and iteration spaces of a double loop. In Section 3.3, we see h o w d e p e n d e n c e concepts are generalized f r o m a single to a double loop. Finally, d e p e n d e n c e p r o b l e m s p o s e d by e l e m e n t s ol~ a one- or t w o - d i m e n s i o n a l array are s t u d i e d in Section 3.4. We do n o t p r e s e n t any formal results there, b u t t h r o u g h a series of examples, show h o w s o m e of the general d e p e n d e n c e testing m e t h o d s w o r k in a double loop. Concepts and results of this c h a p t e r will be s u b s u m e d in the next c h a p t e r on perfect loop nests. However, in Chapter 4, we use a c o m p a c t matrix notation, while the t r e a t m e n t 57
CHAPTER 3. DOUBLE LOOPS
58
in this c h a p t e r is still m o s t l y in t e r m s of scalars. The idea here is to p o s t p o n e the n e w notation, to allow the reader s o m e time to unders t a n d the difficulties that arise as we generalize our p r o g r a m m o d e l f r o m a single loop.
3.2
Index and Iteration Spaces
The m o d e l p r o g r a m for this c h a p t e r is a double loop (L~,L2) of the form L1 : L2 :
do I~ = Pl, ql, 01 d o I2 = P2, q2, 02
H(II,I2) enddo enddo
where H(II,I2) is a s e q u e n c e of a s s i g n m e n t s t a t e m e n t s . The index vector of (L1,L2) is (I1,I2), a n d the values of (I1,I2) are the index points or index values of the loop nest. The set of all index p o i n t s is the index space. Let ~1 a n d ~2 d e n o t e the iteration variables of the loops L1 a n d L2, respectively. Then, (~1, ~2) is the iteration vector of (L1,L2). The values of (~1, ~2 ) are the iteration points or iteration values, a n d the set of all iteration p o i n t s is the iteration space. The index a n d iteration spaces are finite s u b s e t s of 7.2 with the same n u m b e r of points. We a s s u m e that the loop p a r a m e t e r s are integer-valued; p~, ql, 01, a n d 02 are constants; 01 a n d 02 are nonzero; a n d P2 a n d 6/2 are affine f u n c t i o n s of I1. Let P2 = P20 4" P2111
(3.1)
6/2 = 6/20 4" 6/2111,
(3.2)
where the coefficients are integer constants. The following t h e o r e m describes the s t r u c t u r e s of the index and iteration spaces. T h e o r e m 3.1 The iteration space of the double loop (L1,L2) consists of all integer vectors (~1, ~f2) such that 0 ~ ~1 ~ ~1
(3.3)
0 --< ~2 --< ~20 4" ~ 2 1 h ,
(3.4)
3.2. INDF~ AND ITERATION SPACES
59
where ~1
= [ ( q l -- P l ) / O I J
(3.5)
~2o = [(q20 - P2o) + (q21 - P21)Pl]/02 ~21 = (q21 --
(3.6)
P21)01[02.
(3.7)
The index space of (L1,L2) is the set of all integer vectors (I1,I2) given by I1 = Px + 01~1
(3.8)
12 = (P20 + P 2 1 P l ) + ( P 2 1 0 1 ) ~ l + 0212,
(3.9)
where (~1, ~2 ) satisfies (3.3)-(3.4). In the sequential execution of(L1, L2 ), the index space is traversed in the direction of lexicographically increasing (~1, ~2). PROOF. T h e p r o o f is d e r i v e d f r o m r e s u l t s o n single l o o p s d e s c r i b e d in C h a p t e r 2. T h e i n d e x variables of L1 a n d L2 are given in t e r m s of the c o r r e s p o n d i n g i t e r a t i o n variables by (2.3): I1 = PX + 01~1
(3.10)
12 = P2 + 02~2.
(3.11)
0 < ~1 < (qx - pl)/O1
(3.12)
0 < ~2 < (q2 - p2)/02.
(3.13)
By T h e o r e m 2.2, we h a v e
Since 42 - P2 = (420 + 42111) -- (P20 + P2111)
by (3.1)-(3.2)
= (420 -- P20) + (421 -- P 2 1 ) I 1
= (420 - P20) + (421 - P21)(Pl + 01~1)
by (3.10)
= [ ( 4 2 0 -- P 2 0 ) + (421 -- P 2 1 ) P l ] + (421 -- P21)01~1,
we c a n write (3.13) as o
4zo ÷
CHAPTER 3. DOUBLELOOPS
60
These inequalities give the range of ~2 for a given value of ~1, and the range of ~ is given by (3.12). This proves the first part of the theorem. For each value of (I1,I2), there is a unique value of (~1,~2), and conversely. We can write I2 = P2 + 02~2
by (3.11)
= (P20 + P2111) + 02i~2
by (3.1)
= P20 + P21(Pl + 01~1) + 02~2
by (3.10)
= (P20 + P21Pl) + (P21 01)i~1 + 02~2. This result and (3.10) give the required description of the index space in terms of ~1 and ~2. The last part of the theorem on the traversal order of the index space of (L1, L2) follows from the fact that the index space of a single loop is traversed in the direction of increasing iteration value (Theorem 2.3). Consider any two distinct index points (i~, i2) and (jx, j2) of (L1, L2), and let (~, ~2) and (31, ~2) denote the corresponding iteration points. In the sequential execution of the program, the iteration of (L1, L2) defined by (I~, I2) = (il, i2) can come before the iteration defined by ( I 1 , I 2 ) = ( j x , j 2 ) in exactly one of two ways: 1. The iteration of L1 defined by I1 = il comes before the iteration defined by I1 = j l , that is, ~1 < ~1 (Theorem 2.3). 2. The iterations of L1 defined by I1 = il and I1 = j l are identical, and the iteration of L2 defined by I2 = i2 comes before the iteration defined by I2 = j2; that is, ~1 = ~1 and ~2 < ~2. Combining the two sets of conditions, we can say that in the sequential execution of (L 1, L2), one moves from an index point (il, i2) to an index point (jl, j2) iff (~1, ~2) "~ (jX, j2). This completes the proof of the theorem. [] It should be noted that the coefficients ~20 and ~21 are fractions, in general. As pointed out in Chapter 1, we are assuming exact rational arithmetic, so that the question of roundoff errors does not arise. The double loop ( L 1 , L 2 ) is regular if P21 = q21; it is rectangular if P21 --- q21 : O. The following corollary deals with the description of the loop nest w h e n both loops are normalized.
3.2. INDEX AND ITERATION SPACES
61
Corollary 1 A double loop (L1, L2) with arbitrary strides can be written as an equivalent double loop (L~, L2) with unit strides. If (L1, L2) is regular, then (L1,L2) is rectangular. PROOF. Let/~(~1, ~2) denote the image of H(I1, I2) w h e n 11 and I2 are replaced by their equivalent expressions in ~1 and [2 from (3.8) and (3.9). Consider the double loop with unit strides LI: L2 :
doh = 0,~,1 do ~2 = 0, 6~20 q- ~21/~1,1
/:/(h, ~2) enddo enddo
The bodies of the loop nests (L1, L2) and (L1, L2) are labeled by different sets of variables. However, w h e n all loops are unrolled, the two nests give exactly the same sequence of instances of H(I~,I2). We leave the details to the reader. (See Exercise 1.) When (L1,L2) is regular, we have P21 = (/21, by definition. Then, c~2~ = 0 by (3.7), so that the loop nest (L~, L2) is rectangular. [] Example 3.1 Consider the double loop L1 : L2 :
do/1 = 20,11,-2 d o I2 = I1 + 2,211 + 1, 3
H(II,I2) enddo enddo
It is clear that I1 takes the values: 20, 18, 16, 14, 12, in that order. The values of I2 for each value of I1 are shown in Table 3.1. We can easily write down the corresponding table of iteration values, since the set of iteration values of any loop is at a one-to-one correspondence with the set of its, index values, and the iteration values always come in the sequence: 0, 1, 2 , . . . . This is Table 3.2 given next. In Figure 3.1, we show the index space of (L1,L2) on the left and the iteration space on the right. 1 The index space is b o u n d e d by the lines: I1 = 12,I1 = 20, I2 = I1 + 2, I2 = 2I~ + 1. 1Different scales are used in the two diagrams.
62
CHAPTER 3. DOUBLE LOOPS
I1 20 18 16 14 12
22 20 18 16 14
25 23 21 19 17
28 26 24 22 20
12
~
31 29 27 25 23
34 32 30 28
37 35 33
40
Table 3.1: Index values for loops of Example 3.1.
0 0 0 0 0
0 1 2 3 4
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4
5 5 5
6
Table 3.2: Iteration values for loops of Example 3.1. It is traversed in the direction: (20, 2 2 ) , . . . , (20,40), (18, 2 0 ) , . . . , (18, 35) . . . . . (12, 1 4 ) , . . . , (12, 23). The c o r r e s p o n d i n g sequence of iteration p o i n t s is ( 0 , O) . . . .
, ( 0 , 6 ) , ( 1, 0 ) , . . .
, ( 1, 5 ) . . . .
, (4, 0), . . . , (4, 3).
Next, apply T h e o r e m 3.1 a n d its corollary to this example. We have Pl = 20,
P20 = 2,
~/20 = 1,
ql
P21 = 1,
~/21 = 2,
=
11,
01 = - 2 ,
02 = 3.
Using (3.5)-(3.7), we c o m p u t e : ~1 = 4,
~20 = 19~3,
~21 =
-2~3.
By T h e o r e m 3.1, the iteration space of (L1,L2) is the set of p o i n t s (~1, ~2) such that
0 < ~2 < 19/3 - 2~1/3.
3.2. INDEX AND ITERATION SPACES
63
i r°
40
14
~
%
: : : • • • •
0
f2
2'0
"I1
0
= : :,~
"il
F i g u r e 3.1: I n d e x a n d i t e r a t i o n s p a c e s f o r E x a m p l e 3.1. T h u s , t h e i t e r a t i o n s p a c e is b o u n d e d b y t h e lines: h = 0,ill = 4,,f2 = 0,211 + 312 = 19. T h e i n d e x a n d i t e r a t i o n s p a c e s are r e l a t e d b y e q u a t i o n s (3.8)-(3.9): It = 20 - 2~1 12 = 22 - 211 + 3172. W h e n t h e l o o p s are n o r m a l i z e d , we g e t t h e e q u i v a l e n t p r o g r a m LI: L2 :
doh =0,4,1 d o i72 = 0, (19 - 2]1)/3, 1 H ( 2 0 - 2 8 , 22 - 21?1 + 3~2) enddo enddo
EXERCISES 3.2
1. Consider a double loop of the form L1 : L2 :
do 11 = 10, 100,5 do I2 = 10, I1, 10 H(11,I2) enddo enddo
CHAPTER 3. DOUBLE LOOPS
64
In the sequential execution of the program, show the order of the instances of H (I1, I2), and the corresponding sequences of values of (I1, I2 ) and (*~1,~2). Write the equivalent program after loop normalization. Show that in its sequential execution, the same sequence of instances of H (I1, I2 ) is executed. 2. In Example 3.1, express ~1 and [2 in terms of I1 a n d I2, and find the total number of iteration points. 3. For the double loop (L1, L2) in each of the following programs, find descriptions of the iteration and index spaces (as in Theorem 3.1), Q
the total number of iterations, the first five index values and the last in sequential execution,
•
the corresponding iteration values,
•
expressions of the iteration variables in terms of the index variables,
•
an equivalent double loop with unit strides:
(a) L1 : L~:
do I1 = 1,100, 2 do I2 = 100, 1, - 3
H(II,I2) enddo enddo (b) L1 : L2:
do 11 = 10, 100,1 doI2 = 2 1 1 + 1 , I 1 + 1 3 , 2 H(I1, I2) enddo enddo
(C) L1 : L2 :
do I1 = -110, -200, - 5 do I2 = 311,20 - I1, 10
H(II,I2) enddo enddo
(d) L1 : L2 :
do I1 = 10,100, 2 do I2 = 411 - 10,411 + 93,3
H(il,I2) enddo enddo
3.3 Dependence Concepts Consider two, not necessarily distinct, assignment statements S and T in t h e d o u b l e l o o p ( L 1 , L 2 ) o f t h e p r e v i o u s s e c t i o n . Let S ( i l , i 2 ) d e n o t e t h e i n s t a n c e o f S f o r a n i n d e x p o i n t ( i l , ~2), a n d T ( j x , j 2 ) t h e
3.3. DEPENDENCE CONCEPTS
65
instance of T for an index point (jl, j2). The distance from S (il, i2 ) to T (jl, j2) is defined to be the vector (31 -- El, 32 - ~2), where (~1, ~2) and (3~,32) are the iteration points corresponding to (~1, i2) and ( j l , j 2 ) , respectively. Lemma 2.4 (on the execution ordering of statement instances for a single loop) is generalized to a double loop by the following result, where the concept of a positive scalar distance is replaced by that of a lexicographically positive distance vector. L e m m a 3.2 Consider two assignment statements S and T in the loop nest (L~, L2). Let (dl, d2) denote the distance from an instance S (~1, ~2) of S to an instance T(j~,j2) of T. In the sequential execution of the program, S(~I, i2) is executed before T(jl, j2) iff one of the following two conditions holds: (a) (all, d 2 ) >- ( 0 , 0 ) ,
(b) (dl, d2) = (0, O) and S < T. PROOF. Let ( ~ , [2) and (3~, 32) denote the iteration points corresponding to the index points (i1,i2) and ( j l , j 2 ) , respectively. Then, the distance (d~, d2) from S(il, i2) to T(jl ,j2) is given by ( d l , d 2 ) = (31 - [1,32 - ~2). Since in the sequential execution of (L~,L2), the index space is traversed in the direction of increasing iteration value (Theorem 3.1), H(il, i2) is executed before H(j~,j2) iff (~1, ~2) < (31,32), that is, iff (dx,d2) >- (0,0). To prove the only if part, assume that S(i~, i2) is executed before T(j~,j2). The instances S(i~,i2) and T(j~,j2) are in H(i1,i2) and H(jx, j2), respectively. If (il, i2) :~ ( j l , j 2 ) , then H(il, i2) is executed before H(j~, j2), and therefore (d~, d2) >- (0, 0). If (il, i2) = (j~, j2), then ( a l l , d 2 ) = ( 0 , 0 ) , and we m u s t have S < T for the instance of S to precede the instance of T in the same iteration of the double loop. The proof of the if part is similar and is omitted. [] The relation of dependence between statements is denoted by ~, and is defined as follows: For two statements S and T, we have S ~ T if there is an instance S(i~, i2) of S, an instance T ( j l , j 2 ) of T, and a m e m o r y location 5Vl, such that
CHAPTER 3. DOUBLE LOOPS
66
1. Both S(il,
i2)
and T ( j l , j 2 ) reference (read or write) 5~/;
2. S(~1, ~2) is executed before T(j~, J2) in the sequential execution of the program; 3. During sequential execution, the location 5v/is not written in the time period from the end of execution of S(il, i2) to the beginning of the execution of T ( j l , j2). The statements S and T need not be distinct, but Condition 2 requires that the instances S(i1,i2) and T(jx,j2) be distinct. The notation S ~ T is read as "T d e p e n d s on S." The dependence of T on S is the set of all pairs (S (/.1, ~2 ), T (jl, j2)) that satisfy tile above conditions. Thus, we have S ~ T i f f the d e p e n d e n c e of T on S is nonempty. The graph of the relation ~ is called the statement dependence graph of the given program. Suppose that a statement T d e p e n d s on a statement So Let S(il, i2) and T(j~, j2) denote a pair of instances, such that they (together with some m e m o r y location 5v/) satisfy the above conditions. (There is at least one such pair.) We say that the instance T ( j l , j 2 ) depends on the instance S ( i l , ~2). Let (d~, d2) denote the distance from S(~I, i2) to T (jl, j2). Let (~rx, 0-2 ) = (sig(dx), sig(d2) ) and ~ = lev(dl, d2). (For the definitions of "sig" and "lev," see Section 1.1.3.) In other words, 0-1
f =
02 =
g=
--
f
1 if d l > 0 1 if d l < 0 0 if dl = 0,
1 ifd2 > 0 -- 1 if d2 < 0 0 if d2 = 0, 1 ifdl >0 2 if dl = 0 a n d d 2 > 0 3 ifd~=0andd2=0.
Then, (d~, d2) is a distance vector, (0-1,0-2) a direction vector, and g a dependence level for the d e p e n d e n c e of T on S. It is sometimes convenient to say that T d e p e n d s on S with a distance vector (dl, d2) or a direction vector (o-1,0-2), and at a level ~. When we do not want to specify the exact value of o-1 or 0-2, we use the symbol " , . " For
3.3. DEPENDENCE CONCEPTS
67
example, "there is a direction vector of the f o r m (1, • )" m e a n s there is at least one direction vector f r o m the set {(1, 1), (1, 0), ( 1 , - 1 ) } . Since S(il, i2) m u s t be executed before T ( j I , j 2 ) if T ( j l , j 2 ) dep e n d s on S(i~, i2), we have the following t h e o r e m as a direct consequence of L e m m a 3.2. The next two corollaries follow f r o m the definitions j u s t given. T h e o r e m 3.3 If ( d l , d 2 ) is a distance vector for the dependence of statement T on statement S, then (d~,d2) --- (0,0). The equality ( d l , d 2 ) = (0,0) is possible only ifS < T. Corollary 1 If(O-l, 0-2 ) is a direction vector for the dependence of statement T on statementS, then (0-1,0-2) --- (0, 0). The equality (0-1,0-2) = (0, 0) is possible only if S < T. Corollary 2 If g is a dependence level for the dependence o f statement T on statement S, then 1 < ~ < 3. The value g = 3 is possible only if S 0; 4. There is a d e p e n d e n c e direction vector of the f o r m (1, #); 5. T d e p e n d s o n S at level 1.
68
CHAPTER
3.
DOUBLE
LOOPS
There are similar sets of equivalent s t a t e m e n t s for a d e p e n d e n c e carried by L2 a n d a l o o p - i n d e p e n d e n t d e p e n d e n c e (Exercise 1). As in Section 2.3, we can define flow, anti-, output, and i n p u t dep e n d e n c e s , indirect d e p e n d e n c e , d e p e n d e n c e c a u s e d by a pair of variables, etc. When needed, we also associate a distance or a direction vector, or a level of d e p e n d e n c e with a particular type of d e p e n d e n c e , or even with a particular pair of variables. E x a m p l e 3.2 In this example, we illustrate some of the concepts int r o d u c e d in the c u r r e n t section. Consider the p r o g r a m L1 :
do I1 = 10, 100, 2
L2 :
S : T : U:
V :
d o I2 = 16, 11, - 2 X ( I 1 + 2, I2) . . . . ....... .......
X ( I ~ , I 2 - 4) • • • X(I1+3,I2+2)...
X(I1,12) ....
enddo enddo A beginning part of the s e q u e n c e of s t a t e m e n t instances obtained by unrolling the two loops is s h o w n in Table 3.3, where the first c o l u m n gives iteration points. It is clear that s t a t e m e n t T is flow d e p e n d e n t on s t a t e m e n t S. For example, the instance T(12, 16) d e p e n d s o n the instance S(10, 12). The iteration p o i n t s c o r r e s p o n d i n g to the index p o i n t s (12, 16) a n d (10, 12), are (1, 0) a n d (0, 2), respectively. Hence, the distance vector arising f r o m these two instances is (dl, d2) = ( 1 , - 2 ) , a n d the direction vector is (~rl,~r2) = ( 1 , - 1 ) . It implies that T d e p e n d s on S at level 1, since d l > 0. This d e p e n d e n c e is carried by the outer loop L1, it is n o t carried by the inner loop L2, and there is no l o o p - i n d e p e n d e n t part. The p a t t e r n is r e p e a t e d several times. For example, T(14, 16) d e p e n d s o n S(12, 12), T(16, 16) d e p e n d s on S(14, 12), and so on. We do n o t get any n e w distance or direction vectors. S t a t e m e n t V is a n t i - d e p e n d e n t o n s t a t e m e n t T. Indeed, V(10, 12) d e p e n d s o n T(10, 16), V(12, 12) d e p e n d s on T(12, 16), etc. For this d e p e n d e n c e , the u n i q u e distance vector, direction vector, and level are (0, 2), (0, 1), a n d 2, respectively. This d e p e n d e n c e is carried by the inner loop L2, b u t n o t by L1.
3.3. DEPENDENCE CONCEPTS
69
Iteration Point
(o,o)
Iteration of loop Nest S ( 1 0 , 16) :
X ( 1 2 , 16) . . . .
T ( I O , 16) :
.......
U ( 1 0 , 16) :
(0,1)
(0,2)
X ( 1 3 , 18) • • •
V ( 1 0 , 16) :
X ( 1 0 , 16) . . . .
S(10, 14) :
X ( 1 2 , 14) . . . .
T ( 1 0 , 14) :
X ( 1 0 , 10) • • •
U(10, 14) :
X ( 1 3 , 16) • • •
V ( 1 0 , 14) :
X ( 1 0 , 14) . . . .
S ( 1 0 , 12) :
X ( 1 2 , 12) . . . .
T ( 1 0 , 12) :
X(10,8)
U ( 1 0 , 12) :
(1,0)
(1,1)
X ( 1 0 , 12) . . . .
S(12,16)
X ( 1 4 , 16) . . . .
:
T ( 1 2 , 16) :
X ( 1 2 , 12) • • •
U ( 1 2 , 16) :
X ( 1 5 , 18) • • •
V ( 1 2 , 16) :
X ( 1 2 , 16) . . . .
S(12, 14) :
X ( 1 4 , 14) . . . .
T ( 1 2 , 14) :
X ( 1 2 , 10) • • • X ( 1 5 , 16) • • •
V ( 1 2 , 14) :
X ( 1 2 , 14) . . . .
S ( 1 2 , 12) :
X ( 1 4 , 12) . . . .
T ( 1 2 , 12) : U(12,12)
(2,0)
(2,1)
X ( 1 2 , 8) • • •
:
X ( 1 5 , 14) • • •
V ( 1 2 , 12) :
X ( 1 2 , 12) . . . .
S ( 1 4 , 16) :
X ( 1 6 , 16) . . . .
T ( 1 4 , 16) :
X ( 1 4 , 12) • • •
U(14, 16) :
X ( 1 7 , 18) • • •
V ( 1 4 , 16) :
X ( 1 4 , 16) . . . .
S ( 1 4 , 14) :
X ( 1 6 , 14) . . . .
T ( 1 4 , 14) :
.......
U ( 1 4 , 1 4 ) ."
(2,2)
X ( 1 7 , 16) • • • X ( 1 4 , 14) . . . .
s ( 1 4 , 12) :
x ( 1 6 , 12) . . . .
U(14,12)
:
V ( 1 4 , 12) :
3.3:
X ( 1 4 , 10) • • •
V ( 1 4 , 14) :
T ( 1 4 , 12) :
Table
•••
X ( 1 3 , 14) • • •
V ( 1 0 , 12) :
U ( 1 2 , 14) :
(1,2)
X ( 1 0 , 12) • • •
Some
iterations
X ( 1 4 , 8) • • • .......
X(17,14)
•••
X ( 1 4 , 12) . . . .
of (L1,L2)
in Example
3.2.
70
CHAPTER
3.
DOUBLE
LOOPS
S t a t e m e n t V is o u t p u t d e p e n d e n t o n s t a t e m e n t S, s i n c e V ( 1 2 , 16) d e p e n d s o n S ( 1 0 , 16), V ( 1 2 , 14) d e p e n d s o n S ( 1 0 , 14), etc. In t h i s case, t h e d i s t a n c e v e c t o r is ( 1 , 0 ) , t h e d i r e c t i o n v e c t o r is a l s o (1, 0), a n d t h e level o f d e p e n d e n c e is 1. T h i s d e p e n d e n c e is c a r r i e d b y L1, b u t n o t b y L2. T h e r e is n o d e p e n d e n c e b e t w e e n s t a t e m e n t U a n d a n y o t h e r s t a t e ment in the loop nest. The subscripts for this problem were chosen so that we get simple r e p e a t i n g p a t t e r n s t h a t a r e c l e a r l y visible f r o m a n i n s p e c t i o n o f t h e first f e w s t a t e m e n t i n s t a n c e s . We d o n o t a l w a y s h a v e s u c h r e g u l a r i t y , and the distance vectors, direction vectors, or levels of dependence need not be unique. EXERCISES 3.3 1. Suppose that T depends on S in the model program and the dependence is
carried by the loop L2. Write a set of equivalent statements describing this fact in terms of statement instances, distance vectors, direction vectors, and dependence levels. Repeat for a loop-independent dependence of T on S. 2. Give the complete dependence picture for the program of Example 3.2, after changing the loop headers to: (a) L1 : L2 : (b) LI : L2 : (C) L1 : L2 : (d) L1 : L2 :
do I~ = 10,100, 3 do I2 = 16,11, - 2 do I1 = 10, 100, 1 do 12 = 16,11,-1 do 11 -- 100, 10, - 2 doI2 = 11,16,2 do Ii = 10,100, 2 do 12 = 11,100, 1
3. Study the dependence structure of the following programs (by inspection): (a) L1 : L2 :
do I1 = O, 10, 1 doI2 = 0,10,1 S: T:
X ( I 1 + 1 2 + 2) . . . . ....... X(I1 -I2 + 1 1 ) . . .
enddo enddo
(b) L1 : L2 :
do 11 = 4, 10, 1 do I2 = 4, I1, l S :
X ( I I , I 2 ) = X ( I 1 - 1 , I 2 - 1) +X(I1 + 1,I2 + 1)
enddo enddo
71
3.4. DEPENDENCE PROBLEMS
3.4
Dependence Problems
The goal of this section is to give a taste of the complexity of dependence p r o b l e m s in double loops, and provide an informal introduction, t h r o u g h examples, to the various ways of solving s u c h problems. These s o l u t i o n t e c h n i q u e s are derived f r o m general m e t h o d s which will be d i s c u s s e d in detail in chapters to follow. E x a m p l e 3.3 Consider the rectangular d o u b l e loop d o I1 = 0, 100, 1
L1 : L2 :
S: T:
doI2 = 0,50,1 X (311,312) . . . . .......
X(211 + 1,I2 + 2) • • •
enddo enddo
where the two p r o g r a m variables are e l e m e n t s of a two-dimensional array, each of their subscripts has only one index variable, a n d the c o r r e s p o n d i n g subscripts have the same index variable. We s t u d y the d e p e n d e n c e b e t w e e n s t a t e m e n t s S a n d T c a u s e d by X(3I~, 312) and X(211 + 1, I2 + 2). Note that the index a n d iteration spaces are identical in this case (why?). The instance of the variable X(311,312) for an index point (i~, i2) is X ( 3 i l , 3i2). The instance of the variable X(211 + 1,I2 + 2) for an index p o i n t (Jl, j2 ) is X( 2jl + 1, j2 + 2). These two i n s t a n c e s r e p r e s e n t the s a m e m e m o r y location iff 3i~ = 2jl + 1 3i2 = j2 + 2, that is, iff 3i~
- 2jr = 1
3i2 - j2 = 2.
(3.14)
(3.15)
These are the dependence equations for the problem. Since (il, i2) and (jl, j2) are index p o i n t s for the given loop nest, they m u s t satisfy
72
CHAPTER 3. DOUBLE LOOPS
the following inequalities: 0 _< il -< 100"~ 0 < j~ < 1 0 0 )
(3.16)
0 _< i2 < 50 ) 0 < j2 < 50. ~
(3.17)
These are the dependence constraints for the problem. 2 If the variables of S and T cause a d e p e n d e n c e between those two statements, then there m u s t exist integers il, j l , i2, and j2 that satisfy the equations (3.14)-(3.15) and the inequalities (3.16)-(3.17). This particular double-loop problem can be partitioned into two disjoint single-loop problems: one with d e p e n d e n c e equation (3.14) and d e p e n d e n c e constraints (3.16), the other with d e p e n d e n c e equation (3.15) and d e p e n d e n c e constraints (3.17). It is an example of what we called a simple problem in Chapter 1. We can solve each subproblem by Algorithm 2.2, and t h e n combine the results to get the solution to the m a i n problem. Instead of mechanically carrying out the steps of Algorithm 2.2, we show here the main steps. Solving Equation (3.14) by Algorithm 2.1 and Theorem 2.8, we get il = 1 + 2tl ) j l = 1 + 3tl,
(3.18)
where tl is an u n k n o w n integer parameter. Substituting for il and j l in (3.16), we get the inequalities 0
(3.19)
2Since the stride of each loop is 1, conditions of the form "(il - 0) ! 1 is an integer" are automatically satisfied.
3.4. DEPENDENCE PROBLEMS
73
Solving Equation (3.15) by Algorithm 2.1 and Theorem 2.8, we get i2 = t2 ~ j2 = - 2 + 3t2, )
(3.20)
where t2 is an u n k n o w h integer parameter. Substituting for i2, j2 in (3.17), we get the inequalities 0 _< t2 -< 50 0 < - 2 + 3t2 < 50, which simplify to 0 < t 2 < 50 2/3 < t2 < 52/3. Since t2 is an integer, it will satisfy both sets of inequalities iff 1 < t2 < 17.
(3.21)
If we take any integral value of tl from (3.19) and any integral value of t2 from (3.21), t h e n the corresponding integral values of il, j l , i2, and j2, c o m p u t e d from (3.18) and (3.20), will satisfy the equations (3.14)-(3.15) and the constraints (3.16)-(3.17). Thus, there are integers that satisfy the d e p e n d e n c e equations and the d e p e n d e n c e constraints. Therefore, there is a d e p e n d e n c e between S and T. We can restrict the d e p e n d e n c e s for which we test by requiring a specific direction vector, and still treat the d e p e n d e n c e problem as a simple problem. Suppose we want to know if T depends on S with the direction vector (1, 1). Then, we are looking for a solution ( il, jx, ~2, j2 ) to equations (3.14)-(3.15) that satisfies (3.16)-(3.17) a n d the conditions: il < j l , i2 < j2. From (3.18) we see that i~ < j~ is possible iff tx > 0. The range of t~ as given by (3.19) is then further restricted to 1 < tl < 33. Similarly, it is clear from (3.20) that ~2 < j2 holds iff t2 > 1. The range of t2 given by (3.21) is then reduced to 2 < t2 < 17. Since the ranges of tl and t2 u n d e r the additional conditions are nonempty, statement T does d e p e n d on statement S with the direction vector (1, 1). This implies that T depends on S at level 1. The set of instance pairs ( S ( il , ~2 ), T (jl, j2 ) ), such that T (j~ , j2 ) d e p e n d s on S ( i l , i2) and il < ix, i2 < j2, is given by (3.18) and (3.20),
74
CHAPTER 3. DOUBLE LOOPS
w h e r e 1 < t~ < 33 a n d 2 < t2 < 17. The c o r r e s p o n d i n g set of d e p e n d e n c e d i s t a n c e s is {(tl, 2t2 - 2) : 1 < tl < 33, 2 < t2 < 17}. Next, s u p p o s e we w a n t to find o u t if s t a t e m e n t S d e p e n d s o n statem e n t T with t h e d i r e c t i o n v e c t o r ( 1, - 1). Then, since the i n d e x p o i n t s (i1, i2) a n d ( j l , j2) are a s s o c i a t e d with S a n d T, respectively, t h e additional c o n d i t i o n s are jx < /-1 a n d j2 > i2. F r o m (3.18) we see that jx < /-1 h o l d s iff tl < 0. However, as (3.19) shows, negative values of tl are n o t acceptable. Hence, a l t h o u g h t h e r e are s o l u t i o n s to (3.15) satisfying (3.17) a n d j2 > i2 (as we s a w a b o v e ) , t h e r e are n o solutions to the s y s t e m of e q u a t i o n s (3.14)-(3.15) t h a t satisfy (3.16)-(3.17) a n d the extra conditions. So, S d o e s n o t d e p e n d o n T w i t h t h e d i r e c t i o n v e c t o r ( 1, - 1). E x a m p l e 3.4 C o n s i d e r t h e r e c t a n g u l a r d o u b l e loop L1 : L2: S : T:
d o I1 = 0, 100, 1 doI2 =0,50,1 X(311,312) . . . . ....... X(212+1,I1
+2)''"
enddo enddo o b t a i n e d f r o m the p r o g r a m o f Example 3.3, by i n t e r c h a n g i n g the i n d e x variables in t h e s u b s c r i p t s o f the p r o g r a m variable of s t a t e m e n t T. The d e p e n d e n c e e q u a t i o n s in this case are 3i1 - 2j2 = 1
(3.22)
3i2 - j l = 2.
(3.23)
The d e p e n d e n c e c o n s t r a i n t s are u n c h a n g e d : 0 _< i~ 0 _< j l 0 _< i2 0 _< j2
_< 100 ~ -< 100 -< 5 0 < 50.
(3.24)
Solving Equation (3.22) b y A l g o r i t h m 2.1 a n d T h e o r e m 2.8, we get /,1 -=- 1 4- 2t1 ~ j2 = 1 + 3t1, )
(3.25)
75
3.4. DEPENDENCE PROBLEMS
where tl is an u n k n o w n integer parameter. Substituting for il and j2 in the corresponding inequalities of (3.24), we get 0< 1+2t1<
100
0< 1+3t1<
50,
which simplify to - 1 / 2 _< tl -< 99/2 - 1 / 3 _< tl < 49/3. Since tl is an integer, it will satisfy both sets of inequalities iff 0 < tl < 16.
(3.26)
Solving equation (3.23) by Algorithm 2.1 and Theorem 2.8, we get i2 = t2 t j l = - 2 + 3t2,
(3.27)
./
where t2 is an u n k n o w n integer parameter. Substituting for i2 and j l in the corresponding inequalities of (3.24), we get 0 <
t2 < 50
0<-2+3t2<
100,
which give 1 < t2 < 34.
(3.28)
Since the ranges of tl and t2 are nonempty, there are integers that satisfy the d e p e n d e n c e equations and the dependence constraints. Therefore, there is a d e p e n d e n c e between S and T. The d e p e n d e n c e problem we just solved was again a simple problem. However, w h e n we try to be more specific, the problem gets more complicated. As in Example 3.3, suppose we want to know if statement T d e p e n d s on statement S with the direction vector (1, 1). Then, we are looking for a solution (il, j l , i2, j2) to equations (3.22)(3.23) that satisfies (3.24) and the conditions: il < j l and i2 < j2. Note that the last set of conditions is equivalent to il < j l - 1 and
CHAPTER 3. DOUBLE LOOPS
76
i2 --< j2 - 1, since we are dealing w i t h integers only. Using the expressions for i l , j l , i 2 a n d J2 in (3.25) a n d (3.27), we see t h a t i~ < j l - 1 h o l d s iff 4 / 3 + 2t~/3 < t2,
(3.29)
t2 -< 3tl.
(3.30)
a n d i2 < j2 - 1 h o l d s iff
Thus, s t a t e m e n t T d e p e n d s o n s t a t e m e n t S w i t h t h e direction v e c t o r (1, 1), iff t h e r e are integers tx a n d t2 s u c h t h a t (3.26), (3.28), (3.29), a n d (3.30) all h o l d s i m u l t a n e o u s l y . Using Fourier e l i m i n a t i o n (Algor i t h m 1.3.2) a n d r e m e m b e r i n g that t~ a n d t2 are integer variables, we get 1 < tl < 16
[ m a x ( l , (4 + 2 t l ) / 3 ) ) ] < t2 < [ m i n ( 3 4 , 3 t ~ ) ] .
(3.31)
(3.32)
This s h o w s t h a t t h e r e are real values of tl a n d t2 t h a t satisfy the req u i r e d conditions. To see if t h e r e are i n t e g e r s satisfying t h e s a m e conditions, we m a y u s e e n u m e r a t i o n . Indeed, t h e r e are. Taking t~ = 1, we get 2 < t2 < 3, so that (1, 2) a n d (1, 3) are a m o n g the admissible values of (t~, t2). The set of i n s t a n c e pairs (S(il, i2), T(j~, j2)), s u c h t h a t T ( j ~ , j 2 ) d e p e n d s o n S ( i l , i2), il < jl, a n d i2 ( j2, can be described by (3.25) a n d (3.27), w h e r e t~ a n d t2 are integers satisfying (3.31) a n d (3.32). This e x a m p l e was an illustration of the m e t h o d of e l i m i n a t i o n for solving d e p e n d e n c e p r o b l e m s (Algorithm 1.3). E x a m p l e 3.5 Consider the d e p e n d e n c e p r o b l e m in the d o u b l e loop LI: L2 : S: T:
do I1 = 10, 100, 2 do I2 = 100, 10, - 3 X(3It + 312) . . . . ....... X ( I 1 + 212 + 3) • • • enddo enddo
involving a o n e - d i m e n s i o n a l array. The i n s t a n c e of X(311 + 312) for an index p o i n t (il, i2) is X ( 3 i l + 3i2). The i n s t a n c e of X(I1 + 212 + 3)
77
3.4. DEPENDENCE PROBLEMS
for an i n d e x p o i n t ( j l , j 2 ) is X ( j l + 2j2 + 3). T h e s e two i n s t a n c e s r e p r e s e n t the s a m e m e m o r y l o c a t i o n iff 3il - j l + 3i2 - 2j2 = 3.
(3.33)
This e q u a t i o n is t h e d e p e n d e n c e e q u a t i o n in t e r m s of i n d e x values. T h e variables il, j l , i2, a n d j2 m u s t satisfy the following c o n d i t i o n s : 10 < il < 100,
10 _< i2 < 100,
(il - 1 0 ) / 2 is a n integer,
(i2 - 1 0 0 ) / ( - 3 ) is an integer,
10<ji_<
10 _< j2 < 100, (j2 - 1 0 0 ) / ( - 3 ) is a n integer.
100,
(jl - 1 0 ) / 2 is a n integer,
T h e s e inequalities are the d e p e n d e n c e c o n s t r a i n t s in t e r m s of i n d e x values. We r e s t a t e the p r o b l e m in t e r m s of iteration v a l u e s u s i n g Theor e m 3.1. T h e i t e r a t i o n variables ~1 a n d ]2 of L1 a n d L2 are given by 2]1
I1 = 1 0 + I2
=
lO0
-
a n d t h e y satisfy the c o n d i t i o n s
0_
<45
0_
i2 = 100 - 3~2,
j l = 10 + 231,
j2 = 1 0 0 - 332.
S u b s t i t u t i n g for il, j l , i2, a n d j2 in E q u a t i o n 3.33 a n d the a s s o c i a t e d c o n s t r a i n t s , we get the e q u a t i o n 6~1 - 2 j l - 9~2 + 632 = - 1 1 7 ,
(3.34)
CHAPTER 3. DOUBLELOOPS
78
where 0 < ~1 < 4 5 , 0 _< )1 < 45,
0 < ~2 < 30, } 0 < j2 < 30.
(3.35)
In terms of iteration variables, the dependence equation is (3.34) and the dependence constraints are given by (3.35). Suppose we want to test if statement T d e p e n d s on statement S with the direction vector (1, - 1 ) . Then, we have to decide if there is an integer solution (~1, jl, [2, ~2 ) to Equation (3.34) that satisfies (3.35) and the conditions
}
~'1< Jl - 1 22 -> j2 + i.
(3.36)
We solve this problem by the method of bounds (Chapter 1). First, we check to see if (3.34) has any integer solutions at all By Theorem 1.3.5, there are integer solutions to this equation iff gcd(6, 2, 9, 6) divides 117. (This is called the gcd test for this problem.) This gcd can be found by Corollary I to Theorem 1.3.4. Given the matrix
A=
2 9
'
6 by Algorithm 1.2.2, we can find a 4 x 4 u n i m o d u l a r matrix V and a 4 x 1 echelon matrix
$=
0 0
'
such t h a t A = VS. Then, god(6, 2, 9, 6) is given by ISlll which happens to equal 1. For an alternative way, note that gcd(6, 2, 9, 6) = gcd(6, god(2, 9, 6)) = gcd(6, god(2, god(9, 6))), and we can evaluate the last expression by repeated applications of the classical Euclid's algorithm (see [Knut 73]). Algorithm 2.1 may also be used, but it c o m p u t e s more than what we need. In any case, since the gcd here is 1, Equation (3.34) does have integer solutions.
79
3.4. DEPENDENCE PROBLEMS
The next step is to see if there is a real solution to (3.34) that satisfies (3.35) a n d (3.36). Define a f u n c t i o n ~f : R 4 ~ R by f(x1,23"1, X2, Y2) = 6X1 - 2yl
- 9X2 +
6y2.
Let P d e n o t e the p o l y t o p e in 1/4 defined by the inequalities (3.35)(3.36) stated in t e r m s of real variables: 0 < X1 <
45,
0
_< X 2 <
30,
Xl
0 _< Y l -<
45,
0 _< Y 2 N
30,
Y 2 N X2 -- 1.
< Y l --
1,
(A polytope is a b o u n d e d set described by linear inequalities; see Section A.2.) By T h e o r e m A.6, there is a real solution to the e q u a t i o n f ( x l , Y l , x2, Y2 )
= --
117
in P, iff <- - 1 1 7 _< m a x f ( x l , Y l , X 2 , Y 2 ) . P
rmn f ( x l , Y l , X 2 , Y 2 ) P
(This is the bounds test for this problem.) We can c o m p u t e these e x t r e m e values of f by T h e o r e m 2.11. We have minf(xl,Yl,X2,Y2) P
=
min
0<x1_<45 0 < y l -<45 Xl -
(6Xl - 2yl)
-
+
min
0_
(6y2 - 9X2)
= ~ ( 6 , - 2 , 0 , 4 5 , 1) + ~(6, - 9 , 0, 30, 1) = -360
by (2.19),
and similarly, maxf(xl,yl,x2,Y2) P
= v ( 6 , - 2 , 0 , 4 5 , 1) + v(6, - 9 , 0, 30, 1) = 165
by (2.21).
Since the c o n d i t i o n - 3 6 0 _< - 1 1 7 _< 165 holds, there is a real solution to the equation f ( x l , Y l , x2, Y2) = - 117 in P, that is, a solution to Equation (3.34) satisfying the conditions (3.35)-(3.36), w h e n the four variables are a s s u m e d to be real variables. At this point, we a s s u m e that there is probably an integer solution to the d e p e n d e n c e e q u a t i o n satisfying all conditions, a n d therefore, there is probably a dependence of T o n S with the direction vector (1, - 1).
CHAPTER 3. DOUBLE LOOPS
80
EXERCISES 3.4 1. In the program of (a) Example 3.3, (b) Example 3.4, (c) Example 3.5, check for dependence of T on S and of S on T, with each possible direction vector not already covered. Find the dependence levels and distance vectors in the first two programs.
Chapter 4
Perfect Loop Nests 4.1
Introduction
In C h a p t e r 2, we gave a detailed i n t r o d u c t i o n to the concepts and t e c h n i q u e s of d e p e n d e n c e analysis u s i n g a single loop as the model. In C h a p t e r 3, we s h o w e d h o w things can b e c o m e complicated very quickly w h e n there are two loops to deal with. The m o d e l for this c h a p t e r is a perfect n e s t of an arbitrary n u m b e r of loops. Our goal is to p r o p e r l y f o r m u l a t e the linear d e p e n d e n c e p r o b l e m in the context of s u c h a program. We will n o t discuss m e t h o d s of s o l u t i o n in this chapter except in s o m e special cases (Section 4.6) a n d examples. Matrix n o t a t i o n b e c o m e s a l m o s t a necessity at this point; w i t h o u t that, we r u n the risk of d r o w n i n g in a sea of subscripts. The reader s h o u l d keep o n h a n d a double loop of the f o r m L1 : L2 :
do I1 = Pl0, ql0, 01 do I2 = P20 + P2111,q20 + g/2111, 02 H(II,I2) enddo enddo
where Plo, qlo, P20, P21, q20, a n d q21 are integer constants, and try to u n d e r s t a n d the results and n o t a t i o n of this c h a p t e r first in t e r m s of that particular perfect nest. Note that in C h a p t e r 3, we wrote Pl for Pl0, a n d 6/1 for ql0. In Section 4.2, we s t u d y the index a n d iteration spaces of a perfect loop nest. The matrix n o t a t i o n m e n t i o n e d above is i n t r o d u c e d 81
82
CHAPTER 4. PERFECT LOOP NESTS
in this section. D e p e n d e n c e c o n c e p t s are e x p l a i n e d in Section 4.3; r e p r e s e n t a t i o n of linear a r r a y s u b s c r i p t s in t e r m s of m a t r i c e s is pres e n t e d in Section 4.4; a n d t h e linear d e p e n d e n c e p r o b l e m is d e s c r i b e d in Section 4.5. T h r e e special cases of the p r o b l e m in p e r f e c t n e s t s are s t u d i e d in Section 4.6: u n i f o r m d e p e n d e n c e , two-variable p r o b l e m s , a n d a general case that i n c l u d e s the first two.
4.2
Index and Iteration Spaces
In this chapter, we c o n s i d e r a p e r f e c t n e s t of loops L = (L1, L 2 , . . . , Lm) of the f o r m LI" L2"
do I~ = Pl,ql,01 do I2 = P2, q2, 02 :
Lm"
do I m = Pro, qm,
Om
H(I1,12 . . . . . Ira) enddo :
enddo enddo w h e r e H ( I 1 , 1 2 , . . . ,Ira) is a s e q u e n c e of a s s i g n m e n t s t a t e m e n t s . We write I = (I1, I2 . . . . . Ira) a n d call I the index vector o f L. The values of I are the index points or index values of L, a n d the set of all i n d e x p o i n t s is the index space. Let ]r d e n o t e the iteration variable of the loop Lr, 1 < r < rrr. We write ~ = (~1, ]2 . . . . , ~m) a n d call ~ the iteration vector of L. The values of ~ are the iteration points or iteration values o f L, a n d t h e set of all iteration p o i n t s is the iteration space. The i n d e x a n d iteration spaces o f a loop n e s t o f l e n g t h rn are s u b s e t s of Z m, with the s a m e n u m b e r of points. For 1 < r < rn, the initial limit Pr a n d the final limit qr are ass u m e d to be integer-valued affine f u n c t i o n s of I1, I 2 , . . . , Ir-~. Let p r ( I I , I 2 , . . . , / r - 1 ) = Pro + prxI1 + Pr212 + " " " + Pr(r-1)Ir-1 clr(II,I2 . . . . , / r - I ) = ClrO + qrlIx
+
~r212 +
" " " + qr(r-1)Ir-1,
w h e r e the coefficients are all integer c o n s t a n t s . We define two m vectors Po a n d qo, a n d two r n x r n u p p e r triangular m a t r i c e s P a n d Q
83
4.2. INDEX A N D ITERATION SPACES
as follows: PO = (PlO, P20,--., PraO), qo = (ql0, q 2 0 , . . . , qrnO),
p =
Q _-
1 0 0
-p21 1 0
-p31 --P32 1
.... .... ....
0
0
0
...
1 0 0
-q21 1 0
-q31 -q32 1
.... .... ....
0
0
0
.--
Prnl Prn2 Pra3 1 qrrtl / qrn2 qrn3 . 1
The v e c t o r s r e p r e s e n t the c o n s t a n t p a r t s of the loop limits, a n d the m a t r i c e s r e p r e s e n t the variable parts. Note that P a n d Q are u n i m o d ular m a t r i c e s w i t h det(P) = d e t ( Q ) = 1. For the loop n e s t L, the initial vector is Po, the initial m a t r i x is P, the final vector is q0, a n d the final matrix is Q. The strides 0r are a s s u m e d to be n o n z e r o integer c o n s t a n t s . The stride m a t r i x o f I. is d e n o t e d b y O a n d is d e f i n e d b y 01
O = diag(01,02,...,0m) =
0 . 0
0
02 .
. 0
--"
0
... . ..
0
""
Om
Since d e t ( O ) = 1-[r=~ n~ Or ~ 0 b y h y p o t h e s i s , O is nonsingular. T h e o r e m 4.1 A point I E Z ra is a n index point f o r the loop nest L iff (a) (IP - p o ) O -1 is an integer vector, a n d (b) I satisfies the inequalities: P o O -1 _< I P O -1
"~
IQO_ 1 < qoO_X. ~
(4.1)
84
C H A P T E R 4. P E R F E C T L O O P N E S T S
PROOF. By T h e o r e m 2.1, a n integer Ir is a n i n d e x value o f the l o o p Lr iff the following t w o c o n d i t i o n s hold: 1. ( I t - P r ) / O r is an integer, a n d 2. I r satisfies the inequalities: 0 <_ ( I t - p r ) / O r
(4.2)
< (air - p r ) / O r .
For 1 < r < m , w e have a s s u m e d the following f o r m for the initial limit P r =- P r (I1, I2 . . . . , I r - 1 ) of Lr: P r = Pro + P r l I 1 + Pr212 + " " " + P r ( r - 1 ) I r - 1 .
Therefore, we c a n write Ir - Pr = Ir -
(Pro + PrlI1
= (-PrlI1 = (I1,12 ....
+ Pr212 + • • • + Pr(r-1)Ir(r-1))
- Pr212 ..... , Ira) " (-Prl,
Pr(r-1)Ir(r-1) -Pr2,
...,
+ I t ) - PrO
-Pr(r-1),
1, 0 , . . . ,
O) -
PrO,
or
Ir - P r = I. p(r) _ PrO
(4.3)
w h e r e p(r) d e n o t e s c o l u m n r of the initial m a t r i x P. Using (4.3), we s e e that t h e first c o n d i t i o n o n Ir is e q u i v a l e n t to requiring t h a t (I. p(r) _ P r o ) l O t b e an integer. Since this c o n d i t i o n h a s to h o l d f o r e a c h r in 1 < r < m , an i n d e x p o i n t I m u s t b e s u c h that (IP - p 0 ) O -1 is an i n t e g e r vector. Again, s u b s t i t u t i n g for Ir - P r f r o m (4.3) in t h e l e f t - h a n d - s i d e ine q u a l i t y of (4.2), we get 0 _< ( I . P (r) - P r o ) l O t or
PrO~Or <- l ' p ( r ) /Or.
For e a c h r in I _< r < m , we have an inequality o f this form. In m a t r i x notation, t h e entire s e t of m inequalities c a n b e w r i t t e n as PoO -~ < IPO -~.
4.2. INDF~ A N D ITERATION SPACES
85
Next, the right-hand-side inequality of (4.2) is equivalent to
(It - q r ) / O r < O. We leave it to the r e a d e r to s h o w that in matrix notation, the set of m inequalities o f this f o r m can be w r i t t e n as I Q O -1 < q 0 0 -1 That will c o m p l e t e the p r o o f o f the t h e o r e m .
[]
U n d e r certain conditions, the index s p a c e is c o m p l e t e l y characterized b y t h e inequalities (4.1), as we see below. C o r o l l a r y 1 A s s u m e that IOrl = 1 for each loop Lr. Then, a point I ~ Z m is an index point f o r L iffit satisfies the inequalities (4.1). PROOF. C o n d i t i o n (a) of the t h e o r e m is r e d u n d a n t in this case. Since 0 -1 is n o w an integer matrix, (IP - p 0 ) O -1 is n e c e s s a r i l y an integer vector. [] Note that w h e n t h e s t r i d e s Or are all positive, (4.1) can b e simplified to the s y s t e m :
Po < IP
~
IQ _< qo. We d e r i v e d t h e s e inequalities in Section 1.5.3 for t h e case w h e r e e a c h loop h a d a unit stride. We s a y t h a t L is a regular n e s t if P = Q. A regular n e s t I, is said to be rectangular if the limits have no variable parts, that is, if Pl, P2,. • •, Pro, ~/1, q 2 , - . . , qm are c o n s t a n t s . For a r e c t a n g u l a r nest, we have ( r e m e m b e r t h a t ~/m d e n o t e s the m x m identity matrix) Po = ( P l , p 2 . . . . ,Pro),
qo = ( q l , t / 2 , . . . , c / m ) ,
P=Q=~/m.
The s y s t e m of inequalities (4.1) simplifies to p o O -1 _< IPO -1 < q 0 0 -1
(4.4)
in the r e g u l a r case, a n d f u r t h e r to p o O -1 _< IO -1 _< q 0 0 -1 in t h e r e c t a n g u l a r case.
(4.5)
86
CHAPTER 4. PERFECT LOOP NESTS
Corollary 2 The index vector I a n d the iteration vector ~ o f the loop nest L are related by the equation IP = ~O + Po.
(4.6)
PROOF. It is easy to see that the integer vector in the first c o n d i t i o n of T h e o r e m 4.1 is really the iteration vector of the loop nest. In fact, the index and iteration variables of the loop Lr are c o n n e c t e d by Equation (2.3): Ir = Pr + ~r Or. Using (4.3), we can write I • p(r) _ PrO = ~rOr or
I . p(r) = ~irOr + Pro. The matrix f o r m of this set of m scalar e q u a t i o n s is (4.6). Since P and O are nonsingular, we can write I = ([O + p o ) P -1
(4.7)
~ = (IP - p o ) O - 1 .
(4.8)
and
The m a p p i n g I ~ ~ of the index space into the iteration space is clearly bijective; this t r a n s f o r m a t i o n is called loop normalization. The following t h e o r e m gives a d e s c r i p t i o n of the iteration space: T h e o r e m 4.2 A point ~ E Z m is an iteration point for the loop nest I.
iff i(~ < £1o,
t
(4.9)
where 1~ = O p - 1 Q O -1 ~ ~10 (qo - P o p - 1 Q ) O -1.
(4.10)
4.2. INDEX AND ITERATION SPACES
87
PROOF. The iteration p o i n t s are precisely the integer m - v e c t o r s ~ that satisfy an e q u a t i o n of the f o r m IP = ~O + P0, where I is an integer m - v e c t o r s u c h that p 0 0 -1 < IPO -1 ) IQO -1 < qoO -1. Eliminating I, we get the set of inequalities PoO -1 _< (iO + po)O -1 ) (~0 + p o ) P - I Q O -1 < qoO -1. Further simplification yields the set of inequalities
~(OP-1QO -1) -< ( q o - P 0 P - 1 Q ) O -1, which are exactly the inequalities in (4.9).
} []
The u p p e r b o u n d for ~1 can be taken to be an integer to c o n f o r m with the n o t a t i o n in the single a n d double-loop cases. Note that while the iteration space is equal to the set of integer p o i n t s in the polytope in R m defined by (4.9), the index space is a subset of the set of integer p o i n t s in the p o l y t o p e defined by (4.1). An extra c o n d i t i o n is n e e d e d to exactly identify the index space, n a m e l y that (IP - p0)O -1 is an integer vector. For this reason, it is easier to deal with the iteration space rather t h a n with the index space in d e p e n d e n c e analysis, even t h o u g h they are b o t h finite sets with the same n u m b e r of points. The matrix (~ is an m x m upper-triangular rational matrix with (1, 1 , . . . , 1) as the m a i n diagonal (Exercise 2), a n d ~lo is a rational m vector. We s o m e t i m e s refer to ~10 a n d (~ as the normalized final vector and matrix, respectively. T h e o r e m 4.3 In the sequential execution of the loop nest L, the index space is traversed in the direction of (lexicographically) increasing ~. Loop normalization produces a loop nest equivalent to I. where the loops have zero initial limits and unit strides.
88
CHAPTER 4. PERFECT L O O P N E S T S
PROOF. The first p a r t states t h a t an iteration H ( i ) of the loop n e s t is e x e c u t e d b e f o r e a n o t h e r i t e r a t i o n H (j), iff the c o r r e s p o n d i n g iteration values satisfy i -< ). We can prove it in the s a m e way we p r o v e d the double-loop v e r s i o n o f this fact in t h e p r o o f of T h e o r e m 3.1. For t h e s e c o n d part, let ~10 = (~1o, t ~ 2 0 . . . . , ~mO) and
O. =
1 0 0
;
-~21 --~32 -~31 .... --~m2 ~ml1 1 0
1
-~m3
;
;
i
.
Then, the given loop n e s t is equivalent to the p r o g r a m L1 : L2 : :
do i~1 = 0, ~10, 1 do ~2 = 0, ~2o + ~21~1, 1 :
Lm :
d o ~m = O, ~rnO + ~mll~l + " "" + ~m(rn-1)~m-1, 1 H((~O + p0)P -1)
enddo :
enddo enddo since the s e q u e n c e of iterations o f (L1, L2 . . . . . Lm) is the s a m e as that of (L1, L2 . . . . . Lm) in the sequential e x e c u t i o n of the two loop nests. We leave the details to the reader, t3 C o r o l l a r y 1 L o o p n o r m a l i z a t i o n c o n v e r t s a r e g u l a r loop n e s t into a r e c t a n g u l a r loop nest.
PROOF. We s h o w that the loop n e s t (L~,L2,... ,Lm) is r e c t a n g u l a r if t h e loop n e s t L is regular. Indeed, if L is regular, t h e n we have P = Q. That implies (~ = O p - 1 Q O -1 = O P - ~ P O -~ = O:/mO -~ = :/m, and ~o = (qo - P o p - 1 p ) O -~ = (qo - po)O -1 -- ( ( q l o - p l o ) / O1, (q20 - P20) / O2, . . . , (qmo - Pmo) / Om) . (4.11)
4.2. I N D ~ A N D ITERATION SPACES
89
The inequalities (4.9) t h e n b e c o m e 0 <-- ~ --< ( ( q l 0
--
PlO)/01,
( q 2 0 --
P20)[02 . . . . . (qrnO -- Prno)/Orn),
or
0 <_ ~r <-- (qrO - Pro)/Or
(1 < r _< m ) .
(4.12)
Thus, the loop Lr h a s c o n s t a n t limits 0 a n d (~/ro - Pro)/Or.
[]
E x a m p l e 4.1 C o n s i d e r the p r o g r a m L1 •
L~ : L3 "
do I1 = 6, 101, 2 do I2 = I1 + 10, I1 + 217, 3 do I3 = I1 + 312 - 2,211 - I2 + 1, - 2 H ( I 1 , I2, I3 )
enddo enddo enddo where H d e n o t e s a s e q u e n c e o f a s s i g n m e n t s t a t e m e n t s . Here we have Pl = PlO = 6, ql -- qxo = 101, P2 -= P20 + P2111 = 10 + 1 x I1, q 2 -----6/20 + q 2 1 1 1
=
217 + 1 x I 1 ,
P3 = P30 + P3111 + P3212 = - 2 + 1 × I~ + 312, 6/3 ---- q 3 0 + q 3 1 1 1 + q 3 2 1 2
=
1 + 2I~ - 1 × I 2 ,
01 = 2 , 0 2 = 3 , 0 3 = - 2 . Thus, for this loop nest, the initial a n d final v e c t o r s are P0 = (Pl0, P20, P30)
--
(6, 10, - 2 ) ,
(qlO, q20, q30)
=
(101,217, 1);
q 0 --
a n d t h e initial, final, a n d s t r i d e m a t r i c e s are p =
0 0
1 0
-P32 1
--
0 0
1 0
-3 1
,
90
CHAPTER 4. PERFECT LOOP NESTS
Q= o° ol - 132)= 0
0 0
02 0
0 03
=
0 0
ol 1),1 3 0
• -
We compute the inverses of O and P:
0 -1=
0 0
1/3 0
0 -1/2
,
P-I=
0 0
1 3 0 1
,
the normalized final vector ~lo = (qo - Pop-1Q)O -1 = (95/2,207/3, 55/2), and the normalized final matrix ~=Op-1QO
-] =
1 0 -3) 0 1 -6 0 0 1
.
By T h e o r e m 4.1, the index space of (L1,L2,L3) is included in the set of integer points of the polytope defined by the system poO -1 < IPO -1 IQO -1 < qoO -1,
]
I
that is, of the polytope defined by the system
(3, 10/3, 1) < I
1!2 -1/3 1/2~ 1/3 3/2| 0 -1/2,/ 1!2 -1/3 1) 1/3 -1/2 __<(i0112, 217/3,-I/2). 0 -1/2
91
4.2. INDEX AND ITERATION SPACES
After simplification and regrouping, these inequalities may be written
as 6 < I1 -< 101 ] I~ +10_
}
i h -<
It is the set of all integer points ~ > 0 such that
1 0 -3) ~
0
1 -6
0 0
< (95/2,207/3, 55/2),
1
that is, the set of all integer points that satisfy
0<171 < 4 7 t 0<~2 < 6 9 0--<1?3 -< 3171+6172 + 27.
(4.13)
By Corollary 2 to Theorem 4.1, the mapping between the index and iteration spaces is defined by the equation
IP = ~O + Po, that is, by the equation
I
0
1 -3
0
0
1
=~
0o)
0 3 0 0
0 -2
+ (6, 1 0 , - 2 ) .
92
CHAPTER 4. PERFECT LOOP NESTS
We can express I in terms of ~: I = ~ O P -1 + p 0 P -1
=~
2 2 0 3 0 0
8) 9 -2
+(6,16,52)
= ( 2 h + 6,2~1 + 3 8 + 1 6 , 8 h + 9~2 - 2 8 + 52),
(4.14)
and express ] in terms of I: ~ = iPO -1 _ poO -1
=I
1/2 0 0
-1/3 1/3 0
1/2 / 3/2 - (3, 10/3, 1) -1/2
= ( I 1 / 2 - 3, (-I1 + I 2 - 10)/3, (I1 + 3 1 2 - I 3 - 2)/2). It is not necessary to rewrite the program explicitly in terms of iteration variables for dependence analysis. However, we will do so to show how its form changes. To switch over from the index to the iteration variables, we replace I1, I2, and I3 by the corresponding functions of ~fl,~2, and ~3 from (4.14), and note the bounds on ~1, ~2, and ~3 given in (4.13). The given program is then equivalent to the loop nest (Theorem 4.3) L1 :
L2 : L3
:
dO~l = 0,47, 1 do~2 = 0,69, 1 do~3 = 0,3~1 + 6~2 + 27,1 H(2~1 + 6, 2~1 + 3~2 + 16, 8~1 + 9~2 - 28 + 52) enddo enddo enddo
Note that the regular double loop (L1, L2) has changed into the rectangular double loop (L1, L2) (Corollary 1 to Theorem 4.3). EXERCISES 4.2
1. What can you say in general about the matrices p-1 and Q-l?
4.2.
IND~
AND ITERATION SPACES
93
2. Show t h a t (~ is an u p p e r triangular rational matrix with the m a i n diagonal (1, 1 . . . . . 1) a n d det((~) = 1. Show also t h a t (~ is equal to the identity matrix iff the loop n e s t L is regular. 3o Let f. d e n o t e the loop n e s t o b t a i n e d by n o r m a l i z i n g the loops of L. Prove t h a t ~ is r e c t a n g u l a r iff L is regular.
4. Find P0, q0, P, Q, ~10, a n d (~ for the loop n e s t s of Exercise 3.2.3. Using (4.7) a n d (4o8), e x p r e s s the index variables in t e r m s of the iteration variables and vice versa. Find a d e s c r i p t i o n of the iteration s p a c e from (4.9). C o m p a r e with results y o u a l r e a d y got from working on this exercise in Chapter 3. 5. In Example 4.1, find the total n u m b e r of iteration (index) points of the loop nest, a n d the last iteration a n d index values in the sequential execution of the nest. 6. For the perfect loop n e s t s in the following p r o g r a m s , find • P0,q0, P,Q,~10, a n d S ; • the index a n d iteration spaces (theorems 4.1 a n d 4.2); • the total n u m b e r of iterations; • the last index a n d iteration values in the sequential execution of the program; • e x p r e s s i o n s of the i t e r a t i o n variables in t e r m s of the index variables; • an equivalent loop n e s t with unit strides: (a) L1 : L2 : L3 "-
d o I1 = 3,100, 2 d o 12 = 311 - 30, 311 + 400, 5 d o 13 = 511 - 212 + 106, 511 - 212 + 1, - 2 H(II,I2,I3) enddo enddo enddo
(b) L1 : L2 : L3 :
d o I1 = 3 , 1 0 0 , 1 d o I2 = 3,I1,1 d o 13 = 12,100, 1 H (I1, I2, I3) enddo enddo enddo
(c) L1 L2 L3 L4
d o I1 = 10, 200, 5 d o I2 = 211,11, - 1 d o 13 = 11 + 12,11, - 1 d o I4 = 312 + I3, 1000, 2 [-I(I1,I2,I3) enddo enddo enddo enddo
: " : :
94
CHAPTER 4. PERFECT LOOP NESTS
7. Show how you will compute the total number of iteration (index) points of the model loop nest L, and its last iteration and index values in sequential execution. 8. Find a matrix A such that the second vector inequality of (4.9) is equivalent to
| a A _<40A, where all coefficients are integral.
4.3 Dependence Concepts Consider two arbitrary a s s i g n m e n t s t a t e m e n t s S and T in the given loop nest L. The n o t a t i o n s S < T and S < T are defined as before, and it is obvious that < is a total order in the set of a s s i g n m e n t s t a t e m e n t s in the p r o g r a m . Let S(i) a n d T(j) d e n o t e the instances of S a n d T c o r r e s p o n d i n g to two index p o i n t s i a n d j, respectively. The distance f r o m S(i) to T(j) is defined to be the m - v e c t o r ) - ~, where ~ a n d ) are the iteration p o i n t s c o r r e s p o n d i n g to i and j. The following l e m m a is a direct generalization of L e m m a 3.2. It can easily be established by induction; we omit the proof. L e m m a 4.4 Consider two assignment statements S and T in the loop nest L. Let d denote the distance from an instance S(i) o f S to an instance T(j) o f T. In the sequential execution o f the program, S(i) is executed before T(j) iff one o f the following two conditions holds: (a) d >- 0, (b) d = 0 a n d S <
T.
The relation of dependence b e t w e e n s t a t e m e n t s is d e n o t e d by t~ a n d is defined as follows: For two s t a t e m e n t s S a n d T, we have S ~i T if there is an instance S(i) of S, an instance T(j) of T, a n d a m e m o r y location LM, such that 1. Both S(i) and T(j) reference (read or write) LM; 2. S(i) is e x e c u t e d before T(j) in the sequential execution of the program;
4.3. DEPENDENCE CONCEPTS
95
3. During sequential execution, the location 9V~is n o t written in the time p e r i o d f r o m the e n d of execution of S(i) to the beginning of execution of T(j). The s t a t e m e n t s S and T n e e d n o t be distinct, b u t Condition 2 requires that the instances S(i) a n d T(j) be distinct. The n o t a t i o n S 6 T is read as "T d e p e n d s on S." The dependence of T on S is the set of all pairs (S(i), T(j)) that satisfy the above conditions. Thus, we have S 6 T i f f the d e p e n d e n c e of T on S is n o n e m p t y . The g r a p h of the relation 6 is called the statement dependence graph of the given program. S u p p o s e that a s t a t e m e n t T d e p e n d s on a s t a t e m e n t S. Let S (i) and T(j) d e n o t e a pair of instances, such that they (together with s o m e m e m o r y location 9~4) satisfy the above conditions. There is at least one s u c h pair. We say that the instance T(j) depends on the instance S(i). Let d d e n o t e the distance f r o m S(i) to T(j)o Let ~ = sig(d) a n d g = lev(d). (For the definitions of "sig" and "lev," see Section 1.1.3.) Then, d is a distance vector, ~ a direction vector, and ~ a dependence level for the d e p e n d e n c e of T on S. It is s o m e t i m e s convenient to say that T d e p e n d s on S with a distance vector d or a direction vector ~r, a n d at a level ~?. Since S (i) m u s t be executed before T(j), we have the following theo r e m as a direct c o n s e q u e n c e of Lemma 4.4. The next two corollaries follow f r o m the definitions j u s t given. T h e o r e m 4:5 I f d is a distance vector for the dependence o f statement T on statement S, then d >- O. The equality d = 0 is possible only if S- O. The equality ~r = 0 is possible only if S
96
C H A P T E R 4. PERFECT L O O P N E S T S
such that ~ < r ) and T(j) d e p e n d s o n S ( i ) , 1 < r < rn. The loopi n d e p e n d e n t part consists of all instance pairs (S(i), T(i)) such that T(i) d e p e n d s on S(i). The following s t a t e m e n t s are equivalent in the context of d e p e n d e n c e of a s t a t e m e n t T on a s t a t e m e n t S: 1. The d e p e n d e n c e of T on S is carried by the loop Lr; 2. There is a pair of instances S(i) and TO), such that ~ -r O; 4. There is a d e p e n d e n c e direction vector ¢r with ~ >-r O; 5. T d e p e n d s on S at level r . As in Section 2.3, we can define flow, anti-, output, and input dependences; indirect dependence; d e p e n d e n c e caused by a pair of variables; etc. When needed, we also associate a distance or a direction vector, or a level of d e p e n d e n c e with a particular type of dependence, or even with a particular pair of variables. Example 4.2 In this example, we study d e p e n d e n c e between statem e n t s and their instances in the triple loop doI~ = 1,4,2 do I2 = 10, 5, - 3 doI3 = 1,3,1
LI : L2 : L3 : S:
T:
X(I1 + I2 + 13) . . . . ....... X(I2 + I3) • • •
enddo enddo enddo
In Table 4.1, we have shown the iterations of the loop nest obtained by unrolling the loops. From this table, the d e p e n d e n c e structure between s t a t e m e n t instances b e c o m e s clear; that structure is p r e s e n t e d in Table 4.2. (This table does not include input dependence.) Note that the d e p e n d e n c e distance f r o m one s t a t e m e n t instance to another m u s t be c o m p u t e d f r o m the corresponding pair of iteration points shown in Table 4.1. From Table 4.2, we can easily derive the dependence information b e t w e e n s t a t e m e n t s themselves. We see that
4.3. DEPENDENCE CONCEPTS
Iteration
97
Point
(0,0,0)
Iteration S(1, 10,1)
( 0 , 0 , 1)
of loop Nest
:
X(12)
T ( 1 , 1 0 , 1) :
.......
S(1,10,2)
:
X(13)
T(1, 10,2)
:
-
(0,0,2)
S(1,10,3)
:
X(14)
T(1, 10,3)
:
( 0 , 1, 0 )
S ( 1 , 7, 1) :
(0, 1, 1)
s ( 1 , 7, 2) :
(0,1,2)
S(1,7,3)
:
T(1,7,3)
:
S(3, 10,1)
.... X(12)... .... X(13).--
....
.......
X(8)
• ••
....... X(ll)
X(9)
• • •
.... X(10)
:
X(14)
(1,0,2) (1, 1,0)
S(3, 10,2)
:
X(15)
T(3, 10,2)
:
.......
S(3, 10,3)
:
X(16)
T(3, 10, 3) :
.......
S(3,7,
1) :
T ( 3 , 7, 1) :
• • •
X(12)
• • •
X(13)
• • •
.... .... ....
.......
X(8)
• ••
X(9)
• ••
S(3,7,2)
T ( 3 , 7, 2 ) :
.......
( 1 , 1, 2 )
S ( 3 , 7, 3 ) :
X(13)
T ( 3 , 7, 3) :
.......
X(10)
4.2 after
unrolling.
4.1:
Loop
nest
of Example
X(12)
X(ll)
( 1 , 1, 1)
Table
:
X(ll)
• ••
....
T ( 3 , 1 0 , 1) : (1,0,1)
• • •
X(lO) ....
T ( 1 , 7, 2 ) :
(1,0,0)
X(ll)
X(9)
T ( 1 , 7, 1) :
....
.... .... •• •
CHAPTER 4. PERFECT LOOP NESTS
98
From Instance
To Instance
Type
Distance
S(1, 10, 1) S(1, 10, 1) S(1, 10, 1)
T(1,10,2) T(3,10,2) S(3,7,2)
Flow Flow Output
(o,o, 1) (1,o, 1) (1,1,1)
T(1, 10, 1) S(1, 7,3) T(3, 10, 1)
S(1,7,3) T(3, 10, 1) S(3,7,1)
AntiFlow Anti-
S(1, 7,3) S(1, 10,2) T(1, 10,3)
S(3, 7, 1) T(1, 10,3) S(3, 7,3)
S(1, 10,2) S(1, 10,3) S(1, 7,1) S(1,7,1) S(1, 7,2) S(1, 7,2)
Direction I Level (0,0,1) 3 (1,0,1) (1,1,1)
1 1
(0,1,2) (1,-1,-2) (0,1,0)
(0,1, 1)
2
Output Flow Anti-
(1,0,-2) (0,0,1) (1,1,0)
(1,0, - 1 )
1
(0,0,1)
3
(1,1,0)
1
S(3,7,3) S(3, 10, 1) T(1,7,2)
Output Output Flow
(1,1,1) (1,0,-2) (0,0,1)
(1,1,1)
1
(1,0, -1) (0,0,1)
1 3
T(3, 7,2) T(1,7,3) T(3, 7, 3)
Flow Flow Flow
(1,o,1) (o,o,1) (1,o,1)
(1,0,1) (0,0,1) (1,0,1)
1 3 1
(1,-1, -1)
1
(0,1, o)
2
Table 4.2: D e p e n d e n c e s t r u c t u r e of (L1, L2, L3) in Example 4.2.
4.3.
DEPENDENCE
99
CONCEPTS
.
T h e r e is a f l o w d e p e n d e n c e o f T o n So T h e d i s t a n c e v e c t o r s o f t h i s d e p e n d e n c e are ( 0 , 0 , 1), (1,0, 1), a n d ( 1 , - 1 , - 2 ) . The d i r e c t i o n v e c t o r s o f t h i s d e p e n d e n c e are (0, 0, 1), (1, 0, 1), a n d (1, - 1 , - 1 ) . T h e d e p e n d e n c e levels are 1 a n d 3. No p a r t o f t h i s d e p e n d e n c e is c a r r i e d b y t h e l o o p L2.
.
T h e r e is a n a n t i - d e p e n d e n c e o f S o n T. T h e d i s t a n c e v e c t o r s o f t h i s d e p e n d e n c e are (0, 1,2), (0, 1,0), a n d (1, 1,0). T h e direct i o n v e c t o r s are (0, 1, 1), (0, 1, 0), a n d (1, 1, 0). T h e d e p e n d e n c e levels are 1 a n d 2. No p a r t o f t h i s d e p e n d e n c e is c a r r i e d b y L3.
. T h e r e is a n o u t p u t - d e p e n d e n c e o f S o n itself. T h e d i s t a n c e vect o r s o f t h i s d e p e n d e n c e are ( 1, 1, 1) a n d (1, 0, - 2). T h e d i r e c t i o n v e c t o r s are ( 1, 1, 1) a n d (1, 0, - 1). T h e o n l y d e p e n d e n c e level is 1. No p a r t o f t h i s d e p e n d e n c e is c a r r i e d b y e i t h e r L2 or L3. 4. T h e r e is n o l o o p - i n d e p e n d e n t d e p e n d e n c e in t h i s e x a m p l e . EXERCISES 4.3
1. Study the dependence structure of the following programs: (a) LI:
doll =0,8,2 do I2 = 0,8,2 doI3 = 0,8,2
L2 :
L3 : S : T:
X(II,I2,I3) .... ....... X(Ix -
1,I2 + 1,I3) • • •
enddo enddo enddo doI1 = 1,4,1 do I2 = 10, I1, -3 doI3 = I1,I2, 1
(b) LI: L2 : L3: S : T :
(c) L1 : L2 : L3 : S : T :
X ( I I + 12 + 13) . . . . ....... X ( I 1 + 12 +
13) • - •
enddo enddo enddo do I~ = 0, 10,3 do I2 = ]1, 10, 4 do I3 = I2,I1, -1 X ( I ~ , I 2 + 1,I3) . . . . .......
enddo enddo enddo
X(I3,I2,I~
+
I3) • • •
1O0
CHAPTER 4. PERFECT LOOP NESTS
4.4 Subscript Representation We have a s s u m e d that the input and output variables of s t a t e m e n t s in the given p r o g r a m are array-elements with subscripts affine in the index variables of appropriate loops. (Note that this description includes scalars.) Consider a variable of a s t a t e m e n t S in our model p r o g r a m L, that is an element of an n-dimensional array X. It has the form X(allI1
+
• • • +
a m l I m + aOl,... ,alnI1 + • • • + a m n l m + aOn),
where the coefficients are all integer constants. In matrix notation, this variable can be written as X(IA + ao), where
A =
all ~21 .
a12 ~22 .
a~rnl
a~m2
"'"
aln
• • • "..
~2n :
• • •
~rnn
I
and ao = (aol, ao2,... , aon). The coefficient matrix of this element of X is A, and its coefficient vector is ao. We can write by (4.7)
IA + ao = (|O + po)P-1A + ao = ~(Op-1A) + (pop-1A + ao) ^ ^
= IA + rio, where /~ = Op-1A t rio = PoP -1A + ao.
(4.15)
We refer to A. and rio as the normalized coefficient matrix and the normalized coefficient vector, respectively, and call X ( ~ , + ~io) the normalized f o r m of the program variable X(IA + ao). Example 4.3 For the loop nest of Example 4.1, we already f o u n d that p-l=
013 001
and
0 =
03 00-2
0
Consider now the same nest w~th one particular statement:
.
4.4. SUBSCRIPT REPRESENTATION
L1 " L2 : L3 :
101
6, 101, 2 d o I 2 = 11 + 10, I1 + 217,3 d o I 3 = I 1 + 312 - 2 , 2 1 1 - I 2 + 1 , - 2 S: X(211 +13 + 1,-I1 + 312 + I3 - 2,-412) = 1 enddo enddo enddo do
I1 =
Since (211 + I3 + 1, -11 + 312 + 13 - 2, -412) =
(I1, I2, I3)
0
3
-4
1
1
0
+ (1,-2,0),
the coefficient m a t r i x A a n d v e c t o r ao for the a r r a y - e l e m e n t in this s t a t e m e n t are given b y 2 0 1
A =
-1 3 1
0) -4 0
and
ao = ( 1 , - 2 , 0 ) .
We c o m p u t e the n o r m a l i z e d coefficient m a t r i x a n d the n o r m a l i z e d coefficient vector: ~ = Op-1A =
=
0 0
3 0
9 -2
0 -2 18 -2
0 0 -12 0
1 0
3
0
3
-4
1
1
1
0
,
and rio = PoP - 1 A + ao : (6, 1 0 , - 2 ) = (65, 92, - 6 4 ) .
0 0
1 0
3 1
0 1
3 1
-4 0
+(1,-2,0)
CHAPTER 4. PERFECTLOOP NESTS
102
Thus, in t e r m s of iteration variables, the array-element of S b e c o m e s X(12~1 + 9~2 - 2i~3 + 65, 12h + 18~2 - 2 8 + 9 2 , - 8 h - 128 - 64). This is the normalized f o r m of the p r o g r a m variable. EXERCISES 4.4
1. Find the normalized forms of the program variables in the loop nests of Example 3.2, Exercise 3.3.3, Example 3.5, Exercise 4.3.1.
4.5
Dependence Problem
Consider two s t a t e m e n t s S and T in the model p r o g r a m L. Suppose S has a p r o g r a m variable of the f o r m X(IA + ao) and T a p r o g r a m variable of the f o r m X(IB + bo), w h e r e X is an n-dimensional array, A and B are t n x n integer matrices, and ao and bo are integer n-vectors. First, we describe the d e p e n d e n c e problem in terms of index variables. The instance of the variable X(IA + ao) for an index point i of the loop nest L is X(iA + a0). The instance of X(IB + bo) for an index point j is X(jB + bo). These two instances r e p r e s e n t the same m e m o r y location iff iA + a0 = jB + bo, that is, iff iA - j B = bo - ao.
(4.16)
This matrix equation is the dependence equation for X(IA + ao) and X(IB + bo) in terms of index values. It consists of n scalar equations, and there are 2rn integer variables: the t n elements of i and the t n elements of j. Since i and j are index points of L, they m u s t satisfy the following conditions (Theorem 4.1): p 0 0 -1 _< i p o -1
iQO -1 < qoO -1 poO -1 < j p o -1 j Q O -1 < qoO -1,
(4.17)
and (iP p0)O -1 is an integer vector ~ (jP - Po)O -1 is an integer vector. -
(4.18)
103
4.5. DEPENDENCE PROBLEM
These conditions are the dependence constraints in terms o f index values. Next, we state the d e p e n d e n c e problem in terms of iteration variables. Let X ( ~ + ;io) and X ( ~ + l~o) denote the normalized forms of the program variables X(IA + a0) and X(IB + bo), respectively (see Section 4.4). The matrices ~ and ]~, and the vectors ~0 and l~o are defined by equations (4.15): ,~ = o p - x A ~o = P o P - 1 A + ao 1~ = O p - 1 B i~o = p o p - 1 B + bo.
The instance of the variable X ( ~ , + ~o) for an iteration point i of the loop nest L is X(L~ + ~o). The instance of X(~I~ + i~o) for an iteration point ) is X()l~+l~o). These two instances represent the same memory location iff i.~ + ~o = ~]~ + I~o, that is, iff i,~ - )]~ = 1~o - ~o.
(4.19)
The matrix equation (4.19) is the dependence equation for the variables X(~.h, + ~o) and X(~I~ + I~o) in terms o f iteration values. 1 It consists of n scalar equations, and there are 2 m integer variables: the m elements of i and the m elements of ). Since ~ and ~ are iteration points of L, they must satisfy the inequalities (Theorem 4.2): O_
}
x 80
(4.20)
) 6 -< 80, where the matrix (~ and the vector ~lo are defined by (4.10). These inequalities are the dependence constraints for the variables X ( ~ + f i o ) and X(~]~ + I~o) in terms o f iteration values. The main purpose of introducing iteration variables was to eliminate conditions like (4.18). When the stride of each loop is one, the lIn chapters 1-3, we first derived the dependence equation in index values, and then expressed the index values in terms of iteration values to get the equation in iteration values. Here also we can derive (4.19) from (4.16) by using (4.7); see Exercise 1.
104
CHAPTER 4. PERFECT LOOP NESTS
stride matrix O is equal to the identity matrix, a n d (4.18) is satisfied automatically. In that case, we can a p p r o a c h the d e p e n d e n c e p r o b l e m t h r o u g h its d e s c r i p t i o n in t e r m s of index variables. The c o n d i t i o n s in (4.18) d i s a p p e a r also w h e n the stride of each loop is + 1. However, we still p r e f e r to w o r k with iteration variables in this case, to avoid the fact that w h e n the stride of a loop is - 1 , its index values f o r m a decreasing s e q u e n c e in the sequential execution. It is possible to do the analysis in a f r a m e w o r k where we use index variables for loops with unit strides a n d iteration variables for all other loops. For the m o s t part, we use either index variables or iteration variables for all loops in a given p r o b l e m , a n d u s e iteration variables in formal statements. The generalizations of t h e o r e m s 2.6 a n d 2.7 to a perfect loop n e s t are stated below as t h e o r e m s 4.6 a n d 4.7. We also a d d here two corollaries to T h e o r e m 4.6, dealing with direction vectors a n d d e p e n d e n c e levels. (Direction vectors a n d d e p e n d e n c e levels were n o t d i s c u s s e d in Chapter 2, since they are n o t very significant for single loops.) The p r o o f s are simple a n d left to the reader. T h e o r e m 4.6 Consider two statements S and T in the loop nest L. Let X ( I A + a 0 ) denote a variable o f S a n d X ( IB + bo ) a variable o f T , where X is an n-dimensional array. If these variables cause a dependence between S a n d T, then Equation (4.19) has an integer solution (~,)) that satisfies the constraints (4.20). In each o f the following special cases, the solution satisfies an additional condition: (a) I f S < T a n d S ~ T, then ~ <_~; (b) If S = T a n d S 6 S, then ~ < ~; (c) If S < T a n d T 6 S, then ~ >- ~. Corollary 1 If s t a t e m e n t T depends on s t a t e m e n t S with a direction vector ~r = (o-1, ~r2 . . . . . ~m), then Equation (4.19) has an integer solution ({,)) that satisfies (4.20) and the additional condition sig(3r - fr) = tit w h e r e ~ = ( ~1, ~2 . . . . . ~rn) a n d )
(1 -< r < rrt),
= ( 3 1 , 3 2 , . . . , 3m).
(4.21)
4.5. DEPENDENCE PROBLEM
105
If (~r = 0, t h e n t h e extra e q u a t i o n ~r = ) r gets a d d e d to the s y s t e m (4.19). We c a n use it to r e d u c e the n u m b e r of variables by one. If O'r = 1, we get a n extra inequality of the f o r m ~r < ) r , w h i c h is equivalent to ~r < 3r - 1 since we are dealing w i t h integer variables. If O-r = - 1 , t h e n t h e extra inequality is 2r > 3r + 1. By Corollary 1 to T h e o r e m 4.5, a s t a t e m e n t m a y d e p e n d o n itself only w i t h a positive ()- O) d i r e c t i o n vector. Next, take two s t a t e m e n t s S a n d T s u c h t h a t S < T. A d i r e c t i o n v e c t o r for a d e p e n d e n c e o f S on T m u s t be positive, while a d i r e c t i o n v e c t o r for a d e p e n d e n c e of T o n S c a n be n o n n e g a t i v e (_> 0). If we always associate the notations i a n d i with S, a n d j a n d ) with T, t h e n a d i r e c t i o n v e c t o r for a d e p e n d e n c e o f T o n S has the f o r m (sig()l - ~1),..., sig()ra - 2m)), a n d a d i r e c t i o n v e c t o r for a d e p e n d e n c e of S o n T has the f o r m (sig(fx - . ~ 1 ) . . . . , sig(fm - j m ) ) . The following r e s u l t is a n i m m e d i a t e c o n s e q u e n c e o f Corollary 1, since d e p e n d e n c e at a level 4? is equivalent to d e p e n d e n c e w i t h a dir e c t i o n v e c t o r o f t h e f o r m (,0, 0 . . . . ,0,, 1 , . , . , . . . , *). ~'-1 C o r o l l a r y 2 If statement T depends on statement S at a level 4?, then Equation (4.19) has an integer solution ( i , ) ) that satisfies (4.20) and the additional conditions: 21 = ) 1 , 2 2 = ) 2 . . . . .
2~-1 = ) ~ - 1 , ~
~ ) ~ -- 1.
By Corollary 2 to T h e o r e m 4.5, the possible levels are 1, 2 , . . . , m + 1 if S < T, a n d 1, 2 , . . . , vrt otherwise. T h e o r e m 4.7 Suppose that Equation (4.19) has an integer solution (i, 3) that satisfies the constraints (4.20). -
-
(a) If ~ -< 3, then S ~ T. (b) If ~ >- 3, then T ~ S. -
-
(c) If ~ = ) and S < T, then S ~ T. T h e o r e m s 4.6 a n d 4.7 c a n be r e s t a t e d to deal with d e p e n d e n c e b e t w e e n s t a t e m e n t instances. We will n o t d i s t i n g u i s h b e t w e e n dep e n d e n c e a n d i n d i r e c t d e p e n d e n c e . If t h e r e is a n integer s o l u t i o n to
106
C H A P T E R 4. P E R F E C T L O O P N E S T S
(4.19) satisfying (4.20) a n d a d d i t i o n a l conditions, if any, t h e n we will a s s u m e that the c o r r e s p o n d i n g d e p e n d e n c e exists. Given two s t a t e m e n t s S a n d T in the n e s t L, a variable X ( I A + ao) of S, a n d a variable X(IB + b0) o f T, the b a s i c d e p e n d e n c e p r o b l e m is to decide if t h e r e is a d e p e n d e n c e b e t w e e n S a n d T, t h a t is, if t h e r e is an integer s o l u t i o n to E q u a t i o n (4.19) t h a t satisfies t h e c o n s t r a i n t s (4.20). The e x t e n d e d d e p e n d e n c e p r o b l e m is to find for e a c h d i r e c t i o n v e c t o r tr, the set of all i n t e g e r s o l u t i o n s to (4.19) satisfying (4.20) a n d t h e additional c o n d i t i o n s (4.21). We also w a n t to k n o w the set of all distance v e c t o r s c o r r e s p o n d i n g to e a c h particular d i r e c t i o n v e c t o r (the c o r r e s p o n d i n g level is easily found). This i n f o r m a t i o n can be s u m m a r i z e d a n d u s e d to derive qualitative s t a t e m e n t s , as n e e d e d . E x a m p l e 4.4 Consider the d o u b l e loop do I1 = 5 , 2 0 0 , 3 do I2 = I1,200, 1 X(211 - 1,I2) . . . . ....... X (I1, 3I~ ) • • • enddo enddo
L1 " L2 : S:
T:
For this loop nest, the initial a n d final v e c t o r s are P0 = (5,0),
qo = (200,200),
a n d the initial, final, a n d stride m a t r i c e s are P=
(1-I) 0
1
'
(i O)
Q=
0
1
3 0
O=
'
We c o m p u t e the inverses o f O a n d P:
=
0
I
'
=
0
1
a n d the n o r m a l i z e d final v e c t o r a n d m a t r i x by (4.10): ~10 = (q0 - P 0 p - 1 Q ) O-1 = (65,195), ~ = Op-1QO-1 =
0
1
"
'
0
107
4.5. DEPENDENCE PROBLEM
The coefficient matrix A and the coefficient vector ao for the array element of s t a t e m e n t S are given by A=
0
1
and
ao=(-1,O).
The coefficient matrix B and the coefficient vector bo for the array element of s t a t e m e n t T are given by B=
0
0
and
bo=(O,O).
We c o m p u t e the normalized coefficient matrices and the normalized coefficient vectors by (4.15): "~ = O p - 1 A =
0
1
'
~=OP-1B=
0
0
'
~o = PoP -1A + ao = (9, 5), !~o = p o P - 1 B + b o = (5, 15). The d e p e n d e n c e equation (4.19) is
(~1,~2)
0
1
--(.~1,.~2)
0 0
=(-4,10),
which is equivalent to the system: 6il - 3j1 = - 4 3i~ + i2 - 9j~ = 10.
(4.22) (4.23)
This system has two equations in three variables. The d e p e n d e n c e constraints (4.20) become (0, O) ~ (~1, i2) (f~'~) (0,0) --< ()~,)~)
0 1
~ (65,195)
~ (65,195),
CHAPTER 4. PERFECT LOOP NESTS
108
w h i c h r e d u c e to the s y s t e m : f 0O _< ~ -< < 65 < ~-2 195 - 3~.~ 0<,~1 _<65 O<jz-<195-3j~. It is clear that t h e r e is n o integer s o l u t i o n to the single equation (4.22), since gcd(6, 3) d o e s n o t divide 4 ( T h e o r e m 2.8). Hence, t h e r e is no i n t e g e r s o l u t i o n to the s y s t e m o f e q u a t i o n s (4.22)-(4.23). Therefore, t h e r e is no d e p e n d e n c e b e t w e e n s t a t e m e n t s S a n d T (Theo r e m 4.6). EXERCISES 4.5
1. Derive (4.19) from (4.16) by expressing i and j in terms of i and ). 2. Prove theorems 4.6 and 4.7. Restate them in terms of dependence between S(i) and T()). 3. In Example 4.4, state the dependence equation and constraints in terms of index values. 4. Rework Example 4.4 after changing the loop headers to L~ :
L2 :
d o 11 = 100, 1, - 1
do I2 = 211 + 4, 1011 - 12, 2
5. Find the dependence equation and constraints (in terms of iteration values) for the dependence problem between statements S and T, S and S, and T and T in the programs of Example 4.2, Exercise 4.3.1(a)-(c), and Example 4.3. (In Example 4.3, take T = S.)
4.6
Special Cases
In the last section, w e s a w t h a t the d e p e n d e n c e p r o b l e m b e t w e e n t w o s t a t e m e n t s S a n d T is d e s c r i b e d b y the e q u a t i o n L~ - ) ~ = !~o - rio
(4.24)
and the constraints
-< 4o o_<3 ) 0 < 40,
]
(4.25)
4.6. SPECIAL CASES
109
where X ( ~ + r0) a n d X(~]~ + i~0) are the n o r m a l i z e d f o r m s of the p r o g r a m variables of S a n d T that cause the d e p e n d e n c e . The norm a l i z e d final matrix (~ a n d vector ~10 are defined in (4.10). In this section, we c o n s i d e r three special cases where the d e p e n d e n c e problem b e c o m e s s o m e w h a t simpler. The first two cases are very c o m m o n in real p r o g r a m s ; they are i n c l u d e d in the third case that deals with a m o r e general situation.
Special Case 1. Equal Coefficient Matrices Let/~ = ~. By (4.15), we have ~ = O p - 1 A a n d l~ = Op-1B, so that this is the s a m e as requiring A = B. The variables X (311 + 12 + 5,11 + 212 + 3 ) and X(311 + 12 + 1,I1 + 2•2) in a double loop provide an example for this case, since the coefficient matrix for either variable is (13 ~1). In this case, any d e p e n d e n c e c a u s e d by the two p r o g r a m variables X ( ~ + rio) a n d X ( ~ + l ~ o ) is said to be uniform, a n d each d e p e n d e n c e distance of a u n i f o r m d e p e n d e n c e is said to be a uniform distance. The "uniformity" here refers to the fact that a distance vector of a u n i f o r m d e p e n d e n c e is i n d e p e n d e n t of iteration (index) points. This is s t a t e d m o r e formally in the following theorem.
Theorem 4.8 A s s u m e that a variable X ( ~ + to) o f a s t a t e m e n t S a n d a variable X(]~ + ~o) o f a s t a t e m e n t T cause a uniform dependence o f T on S in the loop nest L. Let do denote a distance vector o f this dependence. Ifio a n d 3o are two iteration points such that 3 o - io = do, then the instance T(j0) corresponding to 3o depends on the instance S(i0) corresponding to i0. 2 PROOF. Since the d e p e n d e n c e c a u s e d by the two variables is uniform, we have ~ = !~ by definition. The d e p e n d e n c e e q u a t i o n for the problem, Equation (4.24), b e c o m e s (3 - i)h
= r o - tSo,
or d~. = ro - l~o,
(4.26)
2Remember that we label statement instances by index values. Here, io and j0 denote the index points of L corresponding to the iteration points io and 30, respectively.
110
CHAPTER 4. PERFECT LOOP NESTS
w h e r e d = ) - i. Thus, all d e p e n d e n c e d i s t a n c e s in the u n i f o r m dep e n d e n c e case satisfy Equation (4.26). In particular, the d i s t a n c e do satisfies (4.26). This m e a n s the pair of iteration p o i n t s ( i o , ) o ) satisfies the d e p e n d e n c e equation, since (30 - io)~, = do~, = 80 - !~o. The s a m e pair obviously satisfies the d e p e n d e n c e c o n s t r a i n t s (4.25), since t h e y are i t e r a t i o n p o i n t s of the loop nest. F u r t h e r m o r e , we have io -< 3o (with io = ~o possible only if S < T), since do - 0 (with do = 0 possible o n l y if S < T), by T h e o r e m 4.5. Hence, the i n s t a n c e TOo) of T d e p e n d s o n the i n s t a n c e S(io) of S. (See T h e o r e m 4.7 a n d the c o m m e n t s following it.) ~3 The c o n v e r s e to T h e o r e m 4.8 is t r u e in the ideal situation w h e n t h e iteration space of I. is the entire set {~ ~ Z m : ~ > 0}. 4.9 Assume that the iteration space o f L consists o f all integer m-vectors ~ > O. Suppose that a variable X ( ~ , + ~o ) o f statement S and a variable X ( ~ + ~o) o f statement T cause a dependence o f T on S. Let d denote a distance vector o f the dependence. I f an instance T(j) o f T depends on an instance S(i) o f S w h e n e v e r 3 = ~ + d, then this dependence is uniform. Theorem
PROOF. Take a n y m - v e c t o r i > O. Let 3 = i + d. Then, i a n d 3 are
iteration points of the loop nest L. Let i and j denote the index points of L corresponding to the iteration points i and 3, respectively. By hypothesis, T(j) depends on S(i). Hence, the pair ( i , ) ) satisfies the d e p e n d e n c e e q u a t i o n (4.24), so that we have
iA - (i + d)~ = fi0 - fi0 or
i(~
- l~) = dl~ + ~ o - ~o.
This m e a n s i ( ~ - l~) has the s a m e value for all ~ >_ O. T h a t is possible only if ~, = 1~ (why?). Hence, the d e p e n d e n c e of T o n S is u n i f o r m . [] In the u n i f o r m d e p e n d e n c e case, the d e p e n d e n c e e q u a t i o n (4.24) r e d u c e s to Equation (4.26) w h e r e d = 3 - ~. This is a m a j o r simplification, since the original e q u a t i o n is a s y s t e m of n e q u a t i o n s in 2 m
4.6. SPECIAL CASES
111
variables, w h e r e a s (4.26) is a s y s t e m of n e q u a t i o n s in m variables. Letting ) = i + d in (4.25), we write the d e p e n d e n c e c o n s t r a i n t s as
0_
] < fl0
O_
(4.27)
If the
loop n e s t L is regular, t h e n (~ is equal to the i d e n t i t y m a t r i x (see Exercise 4.2.2), a n d (4.27) simplifies to (explain) -~10 < d < ~to ~ max(O, - d ) < i < min(fl0, ~10 - d). )
(4.28)
T h e r e is a u n i q u e real s o l u t i o n to (4.26) if rank(,~) = m . (See sections 1.5.4 a n d 1.5.5.) If n = r n a n d ~ = I~ is a n rn x m n o n s i n g u l a r matrix, t h e n this u n i q u e s o l u t i o n is d = (rio - ~ 0 ) ~ - 1 .
(4.29)
If, in addition, the loop n e s t I. is regular, t h e n the u n i f o r m d e p e n d e n c e p r o b l e m b e c o m e s a simple p r o b l e m , since t h e n the s y s t e m of inequalities (4.28) is trivial to solve. T h e r e will be a d e p e n d e n c e bet w e e n s t a t e m e n t s S a n d T, iff d given by (4.29) is an i n t e g e r v e c t o r satisfying t h e first set of inequalities of (4.28), a n d t h e r e is an integer v e c t o r ~ satisfying t h e s e c o n d set. If d >- 0, or if d = 0 a n d S < T, t h e n this d e p e n d e n c e is a d e p e n d e n c e of T o n S. If d -< O, t h e n it is a d e p e n d e n c e of S o n T. R e m e m b e r t h a t o n e m a y have a u n i f o r m d e p e n d e n c e w i t h m a n y d i s t a n c e v e c t o r s (Example 1.5.10), a n d a u n i q u e d i s t a n c e v e c t o r is not n e c e s s a r i l y u n i f o r m (Exercise 3). E x a m p l e 4.5 C o n s i d e r t h e possibility of d e p e n d e n c e b e t w e e n statem e n t s S a n d T in the loop nest:
do I1 = O, 200, 1 do I2 = O, 100 + I~, 1
L1 • L2 :
S: T:
X(311 +12 + 5,I1 + 212 + 3) . . . . ....... X(311 + I 2 + 1 , I t + 2 1 2 ) ' ' "
enddo enddo
112
CHAPTER 4. PERFECT LOOP NESTS
(Here, the index and iteration variables are identical.) The coefficient matrix A of the array element of S and the coefficient matrix B of the array element of T are given by 1 2
and
B=
1 2
"
Since A = B, if there is any d e p e n d e n c e it will be uniform. The dependence equation (4.26) is (d1,d2)(3 1
1 ) = (4,3).
2
There is a unique solution: (d~ 3d 2 ) = '( 4 ' 3 ) (
1 21) -1
= (4,3)
-1/5
?/5
= (1, 1). Since it is an integer vector, we see next if the d e p e n d e n c e constraints are satisfied. The constraints for this loop nest are 0 < il 0
< < < <
200 1 100+il 200 100 + j l .
t
Writing ( j l , j 2 ) = (i1,i2) + (all,d2) = (il + 1, i2 + 1), we get the system 0 _< i~ 0 < i2 0
_< 200 ] < 100 + il <200 < 101 + il,
t
which reduces to 0 < il < 199 O
~
4.6. SPECIAL CASES
113
Thus, T ( i l + 1, j l + 1) d e p e n d s o n S ( i l , j l ) for e a c h i n d e x point (il, i2) t h a t satisfies the last two inequalities, a n d this d e p e n d e n c e is u n i f o r m with a u n i q u e d i s t a n c e v e c t o r (1, 1).
Special Case 2. Diagonal (Normalized) Coefficient Matrices Let ~ a n d l~ be m x m diagonal matrices. (Thus, n = m a n d X is a n m - d i m e n s i o n a l array.) Since ~ = o p - x A a n d l~ = Op-1B, we get this case iff p - 1 A a n d p-1B are m x m diagonal matrices. In a r e c t a n g u l a r loop nest, t h e c o n d i t i o n is equivalent to requiring that A a n d B be diagonal (since t h e n P is t h e i d e n t i t y matrix). An e x a m p l e of this case is p r o v i d e d b y the variables X(211 - 1,312 - 1, I3) a n d X(211 + 1, I2 + 2, 5) in a r e c t a n g u l a r triple l ° ° P ' since their c°efficient m a t r i c e s are ( i o3°! ) a n d ( ~ 1° i ) o , respectively. Let ~ = E = diag(ex, e2 . . . . . era) l~ = F = d i a g ( f l , f 2 , . . . , f r o ) . We a s s u m e t h a t for e a c h r in 1 < r < m , at least one o f er a n d f r is n o n z e r o . 3 The d e p e n d e n c e e q u a t i o n (4.24) can be w r i t t e n as iE - ) F = I~o - rio.
(4.30)
It is equivalent to a s y s t e m of m m u t u a l l y i n d e p e n d e n t two-variable scalar equations: e r f r - f r 3 r = [~rO - dr0
(1 _< r _< m ) .
(4.31)
T h e s e e q u a t i o n s can be solved easily b y A l g o r i t h m 2.1 a n d Theor e m 2.8. For e a c h r , apply A l g o r i t h m 2.1 to find g r = g c d ( e r , f r ) , a n d two integers it0 a n d 3to s u c h t h a t er irO - f r ) r O = Y r .
By T h e o r e m 2.8, the g e n e r a l s o l u t i o n ( ~ r , ) r ) to (4.31) can be p u t in the f o r m ( $ r , j r ) = (arO + a r t r , fifo + f i r t r ) , 3This restriction is not really necessary; its only purpose is to avoid trivial equations of the form "0 = constant" in this discussion.
114
CHAPTER 4. PERFECT LOOP NESTS
where a r O = (/~rO -- t~rO) ~rO/gr,
ar = fr/gr,
f r O = (/~rO -- a r O ) ) r O / g r ,
fir = e r / g r ,
a n d tr is an integer p a r a m e t e r . Then, the s o l u t i o n to (4.30) is ~ = (OQO + ~ l t l ,
0~20 + ~x2t2,...,~mO + O~mtm)
.) = (]~10 + f l t l ,
f20 + f2t2,...,fmO
+ fmtm).
After s u b s t i t u t i o n in the d e p e n d e n c e c o n s t r a i n t s (4.25), we get the following s y s t e m of inequalities involving the p a r a m e t e r s tr: 0 --< (OQO + ~ l t l , 0~20 + ~ 2 t 2 . . . . ,~XmO + O~mtm), (~XlO + a l t l , 0~20 + 0~2t2 . . . . ,arnO + a m t m ) ( ~ 0 --< ( f l 0 + f X t l , f 2 0 (flO + fxtl,f20
< £10,
+ f2t2 ..... fmO + fmtm), + f2t2 ..... fmo + fmtm)(~
<-- ~lo.
This can be w r i t t e n as -arO < artr
(1 _< r _< m ) ,
(4.32)
-frO < frtr
(1 _< r < m ) ,
(4.33)
( a l t l . . . . , O ~ m t m ) ~ <- ~10 - ( a l O , . . . , a m O ) ( ~
(4.34)
( f l t l , . . . , f m t m ) ( ~ < £10 - (fl0, . - . , f m O ) ( ~ •
(4.35)
Each inequality of (4.32) a n d (4.33) involves at m o s t one variable, a n d provides a lower or u p p e r b o u n d for that variable in case its coefficient is n o n z e r o . Since (~ is an u p p e r triangular matrix, £1 is the only variable that m a y a p p e a r in the first inequality of (4.34), tl a n d t2 are the only variables that m a y a p p e a r in the s e c o n d inequality, a n d so on. The s y s t e m (4.35) also has similar properties. Thus, the s y s t e m of inequalities (4.32)-(4.35) is easier to solve t h a n a general s y s t e m since partial elimination has already b e e n done. This d e p e n d e n c e p r o b l e m b e c o m e s a simple p r o b l e m w h e n the loop n e s t L is regular. Since t h e n (~ is the identity matrix a n d ~0 = ( ( q l 0 -- P 1 0 ) / 0 1 , (q20 -- P 2 0 ) / 0 2 , " " " , (qmo -- P m o ) / O r n )
4.6. SPECIAL CASES
115
(see (4.11)), the above s y s t e m of inequalities simplifies to -OtrO <- tXrtr <- (ClrO - PrO)/Or - O t t o ) - f i f O <- f l r t r <- (~lrO PrO)/Or ~rO ~
(1 < r < m ) .
(4.36)
Such a s y s t e m is trivial to solve. A l g o r i t h m 1.5.3 c a n be easily m o d i f i e d to provide a c o m p l e t e solution to this simple p r o b l e m . E x a m p l e 4.6 C o n s i d e r the d e p e n d e n c e p r o b l e m p o s e d b y the two array e l e m e n t s s h o w n in the following loop nest: L1 ~ L2 : L3 :
S: T:
do I1 = 0, 100, 1 d o 12 = 0, 100, 1 do I3 = 0, 100, 1 X(211 - 1,312 - 1,I3) . . . . ....... X(211 + 1 , I 2 + 2 , 5 ) enddo enddo enddo
o..
As in Example 4.5, we will w o r k w i t h i n d e x variables, since t h e y are identical to t h e c o r r e s p o n d i n g i t e r a t i o n variables. The d e p e n d e n c e e q u a t i o n s are 2il - 2jl = 2
(4.37)
3i2 - j2 = 3
(4.38)
i 3 -- 5,
(4.39)
a n d t h e d e p e n d e n c e c o n s t r a i n t s are 0 < il < 100
and
0 < j l < 100,
(4.40)
0 _< i2 -< 100
and
0 < j2 -< 100,
(4.41)
0 _< ~3 -< 100
and
0 < j3 -< 100.
(4.42)
It is clear t h a t t h e m a i n d e p e n d e n c e p r o b l e m c a n be p a r t i t i o n e d into t h r e e smaller p r o b l e m s , since the variables il, j l a p p e a r only in (4.37) a n d (4.40), variables i2, j2 only in (4.38) a n d (4.41), a n d variables i3, j3 only in (4.39) a n d (4.42). We can use A l g o r i t h m 2.2 to solve e a c h of t h e t h r e e p r o b l e m s separately. For r = 1, 2, 3, we find t h e set
116
CHAPTER 4. PERFECT LOOP NESTS
~t'r c o n s i s t i n g of all s o l u t i o n s (it, j r ) to the c o r r e s p o n d i n g e q u a t i o n that s a t i s f y t h e c o r r e s p o n d i n g set of constraints. Each set ~r will be p a r t i t i o n e d into t h r e e s u b s e t s s h o w n below: ~r,1
=
{(it,jr)
~ ~r : ir < j r }
~gr,-i = { ( i t , j r ) ~ ~/r : ir > j r } = {(it,jr)
~r,0
E~r:ir
=jr}.
The general s o l u t i o n to (4.37) is ( i l , j l ) = (tl + 1, tl), w h e r e t~ is a n integer p a r a m e t e r . Using the c o n s t r a i n t s (4.40), we get
~fl = {(t~ + l , t l ) : O < t ~ ~g~,~ = ~
<99}
't~1,_1 = {(t~ + 1 , t l ) : 0 ~ tl ~ 99} ~1,0
= ~.
The general s o l u t i o n to (4.38) is (ie, j e ) = (re, 3re - 3), w h e r e te is an integer p a r a m e t e r . Using the c o n s t r a i n t s (4.41), we get •~
= { ( t z , 3 t z - 3) : 1 ~ tz ~ 34}
• ~,~ = { ( t e , 3 t e - 3 ) : 2 ~ t e ~ 3 4 } • ~,_~ = {(te,3tz - 3) : tz = 1} = {(1,0)}
• 2,o = ~ . The general solution to (4.39) is (i3, j3 ) = (5, t3 ), where t3 is an integer parameter. Using the constraints (4.42), we get ~3
= {(S, t3) : 0 ~ t3 ~ 100}
• 3,1 = {(S, t3) : 6 ~ t3 ~ 100} ~3,-I = {(S, t3) : 0 ~ t3 ~ 4} %,o
= {(5,~)~.
Suppose we want to ~ o w if statement T depends on statement S ~ t h the direction vector ( 1 , - 1 , 0). Then the question is whether there is an integer solution (i~, j~, i2, j2, i3, j3) to the system of equations (4.37)-(4.39) satisf~ng the constraints (4.40)-(4.42), such that il < j l , i2 > j2, i3 = j3.
4.6. SPECIAL CASES
117
The set of all s u c h s o l u t i o n s is
{(il,jl,i2,j2, i3,j3):(ii,ji)
~
~1,1, (i2,j2) e W2,-1, (i3,j3) ~ ~3,0}
w h i c h is clearly e m p t y . Hence, T d o e s n o t d e p e n d o n S w i t h the d i r e c t i o n v e c t o r ( 1, - 1,0). Next, s u p p o s e we w a n t to k n o w if s t a t e m e n t S d e p e n d s o n statem e n t T with the d i r e c t i o n v e c t o r ( 1 , - 1 , 0 ) . T h e n the q u e s t i o n is w h e t h e r t h e r e is a n i n t e g e r s o l u t i o n ( i l , j l , i2,j2, i3,j3) tO t h e syst e m of e q u a t i o n s (4.37)-(4.39) satisfying t h e c o n s t r a i n t s (4.40)-(4.42), s u c h that
jl < il,j2 > i2,j3 = i3. (Note that (it, i2, i3) goes w i t h S a n d ( j l , j 2 , j 3 ) w i t h T, so t h a t their roles are n o w reversed.) The set of all s u c h s o l u t i o n s is
{ ( i i , j l , i 2 , j 2 , i 3 , j 3 ) : ( i t , j ~ ) ~ ~Itt,_l, (i2,j2) e ~I~2,1, (i3,j3) e ~3,o}. This set is n o n e m p t y so t h a t S d e p e n d s o n T w i t h the d i r e c t i o n v e c t o r (1, - 1 , 0). In fact, all i n s t a n c e pairs ( S ( i t , i2, i3), T ( j l , j 2 , j 3 ) ) , s u c h t h a t it > jx, i2 < j2, i3 = j3, a n d S(il, i2, i3) d e p e n d s o n T ( j l , j 2 , j 3 ) , f o r m the set {(S(tl + 1, t2, 5 ) , T ( t l , 3 t 2 - 3, 5)) : 0 < tt < 99,2 < t2 < 34}. The n u m b e r of s u c h pairs is 100 x 33 = 3300. The set of d i s t a n c e v e c t o r s for this d e p e n d e n c e of S o n T is {(1,3 - 2t2,0) : 2 < t2 -< 34} = { ( 1 , - 1 , 0 ) , ( 1 , - 3 , 0 )
..... (1,-65,0)}.
S p e c i a l C a s e 3. A G e n e r a l i z a t i o n o f C a s e s 1 a n d 2 Let ~, = EC a n d ~ = FC, w h e r e E a n d F are r n x rrr diagonal matrices, a n d C is a n r n x rt matrix. It is clear t h a t this last case i n c l u d e s the first two. We get .~ = I~ w h e n E a n d F are equal to the i d e n t i t y matrix, w h i l e / ~ = E a n d i~ = F w h e n C is the i d e n t i t y matrix. A n e x a m p l e of this case is p r o v i d e d b y the variables X(I3, 2It - 1,312 - 1) a n d
118
CHAPTER 4. PERFECT LOOP NESTS
X(5,211 + 1, I2 + 2) in a r e c t a n g u l a r triple loop, since t h e i r coefficient m a t r i c e s have the forms:
= diag(2, 3, 1) • C
= diag(2, 1, 0) • C.
In general, the a n a l y s i s o f Special Case 2 can be e x t e n d e d to i n c l u d e variables w h o s e coefficient m a t r i c e s are d i a g o n a l m a t r i c e s p o s t m u l t i plied by a p e r m u t a t i o n matrix. As in Special Case 2, let E = d i a g ( e l , e 2 , . . . , end) F = d i a g ( f l , f 2 , . . . , fro), a n d a s s u m e t h a t for e a c h ~" in 1 < r < rn, at least one o f er a n d f r is n o n z e r o . Then, we m a y c h o o s e the m a t r i x C in s u c h a w a y t h a t gcd(er,fr) = 1 for e a c h r . The d e p e n d e n c e e q u a t i o n (4.24) h a s the form i(EC) - )(FC) = 60 - rio. It can be b r o k e n u p into two s e p a r a t e s y s t e m s : iE - ) F = k
(4.43)
kC = !~o - to.
(4.44)
For a given k = (kl, k2 . . . . , kin), t h e general s o l u t i o n to (4.43) c a n be w r i t t e n as i = (~Iokl + fit1, ~20k2 + f2t2,..., ~mokm + f m t m ) .~ = (310kl + eltl,)20k2 + e2t2,... ,)mokrn + emtrn), w h e r e tl, t2 . . . . . tm are i n t e g e r p a r a m e t e r s , a n d
er~rO - f r 3 r o = 1
(1 < r < m ) .
The c o n s t r a i n t s (4.25) can also be e x p r e s s e d in t e r m s of kl, k 2 , . . . , km a n d tl, t 2 , . . . , tin.
4.6.
SPECIAL CASES
119
T h i s k i n d o f p r o b l e m i n a r e g u l a r l o o p n e s t ( w h e r e (~ is t h e i d e n t i t y m a t r i x ) b e c o m e s a s i m p l e p r o b l e m , if t h e m a t r i x C is s u c h t h a t E q u a t i o n ( 4 . 4 4 ) h a s n o s o l u t i o n o r a u n i q u e s o l u t i o n i n k. W e o m i t t h e d e t a i l s ; s e e E x a m p l e 1.5.12. T o s e e h o w t o t e s t w h e t h e r t h e m a t r i c e s .3, a n d !~ s a t i s f y t h e c o n d i t i o n d e f i n i n g S p e c i a l C a s e 3, s e e E x e r c i s e 1.5.6.1. EXERCISES 4.6 1. Consider the d e p e n d e n c e equation i.~ - ) ~ = ~o - ~o.
If ,~ = g, then for any given solution (io,,~o) and any given m-vector h, (io + h , ) o + h) is also a solution. Prove the converse: if the equation has this property, then A = g. 2. Find conditions on the iteration space of L such that it is finite and Theorem 4.9 is still valid. 3. Give an example of a nonuniform dependence with a unique distance vector. 4. Find the distance vectors, if any, of the dependence of T on S, S on T, and S on S in the following programs: (a) L1 : L2:
do I~ = 0 , 2 0 0 , 2 doI2 = 100+I1,0,-3 S: X(311 + I 2 + 5 , I ~ +212 + 3 ) . . . . T:
.......
X ( 3 1 1 + I2 + 1 , I ~ + 212). . .
enddo enddo
(b) L1 : L2 :
d o I1 = O, 100, 3 d o 12 = I1, 50 + Ix, 2 S : X(211 + 312 + 12) . . . . T:
.......
X(2I~ + 3 1 2 - 5 ) . . .
enddo enddo
5. Decide if T depends on S at level 2 in the program L1 : L2 :
do 11 = O, 100, 1 do 12 = 0, 50 + I1, 1 S : X ( 2 I ~ + 312 + 12) . . . . T : ....... X(211 + 312 - 5) • • • enddo enddo
6. In the following programs, study the dependence of T on S, S on T, S on S, and T on T. Find all distance and direction vectors, and levels of dependence. For each valid direction vector, find the set of all pairs of statement instances that are involved in the dependence.
C H A P T E R 4. PERFECT L O O P N E S T S
120
(a) L1 : L2 : L3 :
doI1 = 10,100,2 d o I2 = 1 0 0 , 0 , - 2 d o 13 = 20, 100, 1 S: X(211 - 1,312 - 1,I3) . . . . T : ....... X(211 + 1,I2 + 2, 5) • • • enddo enddo enddo
(b) L1 : L2 :
d o I~ = 1 , 1 0 0 , 2 do12 = 100,0,-3 S : X(611 - 1,212 - 1) . . . . T: ....... X(411 + 9 , 2 1 2 + 9 ) . . enddo enddo
7. D i s c u s s t h e p o s s i b i l i t y o f e x t e n d i n g t h e c o n c e p t s a n d r e s u l t s o f t h i s s e c t i o n t o a p r o g r a m t h a t is n o t a p e r f e c t n e s t .
Chapter 5
General Program 5.1
Introduction
A sequence of loops f o r m s a nest if each loop (except the first) is totally included within the previous loop. So far in the book, we have s t u d i e d the d e p e n d e n c e p r o b l e m in a perfect loop nest, where the b o d y of each loop (except the last) is t h e next loop in the sequence. The p a r a m e t e r s we u s e d to characterize a perfect loop n e s t (initial and final vectors, initial a n d final matrices, a n d stride matrix) can also be u s e d to characterize an ordinary loop nest. The m o d e l for this c h a p t e r is a p r o g r a m that consists of loops and a s s i g n m e n t statements, where the loops are arbitrarily (but legally) nested. If necessary, we create a trivial loop with one iteration so that the whole p r o g r a m is a loop. This does n o t alter the c o n c e p t s of d e p e n d e n c e in any way, b u t helps simplify the notation. Each given s t a t e m e n t in the p r o g r a m determines a loop nest; it is the s e q u e n c e of all loops in the p r o g r a m containing that s t a t e m e n t (counted f r o m the o u t e r m o s t loop inward). If the p r o g r a m is a perfect loop nest, t h e n the nests d e t e r m i n e d by all s t a t e m e n t s are the same. When we c o m p a r e two s t a t e m e n t s in the c u r r e n t p r o g r a m m o d e l to test for d e p e n d e n c e b e t w e e n them, we have to deal with two possibly different loop nests. This is h o w the d e p e n d e n c e p r o b l e m is to be generalized as we m o v e f r o m a perfect n e s t of loops to an arbitrary program. Much of the n e e d e d f o r m a l i s m is already in place. We show in Section 5.2 h o w the d e p e n d e n c e concepts can be easily e x t e n d e d f r o m a 121
122
CHAPTER 5. GENERAL PROGRAM
perfect loop n e s t to a general program. The linear d e p e n d e n c e probl e m in the general setting is f o r m u l a t e d in Section 5.3. In Section 5.4, we discuss the generalized gcd test that is the basic test in dependence analysis.
5.2 Dependence Concepts First, take any single a s s i g n m e n t s t a t e m e n t S in the program. Let n i s d e n o t e the n u m b e r of loops containing S and Ls the loop nest d e t e r m i n e d by S. For Ls, let Is d e n o t e the index vector, ~s the iteration vector, Ps0 a n d qs0 the initial a n d final vectors, Ps and Os the initial a n d final matrices, and Os the stride matrix. The n o r m a l i z e d final vector a n d the n o r m a l i z e d final matrix of Ls are d e n o t e d by/Is0 and 0 s , respectively, and they are c o m p u t e d using (4.10): 0S = O s p ~ I Q s O ~ 1 /Is0
)
(qs0 - Ps0P~IQs) O~1. ~
(5.1)
Next, c o n s i d e r any two, n o t necessarily distinct, a s s i g n m e n t statem e n t s S a n d T in the given p r o g r a m . If S lexically precedes T in the program, we write S < T. The m e a n i n g of the n o t a t i o n S < T is t h e n clear, a n d it is obvious that < is a total order in the set of all assignment statements. Let ni = niST d e n o t e the n u m b e r of loops that contain b o t h statem e n t s S a n d T, a n d L the nest f o r m e d by these loops. We have m < m i n ( m s , niT). Since the o u t e r m o s t ni loops of Ls and LT are the same, the first n i e l e m e n t s of Pso a n d PT0 are identical, and so are the first n i e l e m e n t s of qso a n d qT0- Also, the n i x ni submatrices f o r m e d by the t o p m o s t n i rows a n d the l e f t m o s t n i c o l u m n s of the matrices Ps a n d PT are the same, and so are the similar s u b m a t r i c e s of the matrices Qs and QT. Let us label the loops of the p r o g r a m L1, L2,... ,Lms+mT-m, such that L = (L1,L2 . . . . . Lm) Ls = (L1,L2 . . . . , L m , L m + l , . . . , L m s ) LT = ( L 1 , L 2 , . . . , L m , L m s + l , . . . , L m s + m T - m ) . A value i = (il, f2 . . . . , ira, im+l . . . . , ires) of the index vector Is det e r m i n e s an instance of s t a t e m e n t S that is d e n o t e d by S(i), a n d a
5.2. DEPENDENCE CONCEPTS
123
value j = (jl, j2 . . . . , jm, Jms+~,... ,Jms+mr-m) of the index vector IT d e t e r m i n e s an instance of s t a t e m e n t T that is d e n o t e d by T(j). The distance f r o m the instance S(i) to the instance T(j) is defined to be the m - v e c t o r ()1 - ~1,.~2 - ~2 . . . . . jrn -- ~rn), where i -~ (~1 . . . .
, ~rn, ~ m + l . . . .
, ~ms)
.~ = ( J 1 , . . - , j r n , ) ~ n s + l , . . . , j m S + m T - r n )
are the iteration vectors c o r r e s p o n d i n g to i a n d j, respectively. Thus, the distance is defined in t e r m s of the first m e l e m e n t s of the iteration vectors of the loop nests Ls a n d LT. In o t h e r words, the distance is defined 1 by the iterations of the c o m m o n n e s t L. An extension of L e m m a 4.4 to the general case is s t a t e d below w i t h o u t proof. L e m m a 5.1 Consider two assignment statements S a n d T in the model program. Let d denote the distance from an instance S(i) o f S to an instance T(j) o f T . In the sequential execution o f the program, S(i) is executed before T(j) iff one o f the following two conditions holds: (a) d > 0, (b) d = O a n d S <
T.
The d e p e n d e n c e relation ~ b e t w e e n s t a t e m e n t s in the general prog r a m can be defined as in Section 4.3. Descriptions of other associated concepts can be e x t e n d e d a l m o s t w o r d for w o r d to the general case. We n e e d only r e m e m b e r , for example, that a distance vector d of d e p e n d e n c e of T on S, defined as the distance f r o m an instance S(i) to an instance T(j), is c o m p u t e d as the difference b e t w e e n those parts o f ) and i that relate to the loop n e s t I. c o m m o n to S and T. The c o r r e s p o n d i n g direction vector ~r is the sign of d, a n d the d e p e n d e n c e level ~ is the level of d (or of {r). We state here the generalizations of T h e o r e m 4.5 a n d the two corollaries, for the sake of c o m p l e t e n e s s . T h e o r e m 5.2 If d is a distance vector for the dependence o f statement T on s t a t e m e n t S in the general program, then d >- O. The equality d = 0 is possible only i f S < T. 1Note t h a t in a g e n e r a l p r o g r a m , t h e i t e r a t i o n p o i n t s ~ a n d ) n e e d n o t h a v e t h e s a m e n u m b e r o f e l e m e n t s , so t h a t t h e d i f f e r e n c e ) - ~ m a y n o t e v e n m a k e sense.
CHAPTER 5. GENERAL PROGRAM
124
C o r o l l a r y 1 I f ¢~ is a direction vector for the d e p e n d e n c e o f s t a t e m e n t
T on s t a t e m e n t S in the g e n e r a l p r o g r a m , then ~ >__O. The equality o" = 0 is possible only if S < T. C o r o l l a r y 2 I f ~ is a d e p e n d e n c e level for the d e p e n d e n c e o f s t a t e m e n t
T on s t a t e m e n t S in the g e n e r a l p r o g r a m , then i < e < m + 1 (where m is the n u m b e r o f loops that contain both S a n d T). The value ~ = m + 1 is possible only if S < T.
E x a m p l e 5.1 C o n s i d e r t h e p r o g r a m L1 : L2: L3 : S : L4 : Ls :
T:
d o I1 = 1 , 1 0 0 , 1 d o I 2 = 1,I1,2 d o 13 = / 1 , I1 + 12, 1 X(211 - 1,312 + 1,213) . . . . enddo d o I4 = 100, 0, - 3 d o I~ = I1, I4, 1 ....... X(211 + 1,412 + 6 , I 4 + I s ) ' ' ' enddo enddo enddo enddo
S t a t e m e n t S d e t e r m i n e s t h e l o o p n e s t Ls = (L1,L2,L3) w i t h m s = 3 loops. For this nest, we have: P l = P l o = 1, ql
=
qlO
=
100,
P2 : P20 + P2111 : 1 + 0 X 11, 6/2 = q20 +6/21/1 = 0 + 1 x [ 1 , P3 = P30 + P3111 + P3212 = 0 + 1 x I1 + 0 x I2, 6/3 = 6/30 + 6/31/1 + q32/2 = 0 + 1 X I1 + 1 x I2, 01=1,02=2,03=1. H e n c e , t h e initial a n d final v e c t o r s a r e PS0 = ( P l 0 , P20, P30) = (1, 1 , 0 ) , qs0 = (ql0, q20, q30) = (100, 0, 0);
125
5.2. DEPENDENCE CONCEPTS
and the initial, final, and stride matrices are ( 1 -P21 Ps = 0 1 0 0 ( 1 -q21 O.s = 0 1 0 0 Os =
0 0
-P31 ) -P32 1 -q3~ ) -q32 1
02 0 0 03
=
=
=
( 1 0 0 ( 1 0 0
0 1 0 -1 1 0
0 2 0 0 0 1
-1 ) 0 , 1 -1 ) -1 , 1 .
We compute the normalized final vector and matrix: £1so = (qso - PsoP~lO.s ) O~ 1 = (99, 0, 1), 0.3" = Osp~IQsO~ 1 =
(1-1/20) 0 1 0 0
-2 1
.
The iteration vector ~s = (,~1,~2, i3) satisfies the constraints:
O< ~s
}
sfis -<
Thus, the iteration space of the loop nest Ls is the set of all integer vectors (~, ~2, ~3) ~ (0, 0, 0) such that ~
~ 99 ]
-~1/2+
~
0
- 2~2 + ~3 ~
~2
1,
that is, the set of all integer 3-vectors (~, ~2, $3) such that 0 ~ ~ ~ 99 ] 0 ~ ~2 ~ i~/2 0 ~ ~3 ~ 2~2 + 1. Given an index point (i~, i2, i3) of Ls, the corresponding iteration point (i~, i2, ~3) is computed using the equation ~ = (IP - P o ) O - 1 ,
126
CHAPTER 5. GENERAL PROGRAM
so that (~1, ~2, ~3) = [(il, i2, i3)Ps - Pso] O~ 1
(i1,i2, i3)
0 0
1 0
0 1
-(1,1,0)
0 0
1/2 0
0 1
= (i~ - 1, (i2 - 1)/2, i3 - i l ) . Statement T d e t e r m i n e s the loop nest LT = (L1,L2,L4,Ls) with m T = 4 loops. For this nest, the initial and final vectors are DTO = (1,1,100,0),
qT0 = (100,0,0,0);
and the initial, final, and stride matrices are
PT =
1 0 0 0 1 0 001
0 0
0
1
0
0
-1 0 10 01-1 0 0
1
OT =
QT =
0
0 0
' 1 0 0 2 0 0 0 0
0 0 -3 0
0) 0 ' 1
0 0 0 1
Note that the first two elements of Ps0 and PTO are identical, and the same is true for qso and qT0. Also, the 2 x 2 submatrix of Pso at the n o r t h w e s t corner is identical to the similar submatrix of PTO, and the same holds for Qso and QTO. We c o m p u t e the normalized final vector and matrix:
)
~T0 = (qT0 -- PTop~IQT) 071 = (99,0, 1 0 0 / 3 , 9 9 ) ,
~ "-~rP~'I~T~T~ -"" =
1 0 0
-1/2 1 0
0 0 1
1 0 3
0
0
0
1
"
5.2. DEPENDENCE CONCEPTS
Statement Instance
S(i~)
=
S(31, 29, 58)
T(j~) = T(36, 5, 70, 70) S(i2) = S(41,25, 61)
127
Iteration Point
Iteration Point
Iteration Point
of Ls
of LT
of L
--
(30,14)
(35,2,10,34)
(35,2) ( 4 0 , 12)
(30, 14, 27) -
-
(40, 12, 20)
--
Table 5.1: Three s t a t e m e n t instances for Example 5.1. The iteration space of the loop nest LT is the set of all integer 4-vectors (31,32, 34, 35) such that 0 < 3~ 0 _< 32 0<34 0 < 35
< 99 -< 3x /2 <33 < 99 - 31 - 334.
The iteration value ( 3 1 , 3 2 , 3 4 , 35) corresponding to an index value (jl, j2, j4, js) of LT is given by (31,32,34,35) = [(Jl, j2, j4, js)PT - PTO] 0 7 1 = ( j l - 1, (j2 - 1)/2, (100 - j 4 ) / 3 , j 5
-jl).
We take three statement instances S(il ), S(i2), and T(j~ ), where i 1 = (31,29,58),
i2 = (41,25,61),
j~ = (36,5,70,70).
The iteration points of different loop nests for these instances are given in Table 5.1. The distance f r o m S(ix) to T(jl) is ( 5 , - 1 2 ) = (35, 2) - (30, 14), and the distance from T(jl) to S(i2) is (5, 10) = (40, 1 2 ) - (35, 2). Since ( 5 , - 1 2 ) and (5, 10) are both lexicographically positive, Lemma 5.1 implies that in the sequential execution of the given program, S (i1) is execute d before T (j i ), and T (j 1) before S (i2). That is, the instances S(il ), T(jx ), and S(i2) are executed in this order since (30, 14) -< (35, 2) < (40, 12). Next, note that the instance S(24, 19, 40) of S writes the m e m o r y location X(47, 58, 80), and the instance T(23, 13, 52, 28) of T reads the same location. Thus, there is a d e p e n d e n c e between S and T. Now,
128
CHAPTER
5.
GENERAL
PROGRAM
t h e i t e r a t i o n p o i n t o f t h e l o o p n e s t Ls f o r t h e i n d e x p o i n t (24, 19, 40) is (23, 9, 16). Also, t h e i t e r a t i o n p o i n t o f t h e l o o p n e s t LT f o r t h e i n d e x p o i n t (23, 13, 52, 28) is (22, 6, 16, 5). Since (22, 6) < (23, 9), T r e a d s X ( 4 7 , 58, 8 0 ) b e f o r e S w r i t e s it. H e n c e , S is a n t i - d e p e n d e n t o n T w i t h a d i s t a n c e v e c t o r (1, 3) = (23, 9) - (22, 6). T h e c o r r e s p o n d i n g d i r e c t i o n v e c t o r is (1, 1) a n d t h e l e v e l o f d e p e n d e n c e is 1. EXERCISES 5.2
1. In Example 5.1, express the index values (il, i2, i3) and (jl, j2, j4, j~) in terms of the corresponding iteration values. Find descriptions of the index spaces of the loop nests Ls and Lr. 2. In Example 5.1, list all possible direction vectors and levels that a dependence of T on S may theoretically have. Without doing any heavy calculations, can you rule out any items from either list? Repeat for a dependence of S on T. 3. In Example 5.1, let d = ( 3 1 - ~ 1 , ) 2 - ~ 2 ) and d' = (jl - i x , j 2 - i2). Express d' in terms of d. How does d' change if d is increased by (0, 1)? 4. Let i denote an iteration point of the loop nest Ls and ) an iteration point of the loop nest LT. Let i denote the index point of Ls corresponding to i, and j the index point of LT corresponding to 3. Let d = (jr - ~-~.,)2 - ~2. . . . . ira - ~-rn) d ' = (j~ - i ~ , j 2 - i2 . . . . . jrn -- int).
How is d' related to d? How does d' change if d is increased by (0,..., 0, 1)? 5. Study the dependence structure of the following program. For each depen-
dence between two statements, find all distance vectors, direction vectors, and dependence levels. Draw a statement dependence graph. L~:
doI~ =1,10,1
S~ : L~ : $2 : $3 :
L3 :
X(I~) . . . . d o I~ = 10, 1, - 1 X(I1 +/2) . . . . enddo X ( I 1 + 1) . . . .
do I3 = I1, 10, 1 $4 :
L4 : $5 :
36 -"
X ( I 1 + 13) . . . . do 14 = 1 3 , l l , - 1 X ( I 1 + I3 + 14) . . . . enddo
X(I3) . . . . enddo enddo
5.3. DEPENDENCEPROBLEM
5.3
Dependence
129
Problem
In this section, we state the d e p e n d e n c e problem posed by two arrayelements in a general program. The problem is described in terms of iteration variables; the description in terms of index variables is left to the reader. Consider two assignment statements S and T in the program. Let X denote an n-dimensional array. Suppose that each statement has a variable that is an element of X with linear subscripts. Since the index vector of the loop nest determined by S is denoted by Is, this variable of S can be written in the form X (IsA + ao), where A is an rns x n integer matrix and ao is an integer n-vector (see Section 4.4). Similarly, the variable of T can be written as X (ITB + bo), where B is an n i t X n integer matrix and bo is an integer n-vector. Let X 0s/~ + ~io) and X (~TI~ + !~0) denote the normalized forms of the two variables. The nis x n integer matrix ~,, the niT × n integer matrix 1~, and the integer n-vectors ~o and !~0 are defined by the equations in (4.15): ~ = Osp~IA
~
~io I~
Psop~IA + ao OTP~IB
!~o
PToP~IB + bo. J
(5.2)
The instance of the variable X (~s/~ + ao) for an iteration point ~ = (El,~2,...,~m,~m+l,...,~.ms)
of the loop nest Ls is X ( i / ~ . ~0). The instance of X (~TI~ + l~O) for an iteration point ) = (.~1,32 . . . . . .~m, 3 m s + l , . . . ,
3rnS+mT-m)
of the loop nest LT is X + I o). These two instances represent the same memory location iff i~, + ~o = .)1~+ !~o, that is, iff i/~ - ) ~ = l~o - ho.
(5.3)
The matrix equation (5.3) is the dependence equation for the program variables X (|s/~ + ~io) and X (~T~ + I~o). It consists of n scalar equations, and there are (ms + niT) integer variables: the nis elements of
130
CHAPTER 5. GENERAL PROGRAM
Parameter
~S0 ~lT0 r0
Bo
Type integer vector integer matrix integer vector integer matrix integer vector integer matrix
Size ms ms x ms ~/T ~rtT X ~t/T n
ms xn
integer vector integer matrix
n
mT×n
Table 5.2: Parameters for the Dependence Problem. i and the Tt/T elements of 3. Since i is a value of ~s, it must satisfy the inequalities (4.9) for Ls: O_
(5.4)
where 0d is the normalized final matrix and ~lso the normalized final vector of Ls. Since ) is a value of ~T, it must satisfy the inequalities (4.9) for LT:
o _<)
-< 4To,
t
(5.5)
where (~T is the normalized final matrix and ~lTO the normalized final vector of LT. The inequalities in (5.4) and (5.5) constitute the dependence constraints for the program variables X (~s,~ + rio) and ~
- -
k
X , (~T~ + !~0) of statements S and T. Descriptions of the different parameters for the dependence problem are collected in Table 5.2. The extensions of Theorem 4.6, its corollaries, and Theorem 4.7 to a general program are stated below without proof.
Theorem 5.3 Consider two statements S and T in the general program L. Let X (IsA + ao) denote a variable o f S a n d X (ITB + b0) a variable of T, where X is an n-dimensional array. If these variables cause a dependence between S and T, then Equation (5.3) has an integer solution (i, 3) that satisfies (5.4)-(5.5). In each of the following special cases, the solution satisfies an additional condition:
5.3. DEPENDENCE PROBLEM
(a) I f S < T a n d S t ~ T , then ( ~ 1 , [ 2 . . . . .
131
~m) ~ ( . ~ 1 , 3 2 , . - . , 3 m ) ;
(b) If S = T and S t~ S, then ([1, [2 . . . . . ira) < ( 3 1 , 3 2 , . . . , ) r n ) ; (c) I f S < T a n d T ~ S ,
then ([1,~2 . . . . . ~m) >- ( 3 1 , ) 2 , . . - , 3 m ) ;
where ~ = (El . . . . , ~m, ~ m + l , - - - ,
~rns)
.~ --- ( 3 1 , . . . , . ~ r r t , 3 m s + l , . . . , 3 r n s + m T - m ) .
C o r o l l a r y 1 If statement T depends on s t a t e m e n t S with a direction vector t~ = (0-1, 02 . . . . . 0"m ), then Equation (5. 3) has an in teger solution (~, j) that satisfies (5.4)-(5.5) a n d the additional condition sig()r - it) = ~r
(1 < r < m ) .
(5.6)
C o r o l l a r y 2 If s t a t e m e n t T depends on statement S at a level ~, then Equation (5.3) has an integer solution that satisfies (5.4)-(5.5) a n d the additional conditions: {1 = j l , ~2 = j2 . . . . . ~e-1 = J e - 1 , ~e --< Je - 1. By C o r o l l a r y 1 to T h e o r e m 5.2, w e h a v e tr >- 0 if S < T, a n d tr >- 0 o t h e r w i s e . Also, b y C o r o l l a r y 2 to t h e s a m e t h e o r e m , w e h a v e 1 < / ? < m + 1 if S < T, a n d 1 < 4? < m o t h e r w i s e . T h e o r e m 5.4 Suppose that Equation (5. 3) has an integer solution (~,)) that satisfies the constraints (5.4)-(5.5). Let ~ = (El, ~2,..., ~rns) a n d • = ( j l . . . . , )rrt, ) r n s + l . . . . .
)r~tS+rrtT-rtl).
(a) I f ( i i , $ 2 , . . . , i m ) < ( 3 1 , ) 2 , . . . , j r n ) ,
then S-~T. -
-
(b) I f ( i l , ~ 2 . . . . ,~m) >- (31,j2 . . . . , ) m ) , then Tt~S. (c) I f ( i x , ~ 2 . . . . . ~m) = (~1,~2 . . . . . j m ) a n d s < T, t h e n S - ~ T . As always, w e d o n o t d i s t i n g u i s h b e t w e e n d e p e n d e n c e a n d indirect d e p e n d e n c e . If t h e r e is a n i n t e g e r s o l u t i o n to (5.3), s a t i s f y i n g (5.4)-(5.5) a n d a d d i t i o n a l c o n d i t i o n s , if any, t h e n w e a s s u m e t h a t the
132
C H A P T E R 5.
GENERAL P R O G R A M
c o r r e s p o n d i n g d e p e n d e n c e exists. The d e p e n d e n c e p r o b l e m for t h e general p r o g r a m is d e f i n e d as in Section 4.5. E x a m p l e 5.2 In this example, w e state the d e p e n d e n c e p r o b l e m for t h e p r o g r a m of Example 5.1: do 11 = 1,100, 1
L1 : L2 : L3 :
d o 12 = 1,11, 2
S: L4 : Ls :
T:
do 13 = 11,11 + 12, 1 X(211 - 1,312 + 1, 2•3) . . . . enddo d o 14 = 100, 0, - 3 do Is = 11,14, 1 ....... X(211 + 1,412 +6,14 + I s ) . • • enddo enddo enddo enddo
Much of t h e c o m p u t a t i o n h a s b e e n d o n e already. The variable of s t a t e m e n t S c a n b e w r i t t e n as X ((I1,12, I3)A + a0), w h e r e
A =
3 0
Its n o r m a l i z e d f o r m is
h
and
a0 = ( - 1 , 1,0).
X ((~1, ~2, I3)A + rio), w h e r e = Osp~IA =
, 0
rio = P s o p ~ I A + ao = ( 1 , 4 , 2 ) . The variable of T c a n b e w r i t t e n as X ((I1, I2, I4, I5 )B + bo), w h e r e
B=
4 0 0
bo = ( 1 , 6 , 0 ) .
'
133
5.3. DEPENDENCE PROBLEM
2ol)
Its n o r m a l i z e d f o r m is X ( ( i l , ~2, ~4, i s ) ~ + !~03, w h e r e
fi
"~-t~TPT1B =
0
8
0
0
0
0
0
-3
'
1
l~0 = PTop~XB + bo = (3, 10, 101). The d e p e n d e n c e e q u a t i o n (5.3) for this p r o b l e m is
(~1, ~2, ~33
20 0 6 0 0
02
-- (31,32,34,35)
0 0 0
08 0
- 30 1
= (2,0,993. (5.7)
It c o n s i s t s of t h r e e scalar e q u a t i o n s in 7 i n t e g e r variables: 2Zl-231 =2 6Z2 - 832 = 6 2il + 2~3 - 3 1 + 3 3 4 - 3 5 = 9 9 .
]
In Example 5.1, we f o u n d t h e d e p e n d e n c e c o n s t r a i n t s (5.4)-(5.5) for this p r o b l e m : 0 < ~ 1 _< 99 0--< ~2 _< ~1/2 0<~3 < 2~2+1
0-<3~ < 99 0-<32 -< 31/2 0-~34 _<33 o-<3s < 9 9 -
31 -
334.
If we w a n t to test if t h e r e is a d e p e n d e n c e of T o n S w i t h a particular d i r e c t i o n vector, say (1, - 1), t h e n we will get two a d d i t i o n a l c o n d i t i o n s : il --< jx -- 1 a n d j2 < i2 - 1. Similarly, for d e p e n d e n c e at a given level, say 2, the a d d i t i o n a l c o n d i t i o n s are ~1 = j l a n d ~2 -< j2 - 1.
134
CHAPTER 5. GENERAL PROGRAM
EXERCISES 5.3 1. C o n s i d e r t h e p r o g r a m L~ : L2 :
d o I~ = 10, 1 0 0 , 2 d o 1 2 = 211,I1 + 1 , - 1 $1 : X ( 7 I ~ - I2 + 6,•2) . . . .
enddo $2 :
L3 : $3 :
L4 : $4 :
Y ( 3 I ~ + 1) . . . . d o 13 = I1 + 3 , 1 0 0 , 3 X ( 4 I ~ + 1213 - 5,I~ + I 3 + 3) . . . . d o I4 = I~ + 213, 5/1 - 13, 1 Y ( 2 I ~ - I3 + I4) . . . .
enddo enddo enddo Find the dependence equations and constraints for each possible depend e n c e b e t w e e n s t a t e m e n t s . ( I n c l u d e d e p e n d e n c e o f a s t a t e m e n t o n itself.)
5.4
Generalized gcd Test
After formalizing the d e p e n d e n c e p r o b l e m in a sequence of p r o g r a m s with increasing generality, we have reached a point where p r o b l e m f o r m u l a t i o n is c o m p l e t e a n d p r o b l e m solution is to begin. The basic idea is to find integer solutions to a c o m b i n e d s y s t e m of linear e q u a t i o n s a n d inequalities. Over the course of c h a p t e r s 1-4, we have p o i n t e d o u t and c o m p l e t e l y solved certain p r o b l e m s that we called simple p r o b l e m s in C h a p t e r 1. In c h a p t e r s 6 and 7, we will explain the two a p p r o x i m a t e m e t h o d s ( m e t h o d of b o u n d s and m e t h o d of elimination) that m a y be u s e d for p r o b l e m s n o t treatable as simple problems. The first m a j o r step of any d e p e n d e n c e testing a l g o r i t h m is to decide if the s y s t e m of equations alone has an integer solution. In C h a p t e r 1, we s h o w e d in detail h o w to solve a single e q u a t i o n in two variables. That t e c h n i q u e can be u s e d repeatedly to solve a s y s t e m of two-variable e q u a t i o n s s u c h that no two equations have a c o m m o n variable. In occasional examples, we have s h o w n h o w to decide the existence of a solution to a m o r e general system, or even to find the general solution. The basic m a c h i n e r y for solving s y s t e m s of linear d i o p h a n t i n e e q u a t i o n s is available in Chapter 1.3. We are adapting it here to the d e p e n d e n c e problem.
135
5.4. GENERALIZED gcd TEST
A single equation of the f o r m a l X l + a2x2 + • • • + araXrn = c
with integer coefficients, has an integer s o l u t i o n in x1, x2, . . . , x m , iff g c d ( a l , a 2 , . . . , a m ) evenly divides c ( T h e o r e m 1.3.5). W h e n this wellk n o w n result is u s e d in d e p e n d e n c e analysis, it is called the gcd test. The gcd test for a single loop, that is, the test for an e q u a t i o n in two variables, can be traced to [Coha 73]. The gcd test for a oned i m e n s i o n a l array in a n e s t of loops, that is, for a single e q u a t i o n in m a n y variables, a p p e a r e d in [Towl 76]. Consider n o w the d e p e n d e n c e p r o b l e m for an array with one or m o r e d i m e n s i o n s in a general program. In this case, we get a general s y s t e m of linear d i o p h a n t i n e equations. The test that is applicable to any linear d e p e n d e n c e p r o b l e m was i n t r o d u c e d in [Bane 88a]. It is called the generalized gcd test. As the n a m e indicates, it generalizes the gcd test for a o n e - d i m e n s i o n a l array. (See Section 6.4 of [Bane 88a], T h e o r e m 1.3.6, a n d Section 1.5.2.) It is s t a t e d below in the context of the d e p e n d e n c e p r o b l e m in a general p r o g r a m . Note that the d e p e n d e n c e e q u a t i o n (5.3) can be w r i t t e n in the f o r m
A ) = 6o -
(i;))
(5.8)
where (i;)) is the vector of size ( m s + tt/T) o b t a i n e d by c o n c a t e n a t i n g the e l e m e n t s of i a n d i, a n d the coefficient matrix of the e q u a t i o n is the ( m s + roT) X rr matrix obtained by c o n c a t e n a t i n g the rows of matrices ~, a n d -I~. G e n e r a l i z e d g c d test. Find an ( m s + roT) X ( m s + tilT) u n i m o d u l a r matrix U and an ( m s + rnT) x n echelon matrix S, such that u.
=s.
I f the variables X (~s~k + ¢1o) o f s t a t e m e n t S a n d X (iT~ + 60) o f statem e n t T cause a dependence between S a n d T, then there exists an integer vector t o f size ( m s + roT) that satisfies the equation tS = l~0 - ~o.
(5.9)
136
CHAPTER 5. GENERAL PROGRAM
The matrices U a n d $ can be f o u n d by A l g o r i t h m 1.2.1. The test follows f r o m T h e o r e m |.3.6. That t h e o r e m says that there is an integer s o l u t i o n to the d e p e n d e n c e e q u a t i o n (5.8) iff there is an integer s o l u t i o n t to Equation (5.9). The whole point here is that (5.9) is m u c h easier to solve t h a n (5.8), since S is an echelon matrix. If there is no integer solution to the d e p e n d e n c e equation, t h e n d e p e n d e n c e c a n n o t exist. On the other hand, if the d e p e n d e n c e e q u a t i o n does have an integer solution, t h e n that fact alone c a n n o t decide the existence of d e p e n d e n c e . We m u s t have an integer solution that also satisfies the constraints. Thus, the generalized gcd test provides only a n e c e s s a r y c o n d i t i o n for d e p e n d e n c e b e t w e e n s t a t e m e n t s S and T. E x a m p l e 5.3 Consider again the p r o g r a m of examples 5.1-5.2. In this example, we apply the generalized gcd test to the d e p e n d e n c e probl e m b e t w e e n s t a t e m e n t s S and T. The d e p e n d e n c e e q u a t i o n (5.7) in Example 5.2 can be written as (
2 0 0 -2 0 0 k 0
0 6 0 0 -8 0 0
!
(~, ~, h, J~, Jz, J4, Js)
2 0 2 -1 0 3 -1
= (2,6,99).
We apply A l g o r i t h m 1.2.1 to the coefficient matrix of this s y s t e m to get the d e c o m p o s i t i o n 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 1
0
0
1
0
0
0 4 0 0 3 0 0 0 0 1 0 0 0 2 0 0 0 0 0 1 3
2 0 0 1
-2
0 6 0 0
2 0 2 -1
0-8 0 0 0 3 0 0-1
=
-2 0 0 0 0 0 0
0 -2 0 0 0 0 0
-1 0 -1 0 0 0 0
where the 7 x 7 matrix on the left (the U matrix) is u n i m o d u l a r a n d the 7 x 3 matrix o n the right (the S matrix) is echelon. The e q u a t i o n (tl, t2, t3, t4, ts, t6, tT).S = (2, 6, 99)
5.4. GENERALIZED gcd TEST
137
is e q u i v a l e n t to the e q u a t i o n (-2tl,-2t2,-t1
- t3) = ( 2 , 6 , 9 9 )
that clearly h a s i n t e g e r solutions. In fact, a n y 7-vector of the f o r m ( - 1 , - 3 , - 9 8 , t4, ts, t6, t7) is an integer solution, w h e r e t 4 , . . . , t7 are a r b i t r a r y integers. Thus, the g e n e r a l i z e d g c d t e s t p a s s e s . This m e a n s for t h e c u r r e n t d e p e n d e n c e p r o b l e m , this test d o e s n o t rule o u t dep e n d e n c e , n o r d o e s it c o n f i r m it. It is clear that t h e r e is a v e r s i o n of the g e n e r a l i z e d gcd test for each special d e p e n d e n c e p r o b l e m w h e r e a d i r e c t i o n v e c t o r is specified. For t h e d e p e n d e n c e of T o n S w i t h a given d i r e c t i o n v e c t o r ( o 1 , 0 2 . . . . . o-~n), w e get n~ extra c o n d i t i o n s . The c o n d i t i o n that corr e s p o n d s to an e l e m e n t o¥ = 0 is an equality: ~r = j r . One a p p r o a c h is to a t t a c h all t h o s e equalities to the s y s t e m (5.8), a n d t h e n a p p l y the g e n e r a l i z e d g c d t e s t to the e x t e n d e d s y s t e m . A n o t h e r a p p r o a c h is to r e d u c e the n u m b e r of variables f r o m the e q u a t i o n s a n d c o n s t r a i n t s u s i n g t h e s e equalities, right at the beginning. Yet a n o t h e r a p p r o a c h is to p e r f o r m the g e n e r a l i z e d gcd test o n c e (as d e s c r i b e d here), a n d t h e n a p p l y the a d d i t i o n a l c o n d i t i o n s i m p o s e d b y the given d i r e c t i o n vector. We will r e t u r n to this p o i n t in Section 7.2. EXERCISES 5.4
1. Apply the generalized gcd test to each dependence problem (between two statements) of Exercise 5.3.1. In each case, apply the test also to the problem of checking dependence (of one statement on another) at level 2 (if such a level is permissible). 2. To apply the generalized gcd test, we need not know the final vectors qs0 and qT0, or the final matrices Qs0 and QT0. Is there a situation where we will be able to apply this test without knowing even the initial vectors and matrices?
Chapter 6
Method of Bounds 6.1
Introduction
Algorithm 1.2 is an approximate d e p e n d e n c e algorithm that checks if (1) the s y s t e m of d e p e n d e n c e equations has an integer solution, and (2) the combined s y s t e m of equations and inequalities has a real solution. The method of bounds is an i m p l e m e n t a t i o n of Algorithm 1.2 that consists of two major tests: the generalized gcd test to check for an integer solution to the equations, and the b o u n d s test to check for a real solution to the equations and the inequalities. We covered the first test in Section 5.4; this chapter is devoted to the second. For this chapter, we n e e d several m a t h e m a t i c a l definitions and results that have b e e n collected in the appendix. The bounds test works by testing if a certain real n u m b e r lies between the e x t r e m e values of a certain real-valued function in a certain set. For a linear d e p e n d e n c e p r o b l e m involving an n-dimensional array X, there are n scalar d e p e n d e n c e equations that can be written in the f o r m fk(X) = Ck
(1 < k < n),
(6.1)
where f l , f2, • • •, f , are real-valued linear functions on some Euclidean space R N, and c,, c2, . . . , c , are integers. The inequalities of the problem define a subset P of R N, w h e r e P is a polytope (see Section A.2). First, consider the case n = 1. Here, we have a single d e p e n d e n c e 139
140
CHAPTER 6. METHOD OF BOUNDS
equation; d e n o t e it by f ( x ) = c.
(6.2)
By T h e o r e m A.6, there is a real solution to Equation (6.2) in P iff m i n f ( x ) _< c _< m a x f ( x ) . x~P
(6.3)
x~P
The b o u n d s test for a one-dimensional array consists of testing this inequality. When n > 1, the situation gets m o r e complicated. By T h e o r e m A.11, the system of equations (6.1) has a real solution in P iff max
~Rn-1
rnin[fl(X)+~.Ak(fk(x)--Ck)]<_Cl x~P
k=2 I'-
< min ~eRn-1
__
max[fl(X)+~'hk(fk(X)--Ck)l. x~P
L
k=2
"-I
j
(6.4)
This is the general inequality for the b o u n d s test; it includes (6.3) as a special case (n = 1). If (6.3) or (6.4) fails to hold, t h e n there is no dependence. The b o u n d s test originated in [Bane 76]. The b o u n d s test is practical w h e n the extreme values in (6.3) or (6.4) can be c o m p u t e d easily. The extreme values of a linear function f in a polytope P are attained at extreme points of P (Theorem A.5). There are d e p e n d e n c e problems where the structure of P is so regular, that we can easily find the extreme values of our function by r e p e a t e d applications of T h e o r e m 2.11. Also, taking the m i n i m u m and m a x i m u m values over ;~ ~ R n - 1 in (6.4) m a y not be too hard if n is small. The b o u n d s test can be diluted in two ways. First, instead of using the m i n i m u m and m a x i m u m values of the function in P, we can use a lower b o u n d and an u p p e r b o u n d in P. Then, the condition we would get will be only a necessary condition for the existence of a real solution. This fact can be stated in the f o r m of the following l e m m a whose p r o o f is trivial. L e m m a 6.1 Consider a b o u n d e d real-valued function f : D -~ R defined on a n o n e m p t y set D. Let p denote a lower a n d v an u p p e r bound for f . I f the equation f (x) = c has a solution in D, then p < c < v.
6.2. PERFECT NEST, ONE-DIMENSIONAL A R R A Y
141
Next, in case of a multi-dimensional array, instead of seeking a s i m u l t a n e o u s real solution to the system of n equations (6.1) in P, we m a y simply test for n individual real solutions to the n individual equations in P. Then, we would test the following sequence of r~ inequalities:
xm~i~fk(x) _< ck _< maxf~(x) x~P
(1 < k < n).
(6.5)
If at least one of these inequalities fails to hold, t h e n there is no dependence. In Section 6.2, we start with results for a one-dimensional array in a perfect rectangular loop nest with unit strides, and t h e n generalize t h e m by allowing arbitrary nesting of loops in Section 6.3. In Section 6.4, we consider tests for multi-dimensional arrays in a nest of rectangular loops with unit strides: the subscript by subscript version of the b o u n d s test as stated in (6.5), and the general test (6.4). Finally, in Section 6.5, we discuss the general b o u n d s test (6.4) in a general program. We should note here that unit strides have been ass u m e d solely for the purpose of simplifying expressions. For a loop with non-unit stride, we can use its iteration variable instead of its index variable, and thus effectively change the stride to 1. Indeed, by the m e t h o d s of sections 6.2-6.5, we can handle r e g u l a r loops with arbitrary strides (see T h e o r e m 4.3 and its corollary). We r e t u r n to this point in some of the examples and exercises.
6.2
Perfect Nest, One-Dimensional Array
Consider a perfect loop nest L of the form: do I1 = Pl, q~, 1 do I2 = P2, q2, 1 :
LI" L2"
L m
"
do I m = Pro, qm, 1 H (II, I2 . . . . . Ira) enddo :
enddo enddo
CHAPTER 6. METHOD OF BOUNDS
142
where each loop has c o n s t a n t limits and a stride of 1, and H is a sequence of a s s i g n m e n t statements. Let S and T denote two s t a t e m e n t s in H. Suppose X ( ao + a x I1 + a 2 1 2 + • • • + a m I m ) i s a p r o g r a m variable of S and X(bo + blI1 + b212 +. • • + bmIm) a variable of T, where X is a one-dimensional array, and all coefficients are integer constants. We s t u d y the p r o b l e m of d e p e n d e n c e between S and T posed by these two variables. Since the loop strides are all equal to 1, we work with the loop index variables: I1, I2 . . . . , I r a . The first t h e o r e m gives a set of necessary conditions u n d e r which T d e p e n d s on S at a given level ~. T h e o r e m 6.2 I f in the loop nest L, a variable X (ao + ~.r=l arIr ) o f S and a variable X (bo + ~ . r = l brlr) o f T cause a dependence o f T on S at a level ~, then the following two conditions hold: (a) The gcd o f
a l - b~,a2 - b2 . . . . ,a¢_~ - b ~ _ l , a e , . . . , a m , b ¢ , . . . , b m divides ( bo - ao ) ; a n d (b) ~ < b o - a o < 7 ,
where M
~ P ( a r - br, 0, Pr, t/r, 0) + p(a~, - b e , Pe, ~/e, 1) + r=l m
~.
[p(ar,O, p r , q r , O) + p ( - b r , O , pr, qr, O)]
r=e+l
and
e-1 V---- ~ v ( a r - b r , 0 , pr,~/r,0) + v ( a ~ , - b ~ , p ~ , ~ l e , 1) + r=l ~rt
~.
[v(ar,O, pr, qr,O) + v ( - b r , O , pr,qr,O)].
r=~+l
PROOF. The d e p e n d e n c e equation for these two variables is m
~ ( a r i r - b r j r ) = bo - ao, r=l
(6.6)
6.2. PERFECT NEST, ONE-DIMENSIONAL ARRA Y
14 3
a n d the d e p e n d e n c e c o n s t r a i n t s are
Pr <- ir <- qr ~ Pr < jr < qr
)
(1 < r _< m ) .
(6.7)
Since there is a d e p e n d e n c e o f s t a t e m e n t T on s t a t e m e n t S at level ~, there are integer v e c t o r s (il, i 2 . . . . . ira) a n d ( j l , j 2 , . . . , jm) t h a t satisfy Equation (6.6), the c o n s t r a i n t s (6.7), a n d the following additional c o n d i t i o n s 1: il = j~, i2 = j2 . . . . , i t - I = J r - l , ie < J¢.
(6.8)
Using the ~? - 1 equalities in (6.8), we can r e d u c e t h e n u m b e r o f variables f r o m 2 m to ( 2 m - / ? + 1). Thus, t h e r e are i n t e g e r s i~, i2,. • •, i t _ l , it, j¢, i¢+1, J r + x , . . - , ira, j m t h a t satisfy the e q u a t i o n
~. ( a t - b r ) i r + (aeit - b t j t ) + r=l
~. ( a r i r - b r j r ) = bo - ao, r=t+l
(6.9)
a n d the conditions: (l_
P r -< i r -< q r
Pt
(6.10)
Note t h a t the first s u m in (6.9) is e m p t y if/? = 1, t h e t e r m ( a ¢ i t - b t j t ) is a b s e n t if ~ = m + 1, a n d the last s u m is e m p t y if ~ equals m or m + 1. (These a s s u m p t i o n s are implicit in t h e i n t e r p r e t a t i o n of similar e x p r e s s i o n s t h r o u g h o u t this chapter.) Since Equation (6.9) has a n i n t e g e r solution, the gcd of t h e coefficients o n the left-hand-side m u s t divide the r i g h t - h a n d - s i d e (Theor e m 1.3.5). T h a t gives C o n d i t i o n (a) o f the t h e o r e m . To derive C o n d i t i o n (b), c o n s i d e r the Euclidean space R 2 m - t + l a n d d e n o t e an arbitrary v e c t o r in it b y X = (X1 . . . . . X ¢ _ I , Xg, yg, X ¢ + I , Y~?+I . . . . . X m , Y m ) .
1See Corollary 2 to Theorem 4.5. If e = m + 1, the inequality ie < Je is absent.
144
CHAPTER 6. METHOD OF BOUNDS
(The symbols Xr, Yr for the real variables have been so chosen that their correspondence to the integer variables of our problem is immediately clear.) Define a real-valued linear function f on R 2m-e+1 by f ( x l . . . . , x e - ~ , x e , Ye, x e + l , Ye+l . . . . , Xm, Y m ) e-!
m
= ~. (at - br)xr + (aexe - beye) + r=l
~'. ( a r x r - b r Y r ) .
(6.11)
r=¢+l
Let P denote the polytope in R 2m-e+~ defined by the conditions in (6.10), w h e n the integer variables are replaced by their real counterparts. In the real domain, Equation (6.9) may be written as f ( x ) = bo - a0. Since it has a solution in P by hypothesis, Theorem A.6 implies minf(X)xep -< b 0 - a o
< xmeafff(x).
To complete the proof, we need to show that minxeP f ( x ) = ~ and maxxep f ( x ) = V. The function f defined by (6.11) and the polytope P defined by (6.10) are such that the problem of finding the extreme values of f in P can be broken up into a number of two-variable problems, so that we may use Theorem 2.11 and its corollary. Let Pe denote the polytope ~ R z • Pe < x < qe, Pe < Y < qe, x <- Y - 1}.
{(x,y)
Then, we have g-1
xmi~f(x) = ~.
rain
(m- - b~.)x~. +
r = 1 Pr <-Xr <-qr
rain
(aexe - beYe) +
( x g , Y e ) ~ P8
I-
if'. r=e+l
|
"1
min
LPr <-xr <-qr
arxr+
rain
Pr $ Y r < q r
(-br)yr)]
e-1 = ~. I~(ar - b r , O, P r , ~ r , O) + /'=1
u(ae, -be, Pe, tie,
1) +
m
~'. [l~(ar,O, p r , Clr, O) + I ~ ( - b r , O, p r , qr, O)] r=~O+l --
=/3,
6.2. PERFECT NEST, ONE-DIMENSION~ A R R A Y
maxf(x)
= ~.
xGP
~'=1
max
Pr <-Xr<-qr
(at - br)xr +
[
In
Pr <Xr
r=~O+l
max
(xg,yg) ~Pf
14 5
(a~ox¢ - bcy~) +
Pr
g-1 = ~. V ( a r - br,O, Pr, qr, O) + v ( a e, - b e, Pe, qe, 1) + r=l ?n
~.
[V(ar, O, pr, qr, O) + v ( - b r , O, pr, qr,O)]
r=g+l
--~.
[]
C o r o l l a r y 1 I f s t a t e m e n t T depends on s t a t e m e n t S in the loop nest L, then the conditions o f Theorem 6.2 hold for some dependence level ~ such that i <_ ~ < m + 1 (or i < ~ < m , i f T < S). T h e o r e m 6.2 c a n be easily g e n e r a l i z e d to test for d e p e n d e n c e w i t h a specified d i r e c t i o n vector. To simplify e x p r e s s i o n s , we agree t h a t w h e n the r a n g e of r is n o t explicitly stated, it is {1, 2 , . . . , m } . T h e o r e m 6.3 Let ~r = ( o-1, ~2, . . . , ~rm) denote a vector whose elements are m e m b e r s o f the set {0, 1, - 1 , • }. I f in the loop nest L, the variables m m X (dO q- ~.r=X g r i t ) o f S a n d X (bo + Y-r=l brIr) o f T cause a dependence o f T on S with the direction vector ~r, then the following two conditions hold: (a) The gcd o f all integers in the three lists: { ( a t - br) " fir = 0}, { a t " ~rr ~ 0}, {br " ~ r * 0} divides ( bo - do); a n d
(b) ~ < b 0 - a o < ~ , where -fi= ~ . p ( a r - b r , (rr=O
O, p r , q r , O)+ ~. p ( a r , - b r ,
Pr, qr, 1 ) +
~r=l
~.
~ ( - br, dr, Pr, qr, 1) +
ffr=-i
~. [fl(ar, O, Pr, qr, O) + P ( - b r , O, Pr, qr, 0)] O'r=#
CHAPTER 6. METHOD OF BOUNDS
146
and ~=
E
V ( a r - b r , O, pr, tlr, O) + ~. v ( a r , - b r , Pr, qr, 1) +
O'r=0
O'r=l
V ( - br, a r , Pr, tit, 1) +
~. O-r = - I
Z
[V(ar, O, p r , q r , O) + v ( - b r , O , pr, qr, O)].
O'r=~
PROOF. The proof is similar to the p r o o f of T h e o r e m 6.2. We have the same d e p e n d e n c e equation: m
~. ( a r i r - brjr) = bo - ao,
(6.12)
r=l
and the same set of d e p e n d e n c e constraints:
Pr <_ ir < qr t Pr < j r < qr
(1 _
(6.13)
Since T d e p e n d s on S with the direction vector (o-1,0.2, • • •, 0.m), there exist integers il, i 2 , . . . , ira, j l , j 2 , . . . , j m that satisfy Equation (6.12), the constraints (6.13), and the additional conditions: ir = j r
if 0.r = 0,
(6.14)
ir < j r - 1
if 0.r = 1,
(6.15)
j r < ir - i
if 0.r = - 1 .
(6.16)
Define an integer N such that the n u m b e r of zero elefnents of ~r is 2 m - N. Then, we can reduce the n u m b e r of variables f r o m 2 m to N by writing j r = ir for each r such that 0.r 0. After regrouping of terms, Equation (6.12) can be written as =
E O'r =O
(ar-br)ir+
~. ( a r i r - b r j r ) + ~r= l
~. (arir - brjr) + ~'. (arir - b r j r ) = b0 - a0. O'r = - 1
Or=*
Since this equation has an integer solution, the gcd of the coefficients on the left-hand-side m u s t divide the right-hand-side (Theorem 1.3.5). That gives Condition (a) of the theorem.
6.2. PERFECT NEST, ONE-DIMENSIONAL ARRAY
14 7
To prove C o n d i t i o n (b), we define a real-valued linear function f on R N by f ( x ) = ~. ( a t - b r ) x r + Z ( a r X r - brYr) + fir = 1
fir =0
~.
( a r X r - brYr) + ~. (arXr - brYr).
ffr=-I
O'r=*
Let P d e n o t e the polytope in R N defined by the inequalities in (6.13), (6.15), and (6.16), w h e n integer variables have b e e n replaced by the c o r r e s p o n d i n g real variables. Existence of d e p e n d e n c e implies that the e q u a t i o n f ( x ) = b0 - ao has a real solution in P. Hence, by Theor e m A.6, we have m i n f ( x ) < bo - ao -< m a x f ( x ) . x~P
x~P
To c o m p u t e these extreme values, we note that for ~rr = 1, the variables Xr a n d Yr satisfy (6.13) and (6.15) iff (Xr,Yr) E Pr, where Pr = {(x,y)
E R 2 : Pr < x < qr, Pr < Y < qr,X
< y
- 1}
is a polytope in R 2. Also, for O-r = - 1, the variables Xr and Yr satisfy (6.13) a n d (6.16) iff (Yr, Xr) ~ Pr. The extreme values are t h e n f o u n d by r e p e a t e d applications of T h e o r e m 2.11 and its corollary: minf(x) = ~ x~P
min
~ = 0 Pr <-Xr<-qr
(at - br)xr + 2 O'r=l
=
~.
min
~rr = - 1
(Yr,Xr)~Pr
~. O'r= ,
min
(arXr - brYr) +
(xr,Yr)~Pr
( - b r Y r + arXr) + [
min
Pr <-Xr<-qr
arXr+
min
Pr <-Yr
(-br)Yr]
= ~. P ( a r - b r , O, pr, qr,O)+ ~. ~ ( a r , - b r , Pr, qr,1) + O'r=0
O'r=l
~.
U ( - br, a r , Pr, qr, 1) +
~rr=-I
~. O',¢-= * B
[ P ( a r , 0 , pr, q r , 0 ) + u ( - b r , 0 , pr, q r , 0 ) ]
CHAPTER 6. METHOD OF BOUNDS
148
xm~a~fp( x )
=
max
•
0 pr<-xr<-clr
~.
_
_
(at - br)xr + ~
max
max
( a r X r - brYr) +
O'r:! (xr,Yr)ePr
(-brYr + arXr) +
fir=_ 1 ( Y r , X r ) ~ P r
~. ~r=,
~. v ( a r - b r , O ,
[
max
arXr+
Pr<-Xr<-qr
pr, qr, O)+ ~
O'r = 0
max
(-br)Yr]
pr<-.Yr<-qr
v ( a r , - b r , pr, qr, 1)+
O'r = 1
~.
V ( - br, a r , Pr, 6/r, 1) +
~r=-I
~. [V(ar,O, pr, qr, O) + v ( - b r , O , pr,qr,O)] O'r~, D
[]
Corollary 1 If statement T depends on statement S in the loop nest L, then the conditions of Theorem 6.3 hold for some direction vector tr >- 0 (or tr >- O, if T < S ). The first b o u n d s test, a version of the test in T h e o r e m 6.2(b), appeared in [Bane 76]. An extension of that inequality was given in [Kenn 80]. That extension is a special case of the test in Theor e m 6.3(b) for a t r of the f o r m ( 0 , . . . , 0 , 1 , - 1 , , , . . . , , ) . (Presence of d e p e n d e n c e with such a direction vector prevents loop interchange.) Bounds tests of various types were given in [Bane 88a]o We would like to note here that the loop nest I. n e e d not be perfect. The results of this section apply w h e n the loop nests d e t e r m i n e d by the s t a t e m e n t s S and T are the same. Example 6.1 Consider the double loop L1 : L2 :
d o I1 = 10, 100, 1 d o I2 = 2, 50, 1
S: T:
X(I~ + I2 - 10) . . . . ....... X(2I~ + I2 + 31) • • • enddo enddo
6.2. PERFECT NEST, ONE-DIMENSIONAL A R R A Y
149
We c h e c k for d e p e n d e n c e of T o n S with t h e d i r e c t i o n v e c t o r (1, - 1). I n s t e a d of a p p l y i n g T h e o r e m 6.3 directly, we w o r k o u t the p r o o f of that t h e o r e m for this particular example. The d e p e n d e n c e e q u a t i o n is (il - 2 j l ) + (i2 - j2) = 41. The g c d o f t h e coefficients is 1, so that this e q u a t i o n h a s i n t e g e r solutions. Thus, the g c d test is n o t effective in this case. The d e p e n d e n c e c o n s t r a i n t s are 10
< i l -<
100,
2
10 < j l < 100,
_< i 2 --< 5 0 ,
2 < j2 < 50.
The d i r e c t i o n v e c t o r (1, - 1) i m p o s e s the a d d i t i o n a l c o n d i t i o n s : il < jx - 1
and
j2 < i2 - 1.
All t h e s e inequalities t o g e t h e r define a p o l y t o p e P in R 4. In t e r m s of real variables, t h e d e p e n d e n c e e q u a t i o n can be w r i t t e n as f ( x ) = 41, w h e r e x = ( x l , Yl, x2, Y2), a n d f : R 4 - R is a linear f u n c t i o n d e f i n e d by f ( x ) = (Xl - 2 y l ) + (x2 - Y2). The b o u n d s t e s t s a y s that d e p e n d e n c e c a n n o t exist u n l e s s minf(x) x~P--
_< 41 _< m a x f ( x ) . xGP
Now, w e have rnin f ( x ) = xGP
min
lO_<x~_
( X 1 --
2yl) +
min
2_
( - y ~ + x2)
= p ( 1 , - 2 , 10, 100, 1) + / ~ ( - 1 , 1, 2, 50, 1) = -189
b y (2.19),
CHAPTER 6. METHOD OF BOUNDS
150
a n d similarly, maxf(x) = x~P
max
(x~ - 2 y l ) +
10_<x~ < 100 lO_
max
2
(-Y2 + x 2 )
= v ( 1 , - 2 , 10, 100, 1) + v ( - 1 , 1,2, 50, 1) = 36
by (2.21).
Since the c o n d i t i o n - 1 8 9 < 41 < 36 d o e s n o t hold, there is no depend e n c e of T o n S w i t h the d i r e c t i o n v e c t o r (1, - 1).
EXERCISES 6.2 1. In the n o t a t i o n of T h e o r e m 6.2, what are the necessary conditions for the d e p e n d e n c e of S on T at a level e? In the n o t a t i o n of Theorem 6.3, what are the necessary c o n d i t i o n s for the d e p e n d e n c e of S o n T with a direction vector ~r? 2. In the following programs, test for the d e p e n d e n c e of T on S, S on T, S on S, and T o n T at each possible level a n d with each possible direction vector (use theorems 6.2 a n d 6.3): (a) The program of Example 6.1; (b) The program of Example 6.1 after changing the loop headers to LI : L2:
d o I~ = 10, 100, 1
do 12 = 2 1 ~ + 2 , 2 I ~ + 5 0 , 3
(C) L1 : L2 : L3: S : T :
do I~ = O, 10, 1 d o 12 = O, 5, 1 dol3 =0,8,1 X ( 4 1 1 - 812 + 1) . . . . .......
X(-6II
+ 412 + 1613 + 89) • • •
enddo enddo enddo (d) L1 : L2 : L3 : S : T :
d o I1 = 0, 100, 1 do/2 =I~,I~ +25,2 d o 13 = I~ + I 2 , I 1 + 12 + 48, 1 X(4I~ 8/2 + 1) . . . . ....... X ( - 6 I ~ + 412 + 1613 + 89) • • • enddo enddo enddo - -
6.3.
RECTANGULAR
LOOPS, ONE-DIMENSIONAL ARRA Y
151
6.3 Rectangular Loops, One-Dimensional Array In this section, we consider a general p r o g r a m a n d u s e the n o t a t i o n d e v e l o p e d in Chapter 5. Let S a n d T d e n o t e any two s t a t e m e n t s in the program. Let m s d e n o t e the n u m b e r of loops in the n e s t Ls det e r m i n e d by S, ~¢/-Tthe n u m b e r of loops in the n e s t LT d e t e r m i n e d by T, and m the n u m b e r of loops in the nest d e t e r m i n e d by both. We label the loops in the p r o g r a m L1, L 2 , . . . , L m s ÷ m r - m , s u c h that I.
= (L1,L2,...,Lm)
LS = ( L 1 , L 2 , . . . , L m , L m + I , . . . , L m s )
LT = (L1, L2 . . . . , L m , L m s + l , . . . ,
Lms+rnT-m).
We a s s u m e that the limits of each loop in the p r o g r a m are c o n s t a n t integers, and that the stride of each loop is 1. However, as p o i n t e d out in Section 6.1, it w o u l d suffice simply to a s s u m e that the loop n e s t s d e t e r m i n e d by the two s t a t e m e n t s are regular. The index variable of the loop L r is d e n o t e d by I t , its initial limit by P r , and its final limit by air, where 1 <_ r <_ m s + m T - m . A s s u m e that b o t h a s s i g n m e n t s t a t e m e n t s have p r o g r a m variables that are e l e m e n t s of a o n e - d i m e n s i o n a l array X. Let and
X (ao + alI1 + • • • + amIm
+ am+lIm+l
+ • " " + amsIms)
X (bo + blI1 + • • • + bmIm
+ bm+lIms+l
+ • " " + bmrIms+mr-m)
d e n o t e the variables of S a n d T, respectively. The d e p e n d e n c e equation is m
~0+ E ~rir+ r=l
ms
mT
m
E
arir=bo+ ~. brjr+
r=m+l
r=l
E
]-%Jms+r-m
r=m+l
or
m
ms
E (arir -- brjr) + E r=l
mT arir -
r:m+l
~ r:m+l
brJms+r-m: bo - ao. (6.17)
The d e p e n d e n c e c o n s t r a i n t s are P r <- i r < q r pr<_jr<_qr
(1 _< r _< m s ) (l<_r<_m,
ms+l<_r<_ms+mT-m).
152
CHAPTER 6. METHOD OF BOUNDS
We can n o w generalize the results of the previous section for a perfect n e s t to results for a p r o g r a m consisting of arbitrarily n e s t e d loops. We state those results with j u s t hints for proofs, o m i t t i n g details that can be c o n s t r u c t e d by e m u l a t i n g the p r o o f s given for theo r e m s 6.2 a n d 6.3. The first t h e o r e m c o r r e s p o n d s to T h e o r e m 6.2. T h e o r e m 6.4 If in the model program o f this section, two variables X
I
ao+ ~.arlr r=l
sll
and X
bo+ ~ . b r l r + r=l
~.
brlms+r-m
r=m+l
I
o f statements S a n d T, respectively, cause a dependence o f T on S at a level ~, then the following two conditions hold: (a) The gcd o f a l - hi, a2 - be . . . . . ae-1 - be-i, a e , . . . , ares, b e , . . . , bmr divides ( bo - ao ) ; a n d (b) ~ < bo - ao -< v, where e-1 -fi = ~. P ( a r - br, O, p r , q r , O) + p(ae, - b e , Pe, qe, 1) + r=l m
ms
p(ar,O, pr, qr,O) +
~.
p ( - b r , O, pr, qr, O) +
r=e+l
r=e+l ~ttT
~.
p ( - b r , O, Pros +r-m, qms +r -re, O)
r=m+l
and ~ = ~. V(ar - br, O, pr, qr, O) + v ( a f , - b ¢ , P e , q~, 1) + r=l mS
y.
~
v(ar,O,p
r=e+l
,qr,o) + Y. r=e+l
trtT
Z v ( - b r , O, Pms+r-m, qms+r-m, 0). r=m+l
+
153
6.3. RECTANGULAR LOOPS, ONE-DIMENSIONAL A R R A Y
PROOF. The d e p e n d e n c e e q u a t i o n (6.17) can be written as
~-1
ms
~.(ar-br)ir+(aei~-bej~)+
~
r=l ~t
+
arir
r=e+l tttT
~.
(-Dr)jr +
r=e+l
~.
( - b r ) J m s + r - m = D o - ao.
r=m+l
We can n o w e m u l a t e the p r o o f of T h e o r e m 6.2 a n d create a detailed p r o o f for the c u r r e n t theorem. [] Corollary 1 I f s t a t e m e n t T d e p e n d s on s t a t e m e n t S in the m o d e l prog r a m , then the conditions o f T h e o r e m 6.4 hold f o r s o m e d e p e n d e n c e level ~ such that l < ~ < rn + l (or l < ~ < m , if T < S). The next result is the generalized version of T h e o r e m 6.3. T h e o r e m 6.5 Let tr = ( ¢rl , ~2, . . . , O-m) denote a vector w h o s e e l e m e n t s are m e m b e r s o f the set {0, 1 , - 1, • }. I f in the m o d e l p r o g r a m o f this section, two variables
X
(
ao + ~.. a r l r r=l
)
and X
I Do+ ~.~tt b r I r r=l
tttT X
+
brIms+r-m
I
r=m+l
o f s t a t e m e n t s S a n d T, respectively, cause a d e p e n d e n c e o f T on S with the direction vector ~, then the following t w o conditions hold:
(a) The g c d o f all integers in the three lists: { a t - br " 1 < r < rn a n d crr = 0}, { a r ' l < r < m a n d ~ r r v:O, o r m + {br'l
(b) ~ < b o - a o < ~ , where
l
154
CHAPTER 6. METHOD OF BOUNDS
~
- b r , O, pr, ttr, O) +
P(ar
t~r=O
l
~.
P ( a r , - b r , Pr,qr, 1) +
~rr=l
~.
lg(-br, ar, Pr,qr,1) +
trr=-I
l
l
~'. [t~(ar,O, pr,qr, O) +l~(-br,O, pr,qr,O)] + O-r= ,
l_
P(ar, O, Pr, qr, O) +
~. r=m+l
mT ~. p(-br,O, Pms+r-m, qms+r-m, O) r=m+l
and ~=
~. v(ar
br,O, pr, Clr,O) +
-
t~r =0
l_
~
V ( a r , - b r , pr, q r , 1 ) +
~.
~r = 1
~r = - 1
l_
l
~.
v ( - b r , ar, pr, qr, X)+
[V(ar, O, pr, qr, O) + v(-br,O, pr,qr, O)] +
0"~ = ¢¢
l<_r<_m ~rts
~
V(ar, O, pr, qr, O) +
r=m+l mT
~.
v ( - b r , O, P m s + r - m ,
qms+r-m, 0).
r=m+l
PROOF. We can construct a proof by rewriting (6.17) in the form
~
(ar-br)ir+
~ r =0
~.
(arir-brjr)+
~rr = 1
l
l
~. ( a r i r - brjr) + O'r = - I
~.
(arir - brjr) +
ff~=%
l_
l
~. r=t~'t+ 1
mT
arir +
~. r=~'t+
(-br)Jms+r-m = bO - ao. 1
6.3. RECTANGULAR LOOPS, ONE-DIMENSIONAL ARRA Y
155
C o r o l l a r y 1 I f statement T depends on statement S in the model program, then the conditions o f Theorem 6.5 hold for some direction vector ~r >_0 (or ~r >- O, if T < S ).
E x a m p l e 6.2 C o n s i d e r the p r o g r a m L1 " L2 : L3 : S :
d o 11 = 10, 1 0 0 , 1 do I2 = 0, 1 0 0 , 2 do I3 = Ii,I1 + 10, 1 X(211 + 212 - 213 + 5) . . . .
enddo
L4 :
I2 + 10, I 2 , - 1 ....... X(212 + 214 + 9) • • • enddo enddo enddo d o I4 =
T:
Suppose we w a n t to test if s t a t e m e n t T d e p e n d s o n s t a t e m e n t S at level 2. The loop n e s t s d e t e r m i n e d by the two s t a t e m e n t s are regular, since the coefficients o f a n y given i n d e x variable in initial a n d final limits of e a c h loop are equal. We c a n u s e t h e t e c h n i q u e of C h a p t e r 5 to state the d e p e n d e n c e p r o b l e m in t e r m s of i t e r a t i o n variables, so that it will b e c o m e a p r o b l e m for r e c t a n g u l a r loops w i t h unit strides. Then, we w o u l d a p p l y T h e o r e m 6.4 to c h e c k for t h e d e p e n d e n c e . In the following, we s h o w the details o f h o w this is done. Since Lx h a s a stride of 1, we c h o o s e n o t to c h a n g e f r o m its i n d e x variable I1 to its i t e r a t i o n variable ~1. The d e p e n d e n c e e q u a t i o n in t e r m s o f i n d e x values is (2il
+ 2i2 - 2i3) -
(2j2 + 2j4) = 4.
Noting that the i n d e x variables I2, I3, a n d I4 are r e l a t e d to the corres p o n d i n g i t e r a t i o n variables ~2, [3, a n d ~4 b y the following equations: _-
~3
~i 4- ~3 +
lO)
-
=
-
+
lO,
we can rewrite the dependence equation in the form
4~2 - 8)2 - 2~3 + 2)4 = 24.
(6.18)
156
CHAPTER 6. METHOD OF BOUNDS
The ranges of I1, ~2, ~3, and ,~4 are given by 10 _< I1 < 100,
0 < ,f3 < 10,
0<~2_<50,
0_<~4_< 10.
From these inequalities, we get the d e p e n d e n c e c o n s t r a i n t s in t e r m s Of i l , j l , ~2,j2, ~3, a n d j4: 10 < i~ < 100,
0 _< i2 -< 50,
0 _< 23 -< 10,
10 _< j l < 100,
0 _< .~2 -< 50,
0 _< .~4 -< 10.
D e p e n d e n c e at level 2 i m p o s e s two extra conditions: il = j l and ~2 --< ,~2 -- 1. We use the equality to eliminate the variable j~. (For this problem, we effectively deal with four variables: i2, j2, ~3, ,~4.) The gcd of the coefficients on the left-hand-side of (6.18) is 2. It divides the right-hand-side, and therefore the gcd test is inconclusive. Next, we consider the b o u n d s test. To simplify c o m p u t a t i o n , we rewrite (6.18) as the e q u a t i o n 2~2 - 4j2
-
~3 +
,~4 =
12.
Denote a vector in R 5 by x = (Xl, x2, Y2, x3, Y4), a n d a linear f u n c t i o n f:R 5 ~Rby f ( x ) = 2x2 - 4y2 - X3 + Y4. In the real domain, the d e p e n d e n c e e q u a t i o n has the f o r m f ( x ) = 12. By the b o u n d s test, the specified d e p e n d e n c e c a n n o t exist u n l e s s m i n f ( x ) _< 12 _< m a x f ( x ) , xEP
x~P
where P is the polytope in R 5 defined by 10 < X1 ~< 100,
0 ~ X2 --< 50,
0 --< X3 --< 10,
0
0 _ < y 4 < 10.
< 50,
x2 -< Y2 - 1, We have minf(x) = xEP
min
(2x2 - 4y2) +
O_<x2< 50 0_
rain
O_<x3_<10
(--X3)
+
min
0_
Y4
= / 2 ( 2 , - 4 , 0 , 50, 1) + / 2 ( - 1 , 0 , 0 , 10,0) + / 2 ( 1 , 0 , 0 , 10,0) = -210
by (2.19),
6.3.
RECTANGULAR
LOOPS,
ONE-DIMENSIONAL
157
ARRAY
and similarly,
maxf(x)
=
max
x~P
(2x2-4y2)+
0 < x 2 -<50 0_
= v(2,-4,0, = 6
Since the condition
max
(-x3)+
0 __<x3 < 10
50, 1) + v ( - 1 , O , 0 ,
max
74
0 < y 4 < 10
10,0) + v(1,0,0,
10,0)
b y (2.21).
-210
rules out the dependence
< 12 < 6 d o e s n o t h o l d , t h e b o u n d s of statement
T on statement
test
S a t l e v e l 2.
EXERCISES 6.3 1. In the n o t a t i o n of Theorem 6.4, what are the necessary conditions for the d e p e n d e n c e of S o n T at a level ~?? In the n o t a t i o n of T h e o r e m 6.5, what are the necessary conditions for the d e p e n d e n c e of S on T with a direction vector ~r? 2. In the following programs, test for the d e p e n d e n c e of T o n S, S o n T, S o n S, a n d T o n T at each possible level a n d with each possible direction vector (use t h e o r e m s 6.4 a n d 6.5): (a) The p r o g r a m of Example 6.2; (b) L1 : L2 : L3 :
d o I1 = 10, 100, 1 d o 12 = 11,11 + lO0, 2 do 13 = 11 + 12, I1 + 12 + 10, 1 S : X(211 + 212 - 213 + 5) . . . . enddo L4 : d o I4 = 311 + 10, 3 I b - 1 T: ....... X(212 + 214 + 9) • • • enddo enddo enddo
(C) L1 :
d o 11 = 10, 100, 1 doI2 = 0,100,2 d o I 3 = 11,I1 + 10,1 S : X ( 2 I I + 212 - 213 + 5) . . . . enddo T: ....... X(211 + 2/2 + 9) • • • enddo enddo
L2 ". L3 :
158
CHAPTER 6. METHOD OF BOUNDS
6.4 Rectangular Loops, Multi-Dimensional Array In this section, we consider d e p e n d e n c e tests for multi-dimensional arrays in rectangular loops. First, we describe subscript-by-subscript tests. S u p p o s e we have a m e t h o d of testing for d e p e n d e n c e c a u s e d by two p r o g r a m variables that are e l e m e n t s of a o n e - d i m e n s i o n a l array. A d e p e n d e n c e p r o b l e m p o s e d by e l e m e n t s of an n - d i m e n s i o n a l array can be treated as a s e q u e n c e of n o n e - d i m e n s i o n a l problems. If our m e t h o d detects a lack of d e p e n d e n c e in any one of those n problems, t h e n there is no d e p e n d e n c e in the n - d i m e n s i o n a l problem. However, the converse is false. We describe below the subscript-by-subscript version of the b o u n d s m e t h o d that consists of a gcd test a n d a b o u n d s test in each d i m e n s i o n . (These b o u n d s tests are given by (6.5).) For each of the results in sections 6.2 and 6.3, there is a corres p o n d i n g result for n - d i m e n s i o n a l arrays (n > 1). We only show here h o w to generalize T h e o r e m 6.2; the rest is left to the reader. T h e o r e m 6.6 Consider the perfect loop nest L o f Section 6.2, and two statements S and T in its body. Let X ( I A + a0) denote a program variable o f S and X (IB + bo) a variable o f T , where X is an n-dimensional array, a0 = ( a o l , a.o2 . . . . , aon) andbo = ( b o l , b o 2 , . . . , bun) are integer n-vectors, and A = ( a r k ) and B = (brk) are rn x n integer matrices. I f these variables cause a dependence o f T on S at a level ~, then the following conditions hold: (a) For each k in 1 < k < n, the gcd o f alk
-- blk,
...,
a(~_l)k
- b ( ~ _ l ) k, a ~ k , . . . ,
amk,
b~k, ..., bmk
divides (bok - aOk); and
(b) For each k in 1 <_ k <_n, we have -ilk < bok - a O k < Vk, where
~ / 3 ( a r k - brk, O, pr, qr,O) +/~(a~k,-b¢k,P¢,q¢, 1) + ~'=1 ~t~
~. r=~+l
[/~(ark, O, Pr, qr, O) + I~(-brk, O, Pr, qr, 011
6.4. RECTANGULAR LOOPS, MULTI-DIMENSIONAL A R R A Y
159
and e-1
~k = ~. v ( a r k - brk, O, p r , q r , O ) + v(aek,-b~k,P~o,q¢, 1) + r=l ~l
2
[V(ark, O, pr, qr, O) + v ( - b r k , O , p r , q r , O)].
r=~)+l
PROOF. Instead of a single d e p e n d e n c e equation (6.6) as in Theorem 6.2, we get a system of n d e p e n d e n c e equations: m
~. ( a r k i r - b r k j r ) = bok - aok,
(1 _< k_< n).
(6.19)
r=l
The d e p e n d e n c e constraints (6.7) remain the same: Pr < ir < qr ~ Pr < j r < qr )
(1 < r < m ) ,
(6.20)
and we have the same additional conditions: il = j~,i2 = j 2 , . . . , i e - 1 = j e - l , i e <
Je.
(6.21)
Existence of the d e p e n d e n c e implies the existence of a simultaneous integer solution to the system (6.19) that satisfies (6.20) and (6.21). Then, there is an integer solution to each individual equation in (6.19) that satisfies (6.20) and (6.21). The conditions in (a) and (b) will follow, if the technique in the proof of T h e o r e m 6.2 is applied separately to each equation. [] It is trivial to translate this t h e o r e m into an algorithm. We can first p e r f o r m the n gcd tests in sequence. If one of t h e m fails, then we terminate; there is no dependence. If all the gcd tests pass, then we do the n b o u n d s tests in sequence. If one of t h e m fails, then we terminate; there is no dependence. If all b o u n d s tests pass, then there is no definite conclusion; we assume d e p e n d e n c e in this case. Next, we consider the general b o u n d s test. We state a result for elements of a two-dimensional array, that generalizes T h e o r e m 6.2. Similar generalization of the results in sections 6.2-6.3 to n dimensions is left to the reader.
CHAPTER 6. METHOD OF BOUNDS
160
T h e o r e m 6.7 Consider the perfect loop nest L o f Section 6.2, and two statements S a n d T in its body. Let X ( I A + a0) denote a program variable o f S a n d X(|B + bo) a variable o f T , where X is a two-dimensional array, a0 = ( a m , a o 2 ) a n d h o = ( b o l , b 0 2 ) a r e integer 2-vectors, a n d A = ( a r k ) a n d B = (brk) are r n x 2 integer matrices. If these variables cause a dependence o f T on S at a level g, then the following conditions hold: (a) The system o f two equations g-1
m
E ( a r k - brk)ir + (agkie - bekje) + ~ (arkir - brkJr) r=l r=#+l (k = 1,2) (6.22) = bok - aOk,
has an integer solution; a n d (b) m a x ~(A) < b01 - aol < m i n V(?Q, h
h
where $-1 ~ ( h ) = Z 12(arl - b r l
+ h ( a r 2 - br2),O, p r , q r , O)+
r=l
~(ae~ + Aae~, - b e ~ - hbe~, Pe, qe, 1) + m
Z
[/~ ( a r l + }kar2, O, Pr, qr, O) +/~( - b r l - ~br2, O, Pr, qr, O) ]
r=$+l
and /?-1
9 ( h ) = ~. V ( a r l - brl + A(ar2 - br2),O, P r , q r , O) + r=l
v ( a e l + ;~ae2, -be1
-
~bg2,
Pe, qe, 1) +
Fit
y.
[ v ( a r l + Aar2,0, Pr, qr, 0) + v ( - b r l - A b r 2 , 0 , Pr, qr, 0)].
r=g+l
PROOF. The p r o o f o f this t h e o r e m is also b a s e d o n that of Theor e m 6.2. I n s t e a d o f the single d e p e n d e n c e e q u a t i o n (6.9), we get the two e q u a t i o n s of (6.22). For the d e p e n d e n c e to exist, t h e r e m u s t be
6.4. RECTANGULAR LOOPS, MUL TI-DIMENSIONAL A R R A Y
161
an integer solution to this system. (That is to be checked by the generalized gcd test of Section 5.4.) This is Condition (a). To derive Condition (b), consider the Euclidean space R 2m-g+ 1 and denote an arbitrary vector in it by X = ( X l . . . . , Xg_l,
x e , Ye, xe+l, Y e + I , . . . , x m , Y m ) .
Define two real-valued linear functions f l and f2 g-1
on R 2m-g+l
by
rn
fk(x) = ~ . ( a r k - b r k ) x r + ( a e ~ x e - b e ~ y e ) + r=l
~'. ( a r ~ x r - b r k Y r ) r=g+l
for k = 1, 2. Let P denote the polytope in R 2m-g+ 1 defined by the conditions in (6.10), w h e n the integer variables are replaced by their real counterparts. In the real domain, the d e p e n d e n c e equations (6.22) may be written in the f o r m fl(X)
= Do1 - a O l
f2 (x) = b02 - a02. Since this system has a solution in P by hypothesis, by T h e o r e m A.10 we get max rain [ f l ( x ) + A(f2(x) - c2)] -< £1 A
x~P
_< m i n m a x If1 (x) + A (f2 (x) - c2) ]. A
x~P
The definitions of f l and f2 imply $-1
f l ( x ) + A(f2(x) - c2) = ~. [(ar~ - br~) + A(ar2 - br2)]Xr + r=l
[(a$1 + h a # 2 ) x e -
(be1 + h b e 2 ) y # ] +
m
~'. [ ( a r l + A a r 2 ) x r - (brl + h b r 2 ) Y r ] . r=g+l
Now, we can show that min If1 (x) + A(f2(x) - c2)] = ~(A) xGP
max [f~ (x) + A(f2(x) - c2)] = F(A) xGP
162
CHAPTER
6.
METHOD
OF BOUNDS
as in the p r o o f of T h e o r e m 6.2. C o n d i t i o n (b) t h e n follows a n d the p r o o f is c o m p l e t e . [] In Example 6.3, we apply the b o u n d s test to c h e c k for d e p e n d e n c e c a u s e d b y e l e m e n t s of a t w o - d i m e n s i o n a l array. We c o n s i d e r d e p e n d e n c e w i t h a given d i r e c t i o n vector to s h o w h o w T h e o r e m 6.3 can be e x t e n d e d to t h e t w o - d i m e n s i o n a l case. The general b o u n d s test a n d its s u b s c r i p t - b y - s u b s c r i p t v e r s i o n are b o t h illustrated. E x a m p l e 6.3 C o n s i d e r the p r o g r a m ( f r o m Example 3.1 in [Li 89]) L1 : L2 : S : T :
d o l l = 1,50, 1 do I2 = 2, 50, 1 X ( 2 I ~ + 312 + 50, 3I~ + I2 + 49) . . . . ....... X ( I ~ - I2 + 51, 2I~ - I2 + 48) • • • enddo enddo
Let us s t u d y the d e p e n d e n c e of T on S with the d i r e c t i o n v e c t o r ( 1, 1). The d e p e n d e n c e e q u a t i o n s are 2i~ - j l
+ 3i2 + j2 = 1
3 ~ - 2j~ + ~2 + j2 = - 1 .
(6.23) (6.24)
First, we a p p l y T h e o r e m 6.3 s e p a r a t e l y to e a c h individual d i m e n sion. The gcd test in e a c h case is inconclusive. The inequalities for the b o u n d s test applied s e p a r a t e l y to the two d i m e n s i o n s are /~(2, - 1 , 1 , 50,1) + U(3,1, 2, 50,1) < 1 < v ( 2 , - 1 , 1, 50, 1) + v ( 3 , 1 , 2 , 5 0 , 1 ) , ~ ( 3 , - 2 , 1, 50, 1) + ~u(1, 1, 2, 50, 1) < - 1 < v ( 3 , - 2 , 1, 50, 1) + v(1, 1,2, 50, 1), or - 3 9 < 1 < 245
and
- 92 < - 1 _< 146,
b o t h of w h i c h hold. Thus, if we rely o n this s u b s c r i p t - b y - s u b s c r i p t test, we have to a s s u m e d e p e n d e n c e . Next, we apply t h e b o u n d s test in its generality. In t h e real d o m a i n , e q u a t i o n s (6.23)-(6.24) can be w r i t t e n in the f o r m f l (x) = 1 ) f2(x) - 1 , ~-
(6.25)
6.4. RECTANGULAR LOOPS, MUL TI-DIMENSIONAL A R R A Y
163
where X = (xI,Yl,X2,W2) f l (X) = 2X1 - y~ + 3X2 + Y2 f 2 ( x ) = 3X1 - 2 y l + X2 + Y2. Let P d e n o t e the p o l y t o p e in R 4 d e f i n e d b y the inequalities 1 < X l -< 5 0 ,
2 < x2 < 50,
1 - < y l _<50,
2
Xl _< Yl
-
< 50,
x2 < Y2 - 1.
1,
By T h e o r e m A.10, the s y s t e m (6.25) h a s a real s o l u t i o n in P iff m a x xmi~ [f~ (x) + A (f2 (x) + 1) ] _< 1 A
_< rain m a x [f~ (x) + h ( f 2 ( x ) + 1)]. h
x~P
(6.26)
Since f l (x) + h (f2 (x) + 1) = 2Xl - Yl + 3x2 + Y2 + h(3x~ - 2 y l + x2 + Y2 + 1) = (2 + 3h)x~ - (1 + 2 h ) y x + (3 + h ) x 2 + (1 + h ) y 2 + A, b y T h e o r e m 2.11 a n d Definition (2.19), w e get
min [fx (x) + h (f~ (x) + 1) ] x~P
--
roan
1<X1_<50 l
x~<_y~-i
[(2 + 3h)Xx - (1 + 2 h ) y l ] +
min
[(3 + h)x2 + (1
2<x2_<50
+
A)y2] + h
2
=/2(2 + 3 A , - I - 2A, i, 50, I) +/2(3 + A, 1 + A, 2, 50, 1) + A = [ ( 1 + ~) - ( 1 + 2A) + 48 r n i n ( - 1 -
2A, I + A,O)] +
[(4 + 2A)2 + ( I + A ) + 4 7 rain(1 + A , 4 + 2A, O)] + A = SA + 9 + 4 8 m i n ( - 1 -
2A, 1 + A,O) + 4 7 m i n ( 1 + A,4 + 2A, O).
164
CHAPTER 6. METHOD OF BOUNDS
Since min ( - 1 - 2h, 1 + h,0) =
l+h 0 -1 -
2h
ifh < -1 if-1
and rain(1 + A,4 + 2h,0) =
f 4+2h l+h 0
ifh<-3 if-3
we have [ 8h+14 7h+11 min [fl (x) + ~(f2 (x) + 1) ] = xeP 5A + 9 3A + 8
ifh<-3 if - 3 < h < - I i f - 1 _< h _< - 1 / 2 if - 1 / 2 _< A.
Thus, this m i n i m u m is a piecewise-linear function of A. In the interval -oo < A _< - 3 , its m a x i m u m value is equal to max (Al~_m~o(8h+1 4 ) , 8 ( - 3 ) + 1 4 ) = -10. There are similar results for the other three intervals. Hence, we have max xmi~[fl (x) + Mf2(x) + 1)] -- max ( - 1 0 , 4, 13/2, ~o) h ~
00.
This shows that (6.26) cannot hold. Hence, T does not depend on S with the direction vector (1, 1). Zhiyuan Li [Li 89] was the first to do a serious study of dependence analysis for multi-dimensional array elements, that did not involve subscript-by-subscript testing, nor array linearization. (See Exercise 2.) The general bounds m e t h o d of this chapter, based on the theory presented in the appendix, includes Li's h-test. His m e t h o d of development, however, is completely different from ours, and we have not covered all his results. See [Li 89], [LiYe 89], and [LiYZ 90] for full details.
6.5. GENERAL METHOD
165
EXERCISES 6.4
1.
(a) Generalize Theorem 6.6 to give necessary conditions for dependence with a given direction vector o~. (b) Generalize Theorem 6.7 to give necessary conditions for dependence with a given direction vector tr. (c) State the n-dimensional version of Theorem 6.7 and that of the theorem for a given direction vector obtained in the previous exercise. (d) Extend all results from a perfect loop nest to a general program.
2. Discuss the effect of array linearization in the context of dependence testing for multi-dimensional arrays. Explain the difficulties that may arise due to linearization. (See [WoBa 87] and [Bane 88a].) 3. In the following programs, test for the dependence of T on S, S on T, S on S, and T on T at all possible levels and with all possible direction vectors: (a) The program of the example in Chapter 1; (b) The program of Example 3.3; (c) The program of Example 3.3 after changing the loop headers to L1 : L2 :
do I~ = 0, 100, 2 doI2 = 211 + 4,2I~ - 12,-1
(d) The program of Example 3.4; (e) The program of Example 6.3. Use both the subscript-by-subscript test and the general method of bounds. Compare your results.
6.5
General Method
In t h e o r y , t h e b o u n d s t e s t c a n b e a p p l i e d to e l e m e n t s o f a m u l t i d i m e n s i o n a l a r r a y in a g e n e r a l p r o g r a m , w h e r e l o o p l i m i t s a r e n o t n e c essarily constant. However, the complexity of the process increases r a p i d l y w i t h t h e n u m b e r o f t h e p a r a m e t e r s (the A's a n d t h e / J ' s ) . In E x a m p l e 6.4, w e i l l u s t r a t e a n a p p l i c a t i o n o f t h e m e t h o d o f b o u n d s to elements of a one-dimensional array, such that we have to deal with t w o ?~ p a r a m e t e r s . In t h i s p r o b l e m , w e u s e L a g r a n g e a n r e l a x a t i o n to c o m p u t e t h e m i n i m u m a n d m a x i m u m v a l u e s o f a l i n e a r f u n c t i o n in a p o l y t o p e (see S e c t i o n A.4)o As p o i n t e d o u t i n S e c t i o n 6.1, a r b i t r a r y s t r i d e s a n d r e g u l a r l o o p nests do not add to the essential complexity of the bounds method; w e s i m p l y c h a n g e o v e r t o i t e r a t i o n v a r i a b l e s in t h a t case. In E x e r c i s e 3,
166
CHAPTER 6. METHOD OF BOUNDS
t h e r e a d e r is a s k e d to g e n e r a l i z e the r e s u l t s of this c h a p t e r along this line. E x a m p l e 6.4 C o n s i d e r the d o u b l e loop d o I1 = 0, 100, 1 d o 12 = 0,11, 1
LI :
L2: S: T:
X(211 + 312) . . . . ....... X(411-I2+5)--. enddo enddo
S u p p o s e we w a n t to test for the d e p e n d e n c e of T o n S w i t h the direction v e c t o r ( 1, 1). The d e p e n d e n c e e q u a t i o n is 2il - 4 j l + 3i2 + j2 = 5.
(6.27)
Since gcd(2, 4, 3, 1) = 1, the gcd t e s t is inconclusive. The d e p e n d e n c e c o n s t r a i n t s are 0 < il < 100,
0 < i2 -< ii,
0 < j l < 100,
0 _< j~ < 31.
The a d d i t i o n a l c o n d i t i o n s i m p o s e d by t h e d i r e c t i o n v e c t o r are il -< j l - 1 i2 -< j2 - 1. Let P d e n o t e t h e p o l y t o p e in R 4 d e f i n e d by all t h e s e inequalities, w h e n t h e i n t e g e r variables il, i2, j l , a n d j2 are r e p l a c e d b y real variables x l , x2, Yl, a n d Y2, respectively. Let Q d e n o t e t h e p o l y t o p e in R 4 defined b y the following inequalities: 0 --< X l < 1 0 0 ,
0 < x2 < 100,
0 --< Y l --< 1 0 0 ,
0 < Y2 < 100,
)¢1 -< Yl - 1,
x2 < Y2 - 1.
Then, P is a s u b s e t of Q specified b y t h e a d d i t i o n a l c o n d i t i o n s x2 -< x1 a n d Y2 -< Yx.
6.5.
167
GENERAL METHOD
The d e p e n d e n c e e q u a t i o n (6.27) can be w r i t t e n as f ( x ) = 5, w h e r e X = (X1,x2,Yl,Y2)
f ( x ) = 2X1 - 4 y l + 3X2 + Y2. The b o u n d s t e s t s t a t e s that s t a t e m e n t T c a n n o t d e p e n d o n s t a t e m e n t S w i t h t h e d i r e c t i o n v e c t o r (1, 1), u n l e s s minf(x) x~P
(6.28)
_< 5 _< m a x f ( x ) . x~P
Now, w e have minf(x) x~P
= min f(x) xaQ x2<_Xl Y2 --
b y T h e o r e m A.9
= m a x m i n I f ( x ) +/3~ ( x l - x2) +/32(Y~ - Y~)] /~ <0 x ~ Q /~2 -<0
= m a x m i n [(/31 + 2)x~ + (/32 - 4 ) y l + (3 - / 3 ~ ) x 2 + (1 - / 3 2 ) Y 2 ] ]31 --<0 x ~ Q g~
=max
[
min
~u~
{(/21+2)x~+(/2~-4)Yl}+
] min
0<X2<100 0
{(3 - / 2 ~ ) x 2 + (1 - / 3 2 ) Y 2 } [
= m a x [(/22 - 4) + 99 m i n (/32 - 4,/21 +/32 -- 2 , 0 ) + ( 1 -- / 3 2 ) + Ul-<0 U2-<0
99 m i n ( 1 -/32, 4 -/31 -- /32, 0 ) ]
b y T h e o r e m 2.11
= - 3 + 99 m a x [rain (/22 - 4,/21 +/32 - 2 , 0 ) + /~1<0
u2-
m i n ( 1 - / 2 2 , 4 -/31 -- / 3 2 , 0 ) ] .
D e n o t i n g t h e s u m of t h e s e t w o m i n i m u m v a l u e s b y , . q ( / 3 1 , / 3 2 ) , w e get mipf(x) x~F
= - 3 + 99 m a x ~/(/21,/32). ~Ul_
(6.29)
CHAPTER 6. METHOD OF BOUNDS
168
In the q u a d r a n t of t h e //1/32-plane w h e r e /31 < 0 a n d /32 < 0, the m i n i m u m values defining y are given b y if - 2 /1,/32 -< 4 i2 - 4 1 +/32 -- 2
m i n (//2 -- 4,/31 +/32 -- 2, 0) =
if/31 -t-/32 -< 2, pl < - 2 if 4 32, 2 /1 +/32
f
_ --
J//1 + / / 2
[ //2 -- 4
- 2
i f / 3 1 -< - 2
if//1 > - 2 ,
and if/31 < 3, 1 /2 m i n (1 - / / 2 , 4
-/31 -/32, 0) = ti
-- //1 /32 - / 3 2
if 3 _31,4 31 +/32 if//2 < 1,/31 +/32 < 4
=0,
Then, we have m a x
/Jl_<0 /a2 <0
~(//1,/32)
=
m a x [ m i n (/32 - 4,//1 + /32 /al<0 /J2-<0
m a x (/31 +//2 - 2),
Ml_<-2 /J2_
--
2,0) + 0]
max
-2_/1_<0 /J2-
(//2 -
4)1
]
= max(-4,-4) = --4,
so t h a t
xm2~f(x) :
-399
f r o m (6.29). We can c o m p u t e maxxep f ( x ) in a similar way, a n d t h e n test Condition (6.28) to d e c i d e the existence of the d e p e n d e n c e . The details are left to the reader.
6.5. GENERAL METHOD
169
EXERCISES 6.5 1. Complete Example 6.4. 2. In Example 6.4, check for dependences with other direction vectors and at different dependence levels. 3. State the analogs of the theorems in this chapter (and their generalizations asked for in Exercise 6.4.1), when loops are allowed arbitrary strides and the loop nest(s) involved are regular.
Chapter 7
Method of Elimination 7.1
Introduction
In this chapter, we discuss the m e t h o d of elimination as outlined in Algorithm 1.3. We have already used elimination at several places in the book. A simple d e p e n d e n c e problem, by definition, is one where elimination is trivial. For a simple problem, the m e t h o d of elimination lets us find integer solutions almost as easily as real solutions. The prime example of that was seen in Chapter 2 for d e p e n d e n c e problems involving one-dimensional arrays in single loops. Our focus here is on problems where that is not the case. We have already seen dep e n d e n c e problems where elimination is nontrivial, for instance, in the example of Chapter 1 and in Example 3.4. In the first part of the m e t h o d of elimination (steps 1-3, Algorithm 1.3), we decide if the system of d e p e n d e n c e equations has an integer solution, and find the general solution in case it does. Deciding the existence of an integer solution constitutes the generalized gcd test that was covered in Section 5.4. We pick up the thread in Section 7.2 at the point where we left off in Chapter 5, and explain in detail how the m e t h o d of elimination works. Section 7.3 is devoted to a study of two-variable d e p e n d e n c e problems; they have some special properties. Finally, in Section 7.4, we look into other elimination approaches one might take. 171
172
7.2
CHAPTER 7. METHOD OF ELIMINATION
Dependence
Testing
by
Elimination
Consider the d e p e n d e n c e problem in a general program as developed in Chapter 5. We are focusing on two statements S and T in the program, a program variable X (Js~ + ~0) of S, and a program variable X (JTI~ + I~0) of T (in their normalized forms). The d e p e n d e n c e equation (5.3) for the problem posed by these variables is written again: i h - 3~ = ~o - ao.
It consists of rt scalar equations in ( m s + roT) variables: the m s elements of i and the ~ T elements of ). We can put this equation in the form
A) =Bo-ao, (i;3) _g
(7.1)
where (~;3) is the vector of size ( m s + roT) obtained by concatenating the elements of i with the elements of 3, and the coefficient matrix of the equation is the ( m s + roT) X ~t matrix obtained by concatenating the rows of ~, with the rows of -1~. Use Algorithm 1.2.1 to find an ( m s + roT) X ( m s +roT) unimodular matrix U and an ( m s + roT) X n echelon matrix S, such that u.
_~
= s.
By T h e o r e m 1.3.6, there is an integer solution (~;3) to (7.1), iff there is an integer vector t of size ( m s + roT) satisfying the equation ts = ~o - ~o.
(7.2)
If there is no such t, then there is no d e p e n d e n c e between S and T. So, assume there is an integer solution t to (7.2). Then, by the same theorem, the general solution to (7.1) is given by ( i ; ) ) = tO,
(7.3)
where t is the general solution to (7.2). The n u m b e r of elements of t completely determined by (7.2) equals rank(S), that is, rank ( _ ~ ) .
7.2. D E P E N D E N C E T E S T I N G B Y E L I M I N A T I O N
173
The number of undetermined elements of t is (ms + m T - rank(S)). These latter elements appear as parameters in the general solution (7.3) to Equation (7.1). We can express i and ) separately in the form
i = tu1 ~ ) = tU~, )
(7.4)
where Ut is the submatrix of U consisting of the leftmost m s columns, and U2 the submatrix consisting of the rightmost trtT columns. We rewrite the dependence constraints (5.4)-(5.5) for this problem:
O_
-< /Is0
t
(7.5)
0_<)
)t)T -< To, where 0s is the normalized final matrix and/lso the normalized final vector of Ls, and (~T is the normalized final matrix and/ITo the normalized final vector of LT. Substituting for i and ) from (7.4) into (7.5), we get
0 _
0 < tU2
t
_<
There are ( m s + m T - rank(S)) variables in this system: those undetermined elements of t. If these inequalities have an integer solution, then there is a dependence between S and T. The second part of the method of elimination (steps 5-7, Algorithm 1.3) consists of finding real solutions to this system of inequalities by Fourier elimination, and then extrapolating the existence of integer solutions. This kind of elimination was used in [Bane 79]. Example 7.1 given below is Example 3 in Chapter 3 of that thesis, changed to suit our current notation. A similar approach is taken for the Power test in [WoTs 92]. Suppose (i,)) = (tU~, tU2) is a solution to Equation (7.1) such that t satisfies the inequalities (7.6). Let i denote the index point of the loop nest Ls corresponding to the iteration point i, and j the index point of the loop nest LT corresponding to the iteration point ). Let d denote the distance from the instance S(i) of S to the instance T(j) of T.
174
CHAPTER 7. METHOD OF ELIMINATION
Let
U (1) , U (2) , . . . ,
U (ms+roT) d e n o t e the c o l u m n s of U. We c a n write
i = ( t U ( 1 ) , t U ( 2 ) , . . . , t U (ms)) ~ = (tu(ms+X),tu(ms +2) . . . . ,tU Ims+mT)) d = (tO Ires+t> - tU ~t>, tO ~ms++> - tU~+>,..., tU tm++m> - turin>) • If d >- 0, t h e n T d e p e n d s o n S w i t h the d i s t a n c e v e c t o r d. If d -< O, t h e n S d e p e n d s o n T w i t h the d i s t a n c e v e c t o r - d . If d = 0 a n d S < T, t h e n T d e p e n d s o n S w i t h the d i s t a n c e v e c t o r 0. C o n s i d e r n e x t t h e d e p e n d e n c e p r o b l e m w i t h a given d i r e c t i o n vector o- = (tyt, o 2 , . . . , ¢ r m ) . T h e r e are n o w m a d d i t i o n a l c o n d i t i o n s : ~r = +~r ~r < j r - 1 ] r < ~r -- 1
if <:rr = 0 ] if O-r = 1 ] if t~r = - 1 .
(7.7)
As m e n t i o n e d at t h e e n d of Section 5.4, we can h a n d l e t h e s e a d d i t i o n a l c o n d i t i o n s in several d i f f e r e n t ways: 1. Create a larger s y s t e m of e q u a t i o n s a n d inequalities by a t t a c h i n g the equalities of (7.7) to (7.1), a n d the inequalities in (7.7) to (7.5). P r o c e s s t h a t n e w s y s t e m by A l g o r i t h m 1.3. 2. Use the equalities of (7.7) to e l i m i n a t e variables in (7.1) a n d (7.5), a n d create a s y s t e m of e q u a t i o n s a n d inequalities with fewer variables. P r o c e s s t h a t n e w s y s t e m by A l g o r i t h m 1.3. 3. Solve the s y s t e m of e q u a t i o n s (7.1) a n d derive the s y s t e m of inequalities (7.6). Restate (7.7) in t e r m s of the u n k n o w n p a r a m eters: tU (r) -- tU (ms÷r) tU (r) < tU (ms+r) tU (ms+r) < tU (r) - 1
-
1
if ~ r = 0 if O-r = 1 if t~r = - 1 .
(7.8)
A t t a c h t h e s e c o n d i t i o n s to (7.6) a n d solve the larger s y s t e m of inequalities. E x a m p l e 7.1 C o n s i d e r the following p r o g r a m ( f r o m Example 3 in Chapter 3 of [Bane 79]):
7.2. DI~TENDENCE TESTING B Y ELIMINATION
175
d o 11 = 0, 20, 1 doI2 = 0,10,1 X(211 + 812 - 50) . . . . enddo doI3 = 0,20,1 ....... X ( 1 2 h + 1613 + 46) • • • enddo enddo
L1 : L2 : S: L3 : T :
T h e d e p e n d e n c e e q u a t i o n is 2 i l - 1 2 j l + 8i2 - 16j3 = 96. Its g e n e r a l s o l u t i o n ( T h e o r e m 1.3.5) c a n b e w r i t t e n as ( i 1 , j l , i 2 , j 3 ) = ( 4 8 + 2t2, t2 + 2t3, t2 + 3t3 + 2t4, t4), w h e r e t2, t3, a n d t4 a r e a r b i t r a r y i n t e g e r p a r a m e t e r s . S u p p o s e w e w a n t to c h e c k if t h e r e is a d e p e n d e n c e o f T o n S. T h e n , t h e c o n d i t i o n s o n il, jx, i2, a n d j3 a r e 0 _< il -< 20,
0 < i2 --< 10,
0<jl
0<j3
<20,
iX --< j l .
<20,
Substituting the general solution into these conditions, we get 0<48+2t2 0
< < < < -<
20 20 10 20 t2 + 2t3.
A l g o r i t h m 1.3.2 ( F o u r i e r ) a p p l i e d t o t h i s s y s t e m will s h o w t h a t it h a s n o r e a l s o l u t i o n . T h e r e f o r e , t h e s p e c i f i e d d e p e n d e n c e d o e s n o t exist. EXERCISES 7.2 1. Use the method of elimination (as explained in this section) to test for the dependence of T on S, S on T, S on S, and T on T in the following programs. Find the levels and direction vectors for each dependence, and the distance vectors corresponding to each valid direction vector:
(a) (b) (c) (d)
The The The The
program program program program
of the example in Chapter 1; of Example 3.4; of Example 3.5; used in examples 5.1-5.3.
176
7.3
CHAPTER 7. METHOD OF ELIMINATION
Two-variable Problems
In d e p e n d e n c e problems for real programs, it is c o m m o n l y observed that each d e p e n d e n c e equation or constraint has no m o r e t h a n two variables. On the other hand, each of the additional conditions imp o s e d by a given direction vector always involves two variables (e.g., E1 < 31)o Thus, we often see that a linear d e p e n d e n c e problem consists of a s y s t e m of two-variable linear equations and a s y s t e m of twovariable linear inequalities. Such systems have certain special properties that m a k e t h e m interesting and easier to solve than a m o r e general system. We have already considered several two-variable problems in the book. In this section, we will focus on general m e t h o d s for handling such problems. Note that some symbols u s e d here may have different meanings in the rest of the book. Consider a system of n linear diophantine equations xA = c
(7.9)
in n t variables: Xl, x 2 , . . . , Xm, where A is an m x n integer matrix and c an integer n-vector. If the coefficient of a variable in a given equation is nonzero, we m a y express that fact in m a n y different ways. For example, we m a y say that the variable is present in the equation, or that the equation has the variable. Formally, a two-variable system of linear equations is defined to be a s y s t e m of the f o r m (7.9) where each equation has at least one but no m o r e t h a n two variables. In t e r m s of the matrix A, this m e a n s that each of its columns has either one or two n o n z e r o elements. To solve a system of linear diophantine equations, we use Algor i t h m 1.2.1 (echelon reduction) and T h e o r e m 1.3.6. In the case of a two-variable system of the f o r m (7.9), we m a y reduce the n u m b e r of searches for n o n z e r o elements during the echelon reduction process, by taking into account the sparseness of the coefficient matrix A. Thus, for each column of A, we keep track of the (one or two) rows that contain the n o n z e r o elements on that column. Aspvall and Shiloach [AsSh 80] have given a fast algorithm for solving a two-variable s y s t e m of linear equations in real variables. It is possible to extend their m e t h o d so that we m a y solve a two-variable s y s t e m of linear equations in integer variables by r e p e a t e d applications of Algorithm 2.1 (extended Euclid) and T h e o r e m 2.8. Since a
7.3. TWO-VARIABLE PROBLEMS
177
d e p e n d e n c e p r o b l e m typically h a s very f e w e q u a t i o n s , the start-up cost of s u c h an a p p r o a c h m a y n o t b e justifiable. However, we can easily carry over f r o m [AsSh 80] certain c o n c e p t s a n d r e s u l t s that will help u s b e t t e r u n d e r s t a n d the p r o c e s s of solving two-variable integer s y s t e m s . (Note that we u s e a different notation.) For a two-variable s y s t e m of the f o r m (7.9), define an u n d i r e c t e d m u l t i g r a p h G w i t h m vertices a n d n e d g e s as follows. For e a c h o f the n~ variables .~1, x 2 , . . . , xm, t h e r e is a v e r t e x in G. For e a c h equation, t h e r e is an e d g e in G s u c h that if t h e e q u a t i o n h a s v a r i a b l e s x i a n d x j, t h e n t h e e d g e j o i n s the vertices c o r r e s p o n d i n g to x i a n d x j-. We say that G is the graph associated with the s y s t e m (7.9). The two-variable s y s t e m is connected if its g r a p h is connected.1 In [AsSh 80], the following r e s u l t s w e r e p r o v e d for a two-variable s y s t e m of linear e q u a t i o n s in real variables, b u t t h e y a p p l y equally well to a similar s y s t e m in integer variables. L e m m a 7.1 Let G denote the graph associated with a two-variable system x A = c o f n linear diophantine equations in ~ variables. (a) I f G is a tree, then r a n k ( A ) = m - 1. (b) I f G is connected, then r a n k ( A ) > m - 1. F r o m this lemma, we c a n derive i m p o r t a n t c o n c l u s i o n s a b o u t the o c c u r r e n c e o f u n d e t e r m i n e d p a r a m e t e r s in the general s o l u t i o n to a two-variable s y s t e m of linear e q u a t i o n s . T h e o r e m 7.2 If a two-variable system x A = c o f linear diophantine equations has an integer solution, then the general solution can be written in a form where the expression for each variable has at most one undetermined integer parameter. I f the system is connected, then the entire general solution has no more than one integer parameter. PROOF. Let G d e n o t e the g r a p h a s s o c i a t e d w i t h t h e s y s t e m x A = c. A s s u m e that G h a s b e e n b r o k e n u p into its (weakly) c o n n e c t e d comp o n e n t s . This l e a d s to a d e c o m p o s i t i o n o f the original s y s t e m x N = c
1A multigraph is a graph that allows multiple edges between a pair of vertices. The concept of connectedness in an undirected graph or multigraph can be defined in the same way we defined weak connectedness for a digraph in Section 1.1.4.
178
CHAPTER 7. METHOD OF ELIMINATION
into disjoint s u b s y s t e m s . Each c o m p o n e n t of G represents a connected subsystem; the s u b s y s t e m s corresponding to two different c o m p o n e n t s have no variables or equations in common; and the subs y s t e m s f o r m e d this way make up the entire s y s t e m of equations. In view of this, it suffices to prove the second part of the theorem. We a s s u m e that the s y s t e m xA = c is connected and has a solution. We will s h o w that the general solution has no more than one integer parameter. By Algorithm 1.2.1, we find an ~ x ~rt unimodular matrix U and an rn x n echelon matrix S such that UA = S. Since the s y s t e m xA = c has an integer solution, so d o e s the s y s t e m tS = c (Theorem 1.3.6). Some of the elements of t get unique values and some remain u n d e t e r m i n e d . These u n d e t e r m i n e d elements of t b e c o m e the (integer) p a r a m e t e r s in the general solution x = tU to xA = c. We have N u m b e r of p a r a m e t e r s = N u m b e r of zero rows of $ = m - rank(S) = m - rank(A) <1, since rank(A) > m - 1 by Lemma 7.1.
A two-variable system of linear inequalities is defined to be a syst e m of the f o r m xA < c where each inequality has at least one b u t no m o r e than two variables. Consider a d e p e n d e n c e p r o b l e m with a two-variable s y s t e m of equations and a two-variable s y s t e m of inequalities. Suppose the generalized gcd test passes, so that the equations have an integer solution. By T h e o r e m 7.2, the general solution is such that at m o s t one integer p a r a m e t e r appears in the expression for each iteration value. When this solution is substituted in the dependence constraints, we again get a two-variable s y s t e m of inequalities in those parameters. The q u e s t i o n then boils d o w n to solving a syst e m of two-variable inequalities. It may h a p p e n that the final set of inequalities can be partitioned into disjoint s u b s y s t e m s , such that each s u b s y s t e m has no more than one parameter. Then the d e p e n d e n c e p r o b l e m b e c o m e s simple. We saw this in Special Case 2 of Section 4.6 with regular loop nests. When the inequalities cannot be partitioned this way, we have to use some general algorithm that decides if a given two-variable s y s t e m of lin-
7.3. TWO-VARIABLE PROBLEMS
179
ear inequalities h a s a real solution. A lot of w o r k h a s b e e n d o n e o n finding s u c h an a l g o r i t h m over the years. S h o s t a k ' s p a p e r [Shos 81] laid the f o u n d a t i o n for a line of r e s e a r c h that h a s p r o d u c e d a n u m b e r of articles. The a l g o r i t h m s in [HoNa 94] are particularly i n t e r e s t i n g in that t h e y are u p - t o - d a t e a n d e a s y to follow. See t h e s a m e p a p e r for a list o f o t h e r r e f e r e n c e s . We e n d this s e c t i o n b y p o i n t i n g o u t that a s y s t e m o f inequalities of the f o r m x < y + c has an integer s o l u t i o n iff it h a s a real solution; see Exercise 1.3.6.2 a n d Section 4 of S h o s t a k ' s paper. E x a m p l e 7.2 C o n s i d e r the p r o b l e m of testing for d e p e n d e n c e b e t w e e n s t a t e m e n t s S a n d T in the d o u b l e l o o p dO I1 = 1,100, 1 dO I2 = 1, I1 - 1, 1
L1 : L2 : S
:
X(Ix,I2)
T
:
. . . . . . .
. . . . X(I2,II)
• • •
enddo enddo The d e p e n d e n c e e q u a t i o n s are i~ - j2 = 0 i2 - jx = 0. The general s o l u t i o n to t h e s e e q u a t i o n s c a n b e w r i t t e n as (~l,jX,~2,j2) = (tl, t2, t2, tl), w h e r e t1 a n d t2 are i n t e g e r p a r a m e t e r s . The d e p e n d e n c e c o n s t r a i n t s are 1 _< i l --~ 1 0 0
1 < j l < 100 1 < i2 ~ ~1 - 1 l<j2<j1-1. Substituting the general s o l u t i o n into t h e constraints, we get the following s y s t e m of inequalities: 1 < tl 1 < t2 1 < t2 l
< < < <
100 100 tl - 1 t2-1.
180
CHAPTER 7. METHOD OF ELIMINATION
It is e a s y tO see that this s y s t e m is inconsistent, since t h e right-hand inequalities o f t h e last two lines give 1 < tl - t2
tl
-
t2
<
- 1.
This i n c o n s i s t e n c y will be c a u g h t b y A l g o r i t h m 1.3.2. Thus, t h e r e is no d e p e n d e n c e b e t w e e n s t a t e m e n t s S a n d T. EXERCISES
7.3
1. Give examples of programs where a dependence problem consists of a system of two-variable equations and a system of two-variable inequalities. Include programs where loops do not necessarily have constant limits and unit strides. Discuss the change in the two-variable nature of a problem as we pass from index values to iteration values. 2. Modify Algorithm 1.2.1 (echelon reduction) to make it run efficiently on matrices that have no more than two nonzero elements on each column. 3. If a two-variable system of n linear equations in m variables is connected, then show that n > nl - 1.
7.4
Other Methods
In A l g o r i t h m 1.2, we first d e c i d e if t h e r e is an integer s o l u t i o n to t h e d e p e n d e n c e e q u a t i o n s . If there is s u c h a solution, t h e n we d e c i d e if t h e r e is a real s o l u t i o n to the e q u a t i o n s that satisfies the d e p e n d e n c e constraints. The m e t h o d of b o u n d s and the m e t h o d of elimination are b o t h i m p l e m e n t a t i o n s of A l g o r i t h m 1.2. Each m e t h o d u s e s the generalized gcd t e s t for the first step. In the m e t h o d of b o u n d s , the s e c o n d s t e p c o n s i s t s of the b o u n d s test w h i c h is b a s e d on c o m p u t a t i o n of b o u n d s (extreme values) of linear f u n c t i o n s in certain p o l y t o p e s . In t h e m e t h o d of elimination, the general s o l u t i o n to the e q u a t i o n s (that is easily o b t a i n e d d u r i n g a p p l i c a t i o n of the g e n e r a l i z e d gcd test) is s u b s t i t u t e d in the d e p e n d e n c e constraints, a n d t h e n Fourier's algor i t h m is a p p l i e d to the resulting s y s t e m of inequalities. We can slightly i m p r o v e the m e t h o d of elimination if we e x t e n d A l g o r i t h m 1.3.2 along the lines s u g g e s t e d in Section 1.3.6. For example, s u p p o s e that traditional Fourier elimination (of a set of linear inequalities in integer variables tl, t2 . . . . ) h a s y i e l d e d a s y s t e m of the
181
7.4. OTHER METHODS
form: 1/3 < t l < 1/2 b2(tl) -< t2 < B2(tl) b3(tl, t2) -< t3 < B3(tl, t2) :
Since the t's are integers, we can get the m o r e restricted set of inequalities, where each lower b o u n d is replaced by its ceiling a n d each u p p e r b o u n d by its floor. For this particular example, that step will clearly show that there are no integer vectors satisfying the given set of inequalities. (The range of tl is [ 1 / 3 l < tl < [1/2], which is empty.) We have u s e d this extra step after elimination t h r o u g h o u t the book. There is yet a n o t h e r way to decide the existence of a real solution to e q u a t i o n s a n d inequalities. Instead of s u b s t i t u t i n g the general solution in the d e p e n d e n c e constraints a n d t h e n eliminating the u n k n o w n p a r a m e t e r s , we eliminate the original variables directly f r o m the original s y s t e m of d e p e n d e n c e e q u a t i o n s and constraints. (We can actually find the extreme values of the b o u n d s m e t h o d this way [Duff 74]. However, finding extreme values is only a means, n o t the goal for us.) If we do n o t use the generalized gcd test with this m e t h o d , t h e n we are losing a vital test for the existence of integer solutions. On the other hand, if we do use the two tests together, t h e n we are n o t fully utilizing the w o r k already d o n e for the generalized gcd test, and e n d i n g u p eliminating m o r e variables t h a n in the m e t h o d of elimination. 2 For elimination of variables f r o m linear inequalities, Fourier's alg o r i t h m (Algorithm 1.3.2) is the s t a n d a r d m e t h o d of choice. This algor i t h m has b e e n applied to p r o b l e m s in d e p e n d e n c e analysis by m a n y a u t h o r s ([Bane 791, [Kuhn 80], [TrIf 86], [WoTs 92], and [Pugh 92a]). In the r e m a i n d e r of this section, we briefly discuss two o t h e r tests for d e p e n d e n c e : the I test and the Omega test. The I Test The I test ([KoKP 91], [PsKK 93]) tries to extend the range of applicability and the accuracy of the m e t h o d of b o u n d s in certain situations: 2Once we have found the general integer solution to a system of linear diophantine equations, the general real solution is obtained simply by changing the undetermined integer parameters into undetermined real parameters.
182
CHAPTER
7.
METHOD
OF ELIMINATION
The array involved is one-dimensional, loops have c o n s t a n t limits, and the coefficients of the d e p e n d e n c e e q u a t i o n are sufficiently small (at least one coefficient is +1). This test is b a s e d on the following observation: Given integers a l , a2 . . . . . am, n o t all zero, and integers /2 a n d v, there are integers x l , x 2 , . . . , x m s u c h that /2 < a l x 1
+ a2x2
+ . . . + amxm
< v
iff [/2/.ql -< I v / . q ] , where ~/ = gcd(a~, a 2 , . . . , a m ) . We consider below an example where the I test m a k e s a correct p r e d i c t i o n a n d the b o u n d s test is inconclusive. E x a m p l e 7.3 The following d e p e n d e n c e p r o b l e m a p p e a r s in [KoKP 91]: The d e p e n d e n c e equation is il - 3i2 + 7~3 = 8,
(7.10)
a n d the d e p e n d e n c e constraints are 1 < ~1 < 3, 1
<
i2 ~ 2, 1
<
i3 ~ 4.
(7.11)
Since gcd(1, 3, 7) --- 1, the gcd test says that there are integer solutions to (7.10). In the polytope defined by (7.11), the extreme values of the left-hand-side of the equation are 2 a n d 28. Since 8 lies bet w e e n 2 and 28, the b o u n d s test says that there is a real solution to the e q u a t i o n that satisfies the constraints. Relying on the m e t h o d of b o u n d s , we would a s s u m e a d e p e n d e n c e in this case. However, a simple e n u m e r a t i o n shows that there is no integer solution (~1, ~2, ~3) tO (7.10) that satisfies (7.11). For this example, the I test will correctly c o n c l u d e that there is no d e p e n d e n c e . The m e t h o d of elimination will also give the correct conclusion in this case. The general solution to Equation (7.10) is il = 3t2 - 7t3 + 8
~2 = t2 i3 = t3, where t2 a n d t3 are u n d e t e r m i n e d integer p a r a m e t e r s . Substituting this solution in the constraints (7.11), we get -7<3t21 < t2 1<
7t3_<-5~ _< 2 I t3<4.
(7.12)
7.4. OTHER METHODS
183
Elimination of the p a r a m e t e r t2 yields the inequalities 8/7 < t3 < 13/7
and
1 < t3 < 4.
Since the range [8/7] < t3 < [13/7] is empty, there is no integer t3 that satisfies this s y s t e m . Hence, there is no dependence. We can get the same answer if we c o m p u t e the extreme values of the function i3 = (8 - il + 3i2)/7 in the polytope defined by (7.11). They t u r n out to be 8/7 and 13/7. Since i3 is an integer, its range is [8/7] < i3 < [ 13 / 71, which is empty. Note that the limits 1 and 4 on i3 in (7.11) were not used in the above two methods; they are not n e e d e d for an application of the I test either. Note also that the order in which the variables are eliminated is important in the I test and in the elimination process.
The Omega Test The Omega test [Pugh 92a] is based on the least r e m a i n d e r algorithm [Knut 81] (which is a variation of Euclid's algorithm) and Fourier's m e t h o d of elimination. The following discussion is a s u m m a r y of the description of the test given in [Hagh 96]. We start with a linear d e p e n d e n c e problem defined by a set of d e p e n d e n c e equations and a set of d e p e n d e n c e constraints. The first part of the Omega test deals with the elimination of variables f r o m equality constraints by a k n o w n m e t h o d based on Euclid's algorithm. The general m e t h o d ([Knut 81], [Maim 80]) uses the following procedure. At each step, a n o n z e r o coefficient of smallest absolute value in the s y s t e m of equations is chosen. Without loss of generality, suppose that a l (> 0) is such a coefficient that appears in an equation of the f o r m a, l X l + a,2x2 + • • • + amXm : ao.
(7.13)
If a l = 1, use this equation to eliminate the variable x1 f r o m other equations in the system. If no more equations remain, a general solution has essentially been obtained. Suppose next that a l > 1. Then,
184
C H A P T E R 7. M E T H O D OF E L I M I N A T I O N
if a~ divides a2, a3 . . . . , a m , check that a l also divides ao. If a l does n o t divide ao, t h e n there is no integer solution to the system. Otherwise, divide b o t h sides of Equation (7.13) by a~ and eliminate x~ as in the previous case. Finally, if a l fails to divide all the coefficients a2, a 3 , . . . , am, t h e n i n t r o d u c e a new variable x~ by [al/al]Xl
+ la2/al]x2
+ ...
+ [am/al]Xm
= x'~.
(7.14)
Solve this e q u a t i o n for x1 a n d replace (7.13) by the e q u a t i o n a l x ~ + (a2 m o d al)X2 + • • • + (ant m o d a l ) X m = a0.
(7.15)
Since at each step either the n u m b e r of equations or the absolute value of the smallest coefficient in the s y s t e m is reduced, the above p r o c e d u r e has to terminate. The m e t h o d is intimately c o n n e c t e d to Euclid's a l g o r i t h m a n d is easy to u n d e r s t a n d . It is not, however, the best m e t h o d for solving the p r o b l e m even t h o u g h substantial refinem e n t s are possible [Knut 81]. One r e f i n e m e n t is to base the p r o c e d u r e on the least r e m a i n d e r alg o r i t h m ([Knut 81], Section 4.5.3, exercises 30 a n d 31). This modification r e d u c e s the average n u m b e r of division steps required by Euclid's algorithm a n d allows the s y s t e m to r e d u c e faster. The idea of the alg o r i t h m is to c h o o s e the r e m a i n d e r o f the s m a l l e s t m a g n i t u d e for the next division step. Thus, i n s t e a d of replacing b by (a m o d b) d u r i n g the division step of Euclid's algorithm, we replace it by (a m o d b - b) if ]a m o d b l > ½Ib]. This a p p r o a c h has b e e n u s e d in the Omega test to eliminate variables f r o m equality constraints. An o p e r a t o r mo~-'ais defined by a rno~'ab = a - b [ a / b + 112]. It is easy to verify that if la m o d bl < ½lbl, t h e n a nao~'ab = a rood b; otherwise, a nao~-'~b = a m o d b - b. At each division step, we replace (a, b) by (b, a ~od b). The above m o d i f i c a t i o n affects the calculation iff the following division step in the original Euclid's a l g o r i t h m would have a q u o t i e n t of one. The analysis of this a l g o r i t h m is related to the Fibonacci n u m b e r s a n d the g o l d e n ratio, a n d a p p r o x i m a t e l y 30.6% of the division steps
7.4. OTHER METHODS
185
are saved 3 o n the average [Knut 81]o The modified algorithm, however, m a k e s each division step longer on m a n y c o m p u t e r s [Knut 81]. K n u t h suggests an alternative a p p r o a c h that saves all division steps w h e n the q u o t i e n t is unity. Additional i n f o r m a t i o n related to the least r e m a i n d e r a l g o r i t h m can be f o u n d in [BrDu 71], [GoZa 52], and [Moor 92]. For further s t u d y of efficient algorithms for solving the greatest c o m m o n divisor problem, as well as other p r o b l e m s f r o m the t h e o r y of n u m b e r s , the interested reader is referred to [BaSh 96]. After variables have b e e n eliminated f r o m all the equations, we are left with a s y s t e m of linear inequalities. The Omega test employs Fourier's m e t h o d repeatedly to eliminate variables f r o m linear inequalities. Since classical Fourier elimination can only d e t e c t real solutions, certain steps are a d d e d to isolate integer solutions. To u n d e r s t a n d the process of finding integer solutions by elimination, consider a set of two inequalities of the f o r m (7.16)
a X ~0~ 1
~<~x,
where x is a variable, a and b are positive integers, and ~ a n d / 3 are integer-valued expressions. Elimination of x f r o m (7.16) by Fourier Elimination (Algorithm 1.3.2) results in the inequality a/3 < b~.
(7.17)
The s y s t e m s (7.16) a n d (7.17) are equivalent in the real domain, b u t they are n o t necessarily equivalent in the integer sense. An integer solution to (7.16) will also satisfy (7.17), b u t an integer solution to (7.17) m a y n o t always lead to an integer solution to (7.16). A sufficient (but n o t necessary) c o n d i t i o n for the existence of an integer x satisfying (7.16) is the inequality
bo~-a[3>ab-a-b+l=
(a-1)(b-1).
(7.18)
(See Exercise 4.) Note that w h e n a = 1 or b = 1, (7.18) is trivially implied by (7.17). 3This choice of least remainder in absolute value, that results in the smallest number of iterations over all gcd algorithms that replace (a, b) by (b, (_+a) rood b), has been observed by K. Vahlen in 1895 [Knut 81].
186
C H A P T E R 7. M E T H O D O F E L I M I N A T I O N
The Omega test u s e s Fourier's m e t h o d to eliminate the variable x f r o m the s y s t e m (7.16) as follows. The real solution space to this s y s t e m (i.e., the s o l u t i o n space of (7.17)) is called the r e a l s h a d o w of (7.16). The s u b s e t of the real s h a d o w that f u r t h e r satisfies the c o n s t r a i n t (7.18) is called the d a r k s h a d o w of (7.16). First, the real a n d the d a r k s h a d o w s are b o t h c o m p u t e d . (In the general case, that m e a n s two applications of the Fourier elimination algorithm.) If the real s h a d o w is empty, t h e n there is no real, a n d hence no integer, solution to the s y s t e m (7.16). Otherwise, if the dark s h a d o w is n o n e m p t y , t h e n there is an integer s o l u t i o n to (7.16). Finally, if the real s h a d o w is n o n e m p t y b u t the d a r k s h a d o w is empty, t h e n an exhaustive search is p e r f o r m e d on all integers i in the range 0 < i < ( a b - a - b ) / a , to see w h e t h e r any of the e q u a t i o n s of the f o r m b~: = fi + i has an integer solution. The search m a y be t e r m i n a t e d as s o o n as the first affirmative a n s w e r is found; otherwise, the whole range h a s to be searched. Observe that, f r o m the above integer p r o g r a m m i n g problem, the Omega test m a y create | b - b / a ] integer p r o g r a m m i n g s u b p r o b l e m s . This n u m b e r is p r o p o r t i o n a l to the absolute value of the coefficient b (let a = b, t h e n [b - b / a ] = b - 1), and, thus, is exponential with r e s p e c t to the i n p u t data. Each s u b p r o b l e m , in turn, is solved by the modified Fourier elimination m e t h o d , a m e t h o d that itself is an exponential algorithm. It has b e e n claimed [Pugh 92] that in practice t h e Omega test is fast a n d its execution time rarely exceeds twice the time required to scan the p r o b l e m i n p u t (i.e., the array subscripts and loop bounds). Several i m p l e m e n t a t i o n details have also b e e n u s e d to i m p r o v e the efficiency of the algorithms. E x a m p l e 7.4 Let us use the Omega test a p p r o a c h , described in this section, to decide if the s y s t e m
lO00x ~ 9 9 9 ~ 1 ~ iO00x
(7.19)
has an integer solution. The p a r a m e t e r s of this system, using the n o t a t i o n of our d i s c u s s i o n o n the s y s t e m (7.16), are a = 1 0 0 0 , ~ = 9 9 9 , b = lO00,fi = 1.
7.4. OTHER METHODS
187
It is clear that (7.19) has real solutions since 1 < 999. An equivalent reason, in the Omega test terminology, is that the real shadow of the s y s t e m is not e m p t y (i.e., 1000 x 9 9 9 - 1000 x I > 0). Thus, the s y s t e m m a y have integer solutions. The dark s h a d o w of the system, however, is empty, since 1000 x 9 9 9 - 1000 x 1 > ( 1 0 0 0 - 1 ) ( 1 0 0 0 - 1) does not hold. Thus, if the s y s t e m (7.19) has an integer solution x, then the relation lO00x = i + 1
(7.20)
holds for an integer i in the range 0 _< i _< (1000 x 1 0 0 0 - 1 0 0 0 - 1000)/1000. Following the Omega test approach, we m u s t p e r f o r m an exhaustive search on all integers i in the range of zero to 998 to see if the syst e m (7.20) has a solution. All the 999 integers in that range m u s t be searched, since none of the c o r r e s p o n d i n g equations has a solution. If b o t h occurrences of 1000 in the p r o b l e m s t a t e m e n t of this exercise are replaced by 10000, t h e n solving the p r o b l e m by this a p p r o a c h involves e n u m e r a t i o n of 9999 integers and solving the same n u m b e r of diophantine equations. In the general case, each of these equalities will be solved (hence, eliminated) and the solution will be substituted in the a u g m e n t e d system. Then, the a u g m e n t e d s y s t e m is solved by the above variant of Fourier elimination m e t h o d . EXERCISES 7.4 1. In the method of elimination used in Example 7.3, first eliminate the parameter t3 from the inequalities (7.12). Compare your result with that obtained in the example. 2. Showby the method of elimination that there is no dependence in the problem represented by the dependence equation il
--
2i2 + 8~3 + 8j3 = 26
and the dependence constraints 1 _< i l < 3 , 1
<
~-2 ~ 3 , 1 < i3 < J 3 <
(This problem appears in [PsKK93lo)
10.
188
CHAPTER 7. METHOD OF ELIMINATION
3. Find several examples of dependence problems where the I test ([KoKP 91], [PsKK 93]) makes a correct prediction while the method of bounds is inconclusive. (Use loops with at least 25 iterations each.) 4. A necessary and sufficient condition for the existence of an integer x satisfying the inequalities (7.16) is the following: [ f J l b l <- [ ~ / a J .
Show that (7.18) implies this condition, and hence (7.18) is a sufficient condition for the existence of an integer x. Show also that (7.18) is not a necessary condition.
Chapter 8
Conclusions In Chapters i t h r o u g h 7, we have tried to f o r m u l a t e a precise t h e o r y of d e p e n d e n c e analysis in a p r o g r a m consisting of loops a n d assignm e n t s t a t e m e n t s . We also d i s c u s s e d several m e t h o d s for solving the d e p e n d e n c e p r o b l e m . The d e p e n d e n c e p r o b l e m consists of finding integer solutions to a s y s t e m of linear e q u a t i o n s a n d inequalities. Since a s y s t e m of linear d i o p h a n t i n e e q u a t i o n s can be solved in p o l y n o m i a l time, it is always a g o o d idea to solve the e q u a t i o n s first. If there is no integer s o l u t i o n to t h e equations, there is no solution to the combined system, a n d t h e r e f o r e the particular d e p e n d e n c e d o e s n o t exist (generalized gcd test). If the e q u a t i o n s a d m i t integer solutions, t h e n there are several o p t i o n s b a s e d o n the n a t u r e of the problem. If the d e p e n d e n c e p r o b l e m falls in the category of simple problems, we can easily find a c o m p l e t e solution. For a n o n - s i m p l e problem, the m e t h o d of b o u n d s m a y be used, if those b o u n d s are easy to c o m p u t e a n d only the existence of the d e p e n d e n c e n e e d s to be settled. When the p o l y t o p e defined by the d e p e n d e n c e constraints is s u c h that its e x t r e m e p o i n t s are hard to find, or that there are too m a n y extreme points, t h e n the c o m p u t a t i o n of the b o u n d s ( m i n i m u m a n d m a x i m u m values in m o s t cases) b e c o m e s hard. 1 When the m e t h o d of b o u n d s is n o t applicable, or w h e n detailed i n f o r m a t i o n about iterations (involved in the d e p e n d e n c e ) is needed, we m a y use the m e t h o d of elimination, p e r h a p s with s o m e limited e n u m e r a t i o n . 1In fact, the efficiency of a linear programming algorithm resides in the cleverness of the way it navigates the set of extreme points. 189
190
CHAPTER 8. CONCLUSIONS
For a general d e p e n d e n c e problem, neither the m e t h o d of b o u n d s n o r t h e m e t h o d of elimination (without c o m p l e t e e n u m e r a t i o n ) will always be accurate. As a last resort, we can use an established integer p r o g r a m m i n g algorithm, for example, Gomory's cutting plane m e t h o d [Schr 87]. (See also [Wall 88] and [Feau 88].) But s u c h an algorithm m a y n o t p r o d u c e the set of all iteration pairs for the d e p e n d e n c e . D e p e n d e n c e p r o b l e m s f o u n d in real p r o g r a m s are usually simple e n o u g h to m a k e t h e hierarchical a p p r o a c h described above quite useful. As a practical matter, one m a y also add several heuristics to a d e p e n d e n c e test suite in a real compiler. For instance, p a t t e r n matching (e.g., checking for subscripts like i, i + 1, ~- 1) can take care of m a n y d e p e n d e n c e p r o b l e m s . Also, limited e n u m e r a t i o n can s o m e t i m e s be u s e d to decide if linear d i o p h a n t i n e equations have n o n n e g a t i v e integer solutions [BoTr 76]. If there is no nonnegative integer solution at all, t h e n there c a n n o t be any solution satisfying the d e p e n d e n c e constraints. (When a d e p e n d e n c e p r o b l e m is stated in t e r m s of iteration variables of loops, all variables in the p r o b l e m are nonnegative.) When we infer the existence of an integer solution f r o m the existence of a real solution (in the m e t h o d of b o u n d s or elimination), we m a y s o m e t i m e s e n d u p with a s p u r i o u s d e p e n d e n c e that s h o u l d n o t be there. But, that is the price we pay for saving time on a m o r e expensive algorithm. It s h o u l d be n o t e d that the b o u n d s test is a sufficient c o n d i t i o n for existence of integer solutions in certain situations; see [Bane 76], [Li 89], and [PsKK 91]. Furthermore, while a s p u r i o u s d e p e n d e n c e m a y result in p r e v e n t i n g s o m e t r a n s f o r m a t i o n s of the given p r o g r a m , it will n o t cause any illegal t r a n s f o r m a t i o n s . Thus, an a p p r o x i m a t e d e p e n d e n c e test is conservative in the sense that it will preserve the integrity of the program. We have tried to include a large n u m b e r of references in the bibliography, n o t all of which have b e e n cited in the text. However, this is n o t a c o m p l e t e list; m o r e references can be f o u n d in the sources we have listed.
Appendix A
Linear Equations on Polytopes A.1
Introduction
Our m a i n goal in this a p p e n d i x is to s t u d y the existence of real solutions to a s y s t e m of linear equations in a polytope. We give a m i n i m a l s e q u e n c e of definitions a n d results (many w i t h o u t proof) that s h o u l d p r e p a r e the reader to solve d e p e n d e n c e p r o b l e m s by the m e t h o d of b o u n d s . However, the b o d y of k n o w l e d g e o n p o l y t o p e s is vast. For an i n - d e p t h study, a suitable b o o k s h o u l d be used; m a n y b o o k s on the t h e o r y of linear a n d integer p r o g r a m m i n g cover p o l y t o p e s in detail. The general reference for the a p p e n d i x is [Schr 87]. However, our a p p r o a c h a n d n o t a t i o n are s o m e t i m e s different f r o m Schrijver's. Note also that s o m e s y m b o l s u s e d here could have m e a n i n g s different f r o m their m e a n i n g s in the context of d e p e n d e n c e analysis. In Section A.2, we discuss s o m e basic p r o p e r t i e s of polytopes. Section A.3 gives the n e c e s s a r y and sufficient c o n d i t i o n u n d e r which a single linear e q u a t i o n has a real s o l u t i o n in a polytope. The m e t h o d of Lagrangean relaxation is s k e t c h e d in Section A.4. This m e t h o d can o f t e n simplify c o m p u t a t i o n of e x t r e m e values of linear f u n c t i o n s in polytopes. It is applied in Section A.5 to the s t u d y of existence of real solutions to s y s t e m s of real e q u a t i o n s in a polytope. Finally, in Section A.6, we describe a situation where existence of a real solution (to a s y s t e m of linear e q u a t i o n s in a polytope) implies the existence of an integer solution. Note that s o m e of the results stated here have m o r e general versions. 191
APPENDIX A. LINEAR EQUATIONS ON POL YTOPES
192
A.2 Polytopes We c o n s i d e r v e c t o r s in R m for s o m e fixed m >_ 1. A set X c R m is convex if the v e c t o r / ~ x + (1 - )~)y is a l w a y s in X, w h e n e v e r x a n d y are in X, a n d A is a real n u m b e r s u c h t h a t 0 _< A _< 1. In o t h e r w o r d s , w h e n e v e r t h e e n d p o i n t s o f a line s e g m e n t are in a convex set, the line s e g m e n t itself is i n c l u d e d in the set. If a set is n o t convex, we m a y w a n t to k n o w the s m a l l e s t c o n v e x set t h a t i n c l u d e s it. A convex combination of a finite set of v e c t o r s Xl, x 2 , . . . , Xp is an e x p r e s s i o n of t h e f o r m y
= ~1x1
-t- , ~ 2 x 2
q- • • • q-
,~pXp,
w h e r e 2~1,A2 . . . . . Ap are n o n n e g a t i v e real n u m b e r s s u c h t h a t ~ 1 + ~ 2 -I- • • • + , ~ p =
1.
The convex hull of a set X c R rn is t h e set o f all c o n v e x c o m b i n a t i o n s of e l e m e n t s of X. The c o n v e x hull o f X is t h e s m a l l e s t c o n v e x set t h a t i n c l u d e s X (Exercise 1). If X itself is convex, it is c l o s e d u n d e r t a k i n g convex c o m b i n a t i o n s of its e l e m e n t s (Exercise 2), a n d is t h e r e f o r e its o w n c o n v e x hull. A set X c R m is b o u n d e d if t h e r e is a positive n u m b e r ~i, s u c h t h a t -6<Xr
<~
(l_
for e a c h v e c t o r (Xl, x 2 , . . . , Xm) ~ X. A polyhedron in R m is a set o f the form P= {x~Rrn:xA_ 1). A polytope is a b o u n d e d p o l y h e d r o n . L e m m a A.1 Each polytope is a convex set. PROOF. Take a n y p o l y t o p e P = {x ~ R m : x A < b}. Let x ~ P, y ~ P, a n d 0 < A < 1. The vector z = Ax + (1 - A)y is in P, b e c a u s e zA = (~x + (1 - A)y)A = AxA + (1 - A)yA
sincexA0,
yA0
A.3. REAL SOLUTIONS TO A SINGLE EQUATION
193
Hence, P is convex.
An extreme point (or vertex) of a p o l y t o p e P is a p o i n t v ~ P s u c h that v is n o t the c o n v e x c o m b i n a t i o n of t w o distinct p o i n t s in P. In o t h e r w o r d s , a n e x t r e m e p o i n t is s u c h that no n o n e m p t y line s e g m e n t c o n t a i n i n g it is i n c l u d e d in the p o l y t o p e . For example, the vertices of a triangle are its e x t r e m e points.
Theorem A.2 Each extreme point o f a polytope {x ~ R m • x A < b}, where A is an m x l matrix, is the unique solution to a set o f m linearly independent equations from the system x A = b. Theorem A.3 A polytope has a finite n u m b e r o f extreme points. Theorem A.4 Each polytope is the convex hull o f the set o f its extreme points. EXERCISES A.2 1. P r o v e t h a t t h e c o n v e x h u l l o f a s e t X o f v e c t o r s is t h e s m a l l e s t c o n v e x s e t t h a t i n c l u d e s X. 2. S h o w t h a t a c o n v e x s e t is c l o s e d u n d e r t a k i n g c o n v e x c o m b i n a t i o n s o f its elements. 3. Let x, z, Yl,Y2 . . . . . y p d e n o t e p o i n t s i n R m, s u c h t h a t x is a c o n v e x c o m b i n a t i o n o f Yl a n d z, a n d z is a c o n v e x c o m b i n a t i o n o f Yl, Y 2 , . . . , Yp. P r o v e t h a t x is a c o n v e x c o m b i n a t i o n o f Yl, Y2 . . . . . Yp.
A.3
Real Solutions to a Single Equation
A real-valued f u n c t i o n f o n R m is linear if f(x + y) = f(x) + f(y) f(Ax) = Af(x) for all x E R m, y E R m, a n d A ~ R. Such a f u n c t i o n has t h e f o r m f(x) = ax = xa for s o m e fixed v e c t o r a in R m. The following t h e o r e m is a b a s i c r e s u l t in linear p r o g r a m m i n g . It says t h a t in a p o l y t o p e , a real-valued linear f u n c t i o n h a s a m i n i m u m a n d a m a x i m u m value, a n d t h o s e v a l u e s are a t t a i n e d at e x t r e m e p o i n t s of that p o l y t o p e .
194
APPENDIX A. LINEAR EQUATIONS ON POLl/TOPES
T h e o r e m A.5 Consider a real-valued linear function f on R m and a nonempty polytope P c R m. Then, f has a m i n i m u m and a m a x i m u m value in P, and there are two extreme points u and v o f P such that f(u)
=
minf(x)
f(v) = maxf(x).
and
x~P
xGP
PROOF. Let { V I , V 2 . . . . . V p } d e n o t e the set of extreme p o i n t s of P. Without any loss of generality, we m a y a s s u m e that f ( v 1 ) < f ( v 2 ) < " ' " -< f ( V p ) . The value of f at any point x ~ P can be e x p r e s s e d in t e r m s of its values at the e x t r e m e points. Indeed, by T h e o r e m A.4, x is a convex c o m b i n a t i o n of the extreme points, so that we can write X=
AIV1 +A2V2
+ " • " +ApVp,
where the A's are nonnegative reals satisfying ?q Then, the linearity of f implies that f ( x ) = A~f(v~) + Aaf(v2) + • • " +
,~2 ÷
+
" " " + ,~p
=
1.
,~pf(Vp).
Since f ( V l ) -< f ( v O < f ( V p )
(1 _< i _< p),
it follows that P
P
P
Z h , f ( v ~ ) < Z h , f ( v , ) _< Z h , f ( v p ) i=1
/=1
i=1
that is, P
P
P
f ( v l ) ~. hi < ~ A i f ( v , ) < f ( v p ) ~. Ai i=I
/=I
i=I
or
f ( v l ) < f ( x ) -< f ( v p ) . This holds for all p o i n t s x 6 P. Therefore, f ( v l ) is the m i n i m u m value of f in P, a n d f ( V p ) is the m a x i m u m value. To c o m p l e t e the proof, we n e e d only take u = vl a n d v = V p . []
A.3. REAL SOLUTIONS TO A SINGLE EQUATION
195
Using t h e above t h e o r e m , we e s t a b l i s h the following r e s u l t t h a t gives a n e c e s s a r y a n d sufficient c o n d i t i o n for t h e e x i s t e n c e o f a real s o l u t i o n to a linear e q u a t i o n . T h e o r e m A.6 Consider a real-valued linear function f on R m and a nonempty polytope P c R m. For a given real number c, the equation f(x) = c
(A.1)
m i n f ( x ) <_ c _< m a x f ( x ) .
(A.2)
has a solution x ~ P iff x~P
x~P
PROOF. The only if p a r t is trivial. If t h e r e is a p o i n t x ~ P s u c h t h a t f ( x ) -- c, t h e n (A.2) h a s to h o l d since e a c h value o f f lies b e t w e e n the m i n i m u m and the maximum. To prove the if part, a s s u m e t h a t c is a real n u m b e r s a t i s f y i n g (A.2). By T h e o r e m A.5, t h e r e are e x t r e m e p o i n t s u a n d v o f P, s u c h that f(u) = xm~f(x)
and
f(v) = maxf(x). xEP
Hence, we have f ( u ) < c _< f ( v ) . We will find a p o i n t x ~ P s u c h t h a t f ( x ) = c. We a s s u m e t h a t f is n o t c o n s t a n t o n P, since t h e r e is n o t m u c h to p r o v e o t h e r w i s e (why?). T h e n f ( v ) - f ( u ) > 0, a n d we c a n define h = f(v) - c f(v) - f(u)" Since 0 < f(v) - c < f(v) - f(u), it follows t h a t 0 < h < 1. Since P is c o n v e x a n d u a n d v are p o i n t s in P, t h e c o n v e x c o m b i n a t i o n x = h u + (1 - h ) v o f u a n d v is also a p o i n t in P. This p o i n t x satisfies f ( x ) = c, since f ( x ) = A f ( u ) + (1 - A ) f ( v ) = h(f(u) - f(v)) + f(v) = (c - f ( v ) ) + f ( v ) = C.
The p r o o f o f the t h e o r e m is n o w c o m p l e t e .
[]
196
APPENDIX A. LINEAR EQUATIONS ON POL YTOPES
In view of T h e o r e m A.5, the extreme values of f in P are equal to its extreme values in the set of vertices of P. This is formally stated below. (Thus, it is e n o u g h to evaluate f at the vertices, which form a finite set.) Corollary 1 Let V denote the set o f e x t r e m e points o f P. Given a real n u m b e r c, the equation f ( x ) = c has a solution in P iff m i n f ( x ) < c _< m a x f ( x ) . x~V
x~V
(A.3)
Theorems A.5 and A.6 are valid in a m u c h more general setting. That generalization involves topological concepts and is beyond our scope. The interested reader is referred to sections 3.17 and 3.18 in [Dieu 69], or the description of the Intermediate Value Theorem in any advanced calculus text.
A.4
Lagrangean Relaxation
The Lagrangean relaxation m e t h o d is u s e d to obtain upper b o u n d s for certain integer p r o g r a m m i n g problems (see Section 24.3 of [Schr 87].) In this section, we drop the integrality requirement and derive some results for linear programming. First, we state a basic result in the theory of linear programming; for a proof, see Section 7.4 of [Schr 87]. Theorem A.7 (Duality Theorem of Linear Programming). Consider an m x l real matrix A a n d two vectors a ~ R m a n d b ~ R l, such that
{x~R m:xA
and
{y~R l:Ay=a,y>0}
are n o n e m p t y sets. Then, w e have
max xa = rain by.
xA_
y>_O Ay=a
There are several equivalent forms of this theorem. We will need the following form that gives the m i n i m u m of xa in the polyhedron defined by the inequalities xA < b.
A.4. LAGRANGEAN RELAXATION
197
C o r o l l a r y 1 In the notation o f Theorem A. 7, the following holds: m i n x a = m a x bz.
xA~b
z~0 Az=a
PROOF. We have min xa = - max x(-a)
xA_
xA_
=
-
min by
y>_0 Ay= - a = - rain
b(-z)
z_
by T h e o r e m A. 7 where z = - y
= m a x bz. z_<0 Az=a
T h e o r e m A. 7 a n d its c o r o l l a r y c a n be very u s e f u l in the c o m p u t a tion o f e x t r e m e v a l u e s of a linear f u n c t i o n f s u b j e c t to a set o f linear c o n s t r a i n t s . S u p p o s e the set of c o n s t r a i n t s can be b r o k e n up into an easy p a r t a n d a hard part. A t t a c h i n g the h a r d c o n s t r a i n t s to f , we create a m o r e c o m p l i c a t e d linear f u n c t i o n , a n d t h e n find its e x t r e m e values s u b j e c t to the e a s y c o n s t r a i n t s . This is s t a t e d m o r e f o r m a l l y in T h e o r e m A.8 a n d its corollary. We a s s u m e t h a t the p o l y t o p e s in the r e s u l t s of this s e c t i o n are n o n e m p t y . T h e o r e m A.8 Consider a real-valued linear function f : x ~. x a de-
fined on R m, and a m x l matrix. LetA1 extreme values o f f xA1 < b l , are given
polytope P = {x : x A < b} in R m, where A is an denote an m x ~1 matrix a n d b l a n ll-vector. The in P, subject to the set o f additional constraints by the formulas:
m a x x a = m i n m a x [xa + (bl - x A 1 ) y ]
(A.4)
m i n x a = m a x m i n [xa + (bl - x A 1 ) z ] .
(A.5)
xA_
xA:_
y>0 xA
z
PROOF. Note t h a t the two sets o f c o n s t r a i n t s x A < b a n d xA1 < b l can be c o m b i n e d as x(A; A1 ) -< (b; b l ).
APPEND~ A. LINEAR EQUATIONS ON POL YTOPES
198
We h a v e max xa =
xA_
xA1 -
min
u>_0, y->0
(bu + bly)
b y T h e o r e m A.7
Au+A~ y=a
= rnin y>O
rain
u_>O AU+Aly=a
(b~y+bu)
= min [bly + max x(a - Aly)[ / y>_O k xA_
b y T h e o r e m A.7
= m i n m a x [ x a + (b~ - x A 1 ) y ] . y>_0 xA_
T h e s e c o n d f o r m u l a c a n be d e r i v e d f r o m t h e C o r o l l a r y to T h e o r e m A.7 in t h e s a m e way. [] As we see below, w h e n t he e x t r a c o n s t r a i n t s are equalities, t h e r e a re n o r e s t r i c t i o n s o n y o r z. C o r o l l a r y 1 In the notation of Theorem A.8, the following formulas
hold: m a x x a = m i n m a x [xa + (bl - xA1 )y]
xA
xA1 =bl
yERh xA_
r ai n x a = m a x m i n [xa + ( b l xA_
xA~)y].
(A.6) (A.7)
y~R/~ xA_
PROOF. T h e s e r e s u l t s f o l l o w d i r e c t l y f r o m T h e o r e m A.8, if we n o t e t h a t t h e e q u a t i o n xA~ = b l is e q u i v a l e n t to t h e c o m b i n e d set o f ine q u alities : xA1 < b~ a n d - x A 1 < - h i . I n d e e d , we h a v e m a x x a = m i n m i n m a x [xa + (b~ - xA1 )u + ( - b ~ + xA1 ) v ] xA0 v>0 xA_
= m i n m i n m a x [ xa + (hi - x A 1 ) ( u - v ) ] u>_O v>_O xA
= m i n m a x [ xa + ( bl - x A 1 ) y ] yER/1 xA_
w h e r e we h a v e w r i t t e n y = u - v. T h e r e is n o sign r e s t r i c t i o n o n y, s i nce it is t h e d i f f e r e n c e b e t w e e n t w o n o n n e g a t i v e v e c t o r s .
A.4. LAGRANGEAN RELAXATION
199
The o t h e r f o r m u l a can b e p r o v e d similarly. We n o w r e c a s t T h e o r e m A.8 a n d its corollary in a slightly different f o r m that will b e u s e f u l for the d i s c u s s i o n in the next section. T h e o r e m A.9 Let f denote a real-valued linear function on R m and P a n o n e m p t y polytope in R m. Consider an inequality o f the form g (x) < c, where ~ is also a real-valued linear function on R m. The extreme values o f f in P, subject to the inequality constraint g ( x ) < c, are given by max f(x) = min max [f(x) + h(c - g(x))]
(A.8)
min f(x)
(A.9)
xEP g(x)_
h_>0 xEP
x~P g(x)_
= m a x rain I f ( x ) + # ( c - g ( x ) ) ] . ~_<0 x~P
PROOF. Here, we have l 1 = 1. The a b o v e f o r m u l a s follow f r o m similar f o r m u l a s in T h e o r e m A.8, if we write f ( x ) for xa, "x E P" for "xA < b," h for y, p for z, c for b l , a n d g ( x ) for x A 1 . [] C o r o l l a r y 1 In Theorem A.9, the extreme values o f f the equality constraint ~ (x) = c are given by
in P subject to
m a x f ( x ) = m i n m a x I f ( x ) + A(c - f l ( x ) ) ]
(A.10)
re_in f ( x ) = m a x m i n I f ( x ) + A(c - g ( x ) ) ] .
(A.11)
x~P .q(x)=c
h
x~P g(x)=c
h
x~P
x~P
PROOF. The p r o o f follows f r o m Corollary 1 to T h e o r e m A.8.
[]
Since t h e r e is n o sign r e s t r i c t i o n o n A, we m a y write A (B (x) - c) i n s t e a d of A (c - B (x)) in the corollary. S u p p o s e w e h a v e t w o c o n s t r a i n t s o f the f o r m ~1 (X) --< C1 a n d B2 (x) = c2. Then, T h e o r e m A.9 a n d its corollary can b e c o m b i n e d to yield the f o l l o w i n g e x p r e s s i o n s : max
x~P ~(x)-
f ( x ) = m i n m i n m a x I f ( x ) + A~ (c~ - B~ (x)) + A2 (c2 - B2 (x)) ] h~>_O h2
x~P
~O2(X) = £2
min
x~P y~(x)-
f(x) = max max min[f(x) /~1-<0
A2
x~P
+ Pl(Cl - g l ( x ) )
+ A2(c2 - g 2 ( x ) ) ] .
200
APPENDIX A. LINEAR EQUATIONS ON POL YTOPES
Similarly, we can write d o w n f o r m u l a s for e x t r e m e values of f s u b j e c t to a n y n u m b e r of linear c o n s t r a i n t s t h a t are e q u a t i o n s or inequalities. EXERCISES A.4 1. Let f l , f2 . . . . . fn denote real-valued linear f u n ct i o n s on R m and P c R m a n o n e m p t y polytope. Write down a pair of e x p r e s s i o n s giving the e x t r e m e values o f f l in P u n d e r a set of additional c o n s t r a i n t s of the form:
(a) f2(x) -< c2,f3(x) < c3 . . . . . fn(x) -< cn; (b) f2(x) = c 2 , f 3 ( x ) = c3 ..... fn(x) = cn; (c) f2 (x) >_c2, f3 (x) >_c3..... f~ (x) >_cn.
A.5
Real Solutions to a System of Equations
C o n s i d e r a s y s t e m of n scalar equations: f x ( x ) = Cl f 2 ( x ) = c2 :
f n ( x ) = cn, w h e r e f l (x), f2 ( x ) , . . . , f n ( x ) are real-valued linear f u n c t i o n s on R m. In this section, we p r e s e n t a n e c e s s a r y a n d sufficient c o n d i t i o n for the existence o f a real solution to this s y s t e m in a polytope. This result g e n e r a l i z e s T h e o r e m A.6 t h a t deals w i t h a single equation. We first establish the c o n d i t i o n for a s y s t e m of two equations, a n d t h e n state the general result w i t h o u t proof. The p r o o f in the general case is similar to the p r o o f in the two-variable case, a n d is left to the reader. T h e o r e m A.10 Consider two real-valued linear functions f l a n d f2 on R m, a n d a n o n e m p t y polytope P c R m. For a given pair o f real n u m b e r s c~ a n d c2, the s y s t e m o f equations f l ( x ) = Cl ~ f2 (x) = c2
(A.12)
has a solution x E P iff
m a x m i n [f~ (x) + A(f2(x) - c2)] < Cl A
x6P
< m i n m a x [ f ~ ( x ) + A(f2(x) - c2)]. A
x6P
(A.13)
A.5. REAL SOLUTIONS TO A SYSTEM OF EQUATIONS
201
PROOF. Define a polytope Q c R m by Q= {xEP:f2(x)
=c2}.
First, a s s u m e that the s y s t e m (A.12) has a solution x ~ P. Then, Q is n o n e m p t y and the equation f l (x) = cl has a solution x ~ Q. T h e o r e m A.6 implies that min f l ( x ) < Cl < m a x f l ( x ) . xeQ ~
x~Q
(A.14)
But, we have mi~nfl (x) = x~t~
min
x~P f2 (X)=C2
f l (x)
= max min [ f l ( x ) + A(f2(x) - c2)] A
x~P
by (A.11)
and also max f l (x) = x~Q
max f~(x)
x~P f2(x)=c2
= m i n max If1 (x) + h (f2 (x) - c2) ] A
x~P
by (A.10).
Hence, Condition (A.14) is the same as Condition (A.13) in the theorem. Next, a s s u m e that (A.13) holds. It can be shown that the polytope (2 is n o n e m p t y (Exercise 2). As in the previous paragraph, we can derive (A.14) f r o m (A.13), and then use T h e o r e m A.6 to conclude that the equation f l (x) = Cl has a solution (2. That is the same as saying that the s y s t e m (A.12) has a solution in P. [] Corollary 1 The system o f equations (A.12) has a solution x E P, iff the single equation f l ( x ) + Af2(x) = cl + Ac2
(A.15)
has a solution in P for each real ~. PROOF. The only if part is trivial. If y is a solution to the system (A. 12), then y is a solution to (A.15) for each 2~. Indeed, we have f l (y) = Cl and f2 (Y) = c2, so that f~ (y) + Af~ (y) = c~ + hc~
A P P E N D ~ A. LINEAR EQUATIONS ON POLYTOPES
202
for each/k. The i n t e r e s t i n g f e a t u r e o f the if p a r t is that for e a c h real/~, we are a s s u m i n g t h e e x i s t e n c e o f a s o l u t i o n y~ to (A.15). F r o m that the existence o f a s o l u t i o n y to (A.12) is to follow, w h e r e y is i n d e p e n d e n t of/k. Equation (A.15) can b e w r i t t e n as f l ( x ) + ?~(f2(x) - c2) = ¢1. By T h e o r e m A.6, it h a s a s o l u t i o n in P iff xmei~l[ f l (x) + ?~(f2(x) - c2)] < Cl < m a x [ f x ( x ) + / k ( f 2 ( x ) - c2)]. xEP
Since this m u s t h o l d for e a c h 2~ a n d Cl is i n d e p e n d e n t of ?~, it follows that m a x m i n [ f l ( x ) + M f 2 ( x ) - c2)] -< c~ /~
x~P
< m i n m a x [f~ (x) ÷ ?~(f~ (x) - c~) ]. ,~
xEP
By T h e o r e m A.10, t h e s y s t e m (A.12) t h e n h a s a s o l u t i o n in P.
[]
Next, w e s t a t e the e x t e n s i o n s o f T h e o r e m A. 10 a n d its corollary to the general case of rt e q u a t i o n s . The p r o o f s are quite similar a n d are o m i t t e d . The m a i n d i f f e r e n c e is that n o w we n e e d n - 1 real n u m b e r s A2, ,~3 . . . . . ,~rt-1 for t h e n - 1 e q u a t i o n s in the s y s t e m in a d d i t i o n to the first. We w r i t e / I = (/~2, ?~3, . . . , ?~n). Note also that t h e o r d e r i n g of t h e equ~ltions is irrelevant; in particular, a n y of t h e n e q u a t i o n s can be labeled as f l (x) = Cl. T h e o r e m A.11 Let f l , f 2 , . . . , f n denote a sequence o f real-valued linear functions on R m, and P a n o n e m p t y polytope in R m. For a given set o f real n u m b e r s c1, c2 . . . . , Cn, the system o f equations f k ( x ) = Ck
(1 < k < n )
(A.16)
has a solution x E P iff max min ,~ERn- 1 x ~ P
(x) + ~ , ~ k ( f k ( X ) k=2
_< m i n
,~Rn-1
max xeP
[
-- Ck.)
< C1
f l ( x ) + ~. ? ~ ( f ~ ( x ) k=2
c~)
]
.
(A.17)
A.5. REAL SOLUTIONS TO A SYSTEM OF E(~UATIONS
203
Corollary 1 The system o f linear equations (A.16) has a real solution in a nonempty polytope P, iff the single equation n
n
fl(X) + ~. ,~kfk(X) k=2
= Cl + ~.. ,~kCk
(A.18)
k=2
has a real solution in P for each (A2, h3 . . . . , An) E Rf1-1 . Using T h e o r e m A. 11, we can write a simple algorithm that decides if a given s y s t e m of linear equations has a solution in a given polytope, by first checking a single equation, then a pair of equations, etc. In the worst case, we end up testing n systems of equations. But, in practice, we can get an approximate test by quitting after a small number of steps. That m a y be desirable since the complexity of handling a s y s t e m of n equations increases rapidly with the size of n. A l g o r i t h m A.1 Given a set of real-valued linear functions f l , f2,. • •, f n on R m, a n o n e m p t y polytope P c R m, and a set of real n u m b e r s Cl, c2 . . . . , cn, this algorithm decides if the system of equations f/~(x) = c/~
(1 ~ < n )
(A.19)
has a solution in P. For 1 < k _< n, let Cond(k) denote the condition
[
max rnin f~ (x) + ~. l~(f~(×) - c~)
/~R,~- ~ x~P
_< c~
i=2
_< min
max
fi (x) +
,~R,~- 1 x~P
: i=2
When k = 1 the condition becomes simply 1 T l i n f l ( X ) ~ C1 --~ maxf~(x). x~P x~P
d o k = 1, n, 1 if Cond(k) is false then r e t u r n (the s y s t e m (A.19) has no solution in P) enddo r e t u r n (the s y s t e m (A.19) has a solution in P)
i(fi(x) - c i )
]
.
APPENDIX A. LINEAR EQUATIONS ON POLYTOPES
204
PROOF. The a l g o r i t h m is self-explanatory. We first test if f l (x) = cl h a s a s o l u t i o n in P. If not, t h e n the s y s t e m c a n n o t have a s o l u t i o n in Po If yes, t h e n we t e s t if the s u b s y s t e m ~fl (x) = c l , f 2 ( x ) = c2 h a s a s o l u t i o n in P, a n d so on. [] T h e r e c o u l d be v a r i a t i o n s of this algorithm. For example, we m a y first t e s t if t h e single e q u a t i o n s f k (x) = Ck have individual solutions. EXERCISES A.5
1. Consider a linear function F : R 3 fl and f2 on R 3 by the equation
~ R 2.
It defines two real-valued functions
F(X1,X2,X3) = ( f l ( X 1 , X 2 , X 3 ) , f 2 ( x I , X 2 , X 3 ) )
((X1,X2,X 3) E R 3 ) .
Show that f~ and f2 are both linear. (This result can be easily generalized to linear functions F from any space Rm to any space Rn. 2. In Theorem A.10, show that if the equation f2(x) = c2 has no solution in P, that is, if Q is empty, then Condition A.13 does not hold.
A.6
Integer Solutions to Linear Equations
In this a p p e n d i x , w e have dealt w i t h real s o l u t i o n s to linear e q u a t i o n s w i t h real coefficients in p o l y t o p e s . C o n s i d e r n o w a s y s t e m o f linear e q u a t i o n s w i t h i n t e g e r coefficients, a n d a given p o l y t o p e . U n d e r w h a t c o n d i t i o n s , d o e s t h e e x i s t e n c e of a real s o l u t i o n to that s y s t e m in that p o l y t o p e i m p l y the e x i s t e n c e of an i n t e g e r s o l u t i o n in the s a m e p o l y t o p e ? In this context, t h e c o n c e p t of a totally u n i m o d u l a r m a t r i x b e c o m e s useful. An i n t e g e r m a t r i x A (of a n y size) is totally unimodular if the d e t e r m i n a n t o f e a c h s q u a r e s u b m a t r i x is 0, 1, or - 1 (see C h a p t e r 19 of [Schr 87]). Thus, in particular, e a c h e l e m e n t of a totally u n i m o d u l a r m a t r i x is 0, 1, or - 1. A s q u a r e totally u n i m o d u l a r m a t r i x is n e c e s s a r i l y u n i m o d u l a r . We s t a t e a b a s i c fact a b o u t e x t r e m e p o i n t s o f a p o l y t o p e given b y a totally u n i m o d u l a r matrix, a n d t h e n p r o v e a r e s u l t o n i n t e g e r s o l u t i o n s to a s y s t e m of e q u a t i o n s . L e m m a A.12 The extreme points of a polytope P = {x : x A < b} are
integer vectors, if A is a totally unimodular matrix and b is an integer vector.
A.6. INTEGER SOLUTIONS TO LINEAR EQUATIONS
Theorem
205
A.13 Consider a s y s t e m xB = c
o f n linear e q u a t i o n s in rn variables, w h e r e B is an m × n integer m a t r i x a n d c E R n is an integer vector. Suppose it has a real solution in a polytope P = {x E R rn : xA < b}, w h e r e A is a totally u n i m o d u l a r m a t r i x a n d b is an integer vector. I f the c o n c a t e n a t e d m a t r i x (A; B) is totally u n i m o d u l a r , then this s y s t e m has an integer solution in P.
PROOF. The set of solutions to xB = c in P is the polytope Q defined by the following s y s t e m of inequalities:
xB <
x(-B)
c
_< - c .
t
(A.20)
By hypothesis, this polytope is nonempty. Since the matrix (A; B) is totally unimodular, it can be shown (Exercise 1.) that so is the matrix (A; B;-B). Then, by Lemma A.12, each e x t r e m e point of Q is an integer point. This m e a n s the s y s t e m of equations xB = c has an integer solution in P. [] The condition of this t h e o r e m is sufficient, but not necessary for the existence of an integer solution. Not each e x t r e m e point of Q needs to be an integer point; it is e n o u g h to have one extreme point that is an integer point. In fact, it suffices to have just one integer point s o m e w h e r e in Q; it does not even have to be an extreme point. The conditions u n d e r which the existence of a real solution implies the existence of an integer solution are i m p o r t a n t in d e p e n d e n c e analysis, since they describe situations where an approximate dependence test would m a k e accurate predictions. For example, we would know w h e n the passing of the b o u n d s test definitely implies the existence of dependence. We will not study this p r o b l e m in detail; see [Bane 76], [Li 891, and [PSKK 91] for results in this area. EXERCISES A . 6
1. Let A and B denote two matrices with the same number of rows. Show that if (A; B) is a totally unimodular matrix, then so is the matrix (A; B; -B).
206
APPENDIX A. LINEAR EO.UA TIONS ON POL YTOPES
2. Explain how Theorem A.13 may be used in dependence problems to show that the existence of a real solution to the dependence equations satisfying all conditions implies the existence of an integer solution with the same property. Describe the kinds of problems that can be covered this way; give examples. 3. Describe some dependence problems where Theorem A.13 is not applicable, but the same conclusion (real solution implies integer solution) can be derived by finding one integral extreme point. (Extreme points can be found by Theorem A.2.)
Bibliography [AIKe 84]
John Randal Allen & Ken Kennedy. Automatic Loop Interchange. Proceedings of the SIGPLAN '84 Symposium on Compiler Construction, Montreal, Canada, June 17-22, 1984. Available as SIGPLANNotices 19 (1984), 233-246.
[A1Ke 87]
John Randal Allen & Ken Kennedy. Automatic Translation of FORTRAN Programs to Vector Form. ACM Transactions on Programming Languages and Systems 9 (1987), 491-542.
[Alle 83]
John Randal Allen. Dependence Analysis for Subscripted Variables and its Application to Program Transformations. PhD Thesis, Department of Mathematical Sciences, Rice University, Houston, Texas, April 1983. Available as Document 83-14916 from University Microfilms, Ann Arbor, Michigan.
[AsSh 80]
Bengt Aspvall & Yossi Shiloach. A Fast Algorithm for Solving Systems of Linear Equations with Two Variables per Equation. Linear Algebra and its Applications 34 (1980), 117-124.
[Bane 761
Utpal Banerjee. Data Dependence in Ordinary Programs. MS Thesis, Report 76-837, Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, November 1976.
[Bane 79]
Utpal Banerjee. Speedup of Ordinary Programs. PhD Thesis, Report 79-989, Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, October 1979. Available as Document 80-08967 from University Microfilms, Ann Arbor, Michigan.
[Bane 88al Utpal Banerjee. Dependence Analysis for Supercomputing. Kluwer Academic Publishers, Norwell, Massachusetts, 1988. [Bane 88b] Utpal Banerjee. An Introduction to a Formal Theory of Dependence Analysis. The Journal ofSupercomputing 2 (1988), 133149. 207
208
BIBLIOGRAPHY
[Bane 931
Utpal Banerjee. Loop Transformations for Restructuring Compilers: The Foundations. Kluwer Academic Publishers, Norwell, Massachusetts, 1993.
[Bane 94]
Utpal Banerjee. Loop Transformations for Restructuring Compilers: Loop Parallelization. Kluwer Academic Publishers, Norwell, Massachusetts, 1994.
[BaSh 96]
E. Bach & J. Shallit. Algorithmic Number Theory, Volume I: EfficientAlgorithms. MIT Press, Cambridge, Massachusetts, 1996.
[BCKT 791 Utpal Banerjee, Shyh-Ching Chen, David J. Kuck & Ross A. Towle. Time and Parallel Processor Bounds for Fortran-Like Loops. IEEE Transactions on Computers C-28 (1979), 660-670. [BENP 93l
Utpal Banerjee, Rudolf Eigenmann, Alexandru Nicolau & David A. Padua. Automatic Program Parallelization. IEEE Proceedings 81 (1993), 211-243.
[BoTr 76]
I. Borosh & L. B. Treybig. Bounds on Positive Integral Solutions of Linear DiophantIne Equations. Proceedings of the American Mathematical Society 55 (1976), 299-304.
[BrDu 71]
J. L. Brown, Jr. & R. L. Duncan. The Least Remainder Algorithm. Fibonacci Quarterly 9 (1971), 347-350,401.
[BuCy 86]
Michael Burke & Ron Cytron. Interprocedural Dependence Analysis and Parallelization. Proceedings of the ACM SIGPLAN '86 Symposium on Compiler Construction, Palo Alto, California, June 25-27, 1986. Available as SIGPLANNotices 21 (1986), 162175.
[Coha 73]
William L Cohagan. Vector Optimization for the ASC. Proceedings of the 7th Annual Princeton Conference on Information Sciences and Systems, March 22-23, 1973. Princeton University Press, Princeton, New Jersey, 1973, 169-174.
[DaEa 73]
G. B. Dantzig & B. C. Eaves. Fourier-Motzkin Elimination and its Dual. Journal of Combinatorial Theory (A) 14 (1973), 288-297.
[Dieu 69]
Jean A. Dieudonn6. Foundations of Modern Analysis. Academic Press, New York, New York, 1969.
[Duff 74]
Richard J. Duffin. On Fourier's Analysis of Linear Inequality Systems. Mathematical Programming Study 1 (1974), 71-95.
[Feau 88]
Paul Feautrier. Parametric Integer Programming. RAIRO Recherche Operationnelle 22 (1988), 243-268.
BIBLIOGRAPHY
209
[Fior 72]
J. Ch. Fiorot. Generation of all Integer Points for Given Sets of Linear Inequalities. Mathematical Programming 3 (1972), 276295.
[GaJo 79]
M. R. Garey & D. S. Johnson. Computers and Intractability, A Guide to the Theory of NP-Completeness. W. H. Freeman, San Francisco, California, 1979.
[GoKT 91]
Gina Golf, Ken Kennedy & Chau-Wen Tseng. Practical Dependence Testing. Proceedings of the ACM SIGPLAN '91 Conference on Programming Language Design and Implementation, Toronto, Canada, June 26-28, 1991. Available as SIGPL4NNotices 26 (1991), 15-29.
[GoZa 52]
A. W. Goodman & W. M. Zaring. Euclid's Algorithm and the Least Remainder Algorithm. American Mathematical Monthly 59 (1952), 156-159.
[Hagh 95]
Mohammad R. Haghighat. Symbolic Analysis for Parallelizing Compilers. Kluwer Academic Publishers, Norwell, Massachusetts, 1995.
[Hagh 961
Mohammad R. Haghighat. A Note on the Omega Test. To be published.
[HoNa 94]
Dorit S. Hochbaum & Joseph (Seffi) Naor. Simple and Fast Algorithms for Linear and Integer Programs with Two Variables per Inequality. SIAM Journal on Computing 23 (1994), 1179-1192.
[IrTr 89]
Francois Irigoin & R~mi Triolet. Dependence Approximation and Global Parallel Code Generation for Nested Loops. Parallel and Distributed Algorithms, (eds. M. Cosnard et al.), Elsivier (North-Holland), New York, New York, 1989, 297-308.
[Kenn 801
Ken Kennedy. Automatic Translation of FORTRAN Programs to Vector Form. Rice Technical Report 476-029-4, Department of Mathematical Sciences, Rice University, Houston, Texas, October 1980.
[Knut 73]
Donald E. Knuth. The Art of Computer Programming, Volume 1/Fundamental Algorithms. Addison-Wesley, Reading, Massachusetts, Second Edition, 1973.
[Knut 811
Donald E. Knuth. The Art of Computer Programming, Volume 2/Seminumerical Algorithms. Addison-Wesley, Reading, Massachusetts, Second Edition, 1981.
210
BIBLIOGRAPHY
[KoKP 91]
Xiangyun Kong, David Klappholz & Kleanthis Psarris. The I Test: An Improved Dependence Test for Automatic Parallelization and Vectorization. IEEETransactions on Parallel and Distributed Systems 2 (1991), 342-349.
[Kuhn 80]
Robert Henry Kuhn. Optimization and Interconnection Complexity for: Parallel Processors, Single-Stage Networks, and Decision Trees. PhD Thesis, Report 80-1009, Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, February 1980. Available as Document 80-26541 from University Microfilms, Ann Arbor, Michigan.
[Li 89]
Zhiyuan Li. Intraprocedural and Interprocedural Data Dependence Analysis for Parallel Computing, PhD Thesis, Department of Computer Science, University of Illinois at UrbanaChampaign, Urbana, Illinois, August 1989. Available as Report 910, Center for Supercomputing Research and Development.
[LiYe 89]
Zhiyuan Li & Pen-Chung Yew. Some Results on Exact Data Dependence Analysis. Proceedings of the Second Workshop on Languages and Compilers for Parallel Computing, Urbana, Illinois, August 1989. Available as Languages and Compilers for Parallel Computing (eds. D. Gelernter, A. Nicolau & D. Padua), The MIT Press, Cambridge, Massachusetts 1990, 374-401.
[LiYZ 90l
Zhiyuan Li, Pen-Chung Yew & Chuan-Qi Zhu. An Efficient Data Dependence Analysis for Parallelizing Compilers. IEEE Transactions on Parallel and Distributed Systems 1 (1990), 26-34.
[MaHL 91] Dror E. Maydan, John L. Hennessy & Monica S. Lam. Efficient and Exact Data Dependence Analysis. Proceedings of the ACM SIGPLAN '91 Conference on Programming Language Design and Implementation, Toronto, Canada, June 26-28, 1991. Available as SIGPLAN Notices 26 (1991), 1-14. [Maim 80]
Donald G. Malm. A Computer Laboratory Manual for Number Theory. COMPress, Inc., Wentworth, New Hampshire, 1980.
[Moor 921
T. E. Moore. On the Least Absolute Remainder Euclidean Algorithm. Fibonacci Quarterly 30 (1992), 161-165.
[PsKK 911
Kleanthis Psarris, David Klappholz & Xiangyun Kong. On the Accuracy of the Banerjee Test. Journal of Parallel and Distributed Computing 12 (1991), 152-158.
[PsKK 931
Kleanthis Psarris, David Klappholz & Xiangyun Kong. The Direction Vector I Test. IEEETransactions on Parallel and Distributed Systems 4 (1993), 1280-1290.
BIBLIOGRAPHY
211
[Pugh 92a] William Pugh. A Practical Algorithm for F~xact Array Dependence Analysis. Communications of the ACM 35 (1992), 102114. [Pugh 92b] William Pugh. Definitions of Dependence Distance. ACMLetters on Programming Languages and Systems 1 (1992), 261-265. [Schr 87]
Alexander Schrijver. Theory of Linear and Integer Programming. John Wiley & Sons, New York, New York, 1987.
[Shos 81]
Robert Shostak. Deciding Linear Inequalities by Computing Loop Residues. Journal of the Association for Computing Machinery 28 (1981), 769-779.
[Towl 76]
Ross Albert Towle. Control and Data Dependence for Program Transformations. PhD Thesis, Report 76-788, Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, March 1976.
[TrIF 861
R~mi Triolet, Francois Irigoin & Paul Feautrier. Direct Parallelization of Call Statements. Proceedings of the ACM SIGPLAN '86 Symposium on Compiler Construction, Palo Alto, California, June 1986, 176-185.
[Wall 88]
David R. Wallace. Dependence of Multi-Dimensional Array References. Proceedings of the 1988 International Conference on Supercomputing, St. Malo, France, July 1988, 418-428. The ACM Press, New York, New York.
[Will 83]
H. P. Williams. A Characterization of all Feasible Solutions to an Integer Program. Discrete Applied Mathematics 5 (1983), 147155.
[Will 861
H. P. Williams. Fourier's Method of Linear Programming and its Dual. The American Mathematical Monthly 93 (1986), 681-695.
[Wolf 82]
Michael J. Wolfe. Optimizing Compilers for Supercomputers. PhD Thesis, Report 82-1105, Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, October 1982. Available as Report 329, Center for Supercomputing Research and Development, and also as Document 8303027 from University Microfilms, Ann Arbor, Michigan.
[WoBa 87]
Michael Wolfe & Utpal Banerjee. Data Dependence and its Application to Parallel Processing. International Journal of Parallel Programming 16 (1987), 137-178.
212
BIBLIOGRAPHY
[Wolf 94]
Michael Wolfe. The Definition of Dependence Distance. ACM Transactions on Programming Languages and Systems 16 (1994), 1114-1116.
[WoTs 92]
Michael Wolfe & Chau-Wen Tseng. The Power Test for Data Dependence. IEEE Transactions on Parallel and Distributed Systems 3 (1992), 591-601.
Index output, 21 type of, 20 uniform, 41, 109 dependence constraints in index values, 28, 72, 103 in iteration values, 28, 78, 103, 130 dependence equation(s) in index values, 27, 71, 102 in iteration values, 28, 78, 103, 129 dependence graph, 22, 66, 95 dependence level, 66, 95, 123 dependence problem, 28, 106 simple, 8 direction vector, 66, 95, 123 distance (vector) between statement instances, 20, 65, 94, 123 of dependence, 22, 66, 95, 123 uniform, 41,109 double loop, 58 rectangular, 60 regular, 60 duality theorem, 196
affine function, 2 anti-dependence, 21 array-element coefficient matrix of, 100 normalized, 100 coefficient vector of, 100 normalized, 100 bounded set, 192 bounds test, 50, 79, 139 coefficient matrix of array-element, 100 normalized, 100 coefficient vector of array-element, 100 normalized, 100 convex combination, 192 convex hull, 192 convex set, 192 dependence anti-, 21 between statement instances, 20, 66, 95 between statements, 21, 65, 94, 95, 123 caused by variables, 22 flow, 20, 21 indirect, 23 input, 21 loop-carried, 20, 21, 67, 95 loop-independent, 20, 21, 67, 96
extended Euclid's algorithm, 30 extreme point, 193 final limit, 16 normalized, 18 final matrix, 83 normalized, 87 final vector, 83 213
214
INDEX
normalized, 87 flow dependence, 20, 21 function affine, 2 linear, 193
normalized coefficient vector, 100 normalized final limit, 18 normalized final matrix, 87 normalized final vector, 87 normalized program variable, 100
gcd test, 34, 78, 135 generalized gcd test, 135
Omega test, 183 output dependence, 21
index point, 16, 58, 82 index space, 16, 58, 82 index value, 16, 58, 82 index variable, 16 index vector, 58, 82 indirect dependence, 23 initial limit, 16 initial matrix, 83 initial vector, 83 input dependence, 21 iteration point, 17, 58, 82 iteration space, 17, 58, 82 iteration value, 17, 58, 82 iteration variable, 4, 17 iteration vector, 58, 82 I test, 181
polyhedron, 192 polytope, 192 extreme point of, 193 vertex of, 193 Power test, 173
Lagrangean relaxation, 196 ?~-test, 164 least remainder algorithm, 184 linear function, 193 loop-carried dependence, 67, 95 loop-independent dependence, 67, 96 loop nest, 121 determined by a statement, 121 perfect, 121 rectangular, 85 regular, 85 loop normalization, 4, 18, 86 method of botmds, 10, 46, 139 method of elimination, 11, 171 normalized coefficient matrix, 100
rectangular double loop, 60 rectangular loop nest, 85 regular double loop, 60 regular loop nest, 85 set bounded, 192 convex, 192 simple dependence problem, 8 statement dependence graph, 22, 66, 95 stride matrix of loop nest, 83 stride of loop, 16 totally unimodular matrix, 204 two-variable system of linear equations, 176 connected, 177 graph associated with, 177 connected, 177 two-variable system of linear inequalities, 178 uniform dependence, 41, 109 uniform dependence distance, 41, 109 vertex of polytope, 193